summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-06-23x86: avoid avoid passing around 'thread_info' in stack dumping codeLinus Torvalds
None of the code actually wants a thread_info, it all wants a task_struct, and it's just converting to a thread_info pointer much too early. No semantic change. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23locking: avoid passing around 'thread_info' in mutex debugging codeLinus Torvalds
None of the code actually wants a thread_info, it all wants a task_struct, and it's just converting back and forth between the two ("ti->task" to get the task_struct from the thread_info, and "task_thread_info(task)" to go the other way). No semantic change. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-23Btrfs: Force stripesize to the value of sectorsizeChandan Rajendra
Btrfs code currently assumes stripesize to be same as sectorsize. However Btrfs-progs (until commit df05c7ed455f519e6e15e46196392e4757257305) has been setting btrfs_super_block->stripesize to a value of 4096. This commit makes sure that the value of btrfs_super_block->stripesize is a power of 2. Later, it unconditionally sets btrfs_root->stripesize to sectorsize. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23btrfs: fix disk_i_size update bug when fallocate() failsWang Xiaoguang
When doing truncate operation, btrfs_setsize() will first call truncate_setsize() to set new inode->i_size, but if later btrfs_truncate() fails, btrfs_setsize() will call "i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the inmemory inode size, now bug occurs. It's because for truncate case btrfs_ordered_update_i_size() directly uses inode->i_size to update BTRFS_I(inode)->disk_i_size, indeed we should use the "offset" argument to update disk_i_size. Here is the call graph: ==>btrfs_truncate() ====>btrfs_truncate_inode_items() ======>btrfs_ordered_update_i_size(inode, last_size, NULL); Here btrfs_ordered_update_i_size()'s offset argument is last_size. And below test case can reveal this bug: dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100 dev=$(losetup --show -f fs.img) mkdir -p /mnt/mntpoint mkfs.btrfs -f $dev mount $dev /mnt/mntpoint cd /mnt/mntpoint echo "workdir is: /mnt/mntpoint" blocksize=$((128 * 1024)) dd if=/dev/zero of=testfile bs=$blocksize count=1 sync count=$((17*1024*1024*1024/blocksize)) echo "file size is:" $((count*blocksize)) for ((i = 1; i <= $count; i++)); do i=$((i + 1)) dst_offset=$((blocksize * i)) xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\ testfile > /dev/null done sync truncate --size 0 testfile ls -l testfile du -sh testfile exit In this case, truncate operation will fail for enospc reason and "du -sh testfile" returns value greater than 0, but testfile's size is 0, we need to reflect correct inode->i_size. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23Btrfs: fix error handling in map_private_extent_bufferLiu Bo
map_private_extent_buffer() can return -EINVAL in two different cases, 1. when the requested contents span two pages if nodesize is larger than pagesize, 2. when it detects something insane. The 2nd one used to be only a WARN_ON(1), and we decided to return a error to callers, but we didn't fix up all its callers, which will be addressed by this patch. Without this, btrfs may end up with 'general protection', ie. reading invalid memory. Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23Btrfs: fix error return code in btrfs_init_test_fs()Wei Yongjun
Fix to return a negative error code from the kern_mount() error handling case instead of 0(ret is set to 0 by register_filesystem), as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23Merge branches '4.7-rc-misc', 'hfi1-fixes', 'i40iw-rc-fixes' and ↵Doug Ledford
'mellanox-rc-fixes' into k.o/for-4.7-rc
2016-06-23IB/srpt: Reduce QP buffer sizeBart Van Assche
The memory needed for the send and receive queues associated with a QP is proportional to the max_sge parameter. The current value of that parameter is such that with an mlx4 HCA the QP buffer size is 8 MB. Since DMA is used for communication between HCA and CPU that buffer either has to be allocated coherently or map_single() must succeed for that buffer. Since large contiguous allocations are fragile and since the maximum segment size for e.g. swiotlb is 256 KB, reduce the max_sge parameter. This patch avoids that the following text appears on the console after SRP logout and relogin on a system equipped with multiple IB HCAs: mlx4_core 0000:05:00.0: swiotlb buffer is full (sz: 8388608 bytes) swiotlb: coherent allocation failed for device 0000:05:00.0 size=8388608 CPU: 11 PID: 148 Comm: kworker/11:1 Not tainted 4.7.0-rc4-dbg+ #1 Call Trace: [<ffffffff812c6d35>] dump_stack+0x67/0x92 [<ffffffff812efe71>] swiotlb_alloc_coherent+0x141/0x150 [<ffffffff810458be>] x86_swiotlb_alloc_coherent+0x3e/0x50 [<ffffffffa03861fa>] mlx4_buf_direct_alloc.isra.5+0x9a/0x120 [mlx4_core] [<ffffffffa0386545>] mlx4_buf_alloc+0x165/0x1a0 [mlx4_core] [<ffffffffa035053d>] create_qp_common.isra.29+0x57d/0xff0 [mlx4_ib] [<ffffffffa03510da>] mlx4_ib_create_qp+0x12a/0x3f0 [mlx4_ib] [<ffffffffa031154a>] ib_create_qp+0x3a/0x250 [ib_core] [<ffffffffa055dd4b>] srpt_cm_handler+0x4bb/0xcad [ib_srpt] [<ffffffffa02c1ab0>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa02c3640>] cm_work_handler+0x1ac0/0x2059 [ib_cm] [<ffffffff810737ed>] process_one_work+0x19d/0x490 [<ffffffff81073b29>] worker_thread+0x49/0x490 [<ffffffff8107a0ea>] kthread+0xea/0x100 [<ffffffff815b25af>] ret_from_fork+0x1f/0x40 Fixes: b99f8e4d7bcd ("IB/srpt: convert to the generic RDMA READ/WRITE API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23i40iw: Enable level-1 PBL for fast memory registrationShiraz Saleem
Set the chunk_size to enable level-1 PBL support when the fast memory page count is more than one. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23i40iw: Return correct max_fast_reg_page_list_lenFaisal Latif
Return correct value for max_fast_reg_page_list_len from i40iw_query_device(). Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23i40iw: Correct status check on i40iw_get_pbleFaisal Latif
i40iw_get_pble returns 0 on success. Correct the check on return code. Signed-off-by: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23i40iw: Correct CQ armingShiraz Saleem
CQ is armed for solicited events only, ignoring other notification flags. Correct this by arming for next and arming for solicited event if IB_CQ_SOLICITED is set. Also protect CQ shadow area update with spinlock. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/rdmavt: Correct qp_priv_alloc() return value testMike Marciniszyn
The current drivers return errors from this calldown wrapped in an ERR_PTR(). The rdmavt code incorrectly tests for NULL. The code is fixed to use IS_ERR() and change ret according to the driver return value. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/hfi1: Don't zero out qp->s_ack_queue in rvt_reset_qpAshutosh Dixit
Since rvt_reset_qp already zero's out qp->s_ack_queue head and tail pointers, there is no need to zero out qp->s_ack_queue itself. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/hfi1: Fix deadlock with txreq allocation slow pathMike Marciniszyn
A failure in the get_txreq() inline will result in a slow path retry using __get_txreq(). __get_txreq() attempts to procure the qp s_lock, which is already held in all callers. Fix by deleting the s_lock maintenance in __get_txreq() and add sparse syntax hooks to future proof the code. Cc: Stable <stable@vger.kernel.org> # 4.6+ Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Prevent cross page boundary allocationChuck Lever
Prevent cross page boundary allocation by allocating new page, this is required to be aligned with ConnectX-3 HW requirements. Not doing that might cause to "RDMA read local protection" error. Fixes: 1b2cd0fc673c ('IB/mlx4: Support the new memory registration API') Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Fix memory leak if QP creation failedDotan Barak
When RC, UC, or RAW QPs are created, a qp object is allocated (kzalloc). If at a later point (in procedure create_qp_common) the qp creation fails, this qp object must be freed. Fixes: 1ffeb2eb8be99 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Verify port number in flow steering create flowYishai Hadas
In procedure mlx4_ib_create_flow, passing an invalid port number will cause an out-of-bounds array access. Data passed to this procedure can come from user-space. Therefore, need to validate port number before proceeding onwards. Note that we check against the number of physical ports declared at the verbs (ib core) level; When bonding is active, the verbs level sees one physical port, even though the low-level driver sees two ports. Fixes: f77c0162a339 ("IB/mlx4: Add receive flow steering support") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Fix error flow when sending mads under SRIOVYishai Hadas
Fix mad send error flow to prevent double freeing address handles, and leaking tx_ring entries when SRIOV is active. If ib_mad_post_send fails, the address handle pointer in the tx_ring entry must be set to NULL (or there will be a double-free) and tx_tail must be incremented (or there will be a leak of tx_ring entries). The tx_ring is handled the same way in the send-completion handler. Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx4: Fix the SQ size of an RC QPYishai Hadas
When calculating the required size of an RC QP send queue, leave enough space for masked atomic operations, which require more space than "regular" atomic operation. Fixes: 6fa8f719844b ("IB/mlx4: Add support for masked atomic operations") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx5: Fix wrong naming of port_rcv_data counterTalat Batheesh
port_xmit_data is written instead of port_rcv_data. Fixes: 3efd9a11212d ('IB/mlx5: Modify MAD reading counters method to use counter registers') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/mlx5: Fix post send fence logicEli Cohen
If the caller specified IB_SEND_FENCE in the send flags of the work request and no previous work request stated that the successive one should be fenced, the work request would be executed without a fence. This could result in RDMA read or atomic operations failure due to a MR being invalidated. Fix this by adding the mlx5 enumeration for fencing RDMA/atomic operations and fix the logic to apply this. Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/uverbs: Initialize ib_qp_init_attr with zerosMaor Gottlieb
Initialize ib_qp_init_attr with zeros in order to avoid from garbage in fields that won't be set with user values. Fixes: a060b5629ab06 ('IB/core: generic RDMA READ/WRITE API') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix false search of the IB_SA_WELL_KNOWN_GUIDEli Cohen
When virtualziation is supported, VFs may send SA MADs to a GID formed by the concatenation of the subnet prefix with the IB_SA_WELL_KNOWN_GUID. When a response is required, the current code will search the local HCA's port for the received GID to figure out the GID index of the entry containing this GID. However, since this is not a real GID it will not be found and error will be printed. We change the logic to check if the destination GID is this special GID and avoid lookup in this case and use GID index 0. Fixes: a0c1b2a35087 ('IB/core: Support accessing SA in virtualized environment') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix RoCE v1 multicast join logic issueAlex Vesker
During multicast join of RoCEv1, IGMP join state and max hop limit were updated incorrectly. IGMP join should be sent and marked as joined only on RoCEv2 after a successful join. Max hops should be updated to the hop limit on RoCEv2 regardless of the join state. Fixes: bee3c3c91865 ('IB/cma: Join and leave multicast groups...') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23IB/core: Fix no default GIDs when netdevice reregistersTalat Batheesh
Currently, when the netdevice returned by get_netdev is unregistered, we delete all GIDs (including the default GIDs) and reset their attributes. Therefore, when we re-register it, no default GIDs will be assigned (as their "default GID") attribute will be reset. Fixing this by keeping "default GID" attribute. Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management') Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-23x86/xen: avoid m2p lookup when setting early page table entriesDavid Vrabel
When page tables entries are set using xen_set_pte_init() during early boot there is no page fault handler that could handle a fault when performing an M2P lookup. In 64 bit guests (usually dom0) early_ioremap() would fault in xen_set_pte_init() because an M2P lookup faults because the MFN is in MMIO space and not mapped in the M2P. This lookup is done to see if the PFN in in the range used for the initial page table pages, so that the PTE may be set as read-only. The M2P lookup can be avoided by moving the check (and clear of RW) earlier when the PFN is still available. Reported-by: Kevin Moraga <kmoragas@riseup.net> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com>
2016-06-23hwmon: (dell-smm) Cache fan_type() calls and change fan detectionPali Rohár
On more Dell machines (e.g. Dell Precision M3800) fan_type() call is too expensive (CPU is too long in SMM mode) and cause kernel to hang. This is bug in Dell SMM or BIOS. This patch caches type for each fan (as it should not change) and changes the way how fan presense is detected. First it try function fan_status() as was before commit f989e55452c7 ("i8k: Add support for fan labels"). And if that fails fallback to fan_type(). *_status() functions can fail in case fan is not currently accessible (e.g. present on GPU which is currently turned off). Reported-by: Tolga Cakir <cevelnet@gmail.com> Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=112021 Cc: stable@vger.kernel.org # v4.0+, will need backport Tested-by: Tolga Cakir <cevelnet@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-06-23Merge branch 'fixes' of ↵Rafael J. Wysocki
https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq into pm-devfreq Pull devfreq fixes for v4.7 from MyungJoo Ham. * 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq: PM / devfreq: fix initialization of current frequency in last status PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() check PM / devfreq: remove double put_device PM / devfreq: fix double call put_device PM / devfreq: fix duplicated kfree on devfreq pointer PM / devfreq: devm_kzalloc to have dev pointer more precisely
2016-06-23xen/pciback: Fix conf_space read/write overlap check.Andrey Grodzovsky
Current overlap check is evaluating to false a case where a filter field is fully contained (proper subset) of a r/w request. This change applies classical overlap check instead to include all the scenarios. More specifically, for (Hilscher GmbH CIFX 50E-DP(M/S)) device driver the logic is such that the entire confspace is read and written in 4 byte chunks. In this case as an example, CACHE_LINE_SIZE, LATENCY_TIMER and PCI_BIST are arriving together in one call to xen_pcibk_config_write() with offset == 0xc and size == 4. With the exsisting overlap check the LATENCY_TIMER field (offset == 0xd, length == 1) is fully contained in the write request and hence is excluded from write, which is incorrect. Signed-off-by: Andrey Grodzovsky <andrey2805@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()Juergen Gross
xen_cleanhighmap() is operating on level2_kernel_pgt only. The upper bound of the loop setting non-kernel-image entries to zero should not exceed the size of level2_kernel_pgt. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-23xen/balloon: Fix declared-but-not-defined warningRoss Lagerwall
Fix a declared-but-not-defined warning when building with XEN_BALLOON_MEMORY_HOTPLUG=n. This fixes a regression introduced by commit dfd74a1edfab ("xen/balloon: Fix crash when ballooning on x86 32 bit PAE"). Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-06-22Btrfs: don't do nocow check unless we have toJosef Bacik
Before we write into prealloc/nocow space we have to make sure that there are no references to the extents we are writing into, which means checking the extent tree and csum tree in the case of nocow. So we don't want to do the nocow dance unless we can't reserve data space, since it's a serious drag on performance. With the following sequence fallocate -l10737418240 /mnt/btrfs-test/file cp --reflink /mnt/btrfs-test/file /mnt/btrfs-test/link fio --name=randwrite --rw=randwrite --bs=4k --filename=/mnt/btrfs-test/file \ --end_fsync=1 we get the worst case scenario where we have to fall back on to doing the check anyway. Without this patch lat (usec): min=5, max=111598, avg=27.65, stdev=124.51 write: io=10240MB, bw=126876KB/s, iops=31718, runt= 82646msec With this patch lat (usec): min=3, max=91210, avg=14.09, stdev=110.62 write: io=10240MB, bw=212753KB/s, iops=53188, runt= 49286msec We get twice the throughput, half of the runtime, and half of the average latency. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> [ PAGE_CACHE_ removal related fixups ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22btrfs: fix deadlock in delayed_ref_async_startChris Mason
"Btrfs: track transid for delayed ref flushing" was deadlocking on btrfs_attach_transaction because its not safe to call from the async delayed ref start code. This commit brings back btrfs_join_transaction instead and checks for a blocked commit. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-22Btrfs: track transid for delayed ref flushingJosef Bacik
Using the offwakecputime bpf script I noticed most of our time was spent waiting on the delayed ref throttling. This is what is supposed to happen, but sometimes the transaction can commit and then we're waiting for throttling that doesn't matter anymore. So change this stuff to be a little smarter by tracking the transid we were in when we initiated the throttling. If the transaction we get is different then we can just bail out. This resulted in a 50% speedup in my fs_mark test, and reduced the amount of time spent throttling by 60 seconds over the entire run (which is about 30 minutes). Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2016-06-23powerpc/bpf/jit: Disable classic BPF JIT on ppc64leNaveen N. Rao
Classic BPF JIT was never ported completely to work on little endian powerpc. However, it can be enabled and will crash the system when used. As such, disable use of BPF JIT on ppc64le. Fixes: 7c105b63bd98 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Reported-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23powerpc: Fix faults caused by radix patching of SLB miss handlerMichael Ellerman
As part of the Radix MMU support we added some feature sections in the SLB miss handler. These are intended to catch the case that we incorrectly take an SLB miss when Radix is enabled, and instead of crashing weirdly they bail out to a well defined exit path and trigger an oops. However the way they were written meant the bailout case was enabled by default until we did CPU feature patching. On powermacs the early debug prints in setup_system() can cause an SLB miss, which happens before code patching, and so the SLB miss handler would incorrectly bailout and crash during boot. Fix it by inverting the sense of the feature section, so that the code which is in place at boot is correct for the hash case. Once we determine we are using Radix - which will never happen on a powermac - only then do we patch in the bailout case which unconditionally jumps. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: Denis Kirjanov <kda@linux-powerpc.org> Tested-by: Denis Kirjanov <kda@linux-powerpc.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23UBIFS: Implement ->migratepage()Kirill A. Shutemov
During page migrations UBIFS might get confused and the following assert triggers: [ 213.480000] UBIFS assert failed in ubifs_set_page_dirty at 1451 (pid 436) [ 213.490000] CPU: 0 PID: 436 Comm: drm-stress-test Not tainted 4.4.4-00176-geaa802524636-dirty #1008 [ 213.490000] Hardware name: Allwinner sun4i/sun5i Families [ 213.490000] [<c0015e70>] (unwind_backtrace) from [<c0012cdc>] (show_stack+0x10/0x14) [ 213.490000] [<c0012cdc>] (show_stack) from [<c02ad834>] (dump_stack+0x8c/0xa0) [ 213.490000] [<c02ad834>] (dump_stack) from [<c0236ee8>] (ubifs_set_page_dirty+0x44/0x50) [ 213.490000] [<c0236ee8>] (ubifs_set_page_dirty) from [<c00fa0bc>] (try_to_unmap_one+0x10c/0x3a8) [ 213.490000] [<c00fa0bc>] (try_to_unmap_one) from [<c00fadb4>] (rmap_walk+0xb4/0x290) [ 213.490000] [<c00fadb4>] (rmap_walk) from [<c00fb1bc>] (try_to_unmap+0x64/0x80) [ 213.490000] [<c00fb1bc>] (try_to_unmap) from [<c010dc28>] (migrate_pages+0x328/0x7a0) [ 213.490000] [<c010dc28>] (migrate_pages) from [<c00d0cb0>] (alloc_contig_range+0x168/0x2f4) [ 213.490000] [<c00d0cb0>] (alloc_contig_range) from [<c010ec00>] (cma_alloc+0x170/0x2c0) [ 213.490000] [<c010ec00>] (cma_alloc) from [<c001a958>] (__alloc_from_contiguous+0x38/0xd8) [ 213.490000] [<c001a958>] (__alloc_from_contiguous) from [<c001ad44>] (__dma_alloc+0x23c/0x274) [ 213.490000] [<c001ad44>] (__dma_alloc) from [<c001ae08>] (arm_dma_alloc+0x54/0x5c) [ 213.490000] [<c001ae08>] (arm_dma_alloc) from [<c035cecc>] (drm_gem_cma_create+0xb8/0xf0) [ 213.490000] [<c035cecc>] (drm_gem_cma_create) from [<c035cf20>] (drm_gem_cma_create_with_handle+0x1c/0xe8) [ 213.490000] [<c035cf20>] (drm_gem_cma_create_with_handle) from [<c035d088>] (drm_gem_cma_dumb_create+0x3c/0x48) [ 213.490000] [<c035d088>] (drm_gem_cma_dumb_create) from [<c0341ed8>] (drm_ioctl+0x12c/0x444) [ 213.490000] [<c0341ed8>] (drm_ioctl) from [<c0121adc>] (do_vfs_ioctl+0x3f4/0x614) [ 213.490000] [<c0121adc>] (do_vfs_ioctl) from [<c0121d30>] (SyS_ioctl+0x34/0x5c) [ 213.490000] [<c0121d30>] (SyS_ioctl) from [<c000f2c0>] (ret_fast_syscall+0x0/0x34) UBIFS is using PagePrivate() which can have different meanings across filesystems. Therefore the generic page migration code cannot handle this case correctly. We have to implement our own migration function which basically does a plain copy but also duplicates the page private flag. UBIFS is not a block device filesystem and cannot use buffer_migrate_page(). Cc: stable@vger.kernel.org Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [rw: Massaged changelog, build fixes, etc...] Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de>
2016-06-23mm: Export migrate_page_move_mapping and migrate_page_copyRichard Weinberger
Export these symbols such that UBIFS can implement ->migratepage. Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Christoph Hellwig <hch@lst.de>
2016-06-23ubi: Make recover_peb power cut awareRichard Weinberger
recover_peb() was never power cut aware, if a power cut happened right after writing the VID header upon next attach UBI would blindly use the new partial written PEB and all data from the old PEB is lost. In order to make recover_peb() power cut aware, write the new VID with a proper crc and copy_flag set such that the UBI attach process will detect whether the new PEB is completely written or not. We cannot directly use ubi_eba_atomic_leb_change() since we'd have to unlock the LEB which is facing a write error. Cc: stable@vger.kernel.org Reported-by: Jörg Pfähler <pfaehler@isse.de> Reviewed-by: Jörg Pfähler <pfaehler@isse.de> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-06-23gpio: make library immune to error pointersLinus Walleij
Most functions that take a GPIO descriptor in need to check the descriptor for IS_ERR(). We do this mostly in the VALIDATE_DESC() macro except for the gpiod_to_irq() function which needs special handling. Cc: stable@vger.kernel.org Reported-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-06-23gpio: make sure gpiod_to_irq() returns negative on NULL descLinus Walleij
commit 54d77198fdfbc4f0fe11b4252c1d9c97d51a3264 ("gpio: bail out silently on NULL descriptors") doesn't work for gpiod_to_irq(): drivers assume that NULL descriptors will give negative IRQ numbers in return. It has been pointed out that returning 0 is NO_IRQ and that drivers should be amended to treat this as an error, but that is for the longer term: now let us repair the semantics. Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Reported-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-06-23gpio: 104-idi-48: Fix missing spin_lock_init for ack_lockAxel Lin
Fixes: 9ae482104cb9 ("gpio: 104-idi-48: Clear pending interrupt once in IRQ handler") Signed-off-by: Axel Lin <axel.lin@ingics.com> Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-06-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns fix from Eric Biederman: "This contains just a single small patch that fixes a tiny hole in the logic of allowing unprivileged mounting of proc and sysfs. In practice I don't think anyone is affected because having MNT_RDONLY clear in mnt->mnt_flags but MS_RDONLY set in sb->s_flags is very weird for a filesystem, and weirder for proc and sysfs. However if it happens let's handle it correctly and then no one has to to worry about this crazy case" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mnt: Account for MS_RDONLY in fs_fully_visible
2016-06-22Merge tag 'gpio-v4.7-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "More GPIO fixes. Most prominent the gpiod_to_irq() fix brought to my attention by Hans de Goede. The hardening patch is a consequence of the reasoning around that bug. - It was discovered that too many parts of the kernel does not respect gpiod_to_irq() returning zero for an invalid IRQ. While this gets fixed, we need to make it return negative errorcodes again. - Harden the library a bit when passed error pointers. It is a bug to use these, but let's be helpful and warn the users. - Fix an uninitialized spinlock in the 104-idi-48 driver" * tag 'gpio-v4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: make library immune to error pointers gpio: make sure gpiod_to_irq() returns negative on NULL desc gpio: 104-idi-48: Fix missing spin_lock_init for ack_lock
2016-06-22arm64: hibernate: Don't hibernate on systems with stuck CPUsJames Morse
Hibernate relies on cpu hotplug to prevent secondary cores executing the kernel text while it is being restored. Add a call to cpus_are_stuck_in_kernel() to determine if there are CPUs not counted by 'num_online_cpus()', and prevent hibernate in this case. Fixes: 82869ac57b5 ("arm64: kernel: Add support for hibernate/suspend-to-disk") Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-22arm64: smp: Add function to determine if cpus are stuck in the kernelJames Morse
kernel/smp.c has a fancy counter that keeps track of the number of CPUs it marked as not-present and left in cpu_park_loop(). If there are any CPUs spinning in here, features like kexec or hibernate may release them by overwriting this memory. This problem also occurs on machines using spin-tables to release secondary cores. After commit 44dbcc93ab67 ("arm64: Fix behavior of maxcpus=N") we bring all known cpus into the secondary holding pen, meaning this memory can't be re-used by kexec or hibernate. Add a function cpus_are_stuck_in_kernel() to determine if either of these cases have occurred. Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-22ALSA: hda/tegra: iomem fixups for sparse warningsBen Dooks
The readl/writel are not being passed __iomem annotated variables, so fix the following sparse warnings by adding __iomem in: sound/pci/hda/hda_tegra.c:120:9: warning: incorrect type in argument 2 (different address spaces) sound/pci/hda/hda_tegra.c:120:9: expected void volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:120:9: got unsigned int [usertype] *addr sound/pci/hda/hda_tegra.c:125:16: warning: incorrect type in argument 1 (different address spaces) sound/pci/hda/hda_tegra.c:125:16: expected void const volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:125:16: got unsigned int [usertype] *addr sound/pci/hda/hda_tegra.c:134:13: warning: incorrect type in argument 1 (different address spaces) sound/pci/hda/hda_tegra.c:134:13: expected void const volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:134:13: got void *dword_addr sound/pci/hda/hda_tegra.c:137:9: warning: incorrect type in argument 2 (different address spaces) sound/pci/hda/hda_tegra.c:137:9: expected void volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:137:9: got void *dword_addr sound/pci/hda/hda_tegra.c:146:13: warning: incorrect type in argument 1 (different address spaces) sound/pci/hda/hda_tegra.c:146:13: expected void const volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:146:13: got void *dword_addr sound/pci/hda/hda_tegra.c:156:13: warning: incorrect type in argument 1 (different address spaces) sound/pci/hda/hda_tegra.c:156:13: expected void const volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:156:13: got void *dword_addr sound/pci/hda/hda_tegra.c:159:9: warning: incorrect type in argument 2 (different address spaces) sound/pci/hda/hda_tegra.c:159:9: expected void volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:159:9: got void *dword_addr sound/pci/hda/hda_tegra.c:168:13: warning: incorrect type in argument 1 (different address spaces) sound/pci/hda/hda_tegra.c:168:13: expected void const volatile [noderef] <asn:2>*addr sound/pci/hda/hda_tegra.c:168:13: got void *dword_addr sound/pci/hda/hda_tegra.c:173:23: warning: incorrect type in initializer (incompatible argument 2 (different address spaces)) sound/pci/hda/hda_tegra.c:173:23: expected void ( *reg_writel )( ... ) sound/pci/hda/hda_tegra.c:173:23: got void ( static [toplevel] *<noident> )( ... ) sound/pci/hda/hda_tegra.c:174:22: warning: incorrect type in initializer (incompatible argument 1 (different address spaces)) sound/pci/hda/hda_tegra.c:174:22: expected unsigned int ( *reg_readl )( ... ) sound/pci/hda/hda_tegra.c:174:22: got unsigned int ( static [toplevel] *<noident> )( ... ) sound/pci/hda/hda_tegra.c:175:23: warning: incorrect type in initializer (incompatible argument 2 (different address spaces)) sound/pci/hda/hda_tegra.c:175:23: expected void ( *reg_writew )( ... ) sound/pci/hda/hda_tegra.c:175:23: got void ( static [toplevel] *<noident> )( ... ) sound/pci/hda/hda_tegra.c:176:22: warning: incorrect type in initializer (incompatible argument 1 (different address spaces)) sound/pci/hda/hda_tegra.c:176:22: expected unsigned short ( *reg_readw )( ... ) sound/pci/hda/hda_tegra.c:176:22: got unsigned short ( static [toplevel] *<noident> )( ... ) sound/pci/hda/hda_tegra.c:177:23: warning: incorrect type in initializer (incompatible argument 2 (different address spaces)) sound/pci/hda/hda_tegra.c:177:23: expected void ( *reg_writeb )( ... ) sound/pci/hda/hda_tegra.c:177:23: got void ( static [toplevel] *<noident> )( ... ) sound/pci/hda/hda_tegra.c:178:22: warning: incorrect type in initializer (incompatible argument 1 (different address spaces)) sound/pci/hda/hda_tegra.c:178:22: expected unsigned char ( *reg_readb )( ... ) sound/pci/hda/hda_tegra.c:178:22: got unsigned char ( static [toplevel] *<noident> )( ... ) Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2016-06-22PM / devfreq: fix initialization of current frequency in last statusLukasz Luba
Some systems need current frequency from last_status for calculation but it is zeroed during initialization. When the device starts there is no history, but we can assume that the last frequency was the same as the initial frequency (which is also used in 'previous_freq'). The log shows the result of this misinterpreted value. [ 2.042847] ... Failed to get voltage for frequency 0: -34 Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2016-06-22PM / devfreq: exynos-nocp: Remove incorrect IS_ERR() checkDan Carpenter
Smatch complains because platform_get_resource() returns NULL on error and not an error pointer so the check is wrong. Julia Lawall pointed out that normally we don't check these, because devm_ioremap_resource() has a check for NULL. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>