summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-01-29mm: numa: initialise numa balancing after jump label initialisationMel Gorman
The command line parsing takes place before jump labels are initialised which generates a warning if numa_balancing= is specified and CONFIG_JUMP_LABEL is set. On older kernels before commit c4b2c0c5f647 ("static_key: WARN on usage before jump_label_init was called") the kernel would have crashed. This patch enables automatic numa balancing later in the initialisation process if numa_balancing= is specified. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29mm/page-writeback.c: do not count anon pages as dirtyable memoryJohannes Weiner
The VM is currently heavily tuned to avoid swapping. Whether that is good or bad is a separate discussion, but as long as the VM won't swap to make room for dirty cache, we can not consider anonymous pages when calculating the amount of dirtyable memory, the baseline to which dirty_background_ratio and dirty_ratio are applied. A simple workload that occupies a significant size (40+%, depending on memory layout, storage speeds etc.) of memory with anon/tmpfs pages and uses the remainder for a streaming writer demonstrates this problem. In that case, the actual cache pages are a small fraction of what is considered dirtyable overall, which results in an relatively large portion of the cache pages to be dirtied. As kswapd starts rotating these, random tasks enter direct reclaim and stall on IO. Only consider free pages and file pages dirtyable. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29mm/page-writeback.c: fix dirty_balance_reserve subtraction from dirtyable memoryJohannes Weiner
Tejun reported stuttering and latency spikes on a system where random tasks would enter direct reclaim and get stuck on dirty pages. Around 50% of memory was occupied by tmpfs backed by an SSD, and another disk (rotating) was reading and writing at max speed to shrink a partition. : The problem was pretty ridiculous. It's a 8gig machine w/ one ssd and 10k : rpm harddrive and I could reliably reproduce constant stuttering every : several seconds for as long as buffered IO was going on on the hard drive : either with tmpfs occupying somewhere above 4gig or a test program which : allocates about the same amount of anon memory. Although swap usage was : zero, turning off swap also made the problem go away too. : : The trigger conditions seem quite plausible - high anon memory usage w/ : heavy buffered IO and swap configured - and it's highly likely that this : is happening in the wild too. (this can happen with copying large files : to usb sticks too, right?) This patch (of 2): The dirty_balance_reserve is an approximation of the fraction of free pages that the page allocator does not make available for page cache allocations. As a result, it has to be taken into account when calculating the amount of "dirtyable memory", the baseline to which dirty_background_ratio and dirty_ratio are applied. However, currently the reserve is subtracted from the sum of free and reclaimable pages, which is non-sensical and leads to erroneous results when the system is dominated by unreclaimable pages and the dirty_balance_reserve is bigger than free+reclaimable. In that case, at least the already allocated cache should be considered dirtyable. Fix the calculation by subtracting the reserve from the amount of free pages, then adding the reclaimable pages on top. [akpm@linux-foundation.org: fix CONFIG_HIGHMEM build] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29mm: document improved handling of swappiness==0Aaron Tomlin
Prior to commit fe35004fbf9e ("mm: avoid swapping out with swappiness==0") setting swappiness to 0, reclaim code could still evict recently used user anonymous memory to swap even though there is a significant amount of RAM used for page cache. The behaviour of setting swappiness to 0 has since changed. When set, the reclaim code does not initiate swap until the amount of free pages and file-backed pages, is less than the high water mark in a zone. Let's update the documentation to reflect this. [akpm@linux-foundation.org: remove comma, per Randy] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Bryn M. Reeves <bmr@redhat.com> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29lib/genalloc.c: add check gen_pool_dma_alloc() if dma pointer is not NULLLad, Prabhakar
In the gen_pool_dma_alloc() the dma pointer can be NULL and while assigning gen_pool_virt_to_phys(pool, vaddr) to dma caused the following crash on da850 evm: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 805 [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.13.0-rc1-00001-g0609e45-dirty #5 task: c4830000 ti: c4832000 task.ti: c4832000 PC is at gen_pool_dma_alloc+0x30/0x3c LR is at gen_pool_virt_to_phys+0x74/0x80 Process swapper, call trace: gen_pool_dma_alloc+0x30/0x3c davinci_pm_probe+0x40/0xa8 platform_drv_probe+0x1c/0x4c driver_probe_device+0x98/0x22c __driver_attach+0x8c/0x90 bus_for_each_dev+0x6c/0x8c bus_add_driver+0x124/0x1d4 driver_register+0x78/0xf8 platform_driver_probe+0x20/0xa4 davinci_init_late+0xc/0x14 init_machine_late+0x1c/0x28 do_one_initcall+0x34/0x15c kernel_init_freeable+0xe4/0x1ac kernel_init+0x8/0xec This patch fixes the above. [akpm@linux-foundation.org: update kerneldoc] Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Nicolin Chen <b42378@freescale.com> Cc: Joe Perches <joe@perches.com> Cc: Sachin Kamat <sachin.kamat@linaro.org> Cc: <stable@vger.kernel.org> [3.13.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-29Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fanotify use-after-free fixes from Jan Kara: "Three fixes for the fanotify use after free problems guys were reporting. I have ended up with different lifetime rules for struct fanotify_event_info depending on whether it is for permission event or normal event which isn't ideal. My plan is to split these into two different structures (as permission events need larger struct anyway) which will make the rules trivial again. But that can wait for later I guess (but I can add the patch to the pile if you want), now I wanted to make -rc1 boot for these guys" [ "These guys" being Jiri Kosina and Dave Jones that reported the slab corruption issues due to incorrect object lifetimes ] * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: Fix use after free for permission events fsnotify: Do not return merged event from fsnotify_add_notify_event() fanotify: Fix use after free in mask checking
2014-01-29ceph: fix posix ACL hooksSage Weil
The merge of commit 7221fe4c2ed7 ("ceph: add acl for cephfs") raced with upstream changes in the generic POSIX ACL code (eg commit 2aeccbe957d0 "fs: add generic xattr_acl handlers" and others). Some of the fallout was fixed in commit 4db658ea0ca ("ceph: Fix up after semantic merge conflict"), but it was incomplete: the set_acl inode_operation wasn't getting set, and the prototype needed to be adjusted a bit (it doesn't take a dentry anymore). Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-30drm/nouveau: resume display if any later suspend bits failIlia Mirkin
If either idling channels or suspending the fence were to fail, the display would never be resumed. Also if a client fails, resume the fence (not functionally important, but it would potentially leak memory). See https://bugs.freedesktop.org/show_bug.cgi?id=70213 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-01-30drm/nouveau: fix lock unbalance in nouveau_crtc_page_flipMaarten Lankhorst
Fixes a regression introduced by d5c1e84b3a130f0 "drm/nouveau: hold mutex while syncing to kernel channel". Cc: stable@vger.kernel.org #3.13 Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-01-30drm/nouveau: implement hooks for needed for drm vblank timestamping supportBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-01-30drm/nouveau/disp: add a method to fetch info needed by drm vblank timestampingBen Skeggs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-01-30drm/nv50: fill in crtc mode struct members from crtc_mode_fixupBen Skeggs
The DRM uses the adjusted mode to calculate constants for vblank timestamping. Our encoder mode_fixup (usually) replaces this data with our backend mode information, which doesn't have the needed data filled in already. Reported-by: Mario Kleiner mario.kleiner.de@gmail.com Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-01-29drm/radeon/dce8: workaround for atom BlankCrtc tableAlex Deucher
Some DCE8 boards have a funky BlankCrtc table that results in a timeout when trying to blank the display. The timeout is harmless (all operations needed from the table are complete), but wastes time and is confusing to users so work around it. bug: https://bugs.freedesktop.org/show_bug.cgi?id=73420 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2014-01-29drm/radeon/DCE4+: clear bios scratch dpms bit (v2)Alex Deucher
The BlankCrtc table in some DCE8 boards has some logic shortcuts for the vbios when this bit is set. Clear it for driver use. v2: fix typo Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73420 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2014-01-29drm/radeon: set si_notify_smc_display_change properlyAlex Deucher
This is effectively a revert of 4573388c92ee60b4ed72b8d95b73df861189988c. Forcing a display active when there is none causes problems with dpm on some SI boards which results in improperly initialized dpm state and boot failures on some boards. As for the bug commit 4573388c92ee tried to address, one can manually force the state to high for better performance when using the card as a headless compute node until a better fix is developed. bugs: https://bugs.freedesktop.org/show_bug.cgi?id=73788 https://bugs.freedesktop.org/show_bug.cgi?id=69395 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> cc: stable@vger.kernel.org
2014-01-29drm/radeon: fix DAC interrupt handling on DCE5+Alex Deucher
DCE5 and newer hardware only has 1 DAC. Use the correct offset. This may fix display problems on certain board configurations. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2014-01-29drm/radeon: clean up active vram sizingAlex Deucher
If we are not able to properly initialize one of the gpu engines for buffer paging, we limit vram to the size of the cpu visible aperture. We generally either use the gfx or dma engine to do this. Clean up the size limiting code to only adjust the size based on what ring is selected for buffer paging rather than making assumptions about which engine is selected for paging. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2014-01-29drm/radeon: skip async dma init on r6xxAlex Deucher
The hw is buggy and it's not currently used, but it's currently still initialized by the driver. Skip the init. Skipping init also seems to improve stability with dpm on some r6xx asics. bug: https://bugs.freedesktop.org/show_bug.cgi?id=66963 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
2014-01-29drm/radeon/runpm: don't runtime suspend non-PX cardsAlex Deucher
Prevent runtime suspend of non-PX GPUs. Runtime suspend is not what we want in those cases. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2014-01-29drm/radeon: add ring to fence trace functionsChristian König
Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-01-29drm/radeon: add missing trace pointChristian König
Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-01-29drm/radeon: fix VMID use trackingChristian König
Otherwise we allocate a new VMID on nearly every submit. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2014-01-29Update Jean Delvare's e-mail addressJean Delvare
Signed-off-by: Jean Delvare <khali@linux-fr.org>
2014-01-29hwmon: (it87) Print proper names for the IT8771E and IT8772EJean Delvare
The driver prints IT8771F and IT8772F instead of IT8771E and IT8772E respectively when the driver is loaded. This is a cosmetic only bug but let's fix it. Signed-off-by: Jean Delvare <khali@linux-fr.org>
2014-01-29hwmon: (it87) Add support for the ITE IT8603ERudolf Marek
Add support for IT8603E. This closes bug #57861: https://bugzilla.kernel.org/show_bug.cgi?id=57861 [JD: Fixes and clean-ups.] Signed-off-by: Rudolf Marek <r.marek@assembler.cz> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2014-01-29Merge branch 'kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm-queuePaolo Bonzini
Conflicts: arch/powerpc/kvm/book3s_hv_rmhandlers.S arch/powerpc/kvm/booke.c
2014-01-29x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101Paolo Bonzini
When Hyper-V hypervisor leaves are present, KVM must relocate its own leaves at 0x40000100, because Windows does not look for Hyper-V leaves at indices other than 0x40000000. In this case, the KVM features are at 0x40000101, but the old code would always look at 0x40000001. Fix by using kvm_cpuid_base(). This also requires making the function non-inline, since kvm_cpuid_base() is static. Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29x86, kvm: cache the base of the KVM cpuid leavesPaolo Bonzini
It is unnecessary to go through hypervisor_cpuid_base every time a leaf is found (which will be every time a feature is requested after the next patch). Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdefPaolo Bonzini
Self explanatory. Reported-by: Radim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29Merge commit 'f4bcd8ccddb02833340652e9f46f5127828eb79d' into x86/buildH. Peter Anvin
Bring in upstream merge of x86/kaslr for future patches. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29xtensa: fixup simdisk driver to work with immutable bio_vecsJens Axboe
Geert reported: arch/xtensa/platforms/iss/simdisk.c:108:23: error: 'struct bio' has no member named 'bi_sector' arch/xtensa/platforms/iss/simdisk.c:110:2: error: incompatible types when assigning to type 'int' from type 'struct bvec_iter' arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bv_len' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union make[2]: *** [arch/xtensa/platforms/iss/simdisk.o] Error 1 Fixup the usage of bio_for_each_segment(). Also fix wrong use of __bio_kunmap_atomic() - it needs the mapped buffer passed in, not the originally mapped page. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2014-01-29Btrfs: fix spin_unlock in check_ref_cleanupChris Mason
Our goto out should have gone a little farther. Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: setup inode location during btrfs_init_inode_lockedChris Mason
We have a race during inode init because the BTRFS_I(inode)->location is setup after the inode hash table lock is dropped. btrfs_find_actor uses the location field, so our search might not find an existing inode in the hash table if we race with the inode init code. This commit changes things to setup the location field sooner. Also the find actor now uses only the location objectid to match inodes. For inode hashing, we just need a unique and stable test, it doesn't have to reflect the inode numbers we show to userland. Signed-off-by: Chris Mason <clm@fb.com> CC: stable@vger.kernel.org
2014-01-29Btrfs: don't use ram_bytes for uncompressed inline itemsChris Mason
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect the new size. The fixe uses the size directly from the item header when reading uncompressed inlines, and also fixes truncate to update the size as it goes. Reported-by: Jens Axboe <axboe@fb.com> Signed-off-by: Chris Mason <clm@fb.com> CC: stable@vger.kernel.org
2014-01-29Btrfs: fix btrfs_search_slot_for_read backwards iterationFilipe David Borba Manana
If the current path's leaf slot is 0, we do search for the previous leaf (via btrfs_prev_leaf) and set the new path's leaf slot to a value corresponding to the number of items - 1 of the former leaf. Fix this by using the slot set by btrfs_prev_leaf, decrementing it by 1 if it's equal to the leaf's number of items. Use of btrfs_search_slot_for_read() for backward iteration is used in particular by the send feature, which could miss items when the input leaf has less items than its previous leaf. This could be reproduced by running btrfs/007 from xfstests in a loop. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: do not export ulist functionsWang Shilong
There are not any users that use ulist except Btrfs,don't export them. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: rework ulist with list+rb_treeWang Shilong
We are really suffering from now ulist's implementation, some developers gave their try, and i just gave some of my ideas for things: 1. use list+rb_tree instead of arrary+rb_tree 2. add cur_list to iterator rather than ulist structure. 3. add seqnum into every node when they are added, this is used to do selfcheck when iterating node. I noticed Zach Brown's comments before, long term is to kick off ulist implementation, however, for now, we need at least avoid arrary from ulist. Cc: Liu Bo <bo.li.liu@oracle.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Zach Brown <zab@redhat.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: fix memory leaks on walking backrefs failureWang Shilong
When walking backrefs, we may iterate every inode's extent and add/merge them into ulist, and the caller will free memory from ulist. However, if we fail to allocate inode's extents element memory or ulist_add() fail to allocate memory, we won't add allocated memory into ulist, and the caller won't free some allocated memory thus memory leaks happen. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: fix send file hole detection leading to data corruptionFilipe David Borba Manana
There was a case where file hole detection was incorrect and it would cause an incremental send to override a section of a file with zeroes. This happened in the case where between the last leaf we processed which contained a file extent item for our current inode and the leaf we're currently are at (and has a file extent item for our current inode) there are only leafs containing exclusively file extent items for our current inode, and none of them was updated since the previous send operation. The file hole detection code would incorrectly consider the file range covered by these leafs as a hole. A test case for xfstests follows soon. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: add a reschedule point in btrfs_find_all_roots()Wang Shilong
I can easily trigger the following warnings when enabling quota in my virtual machine(running Opensuse), Steps are firstly creating a subvolume full of fragment extents, and then create many snapshots (500 in my test case). [ 2362.808459] BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-qgroup-re:1970] [ 2362.809023] task: e4af8450 ti: e371c000 task.ti: e371c000 [ 2362.809026] EIP: 0060:[<fa38f4ae>] EFLAGS: 00000246 CPU: 0 [ 2362.809049] EIP is at __merge_refs+0x5e/0x100 [btrfs] [ 2362.809051] EAX: 00000000 EBX: cfadbcf0 ECX: 00000000 EDX: cfadbcb0 [ 2362.809052] ESI: dd8d3370 EDI: e371dde0 EBP: e371dd6c ESP: e371dd5c [ 2362.809054] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 2362.809055] CR0: 80050033 CR2: ac454d50 CR3: 009a9000 CR4: 001407d0 [ 2362.809099] Stack: [ 2362.809100] 00000001 e371dde0 dfcc6890 f29f8000 e371de28 fa39016d 00000011 00000001 [ 2362.809105] 99bfc000 00000000 93928000 00000000 00000001 00000050 e371dda8 00000001 [ 2362.809109] f3a31000 f3413000 00000001 e371ddb8 000040a8 00000202 00000000 00000023 [ 2362.809113] Call Trace: [ 2362.809136] [<fa39016d>] find_parent_nodes+0x34d/0x1280 [btrfs] [ 2362.809156] [<fa391172>] btrfs_find_all_roots+0xb2/0x110 [btrfs] [ 2362.809174] [<fa3934a8>] btrfs_qgroup_rescan_worker+0x358/0x7a0 [btrfs] [ 2362.809180] [<c024d0ce>] ? lock_timer_base.isra.39+0x1e/0x40 [ 2362.809199] [<fa3648df>] worker_loop+0xff/0x470 [btrfs] [ 2362.809204] [<c027a88a>] ? __wake_up_locked+0x1a/0x20 [ 2362.809221] [<fa3647e0>] ? btrfs_queue_worker+0x2b0/0x2b0 [btrfs] [ 2362.809225] [<c025ebbc>] kthread+0x9c/0xb0 [ 2362.809229] [<c06b487b>] ret_from_kernel_thread+0x1b/0x30 [ 2362.809233] [<c025eb20>] ? kthread_create_on_node+0x110/0x110 By adding a reschedule point at the end of btrfs_find_all_roots(), i no longer hit these warnings. Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: make send's file extent item search more efficientFilipe David Borba Manana
Instead of looking for a file extent item, process it, release the path and do a btree search for the next file extent item, just process all file extent items in a leaf without intermediate btree searches. This way we save cpu and we're not blocking other tasks or affecting concurrency on the btree, because send's paths use the commit root and skip btree node/leaf locking. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: fix to catch all errors when resolving indirect refWang Shilong
We can only tolerate ENOENT here, for other errors, we should return directly. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: fix protection between walking backrefs and root deletionWang Shilong
There is a race condition between resolving indirect ref and root deletion, and we should gurantee that root can not be destroyed to avoid accessing broken tree here. Here we fix it by holding @subvol_srcu, and we will release it as soon as we have held root node lock. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29btrfs: fix warning while merging two adjacent extentsGui Hecheng
When we have two adjacent extents in relink_extent_backref, we try to merge them. When we use btrfs_search_slot to locate the slot for the current extent, we shouldn't set "ins_len = 1", because we will merge it into the previous extent rather than insert a new item. Otherwise, we may happen to create a new leaf in btrfs_search_slot and path->slot[0] will be 0. Then we try to fetch the previous item using "path->slots[0]--", and it will cause a warning as follows: [ 145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 map_private_extent_buffer+0xd4/0xe0 [ 145.713387] btrfs bad mapping eb start 5337088 len 4096, wanted 167772306 8 ... [ 145.713462] [<ffffffffa034b1f4>] map_private_extent_buffer+0xd4/0xe0 [ 145.713476] [<ffffffffa030097a>] ? btrfs_free_path+0x2a/0x40 [ 145.713485] [<ffffffffa0340864>] btrfs_get_token_64+0x64/0xf0 [ 145.713498] [<ffffffffa033472c>] relink_extent_backref+0x41c/0x820 [ 145.713508] [<ffffffffa0334d69>] btrfs_finish_ordered_io+0x239/0xa80 I encounter this warning when running defrag having mkfs.btrfs with option -M. At the same time there are read/writes & snapshots running at background. Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29Btrfs: fix infinite path build loops in incremental sendFilipe David Borba Manana
The send operation processes inodes by their ascending number, and assumes that any rename/move operation can be successfully performed (sent to the caller) once all previous inodes (those with a smaller inode number than the one we're currently processing) were processed. This is not true when an incremental send had to process an hierarchical change between 2 snapshots where the parent-children relationship between directory inodes was reversed - that is, parents became children and children became parents. This situation made the path building code go into an infinite loop, which kept allocating more and more memory that eventually lead to a krealloc warning being displayed in dmesg: WARNING: CPU: 1 PID: 5705 at mm/page_alloc.c:2477 __alloc_pages_nodemask+0x365/0xad0() Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) snd_hda_codec_hdmi snd_hda_codec_realtek joydev radeon snd_hda_intel snd_hda_codec snd_hwdep snd_seq_midi snd_pcm psmouse i915 snd_rawmidi serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer ttm snd_seq_device rfcomm drm_kms_helper parport_pc bnep bluetooth drm ppdev snd soundcore i2c_algo_bit snd_page_alloc binfmt_misc video lp parport r8169 mii hid_generic usbhid hid CPU: 1 PID: 5705 Comm: btrfs Tainted: G O 3.13.0-rc7-fdm-btrfs-next-18+ #3 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012 [ 5381.660441] 00000000000009ad ffff8806f6f2f4e8 ffffffff81777434 0000000000000007 [ 5381.660447] 0000000000000000 ffff8806f6f2f528 ffffffff8104a9ec ffff8807038f36f0 [ 5381.660452] 0000000000000000 0000000000000206 ffff8807038f2490 ffff8807038f36f0 [ 5381.660457] Call Trace: [ 5381.660464] [<ffffffff81777434>] dump_stack+0x4e/0x68 [ 5381.660471] [<ffffffff8104a9ec>] warn_slowpath_common+0x8c/0xc0 [ 5381.660476] [<ffffffff8104aa3a>] warn_slowpath_null+0x1a/0x20 [ 5381.660480] [<ffffffff81144995>] __alloc_pages_nodemask+0x365/0xad0 [ 5381.660487] [<ffffffff8108313f>] ? local_clock+0x4f/0x60 [ 5381.660491] [<ffffffff811430e8>] ? free_one_page+0x98/0x440 [ 5381.660495] [<ffffffff8108313f>] ? local_clock+0x4f/0x60 [ 5381.660502] [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50 [ 5381.660508] [<ffffffff81095fb8>] ? trace_hardirqs_off_caller+0x28/0xd0 [ 5381.660515] [<ffffffff81183caf>] alloc_pages_current+0x10f/0x1f0 [ 5381.660520] [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50 [ 5381.660524] [<ffffffff8113fae4>] __get_free_pages+0x14/0x50 [ 5381.660530] [<ffffffff8115dace>] kmalloc_order_trace+0x3e/0x100 [ 5381.660536] [<ffffffff81191ea0>] __kmalloc_track_caller+0x220/0x230 [ 5381.660560] [<ffffffffa0729fdb>] ? fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs] [ 5381.660564] [<ffffffff8178085c>] ? retint_restore_args+0xe/0xe [ 5381.660569] [<ffffffff811580ef>] krealloc+0x6f/0xb0 [ 5381.660586] [<ffffffffa0729fdb>] fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs] [ 5381.660601] [<ffffffffa072a208>] fs_path_prepare_for_add+0x98/0xb0 [btrfs] [ 5381.660615] [<ffffffffa072a2bc>] fs_path_add_path+0x2c/0x60 [btrfs] [ 5381.660628] [<ffffffffa072c55c>] get_cur_path+0x7c/0x1c0 [btrfs] Even without this loop, the incremental send couldn't succeed, because it would attempt to send a rename/move operation for the lower inode before the highest inode number was renamed/move. This issue is easy to trigger with the following steps: $ mkfs.btrfs -f /dev/sdb3 $ mount /dev/sdb3 /mnt/btrfs $ mkdir -p /mnt/btrfs/a/b/c/d $ mkdir /mnt/btrfs/a/b/c2 $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1 $ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c2/d2 $ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2/d2/cc $ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2 $ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send The structure of the filesystem when the first snapshot is taken is: . (ino 256) |-- a (ino 257) |-- b (ino 258) |-- c (ino 259) | |-- d (ino 260) | |-- c2 (ino 261) And its structure when the second snapshot is taken is: . (ino 256) |-- a (ino 257) |-- b (ino 258) |-- c2 (ino 261) |-- d2 (ino 260) |-- cc (ino 259) Before the move/rename operation is performed for the inode 259, the move/rename for inode 260 must be performed, since 259 is now a child of 260. A test case for xfstests, with a more complex scenario, will follow soon. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-01-29fanotify: Fix use after free for permission eventsJan Kara
Currently struct fanotify_event_info has been destroyed immediately after reporting its contents to userspace. However that is wrong for permission events because those need to stay around until userspace provides response which is filled back in fanotify_event_info. So change to code to free permission events only after we have got the response from userspace. Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Reported-and-tested-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Jan Kara <jack@suse.cz>
2014-01-29fsnotify: Do not return merged event from fsnotify_add_notify_event()Jan Kara
The event returned from fsnotify_add_notify_event() cannot ever be used safely as the event may be freed by the time the function returns (after dropping notification_mutex). So change the prototype to just return whether the event was added or merged into some existing event. Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Reported-and-tested-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Jan Kara <jack@suse.cz>
2014-01-29fanotify: Fix use after free in mask checkingJan Kara
We cannot use the event structure returned from fsnotify_add_notify_event() because that event can be freed by the time that function returns. Use the mask argument passed into the event handler directly instead. This also fixes a possible problem when we could unnecessarily wait for permission response for a normal fanotify event which got merged with a permission event. We also disallow merging of permission event with any other event so that we know the permission event which we just created is the one on which we should wait for permission response. Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Reported-and-tested-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Jan Kara <jack@suse.cz>
2014-01-29dmaengine: mmp_pdma: fix mismergeArnd Bergmann
The merge between 2b7f65b11d87f "mmp_pdma: Style neatening" and 8010dad55a0ab0 "dma: add dma_get_any_slave_channel(), for use in of_xlate()" caused a build error by leaving obsolete code in place: mmp_pdma.c: In function 'mmp_pdma_dma_xlate': mmp_pdma.c:909:31: error: 'candidate' undeclared mmp_pdma.c:912:3: error: label 'retry' used but not defined mmp_pdma.c:901:24: warning: unused variable 'c' [-Wunused-variable] This removes the extraneous lines. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2014-01-29Merge branches 'pm-cpufreq' and 'pm-devfreq'Rafael J. Wysocki
* pm-cpufreq: acpi-cpufreq: De-register CPU notifier and free struct msr on error. * pm-devfreq: PM / devfreq: Disable Exynos4 driver build on multiplatform