Age | Commit message (Collapse) | Author |
|
The command line parsing takes place before jump labels are initialised
which generates a warning if numa_balancing= is specified and
CONFIG_JUMP_LABEL is set.
On older kernels before commit c4b2c0c5f647 ("static_key: WARN on usage
before jump_label_init was called") the kernel would have crashed. This
patch enables automatic numa balancing later in the initialisation
process if numa_balancing= is specified.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The VM is currently heavily tuned to avoid swapping. Whether that is
good or bad is a separate discussion, but as long as the VM won't swap
to make room for dirty cache, we can not consider anonymous pages when
calculating the amount of dirtyable memory, the baseline to which
dirty_background_ratio and dirty_ratio are applied.
A simple workload that occupies a significant size (40+%, depending on
memory layout, storage speeds etc.) of memory with anon/tmpfs pages and
uses the remainder for a streaming writer demonstrates this problem. In
that case, the actual cache pages are a small fraction of what is
considered dirtyable overall, which results in an relatively large
portion of the cache pages to be dirtied. As kswapd starts rotating
these, random tasks enter direct reclaim and stall on IO.
Only consider free pages and file pages dirtyable.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tejun reported stuttering and latency spikes on a system where random
tasks would enter direct reclaim and get stuck on dirty pages. Around
50% of memory was occupied by tmpfs backed by an SSD, and another disk
(rotating) was reading and writing at max speed to shrink a partition.
: The problem was pretty ridiculous. It's a 8gig machine w/ one ssd and 10k
: rpm harddrive and I could reliably reproduce constant stuttering every
: several seconds for as long as buffered IO was going on on the hard drive
: either with tmpfs occupying somewhere above 4gig or a test program which
: allocates about the same amount of anon memory. Although swap usage was
: zero, turning off swap also made the problem go away too.
:
: The trigger conditions seem quite plausible - high anon memory usage w/
: heavy buffered IO and swap configured - and it's highly likely that this
: is happening in the wild too. (this can happen with copying large files
: to usb sticks too, right?)
This patch (of 2):
The dirty_balance_reserve is an approximation of the fraction of free
pages that the page allocator does not make available for page cache
allocations. As a result, it has to be taken into account when
calculating the amount of "dirtyable memory", the baseline to which
dirty_background_ratio and dirty_ratio are applied.
However, currently the reserve is subtracted from the sum of free and
reclaimable pages, which is non-sensical and leads to erroneous results
when the system is dominated by unreclaimable pages and the
dirty_balance_reserve is bigger than free+reclaimable. In that case, at
least the already allocated cache should be considered dirtyable.
Fix the calculation by subtracting the reserve from the amount of free
pages, then adding the reclaimable pages on top.
[akpm@linux-foundation.org: fix CONFIG_HIGHMEM build]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Prior to commit fe35004fbf9e ("mm: avoid swapping out with
swappiness==0") setting swappiness to 0, reclaim code could still evict
recently used user anonymous memory to swap even though there is a
significant amount of RAM used for page cache.
The behaviour of setting swappiness to 0 has since changed. When set,
the reclaim code does not initiate swap until the amount of free pages
and file-backed pages, is less than the high water mark in a zone.
Let's update the documentation to reflect this.
[akpm@linux-foundation.org: remove comma, per Randy]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Bryn M. Reeves <bmr@redhat.com>
Cc: Satoru Moriya <satoru.moriya@hds.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In the gen_pool_dma_alloc() the dma pointer can be NULL and while
assigning gen_pool_virt_to_phys(pool, vaddr) to dma caused the following
crash on da850 evm:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Internal error: Oops: 805 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.13.0-rc1-00001-g0609e45-dirty #5
task: c4830000 ti: c4832000 task.ti: c4832000
PC is at gen_pool_dma_alloc+0x30/0x3c
LR is at gen_pool_virt_to_phys+0x74/0x80
Process swapper, call trace:
gen_pool_dma_alloc+0x30/0x3c
davinci_pm_probe+0x40/0xa8
platform_drv_probe+0x1c/0x4c
driver_probe_device+0x98/0x22c
__driver_attach+0x8c/0x90
bus_for_each_dev+0x6c/0x8c
bus_add_driver+0x124/0x1d4
driver_register+0x78/0xf8
platform_driver_probe+0x20/0xa4
davinci_init_late+0xc/0x14
init_machine_late+0x1c/0x28
do_one_initcall+0x34/0x15c
kernel_init_freeable+0xe4/0x1ac
kernel_init+0x8/0xec
This patch fixes the above.
[akpm@linux-foundation.org: update kerneldoc]
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Nicolin Chen <b42378@freescale.com>
Cc: Joe Perches <joe@perches.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Cc: <stable@vger.kernel.org> [3.13.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify use-after-free fixes from Jan Kara:
"Three fixes for the fanotify use after free problems guys were
reporting.
I have ended up with different lifetime rules for struct
fanotify_event_info depending on whether it is for permission event or
normal event which isn't ideal. My plan is to split these into two
different structures (as permission events need larger struct anyway)
which will make the rules trivial again. But that can wait for later
I guess (but I can add the patch to the pile if you want), now I
wanted to make -rc1 boot for these guys"
[ "These guys" being Jiri Kosina and Dave Jones that reported the slab
corruption issues due to incorrect object lifetimes ]
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: Fix use after free for permission events
fsnotify: Do not return merged event from fsnotify_add_notify_event()
fanotify: Fix use after free in mask checking
|
|
The merge of commit 7221fe4c2ed7 ("ceph: add acl for cephfs") raced with
upstream changes in the generic POSIX ACL code (eg commit 2aeccbe957d0
"fs: add generic xattr_acl handlers" and others).
Some of the fallout was fixed in commit 4db658ea0ca ("ceph: Fix up after
semantic merge conflict"), but it was incomplete: the set_acl
inode_operation wasn't getting set, and the prototype needed to be
adjusted a bit (it doesn't take a dentry anymore).
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If either idling channels or suspending the fence were to fail, the
display would never be resumed. Also if a client fails, resume the fence
(not functionally important, but it would potentially leak memory).
See https://bugs.freedesktop.org/show_bug.cgi?id=70213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
Fixes a regression introduced by d5c1e84b3a130f0
"drm/nouveau: hold mutex while syncing to kernel channel".
Cc: stable@vger.kernel.org #3.13
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
The DRM uses the adjusted mode to calculate constants for vblank
timestamping. Our encoder mode_fixup (usually) replaces this data
with our backend mode information, which doesn't have the needed
data filled in already.
Reported-by: Mario Kleiner mario.kleiner.de@gmail.com
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
|
|
Some DCE8 boards have a funky BlankCrtc table that results
in a timeout when trying to blank the display. The
timeout is harmless (all operations needed from the table
are complete), but wastes time and is confusing to users so
work around it.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=73420
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
The BlankCrtc table in some DCE8 boards has some
logic shortcuts for the vbios when this bit is set.
Clear it for driver use.
v2: fix typo
Bug:
https://bugs.freedesktop.org/show_bug.cgi?id=73420
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
This is effectively a revert of 4573388c92ee60b4ed72b8d95b73df861189988c.
Forcing a display active when there is none causes problems with
dpm on some SI boards which results in improperly initialized
dpm state and boot failures on some boards. As for the bug commit
4573388c92ee tried to address, one can manually force the state to
high for better performance when using the card as a headless compute
node until a better fix is developed.
bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=73788
https://bugs.freedesktop.org/show_bug.cgi?id=69395
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
cc: stable@vger.kernel.org
|
|
DCE5 and newer hardware only has 1 DAC. Use the correct
offset. This may fix display problems on certain board
configurations.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
If we are not able to properly initialize one of the gpu
engines for buffer paging, we limit vram to the size of
the cpu visible aperture. We generally either use the gfx
or dma engine to do this. Clean up the size limiting code
to only adjust the size based on what ring is selected
for buffer paging rather than making assumptions about which
engine is selected for paging.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
The hw is buggy and it's not currently used, but it's
currently still initialized by the driver. Skip the init.
Skipping init also seems to improve stability with dpm on
some r6xx asics.
bug:
https://bugs.freedesktop.org/show_bug.cgi?id=66963
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
Prevent runtime suspend of non-PX GPUs. Runtime suspend is
not what we want in those cases.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Otherwise we allocate a new VMID on nearly every submit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Signed-off-by: Jean Delvare <khali@linux-fr.org>
|
|
The driver prints IT8771F and IT8772F instead of IT8771E and IT8772E
respectively when the driver is loaded. This is a cosmetic only bug
but let's fix it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
|
|
Add support for IT8603E.
This closes bug #57861:
https://bugzilla.kernel.org/show_bug.cgi?id=57861
[JD: Fixes and clean-ups.]
Signed-off-by: Rudolf Marek <r.marek@assembler.cz>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
|
|
Conflicts:
arch/powerpc/kvm/book3s_hv_rmhandlers.S
arch/powerpc/kvm/booke.c
|
|
When Hyper-V hypervisor leaves are present, KVM must relocate
its own leaves at 0x40000100, because Windows does not look for
Hyper-V leaves at indices other than 0x40000000. In this case,
the KVM features are at 0x40000101, but the old code would always
look at 0x40000001.
Fix by using kvm_cpuid_base(). This also requires making the
function non-inline, since kvm_cpuid_base() is static.
Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d
Cc: stable@vger.kernel.org
Cc: mtosatti@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It is unnecessary to go through hypervisor_cpuid_base every time
a leaf is found (which will be every time a feature is requested
after the next patch).
Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d
Cc: stable@vger.kernel.org
Cc: mtosatti@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Self explanatory.
Reported-by: Radim Krcmar <rkrcmar@redhat.com>
Cc: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bring in upstream merge of x86/kaslr for future patches.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Geert reported:
arch/xtensa/platforms/iss/simdisk.c:108:23: error: 'struct bio' has no member named 'bi_sector'
arch/xtensa/platforms/iss/simdisk.c:110:2: error: incompatible types when assigning to type 'int' from type 'struct bvec_iter'
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:110:2: error: request for member 'bv_len' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_size' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_idx' in something not a structure or union
arch/xtensa/platforms/iss/simdisk.c:111:18: error: request for member 'bi_bvec_done' in something not a structure or union
make[2]: *** [arch/xtensa/platforms/iss/simdisk.o] Error 1
Fixup the usage of bio_for_each_segment(). Also fix wrong use
of __bio_kunmap_atomic() - it needs the mapped buffer passed in,
not the originally mapped page.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Our goto out should have gone a little farther.
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We have a race during inode init because the BTRFS_I(inode)->location is setup
after the inode hash table lock is dropped. btrfs_find_actor uses the location
field, so our search might not find an existing inode in the hash table if we
race with the inode init code.
This commit changes things to setup the location field sooner. Also the find actor now
uses only the location objectid to match inodes. For inode hashing, we just
need a unique and stable test, it doesn't have to reflect the inode numbers we
show to userland.
Signed-off-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org
|
|
If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
the new size. The fixe uses the size directly from the item header when
reading uncompressed inlines, and also fixes truncate to update the
size as it goes.
Reported-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
CC: stable@vger.kernel.org
|
|
If the current path's leaf slot is 0, we do search for the previous
leaf (via btrfs_prev_leaf) and set the new path's leaf slot to a
value corresponding to the number of items - 1 of the former leaf.
Fix this by using the slot set by btrfs_prev_leaf, decrementing it
by 1 if it's equal to the leaf's number of items.
Use of btrfs_search_slot_for_read() for backward iteration is used in
particular by the send feature, which could miss items when the input
leaf has less items than its previous leaf.
This could be reproduced by running btrfs/007 from xfstests in a loop.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
There are not any users that use ulist except Btrfs,don't
export them.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We are really suffering from now ulist's implementation, some developers
gave their try, and i just gave some of my ideas for things:
1. use list+rb_tree instead of arrary+rb_tree
2. add cur_list to iterator rather than ulist structure.
3. add seqnum into every node when they are added, this is
used to do selfcheck when iterating node.
I noticed Zach Brown's comments before, long term is to kick off
ulist implementation, however, for now, we need at least avoid
arrary from ulist.
Cc: Liu Bo <bo.li.liu@oracle.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
When walking backrefs, we may iterate every inode's extent
and add/merge them into ulist, and the caller will free memory
from ulist.
However, if we fail to allocate inode's extents element
memory or ulist_add() fail to allocate memory, we won't
add allocated memory into ulist, and the caller won't
free some allocated memory thus memory leaks happen.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
There was a case where file hole detection was incorrect and it would
cause an incremental send to override a section of a file with zeroes.
This happened in the case where between the last leaf we processed which
contained a file extent item for our current inode and the leaf we're
currently are at (and has a file extent item for our current inode) there
are only leafs containing exclusively file extent items for our current
inode, and none of them was updated since the previous send operation.
The file hole detection code would incorrectly consider the file range
covered by these leafs as a hole.
A test case for xfstests follows soon.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
I can easily trigger the following warnings when enabling quota
in my virtual machine(running Opensuse), Steps are firstly creating
a subvolume full of fragment extents, and then create many snapshots
(500 in my test case).
[ 2362.808459] BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-qgroup-re:1970]
[ 2362.809023] task: e4af8450 ti: e371c000 task.ti: e371c000
[ 2362.809026] EIP: 0060:[<fa38f4ae>] EFLAGS: 00000246 CPU: 0
[ 2362.809049] EIP is at __merge_refs+0x5e/0x100 [btrfs]
[ 2362.809051] EAX: 00000000 EBX: cfadbcf0 ECX: 00000000 EDX: cfadbcb0
[ 2362.809052] ESI: dd8d3370 EDI: e371dde0 EBP: e371dd6c ESP: e371dd5c
[ 2362.809054] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2362.809055] CR0: 80050033 CR2: ac454d50 CR3: 009a9000 CR4: 001407d0
[ 2362.809099] Stack:
[ 2362.809100] 00000001 e371dde0 dfcc6890 f29f8000 e371de28 fa39016d 00000011 00000001
[ 2362.809105] 99bfc000 00000000 93928000 00000000 00000001 00000050 e371dda8 00000001
[ 2362.809109] f3a31000 f3413000 00000001 e371ddb8 000040a8 00000202 00000000 00000023
[ 2362.809113] Call Trace:
[ 2362.809136] [<fa39016d>] find_parent_nodes+0x34d/0x1280 [btrfs]
[ 2362.809156] [<fa391172>] btrfs_find_all_roots+0xb2/0x110 [btrfs]
[ 2362.809174] [<fa3934a8>] btrfs_qgroup_rescan_worker+0x358/0x7a0 [btrfs]
[ 2362.809180] [<c024d0ce>] ? lock_timer_base.isra.39+0x1e/0x40
[ 2362.809199] [<fa3648df>] worker_loop+0xff/0x470 [btrfs]
[ 2362.809204] [<c027a88a>] ? __wake_up_locked+0x1a/0x20
[ 2362.809221] [<fa3647e0>] ? btrfs_queue_worker+0x2b0/0x2b0 [btrfs]
[ 2362.809225] [<c025ebbc>] kthread+0x9c/0xb0
[ 2362.809229] [<c06b487b>] ret_from_kernel_thread+0x1b/0x30
[ 2362.809233] [<c025eb20>] ? kthread_create_on_node+0x110/0x110
By adding a reschedule point at the end of btrfs_find_all_roots(), i no longer
hit these warnings.
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Instead of looking for a file extent item, process it, release the path
and do a btree search for the next file extent item, just process all
file extent items in a leaf without intermediate btree searches. This way
we save cpu and we're not blocking other tasks or affecting concurrency on
the btree, because send's paths use the commit root and skip btree node/leaf
locking.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We can only tolerate ENOENT here, for other errors, we should
return directly.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
There is a race condition between resolving indirect ref and root deletion,
and we should gurantee that root can not be destroyed to avoid accessing
broken tree here.
Here we fix it by holding @subvol_srcu, and we will release it as soon
as we have held root node lock.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
When we have two adjacent extents in relink_extent_backref,
we try to merge them. When we use btrfs_search_slot to locate the
slot for the current extent, we shouldn't set "ins_len = 1",
because we will merge it into the previous extent rather than
insert a new item. Otherwise, we may happen to create a new leaf
in btrfs_search_slot and path->slot[0] will be 0. Then we try to
fetch the previous item using "path->slots[0]--", and it will cause
a warning as follows:
[ 145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 map_private_extent_buffer+0xd4/0xe0
[ 145.713387] btrfs bad mapping eb start 5337088 len 4096, wanted 167772306 8
...
[ 145.713462] [<ffffffffa034b1f4>] map_private_extent_buffer+0xd4/0xe0
[ 145.713476] [<ffffffffa030097a>] ? btrfs_free_path+0x2a/0x40
[ 145.713485] [<ffffffffa0340864>] btrfs_get_token_64+0x64/0xf0
[ 145.713498] [<ffffffffa033472c>] relink_extent_backref+0x41c/0x820
[ 145.713508] [<ffffffffa0334d69>] btrfs_finish_ordered_io+0x239/0xa80
I encounter this warning when running defrag having mkfs.btrfs
with option -M. At the same time there are read/writes & snapshots
running at background.
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
The send operation processes inodes by their ascending number, and assumes
that any rename/move operation can be successfully performed (sent to the
caller) once all previous inodes (those with a smaller inode number than the
one we're currently processing) were processed.
This is not true when an incremental send had to process an hierarchical change
between 2 snapshots where the parent-children relationship between directory
inodes was reversed - that is, parents became children and children became
parents. This situation made the path building code go into an infinite loop,
which kept allocating more and more memory that eventually lead to a krealloc
warning being displayed in dmesg:
WARNING: CPU: 1 PID: 5705 at mm/page_alloc.c:2477 __alloc_pages_nodemask+0x365/0xad0()
Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) snd_hda_codec_hdmi snd_hda_codec_realtek joydev radeon snd_hda_intel snd_hda_codec snd_hwdep snd_seq_midi snd_pcm psmouse i915 snd_rawmidi serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer ttm snd_seq_device rfcomm drm_kms_helper parport_pc bnep bluetooth drm ppdev snd soundcore i2c_algo_bit snd_page_alloc binfmt_misc video lp parport r8169 mii hid_generic usbhid hid
CPU: 1 PID: 5705 Comm: btrfs Tainted: G O 3.13.0-rc7-fdm-btrfs-next-18+ #3
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
[ 5381.660441] 00000000000009ad ffff8806f6f2f4e8 ffffffff81777434 0000000000000007
[ 5381.660447] 0000000000000000 ffff8806f6f2f528 ffffffff8104a9ec ffff8807038f36f0
[ 5381.660452] 0000000000000000 0000000000000206 ffff8807038f2490 ffff8807038f36f0
[ 5381.660457] Call Trace:
[ 5381.660464] [<ffffffff81777434>] dump_stack+0x4e/0x68
[ 5381.660471] [<ffffffff8104a9ec>] warn_slowpath_common+0x8c/0xc0
[ 5381.660476] [<ffffffff8104aa3a>] warn_slowpath_null+0x1a/0x20
[ 5381.660480] [<ffffffff81144995>] __alloc_pages_nodemask+0x365/0xad0
[ 5381.660487] [<ffffffff8108313f>] ? local_clock+0x4f/0x60
[ 5381.660491] [<ffffffff811430e8>] ? free_one_page+0x98/0x440
[ 5381.660495] [<ffffffff8108313f>] ? local_clock+0x4f/0x60
[ 5381.660502] [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50
[ 5381.660508] [<ffffffff81095fb8>] ? trace_hardirqs_off_caller+0x28/0xd0
[ 5381.660515] [<ffffffff81183caf>] alloc_pages_current+0x10f/0x1f0
[ 5381.660520] [<ffffffff8113fae4>] ? __get_free_pages+0x14/0x50
[ 5381.660524] [<ffffffff8113fae4>] __get_free_pages+0x14/0x50
[ 5381.660530] [<ffffffff8115dace>] kmalloc_order_trace+0x3e/0x100
[ 5381.660536] [<ffffffff81191ea0>] __kmalloc_track_caller+0x220/0x230
[ 5381.660560] [<ffffffffa0729fdb>] ? fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs]
[ 5381.660564] [<ffffffff8178085c>] ? retint_restore_args+0xe/0xe
[ 5381.660569] [<ffffffff811580ef>] krealloc+0x6f/0xb0
[ 5381.660586] [<ffffffffa0729fdb>] fs_path_ensure_buf.part.12+0x6b/0x200 [btrfs]
[ 5381.660601] [<ffffffffa072a208>] fs_path_prepare_for_add+0x98/0xb0 [btrfs]
[ 5381.660615] [<ffffffffa072a2bc>] fs_path_add_path+0x2c/0x60 [btrfs]
[ 5381.660628] [<ffffffffa072c55c>] get_cur_path+0x7c/0x1c0 [btrfs]
Even without this loop, the incremental send couldn't succeed, because it would attempt
to send a rename/move operation for the lower inode before the highest inode number was
renamed/move. This issue is easy to trigger with the following steps:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ mkdir -p /mnt/btrfs/a/b/c/d
$ mkdir /mnt/btrfs/a/b/c2
$ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap1
$ mv /mnt/btrfs/a/b/c/d /mnt/btrfs/a/b/c2/d2
$ mv /mnt/btrfs/a/b/c /mnt/btrfs/a/b/c2/d2/cc
$ btrfs subvol snapshot -r /mnt/btrfs /mnt/btrfs/snap2
$ btrfs send -p /mnt/btrfs/snap1 /mnt/btrfs/snap2 > /tmp/incremental.send
The structure of the filesystem when the first snapshot is taken is:
. (ino 256)
|-- a (ino 257)
|-- b (ino 258)
|-- c (ino 259)
| |-- d (ino 260)
|
|-- c2 (ino 261)
And its structure when the second snapshot is taken is:
. (ino 256)
|-- a (ino 257)
|-- b (ino 258)
|-- c2 (ino 261)
|-- d2 (ino 260)
|-- cc (ino 259)
Before the move/rename operation is performed for the inode 259, the
move/rename for inode 260 must be performed, since 259 is now a child
of 260.
A test case for xfstests, with a more complex scenario, will follow soon.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
|
Currently struct fanotify_event_info has been destroyed immediately
after reporting its contents to userspace. However that is wrong for
permission events because those need to stay around until userspace
provides response which is filled back in fanotify_event_info. So change
to code to free permission events only after we have got the response
from userspace.
Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The event returned from fsnotify_add_notify_event() cannot ever be used
safely as the event may be freed by the time the function returns (after
dropping notification_mutex). So change the prototype to just return
whether the event was added or merged into some existing event.
Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
We cannot use the event structure returned from
fsnotify_add_notify_event() because that event can be freed by the time
that function returns. Use the mask argument passed into the event
handler directly instead. This also fixes a possible problem when we
could unnecessarily wait for permission response for a normal fanotify
event which got merged with a permission event.
We also disallow merging of permission event with any other event so
that we know the permission event which we just created is the one on
which we should wait for permission response.
Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Jan Kara <jack@suse.cz>
|
|
The merge between 2b7f65b11d87f "mmp_pdma: Style neatening" and
8010dad55a0ab0 "dma: add dma_get_any_slave_channel(), for use in of_xlate()"
caused a build error by leaving obsolete code in place:
mmp_pdma.c: In function 'mmp_pdma_dma_xlate':
mmp_pdma.c:909:31: error: 'candidate' undeclared
mmp_pdma.c:912:3: error: label 'retry' used but not defined
mmp_pdma.c:901:24: warning: unused variable 'c' [-Wunused-variable]
This removes the extraneous lines.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
|
|
* pm-cpufreq:
acpi-cpufreq: De-register CPU notifier and free struct msr on error.
* pm-devfreq:
PM / devfreq: Disable Exynos4 driver build on multiplatform
|