Age | Commit message (Collapse) | Author |
|
commit 848c23b78fafdcd3270b06a30737f8dbd70c347f upstream.
Commit 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent before
submit it to user") introduced a warning to catch unemitted cached
fiemap extent.
However such warning doesn't take the following case into consideration:
0 4K 8K
|<---- fiemap range --->|
|<----------- On-disk extent ------------------>|
In this case, the whole 0~8K is cached, and since it's larger than
fiemap range, it break the fiemap extent emit loop.
This leaves the fiemap extent cached but not emitted, and caught by the
final fiemap extent sanity check, causing kernel warning.
This patch removes the kernel warning and renames the sanity check to
emit_last_fiemap_cache() since it's possible and valid to have cached
fiemap extent.
Reported-by: David Sterba <dsterba@suse.cz>
Reported-by: Adam Borowski <kilobyte@angband.pl>
Fixes: 4751832da990 ("btrfs: fiemap: Cache and merge fiemap extent ...")
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 92a8109e4d3a34fb6b115c9098b51767dc933444 ]
The code tries to allocate a contiguous buffer with a size supplied by
the server (maxBuf). This could fail if memory is fragmented since it
results in high order allocations for commonly used server
implementations. It is also wasteful since there are probably
few locks in the usual case. Limit the buffer to be no larger than a
page to avoid memory allocation failures due to fragmentation.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit cb5b020a8d38f77209d0472a0fea755299a8ec78 upstream.
This reverts commit 8099b047ecc431518b9bb6bdbba3549bbecdc343.
It turns out that people do actually depend on the shebang string being
truncated, and on the fact that an interpreter (like perl) will often
just re-interpret it entirely to get the full argument list.
Reported-by: Samuel Dionne-Riel <samuel@dionne-riel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
cifs)"
This reverts commit 4cd376638c893cf5bf1072eeaac884f62b7ac71e which is
commit 6e785302dad32228819d8066e5376acd15d0e6ba upstream.
Yi writes:
I notice that 4.4.169 merged 60da90b224ba7 ("cifs: In Kconfig
CONFIG_CIFS_POSIX needs depends on legacy (insecure cifs)") add
a Kconfig dependency CIFS_ALLOW_INSECURE_LEGACY, which was not
defined in 4.4 stable, so after this patch we are not able to
enable CIFS_POSIX anymore. Linux 4.4 stable didn't merge the
legacy dialects codes, so do we really need this patch for 4.4?
So revert this patch in 4.9 as well.
Reported-by: "zhangyi (F)" <yi.zhang@huawei.com>
Cc: Steve French <stfrench@microsoft.com>
Cc: Pavel Shilovsky <pshilov@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 53da6a53e1d414e05759fa59b7032ee08f4e22d7 upstream.
The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.
Catching every such case is difficult, but we may as well catch a few
easy ones. This also handles the case described in the previous patch,
in a different way.
The spec does however require us to catch the case where the difference
is in the rpc credentials. This prevents somebody from snooping another
user's replies by fabricating retries.
(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 085def3ade52f2ffe3e31f42e98c27dcc222dd37 upstream.
Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.
Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.
The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.
The protocol permits us to cache replies even if the client didn't ask
us to. And it's easy to do so in the case of solo SEQUENCE compounds.
So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.
Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops. This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id. If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much. But we can at
least attempt to return valid xdr!
Tested-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d88c93f090f708c18195553b352b9f205e65418f upstream.
debugfs_rename() needs to check that the dentries passed into it really
are valid, as sometimes they are not (i.e. if the return value of
another debugfs call is passed into this one.) So fix this up by
properly checking if the two parent directories are errors (they are
allowed to be NULL), and if the dentry to rename is not NULL or an
error.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 97e1532ef81acb31c30f9e75bf00306c33a77812 upstream.
Dereferencing req->page_descs[0] will Oops if req->max_pages is zero.
Reported-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
Tested-by: syzbot+c1e36d30ee3416289cc0@syzkaller.appspotmail.com
Fixes: b2430d7567a3 ("fuse: add per-page descriptor <offset, length> to fuse_req")
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a2ebba824106dabe79937a9f29a875f837e1b6d4 upstream.
NR_WRITEBACK_TEMP is accounted on the temporary page in the request, not
the page cache page.
Fixes: 8b284dc47291 ("fuse: writepages: handle same page rewrites")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9509941e9c534920ccc4771ae70bd6cbbe79df1c upstream.
Some of the pipe_buf_release() handlers seem to assume that the pipe is
locked - in particular, anon_pipe_buf_release() accesses pipe->tmp_page
without taking any extra locks. From a glance through the callers of
pipe_buf_release(), it looks like FUSE is the only one that calls
pipe_buf_release() without having the pipe locked.
This bug should only lead to a memory leak, nothing terrible.
Fixes: dd3bb14f44a6 ("fuse: support splice() writing to fuse device")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 8099b047ecc431518b9bb6bdbba3549bbecdc343 ]
load_script() simply truncates bprm->buf and this is very wrong if the
length of shebang string exceeds BINPRM_BUF_SIZE-2. This can silently
truncate i_arg or (worse) we can execute the wrong binary if buf[2:126]
happens to be the valid executable path.
Change load_script() to return ENOEXEC if it can't find '\n' or zero in
bprm->buf. Note that '\0' can come from either
prepare_binprm()->memset() or from kernel_read(), we do not care.
Link: http://lkml.kernel.org/r/20181112160931.GA28463@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ben Woodard <woodard@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 76699a67f3041ff4c7af6d6ee9be2bfbf1ffb671 ]
The ep->ovflist is a secondary ready-list to temporarily store events
that might occur when doing sproc without holding the ep->wq.lock. This
accounts for every time we check for ready events and also send events
back to userspace; both callbacks, particularly the latter because of
copy_to_user, can account for a non-trivial time.
As such, the unlikely() check to see if the pointer is being used, seems
both misleading and sub-optimal. In fact, we go to an awful lot of
trouble to sync both lists, and populating the ovflist is far from an
uncommon scenario.
For example, profiling a concurrent epoll_wait(2) benchmark, with
CONFIG_PROFILE_ANNOTATED_BRANCHES shows that for a two threads a 33%
incorrect rate was seen; and when incrementally increasing the number of
epoll instances (which is used, for example for multiple queuing load
balancing models), up to a 90% incorrect rate was seen.
Similarly, by deleting the prediction, 3% throughput boost was seen
across incremental threads.
Link: http://lkml.kernel.org/r/20181108051006.18751-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 70306d9dce75abde855cefaf32b3f71eed8602a3 ]
For sync io read in ocfs2_read_blocks_sync(), first clear bh uptodate flag
and submit the io, second wait io done, last check whether bh uptodate, if
not return io error.
If two sync io for the same bh were issued, it could be the first io done
and set uptodate flag, but just before check that flag, the second io came
in and cleared uptodate, then ocfs2_read_blocks_sync() for the first io
will return IO error.
Indeed it's not necessary to clear uptodate flag, as the io end handler
end_buffer_read_sync() will set or clear it based on io succeed or failed.
The following message was found from a nfs server but the underlying
storage returned no error.
[4106438.567376] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2780 ERROR: read block 1238823695 failed -5
[4106438.567569] (nfsd,7146,3):ocfs2_get_suballoc_slot_bit:2812 ERROR: status = -5
[4106438.567611] (nfsd,7146,3):ocfs2_test_inode_bit:2894 ERROR: get alloc slot and bit failed -5
[4106438.567643] (nfsd,7146,3):ocfs2_test_inode_bit:2932 ERROR: status = -5
[4106438.567675] (nfsd,7146,3):ocfs2_get_dentry:94 ERROR: test inode bit failed -5
Same issue in non sync read ocfs2_read_blocks(), fixed it as well.
Link: http://lkml.kernel.org/r/20181121020023.3034-4-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit e4589fa545e0020dbbc3c9bde35f35f949901392 ]
When there is a failure in f2fs_fill_super() after/during
the recovery of fsync'd nodes, it frees the current sbi and
retries again. This time the mount is successful, but the files
that got recovered before retry, still holds the extent tree,
whose extent nodes list is corrupted since sbi and sbi->extent_list
is freed up. The list_del corruption issue is observed when the
file system is getting unmounted and when those recoverd files extent
node is being freed up in the below context.
list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
<...>
kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
lr : __list_del_entry_valid+0x94/0xb4
pc : __list_del_entry_valid+0x94/0xb4
<...>
Call trace:
__list_del_entry_valid+0x94/0xb4
__release_extent_node+0xb0/0x114
__free_extent_tree+0x58/0x7c
f2fs_shrink_extent_tree+0xdc/0x3b0
f2fs_leave_shrinker+0x28/0x7c
f2fs_put_super+0xfc/0x1e0
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
__cleanup_mnt+0x1c/0x28
task_work_run+0x48/0xd0
do_notify_resume+0x678/0xe98
work_pending+0x8/0x14
Fix this by not creating extents for those recovered files if shrinker is
not registered yet. Once mount is successful and shrinker is registered,
those files can have extents again.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 59a63e479ce36a3f24444c3a36efe82b78e4a8e0 ]
RHBZ: 1021460
There is an issue where when multiple threads open/close the same directory
ntwrk_buf_start might end up being NULL, causing the call to smbCalcSize
later to oops with a NULL deref.
The real bug is why this happens and why this can become NULL for an
open cfile, which should not be allowed.
This patch tries to avoid a oops until the time when we fix the underlying
issue.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 594d1644cd59447f4fceb592448d5cd09eb09b5e ]
This patch removes the check from nfs_compare_mount_options to see if a
`sec' option was passed for the current mount before comparing auth
flavors and instead just always compares auth flavors.
Consider the following scenario:
You have a server with the address 192.168.1.1 and two exports /export/a
and /export/b. The first export supports `sys' and `krb5' security, the
second just `sys'.
Assume you start with no mounts from the server.
The following results in EIOs being returned as the kernel nfs client
incorrectly thinks it can share the underlying `struct nfs_server's:
$ mkdir /tmp/{a,b}
$ sudo mount -t nfs -o vers=3,sec=krb5 192.168.1.1:/export/a /tmp/a
$ sudo mount -t nfs -o vers=3 192.168.1.1:/export/b /tmp/b
$ df >/dev/null
df: ā/tmp/bā: Input/output error
Signed-off-by: Chris Perl <cperl@janestreet.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit d288d95842f1503414b7eebce3773bac3390457e ]
When inode is corrupted so that extent type is invalid, some functions
(such as udf_truncate_extents()) will just BUG. Check that extent type
is valid when loading the inode to memory.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 62a063b8e7d1db684db3f207261a466fa3194e72 ]
Anatoly Trosinenko reports that this:
1) Checkout fresh master Linux branch (tested with commit e195ca6cb)
2) Copy x84_64-config-4.14 to .config, then enable NFS server v4 and build
3) From `kvm-xfstests shell`:
results in NULL dereference in locks_end_grace.
Check that nfsd has been started before trying to end the grace period.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit f6176473a0c7472380eef72ebeb330cf9485bf0a ]
When call f2fs_acl_create_masq() failed, the caller f2fs_acl_create()
should return -EIO instead of -ENOMEM, this patch makes it consistent
with posix_acl_create() which has been fixed in commit beaf226b863a
("posix_acl: don't ignore return value of posix_acl_create_masq()").
Fixes: 83dfe53c185e ("f2fs: fix reference leaks in f2fs_acl_create")
Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit b61ac5b720146c619c7cdf17eff2551b934399e5 ]
This patch move dir data flush to write checkpoint process, by
doing this, it may reduce some time for dir fsync.
pre:
-f2fs_do_sync_file enter
-file_write_and_wait_range <- flush & wait
-write_checkpoint
-do_checkpoint <- wait all
-f2fs_do_sync_file exit
now:
-f2fs_do_sync_file enter
-write_checkpoint
-block_operations <- flush dir & no wait
-do_checkpoint <- wait all
-f2fs_do_sync_file exit
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 216f0efd19b9cc32207934fd1b87a45f2c4c593e ]
Before this patch, recovery would cause all callbacks to be delayed,
put on a queue, and afterward they were all queued to the callback
work queue. This patch does the same thing, but occasionally takes
a break after 25 of them so it won't swamp the CPU at the expense
of other RT processes like corosync.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit b469e7e47c8a075cc08bcd1e85d4365134bdcdd5 upstream.
When an event is reported on a sub-directory and the parent inode has
a mark mask with FS_EVENT_ON_CHILD|FS_ISDIR, the event will be sent to
fsnotify() even if the event type is not in the parent mark mask
(e.g. FS_OPEN).
Further more, if that event happened on a mount or a filesystem with
a mount/sb mark that does have that event type in their mask, the "on
child" event will be reported on the mount/sb mark. That is not
desired, because user will get a duplicate event for the same action.
Note that the event reported on the victim inode is never merged with
the event reported on the parent inode, because of the check in
should_merge(): old_fsn->inode == new_fsn->inode.
Fix this by looking for a match of an actual event type (i.e. not just
FS_ISDIR) in parent's inode mark mask and by not reporting an "on child"
event to group if event type is only found on mount/sb marks.
[backport hint: The bug seems to have always been in fanotify, but this
patch will only apply cleanly to v4.19.y]
Cc: <stable@vger.kernel.org> # v4.19
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[amir: backport to v4.9]
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 79f546a696bff2590169fb5684e23d65f4d9f591 upstream.
We recently had an oops reported on a 4.14 kernel in
xfs_reclaim_inodes_count() where sb->s_fs_info pointed to garbage
and so the m_perag_tree lookup walked into lala land. It produces
an oops down this path during the failed mount:
radix_tree_gang_lookup_tag+0xc4/0x130
xfs_perag_get_tag+0x37/0xf0
xfs_reclaim_inodes_count+0x32/0x40
xfs_fs_nr_cached_objects+0x11/0x20
super_cache_count+0x35/0xc0
shrink_slab.part.66+0xb1/0x370
shrink_node+0x7e/0x1a0
try_to_free_pages+0x199/0x470
__alloc_pages_slowpath+0x3a1/0xd20
__alloc_pages_nodemask+0x1c3/0x200
cache_grow_begin+0x20b/0x2e0
fallback_alloc+0x160/0x200
kmem_cache_alloc+0x111/0x4e0
The problem is that the superblock shrinker is running before the
filesystem structures it depends on have been fully set up. i.e.
the shrinker is registered in sget(), before ->fill_super() has been
called, and the shrinker can call into the filesystem before
fill_super() does it's setup work. Essentially we are exposed to
both use-after-free and use-before-initialisation bugs here.
To fix this, add a check for the SB_BORN flag in super_cache_count.
In general, this flag is not set until ->fs_mount() completes
successfully, so we know that it is set after the filesystem
setup has completed. This matches the trylock_super() behaviour
which will not let super_cache_scan() run if SB_BORN is not set, and
hence will not allow the superblock shrinker from entering the
filesystem while it is being set up or after it has failed setup
and is being torn down.
Cc: stable@kernel.org
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 28eb24ff75c5ac130eb326b3b4d0dcecfc0f427d upstream.
In case a hostname resolves to a different IP address (e.g. long
running mounts), make sure to resolve it every time prior to calling
generic_ip_connect() in reconnect.
Suggested-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e74c98ca2d6ae4376cc15fa2a22483430909d96b upstream.
This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone. Let's revert this
for now to have more time for a proper fix.
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 1dbd449c9943e3145148cc893c2461b72ba6fef0 upstream.
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.
Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 8e6e72aeceaaed5aeeb1cb43d3085de7ceb14f79 upstream.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
A bug has been discovered when redirecting splice output to regular files
on EXT4 and tmpfs. Other filesystems might be affected.
This commit fixes the issue for stable series kernel, using one of the
change introduced during the rewrite and refactoring of vfs_iter_write in
4.13, specifically in the
commit abbb65899aec ("fs: implement vfs_iter_write using do_iter_write").
This issue affects v4.4 and v4.9 stable series of kernels.
Without this fix for v4.4 and v4.9 stable, the following upstream commits
(and their dependencies would need to be backported):
* commit abbb65899aec ("fs: implement vfs_iter_write using do_iter_write")
* commit 18e9710ee59c ("fs: implement vfs_iter_read using do_iter_read")
* commit edab5fe38c2c
("fs: move more code into do_iter_read/do_iter_write")
* commit 19c735868dd0 ("fs: remove __do_readv_writev")
* commit 26c87fb7d10d ("fs: remove do_compat_readv_writev")
* commit 251b42a1dc64 ("fs: remove do_readv_writev")
as well as the following dependencies:
* commit bb7462b6fd64
("vfs: use helpers for calling f_op->{read,write}_iter()")
* commit 0f78d06ac1e9
("vfs: pass type instead of fn to do_{loop,iter}_readv_writev()")
* commit 7687a7a4435f
("vfs: extract common parts of {compat_,}do_readv_writev()")
In order to reduce the changes, this commit uses only the part of
commit abbb65899aec ("fs: implement vfs_iter_write using do_iter_write")
that fixes the issue.
This issue and the reproducer can be found on
https://bugzilla.kernel.org/show_bug.cgi?id=85381
Reported-by: Richard Li <richardpku@gmail.com>
Reported-by: Chad Miller <millchad@amazon.com>
Reviewed-by: Stefan Nuernberger <snu@amazon.de>
Reviewed-by: Frank Becker <becke@amazon.de>
Signed-off-by: Jimmy Durand Wesolowski <jdw@amazon.de>
|
|
commit 0d228ece59a35a9b9e8ff0d40653234a6d90f61e upstream.
At the time of forced unmount we place the running replace to
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state, so when the system comes
back and expect the target device is missing.
Then let the replace state continue to be in
BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED state instead of
BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED as there isn't any matching scrub
running as part of replace.
Fixes: e93c89c1aaaa ("Btrfs: add new sources for device replace code")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 5c06147128fbbdf7a84232c5f0d808f53153defe upstream.
When we fail to start a transaction in btrfs_dev_replace_start, we leave
dev_replace->replace_start set to STARTED but clear ->srcdev and
->tgtdev. Later, that can result in an Oops in
btrfs_dev_replace_progress when having state set to STARTED or SUSPENDED
implies that ->srcdev is valid.
Also fix error handling when the state is already STARTED or SUSPENDED
while starting. That, too, will clear ->srcdev and ->tgtdev even though
it doesn't own them. This should be an impossible case to hit since we
should be protected by the BTRFS_FS_EXCL_OP bit being set. Let's add an
ASSERT there while we're at it.
Fixes: e93c89c1aaaaa (Btrfs: add new sources for device replace code)
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 0ea295dd853e0879a9a30ab61f923c26be35b902 upstream.
The function truncate_node frees the page with f2fs_put_page. However,
the page index is read after that. So, the patch reads the index before
freeing the page.
Fixes: bf39c00a9a7f ("f2fs: drop obsolete node page when it is truncated")
Cc: <stable@vger.kernel.org>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit acc58d0bab55a50e02c25f00bd6a210ee121595f upstream.
When doing MTU i/o we need to leave some credits for
possible reopen requests and other operations happening
in parallel. Currently we leave 1 credit which is not
enough even for reopen only: we need at least 2 credits
if durable handle reconnect fails. Also there may be
other operations at the same time including compounding
ones which require 3 credits at a time each. Fix this
by leaving 8 credits which is big enough to cover most
scenarios.
Was able to reproduce this when server was configured
to give out fewer credits than usual.
The proper fix would be to reconnect a file handle first
and then obtain credits for an MTU request but this leads
to bigger code changes and should happen in other patches.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ]
mount.ocfs2 ignore the inconsistent error that journal is clean but
local alloc is unrecovered. After mount, local alloc not empty, then
reserver cluster didn't alloc a new local alloc window, reserveration
map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
following panic.
This issue was reported at
https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html
and was advised to fixed during mount. But this is a very unusual
inconsistent state, usually journal dirty flag should be cleared at the
last stage of umount until every other things go right. We may need do
further debug to check that. Any way to avoid possible futher
corruption, mount should be abort and fsck should be run.
(mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
found = 6518, set = 6518, taken = 8192, off = 15912372
ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
------------[ cut here ]------------
kernel BUG at fs/ocfs2/reservations.c:507!
invalid opcode: 0000 [#1] SMP
Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
RIP: 0010:[<ffffffffa05e96a8>] [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
Call Trace:
ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
__ocfs2_claim_clusters+0x178/0x360 [ocfs2]
ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
ocfs2_write_begin+0x13e/0x230 [ocfs2]
generic_perform_write+0xbf/0x1c0
__generic_file_write_iter+0x19c/0x1d0
ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
__vfs_write+0xb8/0x110
vfs_write+0xa9/0x1b0
SyS_write+0x46/0xb0
system_call_fastpath+0x18/0xd7
Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
RSP <ffff8800ea4db668>
---[ end trace 566f07529f2edf3c ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ]
The ramoops backend currently calls persistent_ram_save_old() even
if a buffer is empty. While this appears to work, it is does not seem
like the right thing to do and could lead to future bugs so lets avoid
that. It also prevents misleading prints in the logs which claim the
buffer is valid.
I got something like:
found existing buffer, size 0, start 0
When I was expecting:
no valid data in buffer (sig = ...)
This bails out early (and reports with pr_debug()), since it's an
acceptable state.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
[ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ]
jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER
is defined then a write buffer is available and has been initialized.
However, this does is not the case when the mtd device has no
out-of-band buffer:
int jffs2_nand_flash_setup(struct jffs2_sb_info *c)
{
if (!c->mtd->oobsize)
return 0;
...
The resulting call to cancel_delayed_work_sync passing a uninitialized
(but zeroed) delayed_work struct forces lockdep to become disabled.
[ 90.050639] overlayfs: upper fs does not support tmpfile.
[ 90.652264] INFO: trying to register non-static key.
[ 90.662171] the code is fine but needs lockdep annotation.
[ 90.673090] turning off the locking correctness validator.
[ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0
[ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7
[ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff
[ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420
[ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90
[ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000
[ 90.780033] ...
[ 90.784902] Call Trace:
[ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148
[ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c
[ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c
[ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc
[ 90.828345] [<8003f27c>] flush_work+0x200/0x24c
[ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210
[ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54
[ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120
[ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c
[ 90.875067] [<80014424>] syscall_common+0x34/0x58
Signed-off-by: Daniel Santos <daniel.santos@pobox.com>
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 04906b2f542c23626b0ef6219b808406f8dddbe9 upstream.
bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
init_page_buffers+0x3e2/0x530 fs/buffer.c:904
grow_dev_page fs/buffer.c:947 [inline]
grow_buffers fs/buffer.c:1009 [inline]
__getblk_slow fs/buffer.c:1036 [inline]
__getblk_gfp+0x906/0xb10 fs/buffer.c:1313
__bread_gfp+0x2d/0x310 fs/buffer.c:1347
sb_bread include/linux/buffer_head.h:307 [inline]
fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
fat_ent_read_block fs/fat/fatent.c:441 [inline]
fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
__fat_get_block fs/fat/inode.c:148 [inline]
...
Trivial reproducer for the problem looks like:
truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt
Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.
Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.
Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 74d5d229b1bf60f93bff244b2dfc0eb21ec32a07 upstream.
If we flip read-only before we initiate writeback on all dirty pages for
ordered extents we've created then we'll have ordered extents left over
on umount, which results in all sorts of bad things happening. Fix this
by making sure we wait on ordered extents if we have to do the aborted
transaction cleanup stuff.
generic/475 can produce this warning:
[ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G W 5.0.0-rc1-default+ #394
[ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
[ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
[ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
[ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
[ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
[ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
[ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
[ 8531.202707] FS: 00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
[ 8531.204851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
[ 8531.207516] Call Trace:
[ 8531.208175] btrfs_free_fs_roots+0xdb/0x170 [btrfs]
[ 8531.210209] ? wait_for_completion+0x5b/0x190
[ 8531.211303] close_ctree+0x157/0x350 [btrfs]
[ 8531.212412] generic_shutdown_super+0x64/0x100
[ 8531.213485] kill_anon_super+0x14/0x30
[ 8531.214430] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8531.215539] deactivate_locked_super+0x29/0x60
[ 8531.216633] cleanup_mnt+0x3b/0x70
[ 8531.217497] task_work_run+0x98/0xc0
[ 8531.218397] exit_to_usermode_loop+0x83/0x90
[ 8531.219324] do_syscall_64+0x15b/0x180
[ 8531.220192] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 8531.221286] RIP: 0033:0x7fc68e5e4d07
[ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
[ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
[ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
[ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
[ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
[ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000
Leaving a tree in the rb-tree:
3853 void btrfs_free_fs_root(struct btrfs_root *root)
3854 {
3855 iput(root->ino_cache_inode);
3856 WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
CC: stable@vger.kernel.org
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add stacktrace ]
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If CONFIG_SECCOMP=n, /proc/self/status includes an empty line. This causes
the iotop application to bail out with an error message.
File "/usr/local/lib64/python2.7/site-packages/iotop/data.py", line 196,
in parse_proc_pid_status
key, value = line.split(':\t', 1)
ValueError: need more than 1 value to unpack
The problem is seen in v4.9.y but not upstream because commit af884cd4a5ae6
("proc: report no_new_privs state") has not been backported to v4.9.y.
The backport of commit fae1fa0fc6cc ("proc: Provide details on speculation
flaw mitigations") tried to address the resulting differences but was
wrong, introducing the problem.
Fixes: 51ef9af2a35b ("proc: Provide details on speculation flaw mitigations")
Cc: Kees Cook <keescook@chromium.org>
Cc: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Kees Cook <keescook@chromium.org>
|
|
commit d47b8715953ad0cf82bb0a9d30d7b11d83bc9c11 upstream.
i_times of inode will be set with current system time which can be
configured through 'date', so it's not safe to judge dnode block as
garbage data or unchanged inode depend on i_times.
Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
dnode block, so I expect recoverying invalid dnode is almost not
possible.
This reverts commit 807b1e1c8e08452948495b1a9985ab46d329e5c2.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e86807862e6880809f191c4cea7f88a489f0ed34 upstream.
The xfstests generic/475 test switches the underlying device with
dm-error while running a stress test. This results in a large number
of file system errors, and since we can't lock the buffer head when
marking the superblock dirty in the ext4_grp_locked_error() case, it's
possible the superblock to be !buffer_uptodate() without
buffer_write_io_error() being true.
We need to set buffer_uptodate() before we call mark_buffer_dirty() or
this will trigger a WARN_ON. It's safe to do this since the
superblock must have been properly read into memory or the mount would
have been successful. So if buffer_uptodate() is not set, we can
safely assume that this happened due to a failed attempt to write the
superblock.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2b08b1f12cd664dc7d5c84ead9ff25ae97ad5491 upstream.
The ext4_inline_data_fiemap() function calls fiemap_fill_next_extent()
while still holding the xattr semaphore. This is not necessary and it
triggers a circular lockdep warning. This is because
fiemap_fill_next_extent() could trigger a page fault when it writes
into page which triggers a page fault. If that page is mmaped from
the inline file in question, this could very well result in a
deadlock.
This problem can be reproduced using generic/519 with a file system
configuration which has the inline_data feature enabled.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 812c0cab2c0dfad977605dbadf9148490ca5d93f upstream.
There are enough credits reserved for most dioread_nolock writes;
however, if the extent tree is sufficiently deep, and/or quota is
enabled, the code was not allowing for all eventualities when
reserving journal credits for the unwritten extent conversion.
This problem can be seen using xfstests ext4/034:
WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
...
EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit b9a74cde94957d82003fb9f7ab4777938ca851cd upstream.
If maxBuf is small but non-zero, it could result in a zero sized lock
element array which we would then try and access OOB.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit ee13919c2e8d1f904e035ad4b4239029a8994131 upstream.
Currently we hide EINTR code returned from sock_sendmsg()
and return 0 instead. This makes a caller think that we
successfully completed the network operation which is not
true. Fix this by properly returning EINTR to callers.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream.
Updating mseq makes client think importer mds has accepted all prior
cap messages and importer mds knows what caps client wants. Actually
some cap messages may have been dropped because of mseq mismatch.
If mseq is left untouched, importing cap's mds_wanted later will get
reset by cap import message.
Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.
Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.
Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 6ff9b09e00a441599f3aacdf577254455a048bc9 upstream.
In gfs2_create_inode, after setting and releasing the acl / default_acl, the
acl / default_acl pointers are not set to NULL as they should be. In that
state, when the function reaches label fail_free_acls, gfs2_create_inode will
try to release the same acls again.
Fix that by setting the pointers to NULL after releasing the acls. Slightly
simplify the logic. Also, posix_acl_release checks for NULL already, so
there is no need to duplicate those checks here.
Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure")
Reported-by: Pan Bian <bianpan2016@163.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream.
According to comment in dlm_user_request() ua should be freed
in dlm_free_lkb() after successful attach to lkb.
However ua is attached to lkb not in set_lock_args() but later,
inside request_lock().
Fixes 597d0cae0f99 ("[DLM] dlm: user locks")
Cc: stable@kernel.org # 2.6.19
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit c0174726c3976e67da8649ac62cae43220ae173a upstream.
Fixes 6d40c4a708e0 ("dlm: improve error and debug messages")
Cc: stable@kernel.org # 3.5
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 23851e978f31eda8b2d01bd410d3026659ca06c7 upstream.
Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr")
Cc: stable@kernel.org # 3.1
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|