Age | Commit message (Collapse) | Author |
|
If the server reboots while we are engaged in a delegation return, and
there is a pNFS layout with return-on-close set, then the current code
can end up deadlocking in pnfs_roc() when nfs_inode_set_delegation()
tries to return the old delegation.
Now that delegreturn actually uses its own copy of the stateid, it
should be safe to just always update the delegation stateid in place.
Fixes: 078000d02d57 ("pNFS: We want return-on-close to complete when evicting the inode")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
The 'nfs_server' and 'mount_server' structures include a union of
'struct sockaddr' (with the older 16 bytes max address size) and
'struct sockaddr_storage' which is large enough to hold all the
supported sa_family types (128 bytes max size). The runtime memcpy()
buffer overflow checker is seeing attempts to write beyond the 16
bytes as an overflow, but the actual expected size is that of 'struct
sockaddr_storage'. Plumb the use of 'struct sockaddr_storage' more
completely through-out NFS, which results in adjusting the memcpy()
buffers to the correct union members. Avoids this false positive run-time
warning under CONFIG_FORTIFY_SOURCE:
memcpy: detected field-spanning write (size 28) of single field "&ctx->nfs_server.address" at fs/nfs/namespace.c:178 (size 16)
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Fix the following coccicheck warning:
fs/nfs/dir.c:2494:2-7: WARNING:
NULL check before some freeing functions is not needed.
Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull fscrypt fix from Eric Biggers:
"Fix a memory leak that was introduced by a change that went into -rc1"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: fix keyring memory leak on mount failure
|
|
xfs_rename can update up to 5 inodes: src_dp, target_dp, src_ip, target_ip
and wip. So we need to increase the inode reservation to match.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
There is a memory leak reported by kmemleak:
unreferenced object 0xffff88817104ef80 (size 224):
comm "xfs_admin", pid 47165, jiffies 4298708825 (age 1333.476s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
60 a8 b3 00 81 88 ff ff a8 10 5a 00 81 88 ff ff `.........Z.....
backtrace:
[<ffffffff819171e1>] __alloc_file+0x21/0x250
[<ffffffff81918061>] alloc_empty_file+0x41/0xf0
[<ffffffff81948cda>] path_openat+0xea/0x3d30
[<ffffffff8194ec89>] do_filp_open+0x1b9/0x290
[<ffffffff8192660e>] do_open_execat+0xce/0x5b0
[<ffffffff81926b17>] open_exec+0x27/0x50
[<ffffffff81a69250>] load_elf_binary+0x510/0x3ed0
[<ffffffff81927759>] bprm_execve+0x599/0x1240
[<ffffffff8192a997>] do_execveat_common.isra.0+0x4c7/0x680
[<ffffffff8192b078>] __x64_sys_execve+0x88/0xb0
[<ffffffff83bbf0a5>] do_syscall_64+0x35/0x80
If "interp_elf_ex" fails to allocate memory in load_elf_binary(),
the program will take the "out_free_ph" error handing path,
resulting in "interpreter" file resource is not released.
Fix it by adding an error handing path "out_free_file", which will
release the file resource when "interp_elf_ex" failed to allocate
memory.
Fixes: 0693ffebcfe5 ("fs/binfmt_elf.c: allocate less for static executable")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221024154421.982230-1-lizetao1@huawei.com
|
|
unshare_sighand should only access oldsighand->action
while holding oldsighand->siglock, to make sure that
newsighand->action is in a consistent state.
Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/AM8PR10MB470871DEBD1DED081F9CC391E4389@AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM
|
|
[BUG]
There are two reports (the earliest one from LKP, a more recent one from
kernel bugzilla) that we can have some chunks with 0 as sub_stripes.
This will cause divide-by-zero errors at btrfs_rmap_block, which is
introduced by a recent kernel patch ac0677348f3c ("btrfs: merge
calculations for simple striped profiles in btrfs_rmap_block"):
if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
BTRFS_BLOCK_GROUP_RAID10)) {
stripe_nr = stripe_nr * map->num_stripes + i;
stripe_nr = div_u64(stripe_nr, map->sub_stripes); <<<
}
[CAUSE]
From the more recent report, it has been proven that we have some chunks
with 0 as sub_stripes, mostly caused by older mkfs.
It turns out that the mkfs.btrfs fix is only introduced in 6718ab4d33aa
("btrfs-progs: Initialize sub_stripes to 1 in btrfs_alloc_data_chunk")
which is included in v5.4 btrfs-progs release.
So there would be quite some old filesystems with such 0 sub_stripes.
[FIX]
Just don't trust the sub_stripes values from disk.
We have a trusted btrfs_raid_array[] to fetch the correct sub_stripes
numbers for each profile and that are fixed.
By this, we can keep the compatibility with older filesystems while
still avoid divide-by-zero bugs.
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Viktor Kuzmin <kvaster@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216559
Fixes: ac0677348f3c ("btrfs: merge calculations for simple striped profiles in btrfs_rmap_block")
CC: stable@vger.kernel.org # 6.0
Reviewed-by: Su Yue <glass@fydeos.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The type of parameter generation has been u32 since the beginning,
however all callers pass a u64 generation, so unify the types to prevent
potential loss.
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 9ed0a72e5b35 ("btrfs: send: fix failures when processing inodes with
no links") tries to fix all incremental send cases of orphan inodes the
send operation will meet. However, there's still a bug causing the corner
subcase fails with a ENOENT error.
Here's shortened steps of that subcase:
$ btrfs subvolume create vol
$ touch vol/foo
$ btrfs subvolume snapshot -r vol snap1
$ btrfs subvolume snapshot -r vol snap2
# Turn the second snapshot to RW mode and delete the file while
# holding an open file descriptor on it
$ btrfs property set snap2 ro false
$ exec 73<snap2/foo
$ rm snap2/foo
# Set the second snapshot back to RO mode and do an incremental send
# with an unusal reverse order
$ btrfs property set snap2 ro true
$ btrfs send -p snap2 snap1 > /dev/null
At subvol snap1
ERROR: send ioctl failed with -2: No such file or directory
It's subcase 3 of BTRFS_COMPARE_TREE_CHANGED in the commit 9ed0a72e5b35
("btrfs: send: fix failures when processing inodes with no links"). And
it's not a common case. We still have not met it in the real world.
Theoretically, this case can happen in a batch cascading snapshot backup.
In cascading backups, the receive operation in the middle may cause orphan
inodes to appear because of the open file descriptors on the snapshot files
during receiving. And if we don't do the batch snapshot backups in their
creation order, then we can have an inode, which is an orphan in the parent
snapshot but refers to a file in the send snapshot. Since an orphan inode
has no paths, the send operation will fail with a ENOENT error if it
tries to generate a path for it.
In that patch, this subcase will be treated as an inode with a new
generation. However, when the routine tries to delete the old paths in
the parent snapshot, the function process_all_refs() doesn't check whether
there are paths recorded or not before it calls the function
process_recorded_refs(). And the function process_recorded_refs() try
to get the first path in the parent snapshot in the beginning. Since it has
no paths in the parent snapshot, the send operation fails.
To fix this, we can easily put a link count check to avoid entering the
deletion routine like what we do a link count check to avoid creating a
new one. Moreover, we can assume that the function process_all_refs()
can always collect references to process because we know it has a
positive link count.
Fixes: 9ed0a72e5b35 ("btrfs: send: fix failures when processing inodes with no links")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Previous commit a05d3c915314 ("btrfs: check superblock to ensure the fs
was not modified at thaw time") only checks the content of the super
block, but it doesn't really check if the on-disk super block has a
matching checksum.
This patch will add the checksum verification to thaw time superblock
verification.
This involves the following extra changes:
- Export btrfs_check_super_csum()
As we need to call it in super.c.
- Change the argument list of btrfs_check_super_csum()
Instead of passing a char *, directly pass struct btrfs_super_block *
pointer.
- Verify that our checksum type didn't change before checking the
checksum value, like it's done at mount time
Fixes: a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We have been seeing the following panic in production
kernel BUG at fs/btrfs/tree-mod-log.c:677!
invalid opcode: 0000 [#1] SMP
RIP: 0010:tree_mod_log_rewind+0x1b4/0x200
RSP: 0000:ffffc9002c02f890 EFLAGS: 00010293
RAX: 0000000000000003 RBX: ffff8882b448c700 RCX: 0000000000000000
RDX: 0000000000008000 RSI: 00000000000000a7 RDI: ffff88877d831c00
RBP: 0000000000000002 R08: 000000000000009f R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000100c40 R12: 0000000000000001
R13: ffff8886c26d6a00 R14: ffff88829f5424f8 R15: ffff88877d831a00
FS: 00007fee1d80c780(0000) GS:ffff8890400c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fee1963a020 CR3: 0000000434f33002 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
btrfs_get_old_root+0x12b/0x420
btrfs_search_old_slot+0x64/0x2f0
? tree_mod_log_oldest_root+0x3d/0xf0
resolve_indirect_ref+0xfd/0x660
? ulist_alloc+0x31/0x60
? kmem_cache_alloc_trace+0x114/0x2c0
find_parent_nodes+0x97a/0x17e0
? ulist_alloc+0x30/0x60
btrfs_find_all_roots_safe+0x97/0x150
iterate_extent_inodes+0x154/0x370
? btrfs_search_path_in_tree+0x240/0x240
iterate_inodes_from_logical+0x98/0xd0
? btrfs_search_path_in_tree+0x240/0x240
btrfs_ioctl_logical_to_ino+0xd9/0x180
btrfs_ioctl+0xe2/0x2ec0
? __mod_memcg_lruvec_state+0x3d/0x280
? do_sys_openat2+0x6d/0x140
? kretprobe_dispatcher+0x47/0x70
? kretprobe_rethook_handler+0x38/0x50
? rethook_trampoline_handler+0x82/0x140
? arch_rethook_trampoline_callback+0x3b/0x50
? kmem_cache_free+0xfb/0x270
? do_sys_openat2+0xd5/0x140
__x64_sys_ioctl+0x71/0xb0
do_syscall_64+0x2d/0x40
Which is this code in tree_mod_log_rewind()
switch (tm->op) {
case BTRFS_MOD_LOG_KEY_REMOVE_WHILE_FREEING:
BUG_ON(tm->slot < n);
This occurs because we replay the nodes in order that they happened, and
when we do a REPLACE we will log a REMOVE_WHILE_FREEING for every slot,
starting at 0. 'n' here is the number of items in this block, which in
this case was 1, but we had 2 REMOVE_WHILE_FREEING operations.
The actual root cause of this was that we were replaying operations for
a block that shouldn't have been replayed. Consider the following
sequence of events
1. We have an already modified root, and we do a btrfs_get_tree_mod_seq().
2. We begin removing items from this root, triggering KEY_REPLACE for
it's child slots.
3. We remove one of the 2 children this root node points to, thus triggering
the root node promotion of the remaining child, and freeing this node.
4. We modify a new root, and re-allocate the above node to the root node of
this other root.
The tree mod log looks something like this
logical 0 op KEY_REPLACE (slot 1) seq 2
logical 0 op KEY_REMOVE (slot 1) seq 3
logical 0 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 4
logical 4096 op LOG_ROOT_REPLACE (old logical 0) seq 5
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 1) seq 6
logical 8192 op KEY_REMOVE_WHILE_FREEING (slot 0) seq 7
logical 0 op LOG_ROOT_REPLACE (old logical 8192) seq 8
>From here the bug is triggered by the following steps
1. Call btrfs_get_old_root() on the new_root.
2. We call tree_mod_log_oldest_root(btrfs_root_node(new_root)), which is
currently logical 0.
3. tree_mod_log_oldest_root() calls tree_mod_log_search_oldest(), which
gives us the KEY_REPLACE seq 2, and since that's not a
LOG_ROOT_REPLACE we incorrectly believe that we don't have an old
root, because we expect that the most recent change should be a
LOG_ROOT_REPLACE.
4. Back in tree_mod_log_oldest_root() we don't have a LOG_ROOT_REPLACE,
so we don't set old_root, we simply use our existing extent buffer.
5. Since we're using our existing extent buffer (logical 0) we call
tree_mod_log_search(0) in order to get the newest change to start the
rewind from, which ends up being the LOG_ROOT_REPLACE at seq 8.
6. Again since we didn't find an old_root we simply clone logical 0 at
it's current state.
7. We call tree_mod_log_rewind() with the cloned extent buffer.
8. Set n = btrfs_header_nritems(logical 0), which would be whatever the
original nritems was when we COWed the original root, say for this
example it's 2.
9. We start from the newest operation and work our way forward, so we
see LOG_ROOT_REPLACE which we ignore.
10. Next we see KEY_REMOVE_WHILE_FREEING for slot 0, which triggers the
BUG_ON(tm->slot < n), because it expects if we've done this we have a
completely empty extent buffer to replay completely.
The correct thing would be to find the first LOG_ROOT_REPLACE, and then
get the old_root set to logical 8192. In fact making that change fixes
this particular problem.
However consider the much more complicated case. We have a child node
in this tree and the above situation. In the above case we freed one
of the child blocks at the seq 3 operation. If this block was also
re-allocated and got new tree mod log operations we would have a
different problem. btrfs_search_old_slot(orig root) would get down to
the logical 0 root that still pointed at that node. However in
btrfs_search_old_slot() we call tree_mod_log_rewind(buf) directly. This
is not context aware enough to know which operations we should be
replaying. If the block was re-allocated multiple times we may only
want to replay a range of operations, and determining what that range is
isn't possible to determine.
We could maybe solve this by keeping track of which root the node
belonged to at every tree mod log operation, and then passing this
around to make sure we're only replaying operations that relate to the
root we're trying to rewind.
However there's a simpler way to solve this problem, simply disallow
reallocations if we have currently running tree mod log users. We
already do this for leaf's, so we're simply expanding this to nodes as
well. This is a relatively uncommon occurrence, and the problem is
complicated enough I'm worried that we will still have corner cases in
the reallocation case. So fix this in the most straightforward way
possible.
Fixes: bd989ba359f2 ("Btrfs: add tree modification log functions")
CC: stable@vger.kernel.org # 3.3+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
After changes in commit 917f32a23501 ("btrfs: give struct btrfs_bio a
real end_io handler") the layout of btrfs_bio can be improved. There
are two holes and the structure size is 264 bytes on release build. By
reordering the iterator we can get rid of the holes and the size is 256
bytes which fits to slabs much better.
Final layout:
struct btrfs_bio {
unsigned int mirror_num; /* 0 4 */
struct bvec_iter iter; /* 4 20 */
u64 file_offset; /* 24 8 */
struct btrfs_device * device; /* 32 8 */
u8 * csum; /* 40 8 */
u8 csum_inline[64]; /* 48 64 */
/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
btrfs_bio_end_io_t end_io; /* 112 8 */
void * private; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
struct work_struct end_io_work; /* 128 32 */
struct bio bio; /* 160 96 */
/* size: 256, cachelines: 4, members: 10 */
};
Fixes: 917f32a23501 ("btrfs: give struct btrfs_bio a real end_io handler")
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently if full_stripe_write() failed to allocate the pages for
parity, it will call __free_raid_bio() first, then return -ENOMEM.
But some caller of full_stripe_write() will also call __free_raid_bio()
again, this would cause double freeing.
And it's not a logically sound either, normally we should either free
the memory at the same level where we allocated it, or let endio to
handle everything.
So this patch will solve the double freeing by make
raid56_parity_write() to handle the error and free the rbio.
Just like what we do in raid56_parity_recover().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In raid56_alloc_missing_rbio(), if we can not determine where the
missing device is inside the full stripe, we just BUG_ON().
This is not necessary especially the only caller inside scrub.c is
already properly checking the return value, and will treat it as a
memory allocation failure.
Fix the error handling by:
- Add an extra warning for the reason
Although personally speaking it may be better to be an ASSERT().
- Properly free the allocated rbio
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There is a kmemleak when writedata alloc failed:
unreferenced object 0xffff888175ae4000 (size 4096):
comm "dd", pid 19419, jiffies 4296028749 (age 739.396s)
hex dump (first 32 bytes):
80 02 b0 04 00 ea ff ff c0 02 b0 04 00 ea ff ff ................
80 22 4c 04 00 ea ff ff c0 22 4c 04 00 ea ff ff ."L......"L.....
backtrace:
[<0000000072fdbb86>] __kmalloc_node+0x50/0x150
[<0000000039faf56f>] __iov_iter_get_pages_alloc+0x605/0xdd0
[<00000000f862a9d4>] iov_iter_get_pages_alloc2+0x3b/0x80
[<000000008f226067>] cifs_write_from_iter+0x2ae/0xe40
[<000000001f78f2f1>] __cifs_writev+0x337/0x5c0
[<00000000257fcef5>] vfs_write+0x503/0x690
[<000000008778a238>] ksys_write+0xb9/0x150
[<00000000ed82047c>] do_syscall_64+0x35/0x80
[<000000003365551d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
__iov_iter_get_pages_alloc+0x605/0xdd0 is:
want_pages_array at lib/iov_iter.c:1304
(inlined by) __iov_iter_get_pages_alloc at lib/iov_iter.c:1457
If writedata allocate failed, the pages and pagevec should be cleanup.
Fixes: 8c5f9c1ab7cb ("CIFS: Add support for direct I/O write")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
There is a memory leak when writedata alloc failed:
unreferenced object 0xffff888192364000 (size 8192):
comm "sync", pid 22839, jiffies 4297313967 (age 60.230s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000027de0814>] __kmalloc+0x4d/0x150
[<00000000b21e81ab>] cifs_writepages+0x35f/0x14a0
[<0000000076f7d20e>] do_writepages+0x10a/0x360
[<00000000d6a36edc>] filemap_fdatawrite_wbc+0x95/0xc0
[<000000005751a323>] __filemap_fdatawrite_range+0xa7/0xe0
[<0000000088afb0ca>] file_write_and_wait_range+0x66/0xb0
[<0000000063dbc443>] cifs_strict_fsync+0x80/0x5f0
[<00000000c4624754>] __x64_sys_fsync+0x40/0x70
[<000000002c0dc744>] do_syscall_64+0x35/0x80
[<0000000052f46bee>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
cifs_writepages+0x35f/0x14a0 is:
kmalloc_array at include/linux/slab.h:628
(inlined by) kcalloc at include/linux/slab.h:659
(inlined by) cifs_writedata_alloc at fs/cifs/file.c:2438
(inlined by) wdata_alloc_and_fillpages at fs/cifs/file.c:2527
(inlined by) cifs_writepages at fs/cifs/file.c:2705
If writedata alloc failed in cifs_writedata_alloc(), the pages array
should be freed.
Fixes: 8e7360f67e75 ("CIFS: Add support for direct pages in wdata")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- fixes for the EFI variable store refactor that landed in v6.0
- fixes for issues that were introduced during the merge window
- back out some changes related to EFI zboot signing - we'll add a
better solution for this during the next cycle
* tag 'efi-fixes-for-v6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0
efi: libstub: Fix incorrect payload size in zboot header
efi: libstub: Give efi_main() asmlinkage qualification
efi: efivars: Fix variable writes without query_variable_store()
efi: ssdt: Don't free memory if ACPI table was loaded successfully
efi: libstub: Remove zboot signing from build options
|
|
Pull cifs fixes from Steve French:
- memory leak fixes
- fixes for directory leases, including an important one which fixes a
problem noticed by git functional tests
- fixes relating to missing free_xid calls (helpful for
tracing/debugging of entry/exit into cifs.ko)
- a multichannel fix
- a small cleanup fix (use of list_move instead of list_del/list_add)
* tag '6.1-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: fix memory leaks in session setup
cifs: drop the lease for cached directories on rmdir or rename
smb3: interface count displayed incorrectly
cifs: Fix memory leak when build ntlmssp negotiate blob failed
cifs: set rc to -ENOENT if we can not get a dentry for the cached dir
cifs: use LIST_HEAD() and list_move() to simplify code
cifs: Fix xid leak in cifs_get_file_info_unix()
cifs: Fix xid leak in cifs_ses_add_channel()
cifs: Fix xid leak in cifs_flock()
cifs: Fix xid leak in cifs_copy_file_range()
cifs: Fix xid leak in cifs_create()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Fixes for patches merged in v6.1"
* tag 'nfsd-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: ensure we always call fh_verify_error tracepoint
NFSD: unregister shrinker when nfsd_init_net() fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morron:
"Seventeen hotfixes, mainly for MM.
Five are cc:stable and the remainder address post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nouveau: fix migrate_to_ram() for faulting page
mm/huge_memory: do not clobber swp_entry_t during THP split
hugetlb: fix memory leak associated with vma_lock structure
mm/page_alloc: reduce potential fragmentation in make_alloc_exact()
mm: /proc/pid/smaps_rollup: fix maple tree search
mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
mm/mmap: fix MAP_FIXED address return on VMA merge
mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
mm/mmap: undo ->mmap() when mas_preallocate() fails
init: Kconfig: fix spelling mistake "satify" -> "satisfy"
ocfs2: clear dinode links count in case of error
ocfs2: fix BUG when iput after ocfs2_mknod fails
gcov: support GCC 12.1 and newer compilers
zsmalloc: zs_destroy_pool: add size_class NULL check
mm/mempolicy: fix mbind_range() arguments to vma_merge()
mailmap: update email for Qais Yousef
mailmap: update Dan Carpenter's email address
|
|
Commit bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer")
refactored the efivars layer so that the 'business logic' related to
which UEFI variables affect the boot flow in which way could be moved
out of it, and into the efivarfs driver.
This inadvertently broke setting variables on firmware implementations
that lack the QueryVariableInfo() boot service, because we no longer
tolerate a EFI_UNSUPPORTED result from check_var_size() when calling
efivar_entry_set_get_size(), which now ends up calling check_var_size()
a second time inadvertently.
If QueryVariableInfo() is missing, we support writes of up to 64k -
let's move that logic into check_var_size(), and drop the redundant
call.
Cc: <stable@vger.kernel.org> # v6.0
Fixes: bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
/proc/pid/smaps_rollup showed 0 kB for everything: now find first vma.
Link: https://lkml.kernel.org/r/3011bee7-182-97a2-1083-d5f5b688e54b@google.com
Fixes: c4c84f06285e ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In ocfs2_mknod(), if error occurs after dinode successfully allocated,
ocfs2 i_links_count will not be 0.
So even though we clear inode i_nlink before iput in error handling, it
still won't wipe inode since we'll refresh inode from dinode during inode
lock. So just like clear inode i_nlink, we clear ocfs2 i_links_count as
well. Also do the same change for ocfs2_symlink().
Link: https://lkml.kernel.org/r/20221017130227.234480-2-joseph.qi@linux.alibaba.com
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit b1529a41f777 "ocfs2: should reclaim the inode if
'__ocfs2_mknod_locked' returns an error" tried to reclaim the claimed
inode if __ocfs2_mknod_locked() fails later. But this introduce a race,
the freed bit may be reused immediately by another thread, which will
update dinode, e.g. i_generation. Then iput this inode will lead to BUG:
inode->i_generation != le32_to_cpu(fe->i_generation)
We could make this inode as bad, but we did want to do operations like
wipe in some cases. Since the claimed inode bit can only affect that an
dinode is missing and will return back after fsck, it seems not a big
problem. So just leave it as is by revert the reclaim logic.
Link: https://lkml.kernel.org/r/20221017130227.234480-1-joseph.qi@linux.alibaba.com
Fixes: b1529a41f777 ("ocfs2: should reclaim the inode if '__ocfs2_mknod_locked' returns an error")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Yan Wang <wangyan122@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kmemleak reported a sequence of memory leaks, and one of them indicated we
failed to free a pointer:
comm "mount", pid 19610, jiffies 4297086464 (age 60.635s)
hex dump (first 8 bytes):
73 64 61 00 81 88 ff ff sda.....
backtrace:
[<00000000d77f3e04>] kstrdup_const+0x46/0x70
[<00000000e51fa804>] kobject_set_name_vargs+0x2f/0xb0
[<00000000247cd595>] kobject_init_and_add+0xb0/0x120
[<00000000f9139aaf>] xfs_mountfs+0x367/0xfc0
[<00000000250d3caf>] xfs_fs_fill_super+0xa16/0xdc0
[<000000008d873d38>] get_tree_bdev+0x256/0x390
[<000000004881f3fa>] vfs_get_tree+0x41/0xf0
[<000000008291ab52>] path_mount+0x9b3/0xdd0
[<0000000022ba8f2d>] __x64_sys_mount+0x190/0x1d0
As mentioned in kobject_init_and_add() comment, if this function
returns an error, kobject_put() must be called to properly clean up
the memory associated with the object. Apparently, xfs_sysfs_init()
does not follow such a requirement. When kobject_init_and_add()
returns an error, the space of kobj->kobject.name alloced by
kstrdup_const() is unfree, which will cause the above stack.
Fix it by adding kobject_put() when kobject_init_and_add returns an
error.
Fixes: a31b1d3d89e4 ("xfs: add xfs_mount sysfs kobject")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
When `xfs_sysfs_init` returns failed, `mp->m_errortag` needs to free.
Otherwise kmemleak would report memory leak after mounting xfs image:
unreferenced object 0xffff888101364900 (size 192):
comm "mount", pid 13099, jiffies 4294915218 (age 335.207s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f08ad25c>] __kmalloc+0x41/0x1b0
[<00000000dca9aeb6>] kmem_alloc+0xfd/0x430
[<0000000040361882>] xfs_errortag_init+0x20/0x110
[<00000000b384a0f6>] xfs_mountfs+0x6ea/0x1a30
[<000000003774395d>] xfs_fs_fill_super+0xe10/0x1a80
[<000000009cf07b6c>] get_tree_bdev+0x3e7/0x700
[<00000000046b5426>] vfs_get_tree+0x8e/0x2e0
[<00000000952ec082>] path_mount+0xf8c/0x1990
[<00000000beb1f838>] do_mount+0xee/0x110
[<000000000e9c41bb>] __x64_sys_mount+0x14b/0x1f0
[<00000000f7bb938e>] do_syscall_64+0x3b/0x90
[<000000003fcd67a9>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: c68401011522 ("xfs: expose errortag knobs via sysfs")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
The assignment to pointer lip is not really required, the pointer lip
is redundant and can be removed.
Cleans up clang-scan warning:
warning: Although the value stored to 'lip' is used in the enclosing
expression, the value is never actually read from 'lip'
[deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
For leaf dir, In most cases, there should be as many bestfree slots
as the dir data blocks that can fit under i_size (except for [1]).
Root cause is we don't examin the number bestfree slots, when the slots
number less than dir data blocks, if we need to allocate new dir data
block and update the bestfree array, we will use the dir block number as
index to assign bestfree array, while we did not check the leaf buf
boundary which may cause UAF or other memory access problem. This issue
can also triggered with test cases xfs/473 from fstests.
According to Dave Chinner & Darrick's suggestion, adding buffer verifier
to detect this abnormal situation in time.
Simplify the testcase for fstest xfs/554 [1]
The error log is shown as follows:
==================================================================
BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0
Write of size 2 at addr ffff88810168b000 by task touch/1552
CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
print_report.cold+0xf6/0x691
kasan_report+0xa8/0x120
xfs_dir2_leaf_addname+0x1995/0x1ac0
xfs_dir_createname+0x58c/0x7f0
xfs_create+0x7af/0x1010
xfs_generic_create+0x270/0x5e0
path_openat+0x270b/0x3450
do_filp_open+0x1cf/0x2b0
do_sys_openat2+0x46b/0x7a0
do_sys_open+0xb7/0x130
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe4d9e9312b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c
RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000
</TASK>
The buggy address belongs to the physical page:
page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x10168b
flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000
raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78
XDD3[S5........x
XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file
fs/xfs/libxfs/xfs_dir2_data.c. Caller
xfs_dir2_data_use_free+0x28a/0xeb0
CPU: 5 PID: 1552 Comm: touch Tainted: G B 6.0.0-rc3+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x4d/0x66
xfs_corruption_error+0x132/0x150
xfs_dir2_data_use_free+0x198/0xeb0
xfs_dir2_leaf_addname+0xa59/0x1ac0
xfs_dir_createname+0x58c/0x7f0
xfs_create+0x7af/0x1010
xfs_generic_create+0x270/0x5e0
path_openat+0x270b/0x3450
do_filp_open+0x1cf/0x2b0
do_sys_openat2+0x46b/0x7a0
do_sys_open+0xb7/0x130
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe4d9e9312b
Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0
75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00
f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b
RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c
RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941
R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000
</TASK>
XFS (sdb): Corruption detected. Unmount and run xfs_repair
[1] https://lore.kernel.org/all/20220928095355.2074025-1-guoxuenan@huawei.com/
Reviewed-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
There's a race in fuse's readdir cache that can result in an uninitilized
page being read. The page lock is supposed to prevent this from happening
but in the following case it doesn't:
Two fuse_add_dirent_to_cache() start out and get the same parameters
(size=0,offset=0). One of them wins the race to create and lock the page,
after which it fills in data, sets rdc.size and unlocks the page.
In the meantime the page gets evicted from the cache before the other
instance gets to run. That one also creates the page, but finds the
size to be mismatched, bails out and leaves the uninitialized page in the
cache.
Fix by marking a filled page uptodate and ignoring non-uptodate pages.
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for
fscrypt_master_key") moved the keyring destruction from __put_super() to
generic_shutdown_super() so that the filesystem's block device(s) are
still available. Unfortunately, this causes a memory leak in the case
where a mount is attempted with the test_dummy_encryption mount option,
but the mount fails after the option has already been processed.
To fix this, attempt the keyring destruction in both places.
Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com
Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org
|
|
To 2.40
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We were only zeroing out the ntlmssp blob but forgot to free the
allocated buffer in the end of SMB2_sess_auth_rawntlmssp_negotiate()
and SMB2_sess_auth_rawntlmssp_authenticate() functions.
This fixes below kmemleak reports:
unreferenced object 0xffff88800ddcfc60 (size 96):
comm "mount.cifs", pid 758, jiffies 4294696066 (age 42.967s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d0beeb29>] __kmalloc+0x39/0xa0
[<00000000e3834047>] build_ntlmssp_smb3_negotiate_blob+0x2c/0x110 [cifs]
[<00000000e85f5ab2>] SMB2_sess_auth_rawntlmssp_negotiate+0xd3/0x230 [cifs]
[<0000000080fdb897>] SMB2_sess_setup+0x16c/0x2a0 [cifs]
[<000000009af320a8>] cifs_setup_session+0x13b/0x370 [cifs]
[<00000000f15d5982>] cifs_get_smb_ses+0x643/0xb90 [cifs]
[<00000000fe15eb90>] mount_get_conns+0x63/0x3e0 [cifs]
[<00000000768aba03>] mount_get_dfs_conns+0x16/0xa0 [cifs]
[<00000000cf1cf146>] cifs_mount+0x1c2/0x9a0 [cifs]
[<000000000d66b51e>] cifs_smb3_do_mount+0x10e/0x710 [cifs]
[<0000000077a996c5>] smb3_get_tree+0xf4/0x200 [cifs]
[<0000000094dbd041>] vfs_get_tree+0x23/0xc0
[<000000003a8561de>] path_mount+0x2d3/0xb50
[<00000000ed5c86d6>] __x64_sys_mount+0x102/0x140
[<00000000142142f3>] do_syscall_64+0x3b/0x90
[<00000000e2b89731>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
unreferenced object 0xffff88801437f000 (size 512):
comm "mount.cifs", pid 758, jiffies 4294696067 (age 42.970s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d0beeb29>] __kmalloc+0x39/0xa0
[<00000000004f53d2>] build_ntlmssp_auth_blob+0x4f/0x340 [cifs]
[<000000005f333084>] SMB2_sess_auth_rawntlmssp_authenticate+0xd4/0x250 [cifs]
[<0000000080fdb897>] SMB2_sess_setup+0x16c/0x2a0 [cifs]
[<000000009af320a8>] cifs_setup_session+0x13b/0x370 [cifs]
[<00000000f15d5982>] cifs_get_smb_ses+0x643/0xb90 [cifs]
[<00000000fe15eb90>] mount_get_conns+0x63/0x3e0 [cifs]
[<00000000768aba03>] mount_get_dfs_conns+0x16/0xa0 [cifs]
[<00000000cf1cf146>] cifs_mount+0x1c2/0x9a0 [cifs]
[<000000000d66b51e>] cifs_smb3_do_mount+0x10e/0x710 [cifs]
[<0000000077a996c5>] smb3_get_tree+0xf4/0x200 [cifs]
[<0000000094dbd041>] vfs_get_tree+0x23/0xc0
[<000000003a8561de>] path_mount+0x2d3/0xb50
[<00000000ed5c86d6>] __x64_sys_mount+0x102/0x140
[<00000000142142f3>] do_syscall_64+0x3b/0x90
[<00000000e2b89731>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: a4e430c8c8ba ("cifs: replace kfree() with kfree_sensitive() for sensitive data")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When we delete or rename a directory we must also drop any cached lease we have
on the directory.
Fixes: a350d6e73f5e ("cifs: enable caching of directories for which a lease is held")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The "Server interfaces" count in /proc/fs/cifs/DebugData increases
as the interfaces are requeried, rather than being reset to the new
value. This could cause a problem if the server disabled
multichannel as the iface_count is checked in try_adding_channels
to see if multichannel still supported.
Also fixes a coverity warning:
Addresses-Coverity: 1526374 ("Concurrent data access violations (MISSING_LOCK)")
Cc: <stable@vger.kernel.org>
Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
KASAN reported a UAF bug when I was running xfs/235:
BUG: KASAN: use-after-free in xlog_recover_process_intents+0xa77/0xae0 [xfs]
Read of size 8 at addr ffff88804391b360 by task mount/5680
CPU: 2 PID: 5680 Comm: mount Not tainted 6.0.0-xfsx #6.0.0 77e7b52a4943a975441e5ac90a5ad7748b7867f6
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x2cc/0x682
kasan_report+0xa3/0x120
xlog_recover_process_intents+0xa77/0xae0 [xfs fb841c7180aad3f8359438576e27867f5795667e]
xlog_recover_finish+0x7d/0x970 [xfs fb841c7180aad3f8359438576e27867f5795667e]
xfs_log_mount_finish+0x2d7/0x5d0 [xfs fb841c7180aad3f8359438576e27867f5795667e]
xfs_mountfs+0x11d4/0x1d10 [xfs fb841c7180aad3f8359438576e27867f5795667e]
xfs_fs_fill_super+0x13d5/0x1a80 [xfs fb841c7180aad3f8359438576e27867f5795667e]
get_tree_bdev+0x3da/0x6e0
vfs_get_tree+0x7d/0x240
path_mount+0xdd3/0x17d0
__x64_sys_mount+0x1fa/0x270
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7ff5bc069eae
Code: 48 8b 0d 85 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 1f 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe433fd448 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff5bc069eae
RDX: 00005575d7213290 RSI: 00005575d72132d0 RDI: 00005575d72132b0
RBP: 00005575d7212fd0 R08: 00005575d7213230 R09: 00005575d7213fe0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00005575d7213290 R14: 00005575d72132b0 R15: 00005575d7212fd0
</TASK>
Allocated by task 5680:
kasan_save_stack+0x1e/0x40
__kasan_slab_alloc+0x66/0x80
kmem_cache_alloc+0x152/0x320
xfs_rui_init+0x17a/0x1b0 [xfs]
xlog_recover_rui_commit_pass2+0xb9/0x2e0 [xfs]
xlog_recover_items_pass2+0xe9/0x220 [xfs]
xlog_recover_commit_trans+0x673/0x900 [xfs]
xlog_recovery_process_trans+0xbe/0x130 [xfs]
xlog_recover_process_data+0x103/0x2a0 [xfs]
xlog_do_recovery_pass+0x548/0xc60 [xfs]
xlog_do_log_recovery+0x62/0xc0 [xfs]
xlog_do_recover+0x73/0x480 [xfs]
xlog_recover+0x229/0x460 [xfs]
xfs_log_mount+0x284/0x640 [xfs]
xfs_mountfs+0xf8b/0x1d10 [xfs]
xfs_fs_fill_super+0x13d5/0x1a80 [xfs]
get_tree_bdev+0x3da/0x6e0
vfs_get_tree+0x7d/0x240
path_mount+0xdd3/0x17d0
__x64_sys_mount+0x1fa/0x270
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 5680:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
____kasan_slab_free+0x144/0x1b0
slab_free_freelist_hook+0xab/0x180
kmem_cache_free+0x1f1/0x410
xfs_rud_item_release+0x33/0x80 [xfs]
xfs_trans_free_items+0xc3/0x220 [xfs]
xfs_trans_cancel+0x1fa/0x590 [xfs]
xfs_rui_item_recover+0x913/0xd60 [xfs]
xlog_recover_process_intents+0x24e/0xae0 [xfs]
xlog_recover_finish+0x7d/0x970 [xfs]
xfs_log_mount_finish+0x2d7/0x5d0 [xfs]
xfs_mountfs+0x11d4/0x1d10 [xfs]
xfs_fs_fill_super+0x13d5/0x1a80 [xfs]
get_tree_bdev+0x3da/0x6e0
vfs_get_tree+0x7d/0x240
path_mount+0xdd3/0x17d0
__x64_sys_mount+0x1fa/0x270
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The buggy address belongs to the object at ffff88804391b300
which belongs to the cache xfs_rui_item of size 688
The buggy address is located 96 bytes inside of
688-byte region [ffff88804391b300, ffff88804391b5b0)
The buggy address belongs to the physical page:
page:ffffea00010e4600 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888043919320 pfn:0x43918
head:ffffea00010e4600 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff)
raw: 04fff80000010200 0000000000000000 dead000000000122 ffff88807f0eadc0
raw: ffff888043919320 0000000080140010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88804391b200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88804391b280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88804391b300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88804391b380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88804391b400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The test fuzzes an rmap btree block and starts writer threads to induce
a filesystem shutdown on the corrupt block. When the filesystem is
remounted, recovery will try to replay the committed rmap intent item,
but the corruption problem causes the recovery transaction to fail.
Cancelling the transaction frees the RUD, which frees the RUI that we
recovered.
When we return to xlog_recover_process_intents, @lip is now a dangling
pointer, and we cannot use it to find the iop_recover method for the
tracepoint. Hence we must store the item ops before calling
->iop_recover if we want to give it to the tracepoint so that the trace
data will tell us exactly which intent item failed.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fiemap fixes:
- add missing path cache update
- fix processing of delayed data and tree refs during backref
walking, this could lead to reporting incorrect extent sharing
- fix extent range locking under heavy contention to avoid deadlocks
- make it possible to test send v3 in debugging mode
- update links in MAINTAINERS
* tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: update btrfs website links and files
btrfs: ignore fiemap path cache if we have multiple leaves for a data extent
btrfs: fix processing of delayed tree block refs during backref walking
btrfs: fix processing of delayed data refs during backref walking
btrfs: delete stale comments after merge conflict resolution
btrfs: unlock locked extent area if we have contention
btrfs: send: update command for protocol version check
btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG
btrfs: add missing path cache update during fiemap
|
|
There is a memory leak when mount cifs:
unreferenced object 0xffff888166059600 (size 448):
comm "mount.cifs", pid 51391, jiffies 4295596373 (age 330.596s)
hex dump (first 32 bytes):
fe 53 4d 42 40 00 00 00 00 00 00 00 01 00 82 00 .SMB@...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000060609a61>] mempool_alloc+0xe1/0x260
[<00000000adfa6c63>] cifs_small_buf_get+0x24/0x60
[<00000000ebb404c7>] __smb2_plain_req_init+0x32/0x460
[<00000000bcf875b4>] SMB2_sess_alloc_buffer+0xa4/0x3f0
[<00000000753a2987>] SMB2_sess_auth_rawntlmssp_negotiate+0xf5/0x480
[<00000000f0c1f4f9>] SMB2_sess_setup+0x253/0x410
[<00000000a8b83303>] cifs_setup_session+0x18f/0x4c0
[<00000000854bd16d>] cifs_get_smb_ses+0xae7/0x13c0
[<000000006cbc43d9>] mount_get_conns+0x7a/0x730
[<000000005922d816>] cifs_mount+0x103/0xd10
[<00000000e33def3b>] cifs_smb3_do_mount+0x1dd/0xc90
[<0000000078034979>] smb3_get_tree+0x1d5/0x300
[<000000004371f980>] vfs_get_tree+0x41/0xf0
[<00000000b670d8a7>] path_mount+0x9b3/0xdd0
[<000000005e839a7d>] __x64_sys_mount+0x190/0x1d0
[<000000009404c3b9>] do_syscall_64+0x35/0x80
When build ntlmssp negotiate blob failed, the session setup request
should be freed.
Fixes: 49bd49f983b5 ("cifs: send workstation name during ntlmssp session setup")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We already set rc to this return code further down in the function but
we can set it earlier in order to suppress a smash warning.
Also fix a false positive for Coverity. The reason this is a false positive is
that this happens during umount after all files and directories have been closed
but mosetting on ->on_list to suppress the warning.
Reported-by: Dan carpenter <dan.carpenter@oracle.com>
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1525256 ("Concurrent data access violations")
Fixes: a350d6e73f5e ("cifs: enable caching of directories for which a lease is held")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().
Using list_move() instead of list_del() and list_add().
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If stardup the symlink target failed, should free the xid,
otherwise the xid will be leaked.
Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Before return, should free the xid, otherwise, the
xid will be leaked.
Fixes: d70e9fa55884 ("cifs: try opening channels after mounting")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If not flock, before return -ENOLCK, should free the xid,
otherwise, the xid will be leaked.
Fixes: d0677992d2af ("cifs: add support for flock")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If the file is used by swap, before return -EOPNOTSUPP, should
free the xid, otherwise, the xid will be leaked.
Fixes: 4e8aea30f775 ("smb3: enable swap on SMB3 mounts")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If the cifs already shutdown, we should free the xid before return,
otherwise, the xid will be leaked.
Fixes: 087f757b0129 ("cifs: add shutdown support")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
s_inodes is superblock-specific resource, which should be
protected by sb's specific lock s_inode_list_lock.
Link: https://lore.kernel.org/r/TYCP286MB23238380DE3B74874E8D78ABCA299@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Fixes: 7d41963759fe ("erofs: Support sharing cookies in the same domain")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Partial decompression should be checked after updating length.
It's a new regression when introducing multi-reference pclusters.
Fixes: 2bfab9c0edac ("erofs: record the longest decompressed size in this round")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221014064915.8103-1-hsiangkao@linux.alibaba.com
|
|
If other duplicated copies exist in one decompression shot, should
leave the old page as is rather than replace it with the new duplicated
one. Otherwise, the following cold path to deal with duplicated copies
will use the invalid bvec. It impacts compressed data deduplication.
Also, shift the onlinepage EIO bit to avoid touching the signed bit.
Fixes: 267f2492c8f7 ("erofs: introduce multi-reference pclusters (fully-referenced)")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221012045056.13421-1-hsiangkao@linux.alibaba.com
|
|
Note that we are still accessing 'h_idata_size' and 'h_fragmentoff'
after calling erofs_put_metabuf(), that is not correct. Fix it.
Fixes: ab92184ff8f1 ("erofs: add on-disk compressed tail-packing inline support")
Fixes: b15b2e307c3a ("erofs: support on-disk compressed fragments data")
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221005013528.62977-1-zbestahu@163.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
|