Age | Commit message (Collapse) | Author |
|
Since we have btrfs_fs_info::sb (struct super_block) as our block device
holder, we can safely use fs_holder_ops for all of our block devices.
This enables freezing/thawing the filesystem from the underlying block
devices.
This may enhance hibernation/suspend support since previously
freezing/thawing a block device managed by btrfs won't do anything btrfs
specific, but only syncing the block device.
Thus before this change, freezing the underlying block devices won't
prevent future writes into the filesystem, thus may cause problems for
hibernation/suspend when btrfs is involved.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The file system type is not a very useful holder as it doesn't allow us
to go back to the actual file system instance. Pass the super_block
instead which is useful when passed back to the file system driver.
This matches what is done for all other block device based file systems,
and allows us to remove btrfs_fs_info::bdev_holder completely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently we always call btrfs_open_devices() before creating the
super block.
It's fine for now because:
- No blk_holder_ops is provided
- btrfs_fs_type is used as a holder
This means no matter who wins the device opening race, the holder will be
the same thus not affecting the later sget_fc() race.
And since no blk_holder_ops is provided, no bdev operation is depending on
the holder.
But this will no longer be true if we want to implement a proper
blk_holder_ops using fs_holder_ops.
This means we will need a proper super block as the bdev holder.
To prepare for such change:
- Add btrfs_fs_devices::holding member
This will prevent btrfs_free_stale_devices() and btrfs_close_device()
from deleting the fs_devices when there is another process trying to
mount the fs.
Along with the new member, here come the two helpers,
btrfs_fs_devices_inc_holding() and btrfs_fs_devices_dec_holding().
This will allow us to hold fs_devices without opening it.
This is needed because we cannot hold uuid_mutex while calling
sget_fc(), this will reverse the lock sequence with s_umount, causing
a lockdep warning.
- Delay btrfs_open_devices() until a super block is returned
This means we have to hold the initial fs_devices first, then unlock
uuid_mutex, call sget_fc(), then re-lock uuid_mutex, and decrease the
holding number.
For new super block case, we continue to btrfs_open_devices() with
uuid_mutex hold.
For existing super block case, we can unlock uuid_mutex and continue.
Although this means a more complex error handling path, as if we
didn't call btrfs_open_devices() (either got an existing sb, or
sget_fc() failed), we cannot let btrfs_put_fs_info() cleanup the
fs_devices, as it can be freed at any time after we decrease the hold
on fs_devices and unlock uuid_mutex.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
As part of the preparation for btrfs blk_holder_ops, we want to ensure
the holder of a block device has a proper lifespan.
However btrfs is always using fput() to close a block device, which has
one problem:
- fput() is deferred
Meaning we can have a block device with invalid (aka, freed) holder.
To avoid the problem and align the behavior to other code, just call
bdev_fput() instead.
There is some extra requirement on the locking, but that's all resolved
by previous patches and we should be safe to call bdev_fput().
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Although btrfs is not yet implementing blk_holder_ops, there is a
requirement for proper blk_holder_ops:
- blkdev_put() must not be called under sb->s_umount
The blkdev_put()/bdev_fput() must not be called under sb->s_umount to
avoid lock order reversal with disk->open_mutex.
This is for the proper blk_holder_ops callbacks.
Currently we're fine because we call regular fput() which defers the
blk holder reclaiming.
To prepare for the future of blk_holder_ops, move the
btrfs_close_devices() calls into btrfs_free_fs_info().
That will be called from kill_sb() callbacks, which is also called for
error handing during mount failures, or there is already an existing
super block.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When calling sget_fc(), there are 3 different situations:
a) Critical error
No super block created.
b) A new super block is created
The fc->s_fs_info is transferred to the super block, and fc->s_fs_info
is reset to NULL.
In this case sb->s_root should still be NULL, and needs to be properly
initialized later by btrfs_fill_super().
c) An existing super block is returned
The fc->s_fs_info is untouched, and anything related to that fs_info
should be properly cleaned up.
This is not obvious even with the extra comments at sget_fc().
Enhance the situation by:
- Add comments for case b) and c)
Especially for case c), the fs_info and fs_devices cleanup happens at
different timing, thus needs extra explanation.
- Move the comments closer to case b) and case c)
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[EXISTING PROBLEM]
Currently btrfs mount is split into two parts:
- btrfs_get_tree_subvol()
Which sets up the very basic fs_info, and eventually calls
mount_subvol() to mount the target subvolume.
- btrfs_get_tree_super()
This is the part doing super block allocation and if there is no
existing super block, do the real open_ctree() to open the fs.
However currently we're doing this in a complex re-entering way:
vfs_get_tree()
|- btrfs_get_tree()
|- btrfs_get_tree_subvol()
|- vfs_get_tree()
| |- btrfs_get_tree()
| |- btrfs_get_tree_super()
|- mount_subvol()
This is definitely not that easy to grasp.
[ENHANCEMENT]
The function vfs_get_tree() is only doing the following work:
- Call get_tree() call back
- Call super_wake()
- Call security_sb_set_mnt_opts()
In our case, super_wake() can be skipped, as after
btrfs_get_tree_subvol() finishes, vfs_get_tree() will call super_wake()
on the super block we got anyway.
The same applies to security_sb_set_mnt_opts(), as long as we do not
free the security from our original fc in btrfs_get_tree_subvol(), the
first vfs_get_tree() call will handle the security correctly.
So here we only need to:
- Replace vfs_get_tree() call with btrfs_get_tree_super()
- Keep the existing fc->security for vfs_get_tree() to handle the
security
This will remove the re-entering behavior and make thing much easier to
follow.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_scan_one_device() opens the block device only to read the super
block. Instead of passing a blk_mode_t argument to sometimes open
it for writing, just hard code BLK_OPEN_READ as it will never write
to the device or hand the block_device out to someone else.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_uring_encoded_read()
btrfs_uring_encoded_read() returns early with -ENOTTY if the uring_cmd
is issued with IO_URING_F_COMPAT but the kernel doesn't support compat
syscalls. However, this early return bypasses the syscall accounting.
Go to out_acct instead to ensure the syscall is counted.
Fixes: 34310c442e17 ("btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)")
CC: stable@vger.kernel.org # 6.15+
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The name 'dir' is misleading as it's the inode number of the directory,
so rename it accordingly.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's no reason to pass 'struct path' to btrfs_mksubvol(), though it's
been like the since the first commit 76dda93c6ae2c1 ("Btrfs: add
snapshot/subvolume destroy ioctl"). We only use the dentry so we should
pass it directly.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We pass name and length of subvolumes separately to the related
functions, while this can be a struct qstr which is otherwise used for
dentry interfaces.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
strcpy() is discouraged from use due to lack of bounds checking.
Replaces it with strscpy(), the recommended alternative for null
terminated strings, to follow best practices.
There are instances where strscpy() cannot be used such as where both
the source and destination are character pointers. In that instance we
can use sysfs_emit().
Link: https://github.com/KSPP/linux/issues/88
Suggested-by: Anthony Iliopoulos <ailiop@suse.com>
Signed-off-by: Brahmajit Das <bdas@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Once upon a time there was a need to cache address of extent buffer
pages, as it was a costly operation (map_private_extent_buffer(),
cfed81a04eb555 ("Btrfs: add the ability to cache a pointer into the
eb")). This was not even due to use of HIGHMEM, this had been removed
before that due to possible locking issues (a65917156e3459 ("Btrfs:
stop using highmem for extent_buffers")).
Over time the amount of work in the set/get helpers got reduced and
became quite straightforward bounds checking with an unaligned
read/write, commit db3756c879773c ("btrfs: remove unused
map_private_extent_buffer").
The actual caching of the page_address()/folio_address() in the token
was more work for very little gain. This depended on subsequent access
into the same page/folio, otherwise the cached pointer had to be
updated.
For metadata-heavy operations this showed up in the 'perf top' profile
where the btrfs_get_token_32() calls were at the top, on my testing
machine consuming about 2-3%. The other generic 32/64 bit helpers also
appeared in the profile with similar fraction.
After removing use of the token helpers we can remove them completely,
this leads to reduction of btrfs.ko by 6.7KiB on release config.
text data bss dec hex filename
1463289 115665 16088 1595042 1856a2 pre/btrfs.ko
1456601 115665 16088 1588354 183c82 post/btrfs.ko
DELTA: -6688
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The token versions of set/get accessors will be removed, use the normal
helpers.
There's additional overhead of the token helpers that update the cached
address in case it moves to another page/folio. The normal versions
don't need to do that.
Note this is similar to fill_inode_item() in inode.c but with slight
differences. The two functions could be deduplicated eventually.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The token versions of set/get accessors will be removed, use the normal
helpers.
There's additional overhead of the token helpers that update the cached
address in case it moves to another page/folio. The normal versions
don't need to do that.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The token versions of set/get accessors will be removed, use the normal
helpers. The btrfs_item members use that interface the most but there
are no real benefits anymore.
This reduces stack consumption on x86_64 release config:
setup_items_for_insert -32 (144 -> 112)
__push_leaf_left -32 (176 -> 144)
btrfs_extend_item -16 (104 -> 88)
copy_for_split -32 (144 -> 112)
btrfs_del_items -16 (144 -> 128)
btrfs_truncate_item -24 (152 -> 128)
__push_leaf_right -24 (168 -> 144)
and module size:
text data bss dec hex filename
1463615 115665 16088 1595368 1857e8 pre/btrfs.ko
1463413 115665 16088 1595166 18571e post/btrfs.ko
DELTA: -202
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
It's not used anymore after commit 091344508249 ("btrfs: qgroup: use
qgroup_iterator in qgroup_convert_meta()"), so remove it.
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's a race between a task disabling quotas and another running the
rescan ioctl that can result in a use-after-free of qgroup records from
the fs_info->qgroup_tree rbtree.
This happens as follows:
1) Task A enters btrfs_ioctl_quota_rescan() -> btrfs_qgroup_rescan();
2) Task B enters btrfs_quota_disable() and calls
btrfs_qgroup_wait_for_completion(), which does nothing because at that
point fs_info->qgroup_rescan_running is false (it wasn't set yet by
task A);
3) Task B calls btrfs_free_qgroup_config() which starts freeing qgroups
from fs_info->qgroup_tree without taking the lock fs_info->qgroup_lock;
4) Task A enters qgroup_rescan_zero_tracking() which starts iterating
the fs_info->qgroup_tree tree while holding fs_info->qgroup_lock,
but task B is freeing qgroup records from that tree without holding
the lock, resulting in a use-after-free.
Fix this by taking fs_info->qgroup_lock at btrfs_free_qgroup_config().
Also at btrfs_qgroup_rescan() don't start the rescan worker if quotas
were already disabled.
Reported-by: cen zhang <zzzccc427@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAFRLqsV+cMDETFuzqdKSHk_FDm6tneea45krsHqPD6B3FetLpQ@mail.gmail.com/
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we failed to insert the tree mod log operation, we are not removing the
dirty status from the allocated and dirtied extent buffer before we free
it. Removing the dirty status is needed for several reasons such as to
adjust the fs_info->dirty_metadata_bytes counter and remove the dirty
status from the respective folios. So add the missing call to
btrfs_clear_buffer_dirty().
Fixes: f61aa7ba08ab ("btrfs: do not BUG_ON() on tree mod log failure at insert_new_root()")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_dump_space_info()'s parameter dump_block_groups is used as a boolean
although it is defined as an integer.
Change it from int to bool.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Any conversion of offsets in the logical or the physical mapping space
of the pages is done by a shift and the target type should be pgoff_t
(type of struct page::index). Fix the locations where it's still
unsigned long.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
btrfs_compress_set_level()
Refactor the btrfs_compress_set_level() function by replacing the
nested usage of min() and max() macro with clamp() to simplify the
code and improve readability.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: George Hu <integral@archlinux.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Since 'snprintf()' returns the number of characters which would
be emitted and output truncation is handled by 'ASSERT()', it
should be safe to use that return value instead of the subsequent
calls to 'strlen()' in 'gen_unique_name()'.
This also reduces the module's text size.
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1897006 161571 16136 2074713 1fa859 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1896848 161571 16136 2074555 1fa7bb fs/btrfs/btrfs.ko
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At btrfs_qgroup_inherit() we allocate a qgroup record even if qgroups are
not enabled, which is unnecessary overhead and can result in subvolume
creation to fail with -ENOMEM, as create_subvol() calls this function.
Improve on this by making btrfs_qgroup_inherit() check if qgroups are
enabled earlier and return if they are not, avoiding the unnecessary
memory allocation and taking some locks.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The add_qgroup_rb() function never returns an error pointer anymore since
commit 8d54518b5e52 ("btrfs: qgroup: pre-allocate btrfs_qgroup to reduce
GFP_ATOMIC usage"), so checking for an error pointer result at
btrfs_quota_enable() is pointless.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Instead of a single if statement with multiple conditions, split it into
several if statements testing only one condition at a time and return true
or false immediately after. This makes it more immediate to reason.
The module's text size also slightly decreases, at least with gcc 14.2.0
on x86_64.
Before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1897138 161583 16136 2074857 1fa8e9 fs/btrfs/btrfs.ko
After:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1896976 161583 16136 2074695 1fa847 fs/btrfs/btrfs.ko
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This is an exported function and therefore it should have a 'btrfs_'
prefix, to make it clear it's btrfs specific, avoid future name collisions
with code outside btrfs, and make its naming consistent with most other
btrfs exported functions.
So add a 'btrfs_' prefix to it and make it return bool instead of int,
since all we need is to return true or false.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The __add_inode_ref() function is quite big and with too much nesting, so
move the code that processes inode extrefs into a helper function, to make
the function easier to read and reduce the level of indentation too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The __add_inode_ref() function is quite big and with too much nesting, so
move the code that processes inode refs into a helper function, to make
the function easier to read and reduce the level of indentation too.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Almost everywhere we want to use a btrfs inode and therefore we have a
lot of calls to BTRFS_I(), making the code more verbose. Instead use btrfs
inode local variables to avoid so much use of BTRFS_I().
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's no need to call d_inode(dentry) when calling btrfs_unlink_inode()
since we have already stored that in a local inode variable. So just use
the local variable to make the code less verbose.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Our message helpers accept NULL for the fs_info in the context that does
not provide and print the common header of the message. The use of pr_*
helpers is only for special reasons, like module loading, device
scanning or multi-line output (print-tree).
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 323ac95bce44 ("Btrfs: don't read leaf blocks containing only
checksums during truncate") changed the condition from `level == 0` to
`level == path->lowest_level`, while its original purpose was just to do
some leaf node handling (calling btrfs_item_key_to_cpu()) and skip some
code that doesn't fit leaf nodes.
After changing the condition, the code path:
1. Also handles the non-leaf nodes when path->lowest_level is nonzero,
which is wrong. However btrfs_search_forward() is never called with a
nonzero path->lowest_level, which makes this bug not found before.
2. Makes the later if block with the same condition, which was originally
used to handle non-leaf node (calling btrfs_node_key_to_cpu()) when
lowest_level is not zero, dead code.
Since btrfs_search_forward() is never called for a path with a
lowest_level different from zero, just completely remove the partial
support for a non-zero lowest_level, simplifying a bit the code, and
assert that lowest_level is zero at the start of the function.
Suggested-by: Qu Wenruo <wqu@suse.com>
Fixes: 323ac95bce44 ("Btrfs: don't read leaf blocks containing only checksums during truncate")
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The function btrfs_lookup_inode_extref(` no longer requires transaction
handle, insert length, or COW flag, as the only caller now performs a
read-only lookup using trans == NULL, ins_len == 0 and cow == 0.
This function was introduced in the early days where extref feature was
introduced by commit f186373fef00 ("btrfs: extended inode refs").
Then some cleanup was done in commit 33b98f227111 ("btrfs: cleanup:
removed unused 'btrfs_get_inode_ref_index'"), which removed the only
caller passing trans and other COW specific options.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Unify naming of return value to the preferred way.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Unify naming of return value to the preferred way.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Unify naming of return value to the preferred way.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Unify naming of return value to the preferred way.
Reviewed-by: Daniel Vacek <neelx@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Unify naming of return value to the preferred way.
Reviewed-by: Daniel Vacek <neelx@suse.com>yy
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Every time we add free space to the free space tree or we remove free
space from the free space tree, we do a lookup for the block group's free
space info item in the free space tree. This takes time, navigating the
btree and we may block either on IO when reading extent buffers from disk
or on extent buffer lock contention due to concurrency.
Instead of doing this lookup every time, cache the result in the block
structure and use it after the first lookup. This adds two boolean members
to the block group structure but doesn't increase the structure's size.
The following script that runs fs_mark was used to measure the time spent
on run_delayed_tree_ref(), since down that call chain we have calls to
add and remove free space to/from the free space tree (calls to
btrfs_add_to_free_space_tree() and btrfs_remove_from_free_space_tree()):
$ cat test.sh
#!/bin/bash
DEV=/dev/nullb0
MNT=/mnt
FILES=100000
THREADS=$(nproc --all)
echo "performance" | \
tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
umount $DEV &> /dev/null
mkfs.btrfs -f $DEV
mount -o ssd $DEV $MNT
OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k"
for ((i = 1; i <= $THREADS; i++)); do
OPTS="$OPTS -d $MNT/d$i"
done
fs_mark $OPTS
umount $MNT
This is a heavy metadata test as it's exercising only file creation, so a
lot of allocations of metadata extents, creating delayed refs for adding
new metadata extents and dropping existing ones due to COW. The results
of the times it took to execute run_delayed_tree_ref(), in nanoseconds,
are the following.
Before this change:
Range: 1868.000 - 6482857.000; Mean: 10231.430; Median: 7005.000; Stddev: 27993.173
Percentiles: 90th: 13342.000; 95th: 23279.000; 99th: 82448.000
1868.000 - 4222.038: 270696 ############
4222.038 - 9541.029: 1201327 #####################################################
9541.029 - 21559.383: 385436 #################
21559.383 - 48715.063: 64942 ###
48715.063 - 110073.800: 31454 #
110073.800 - 248714.944: 8218 |
248714.944 - 561977.042: 1030 |
561977.042 - 1269798.254: 295 |
1269798.254 - 2869132.711: 116 |
2869132.711 - 6482857.000: 28 |
After this change:
Range: 1554.000 - 4557014.000; Mean: 9168.164; Median: 6391.000; Stddev: 21467.060
Percentiles: 90th: 12478.000; 95th: 20964.000; 99th: 72234.000
1554.000 - 3453.820: 219004 ############
3453.820 - 7674.743: 980645 #####################################################
7674.743 - 17052.574: 552486 ##############################
17052.574 - 37887.762: 68558 ####
37887.762 - 84178.322: 31557 ##
84178.322 - 187024.331: 12102 #
187024.331 - 415522.355: 1364 |
415522.355 - 923187.626: 256 |
923187.626 - 2051092.468: 125 |
2051092.468 - 4557014.000: 21 |
Approximate improvement in the first four buckets is about 20%.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When adding and removing free space to the free space tree, we need to
lookup the respective block group's free info item in the free space tree,
check its flags for the BTRFS_FREE_SPACE_USING_BITMAPS bit and then
release the path.
Move these steps into a helper function and use it in both sites.
This will also help avoiding duplicate code in a subsequent change.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's no need to dereference the block group to extract fs_info as we
have already stored fs_info in a local variable. So use the local variable
instead.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's no need to subtract 1 from path->slots[0] and then decrement the
slot, we can reduce the generated assembly code by decrementing the slot
and then use it.
Module size before:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1846220 162998 16136 2025354 1ee78a fs/btrfs/btrfs.ko
Module size after:
$ size fs/btrfs/btrfs.ko
text data bss dec hex filename
1846204 162998 16136 2025338 1ee77a fs/btrfs/btrfs.ko
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The argument is used as a boolean, so switch its type from int to bool.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The free_space_set_bits() is used both to set a range of bits or to clear
range of bits, depending on the 'bit' argument value. So the name is very
misleading since it's not used only to set bits. Furthermore the 'bit'
argument is an integer when a boolean is all that is needed plus its name
is singular, which gives the idea the function operates on a single bit
and not on a range of bits.
So rename the function to free_space_modify_bits(), rename the 'bit'
argument to 'set_bits' and turn the argument to bool instead of int.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
A few of the free space tree exported functions have a 'btrfs_' prefix in
their name, but most don't. Not only is this inconsistent, the preferred
style is to have such a prefix, to avoid potential collisions in the
future with other kernel code and offer a somewhat better readibility by
making it obvious in calls sites that we are calling btrfs specific code.
So add the 'btrfs_' prefix to all free space tree functions that are
missing it.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
All we do under the label is to return, so there's no point in having it,
just return directly whenever we get an error.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|