summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-09bcachefs: Drop memalloc_nofs_save() in bch2_btree_node_mem_alloc()Kent Overstreet
It's really not needed: the only locks used here are the btree cache lock, which we drop for GFP_WAIT allocations, and btree node locks - but we also drop those for GFP_WAIT allocations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Simplify bch2_xattr_emit() implementationYouling Tang
Use helper functions to make code more readable. Similar to commit a5488f29835c ("fs: simplify ->listxattr() implementation") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: drop unused posix acl handlersYouling Tang
Remove struct nop_posix_acl_{access,default} for bcachefs filesystem that don't depend on the xattr handler in their inode->i_op->listxattr() method in any way. There's nothing more to do than to simply remove the handler. It's been effectively unused ever since we introduced the new posix acl api. See [1] for details. Link [1]: https://patchwork.kernel.org/project/linux-fsdevel/cover/20230125-fs-acl-remove-generic-xattr-handlers-v3-0-f760cc58967d@kernel.org/ Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Remove unused parameterAlan Huang
iter here is unused, remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Remove the prev array stuffAlan Huang
After reducing the search range when building the aux tree, the prev array stuff is no longer useful, so remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Minimize the search range used to calculate the mantissaAlan Huang
When the search key's mantissa is larger than the node i's, we know that the search key is larger than the first key of the cacheline corresponding to node i, so that when we are calculating the mantissa of right side nodes of node i, the left side of the search range can be the first key of node i. Once the search range is minimized, the mantissa we are calculating can have more useful bits, thus reduce the slow path comparison. Besides, we can now remove all the prev array stuff. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Convert open-coded extra computation to helperAlan Huang
This patch replaces open-coded extra computation to eytzinger1_extra. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Remove dead code in __build_ro_aux_treeAlan Huang
This logic is no longer useful since commit 3ce8b463e3e0 ("bcachefs: kill bset_tree->max_key"), so remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Remove unused parameter of bkey_mantissa_bits_droppedAlan Huang
The idx parameter of bkey_mantissa_bits_dropped is unused, remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Remove unused parameter of bkey_mantissaAlan Huang
The idx parameter of bkey_mantissa became unused since commit b904a7991802 ("bcachefs: Go back to 16 bit mantissa bkey floats"), so remove it. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_sb_nr_devices()Kent Overstreet
factoring out a helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: trivial open_bucket_add_buckets() cleanupKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Fix a spelling error in docsXiaxi Shen
Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: promote_whole_extents is now a normal optionKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Move rebalance_status out of sysfs/internalKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: remove the unused parameter in macro bkey_crc_nextJulian Sun
In the macro definition of bkey_crc_next, five parameters were accepted, but only four of them were used. Let's remove the unused one. The patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: fix macro definition allocate_dropping_locksJulian Sun
The macro allocate_dropping_locks accepts a parameter _trans, but it was not used, rather the variable trans was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: fix macro definition allocate_dropping_locks_errcodeJulian Sun
The macro allocate_dropping_locks_errocode accepts a parameter _trans, but it was not used, rather the variable trans was directly used, which may be a local variable inside a function that calls the macros. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: remove the unused macro definitionJulian Sun
macro bch2_kthread_wait_event_ioclock_timeout is no longer used, let's remove it. The patch has passed compilation test. Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: quota_reserve_range() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_folio_set() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: range_has_data() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_seek_hole() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_seek_data() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_xattr_list() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_readdir() -> for_each_btree_key_in_subvolume_uptoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: for_each_btree_key_in_subvolume_upto()Kent Overstreet
New helper for looping over keys in a given subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_fiemap(): call trans_begin() on every loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bchfs_read(): call trans_begin() on every loop iterKent Overstreet
Same as the recent change for __bch2_read(); also, kill now unnecessary btree_trans_too_many_iters() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: kill bch2_btree_iter_peek_and_restart()Kent Overstreet
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Btree path tracepointsKent Overstreet
Fastpath tracepoints, rarely needed, only enabled with CONFIG_BCACHEFS_PATH_TRACEPOINTS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Add check for btree_path ref overflowKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Mark bch_inode_info as SLAB_ACCOUNTYouling Tang
After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need to mark the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 ("kmemcg: account for certain kmem allocations to memcg") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: allocate inode by using alloc_inode_sb()Youling Tang
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() to alloc_inode_sb(). It will also fix [1] to avoid the NULL pointer dereference BUG in list_lru_add() when CONFIG_MEMCG is enabled. Links: [1]: https://lore.kernel.org/all/20589721-46c0-4344-b2ef-6ab48bbe2ea5@linux.dev/ [2]: https://lore.kernel.org/all/7db60e36-9c96-4938-a28d-a9745e287386@linux.dev/ Fixes: 86d81ec5f5f0 ("bcachefs: Mark bch_inode_info as SLAB_ACCOUNT") Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Opt_durability can now be set via bch2_opt_set_sb()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: bch2_opt_set_sb() can now set (some) device optionsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: data_allowed is now an opts.h optionKent Overstreet
need this so cmd_option in userspace can handle it Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Annotate struct bucket_array with __counted_by()Thorsten Blum
Add the __counted_by compiler attribute to the flexible array member bucket to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Fix format specifier in bch2_btree_key_cache_to_text()Nathan Chancellor
When building for a 32-bit architecture, for which 'size_t' is 'unsigned int', there is a compiler warning due to use of '%lu': In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:80, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:795:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:78:63: note: in definition of macro 'prt_printf' 78 | #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) | ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:795:38: note: format string is defined here 795 | prt_printf(out, "pending:\t%lu\r\n", per_cpu_sum(bc->nr_pending)); | ~~^ | | | long unsigned int | %u cc1: all warnings being treated as errors Use the proper specifier, '%zu', to resolve the warning. Fixes: e447e49977b8 ("bcachefs: key cache can now allocate from pending") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: key cache can now allocate from pendingKent Overstreet
btree_trans objects can hold the btree_trans_barrier srcu read lock for an extended amount of time (they shouldn't, but it's difficult to guarantee). the srcu barrier blocks memory reclaim, so to avoid too many stranded key cache items, this uses the new pending_rcu_items to allocate from pending items - like we did before, but now without a global lock on the key cache. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Rip out freelists from btree key cacheKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: rcu_pending now works in userspaceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: rcu_pendingKent Overstreet
Generic data structure for explicitly tracking pending RCU items, allowing items to be dequeued (i.e. allocate from items pending freeing). Works with conventional RCU and SRCU, and possibly other RCU flavors in the future, meaning this can serve as a more generic replacement for SLAB_TYPESAFE_BY_RCU. Pending items are tracked in radix trees; if memory allocation fails, we fall back to linked lists. A rcu_pending is initialized with a callback, which is invoked when pending items's grace periods have expired. Two types of callback processing are handled specially: - RCU_PENDING_KVFREE_FN New backend for kvfree_rcu(). Slightly faster, and eliminates the synchronize_rcu() slowpath in kvfree_rcu_mightsleep() - instead, an rcu_head is allocated if we don't have one and can't use the radix tree TODO: - add a shrinker (as in the existing kvfree_rcu implementation) so that memory reclaim can free expired objects if callback processing isn't keeping up, and to expedite a grace period if we're under memory pressure and too much memory is stranded by RCU - add a counter for amount of memory pending - RCU_PENDING_CALL_RCU_FN Accelerated backend for call_rcu() - pending callbacks are tracked in a radix tree to eliminate linked list overhead. to serve as replacement backends for kvfree_rcu() and call_rcu(); these may be of interest to other uses (e.g. SLAB_TYPESAFE_BY_RCU users). Note: Internally, we're using a single rearming call_rcu() callback for notifications from the core RCU subsystem for notifications when objects are ready to be processed. Ideally we would be getting a callback every time a grace period completes for which we have objects, but that would require multiple rcu_heads in flight, and since the number of gp sequence numbers with uncompleted callbacks is not bounded, we can't do that yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09lib/generic-radix-tree.c: add preallocationKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09lib/generic-radix-tree.c: genradix_ptr_inlined()Kent Overstreet
Provide an inlined fast path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Fix deadlock in __wait_on_freeing_inode()Kent Overstreet
We can't call __wait_on_freeing_inode() with btree locks held; we're waiting on another thread that's in evict(), and before it clears that bit it needs to write that inode to flush timestamps - deadlock. Fixing this involves a fair amount of re-jiggering to plumb a new transaction restart. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: switch to rhashtable for vfs inodes hashKent Overstreet
the standard vfs inode hash table suffers from painful lock contention - this is long overdue Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09inode: make __iget() a static inlineKent Overstreet
bcachefs is switching to an rhashtable for vfs inodes instead of the standard inode.c hashtable, so we need this exported, or - a static inline makes more sense for a single atomic_inc(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Replace div_u64 with div64_u64 where second param is u64Reed Riley
Bcachefs often uses this function to divide by nanosecond times - which can easily cause problems when cast to u32. For example, `cat /sys/fs/bcachefs/*/internal/rebalance_status` would return invalid data in the `duration waited` field because dividing by the number of nanoseconds in a minute requires the divisor parameter to be u64. Signed-off-by: Reed Riley <reed@riley.engineer> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Fix sysfs rebalance duration waited formattingFeiko Nanninga
cat /sys/fs/bcachefs/*/internal/rebalance_status waiting io wait duration: 13.5 GiB io wait remaining: 627 MiB duration waited: 1392 m duration waited was increasing at a rate of about 14 times the expected rate. div_u64 takes a u32 divisor, but u->nsecs (from time_units[]) can be bigger than u32. Signed-off-by: Feiko Nanninga <feiko.nanninga@fnanninga.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>