summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_iter.c
AgeCommit message (Collapse)Author
2024-10-29bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peekPiotr Zalewski
Add NULL check for key returned from bch2_btree_and_journal_iter_peek in btree_node_iter_and_journal_peek to avoid NULL ptr dereference in bch2_bkey_buf_reassemble. When key returned from bch2_btree_and_journal_iter_peek is NULL it means that btree topology needs repair. Print topology error message with position at which node wasn't found, its parent node information and btree_id with level. Return error code returned by bch2_topology_error to ensure that topology error is handled properly by recovery. Reported-by: syzbot+005ef9aa519f30d97657@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=005ef9aa519f30d97657 Fixes: 5222a4607cd8 ("bcachefs: BTREE_ITER_WITH_JOURNAL") Suggested-by: Alan Huang <mmpgouride@gmail.com> Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-06bcachefs: btree_iter_peek_upto() now handles BTREE_ITER_all_snapshotsKent Overstreet
end_pos now compares against snapshot ID when required Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: kill bch2_btree_iter_peek_and_restart()Kent Overstreet
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Btree path tracepointsKent Overstreet
Fastpath tracepoints, rarely needed, only enabled with CONFIG_BCACHEFS_PATH_TRACEPOINTS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-09bcachefs: Add check for btree_path ref overflowKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13bcachefs: Convert for_each_btree_node() to lockrestart_do()Kent Overstreet
for_each_btree_node() now works similarly to for_each_btree_key(), where the loop body is passed as an argument to be passed to lockrestart_do(). This now calls trans_begin() on every loop iteration - which fixes an SRCU warning in backpointers fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-07bcachefs: Add missing path_traverse() to btree_iter_next_node()Kent Overstreet
This fixes a bug exposed by the next path - we pop an assert in path_set_should_be_locked(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-18bcachefs: Fix fsck warning about btree_trans not passed to fsck errorKent Overstreet
If a btree_trans is in use it's supposed to be passed to fsck_err so that it can be unlocked if we're waiting on userspace input; but the btree IO paths do call fsck errors where a btree_trans exists on the stack but it's not passed through. But it's ok, because it's unlocked while doing IO. Fixes: a850bde6498b ("bcachefs: fsck_err() may now take a btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Add lockdep support for btree node locksKent Overstreet
This adds lockdep tracking for held btree locks with a single dep_map in btree_trans, i.e. tracking all held btree locks as one object. This is more practical and more useful than having lockdep track held btree locks individually, because - we can take more locks than lockdep can track (unbounded, now that we have dynamically resizable btree paths) - there's no lock ordering between btree locks for lockdep to track (we do cycle detection) - and this makes it easy to teach lockdep that btree locks are not safe to hold while invoking memory reclaim. The last rule is one that lockdep would never learn, because we only do trylock() from within shrinkers - but we very much do not want to be invoking memory reclaim while holding btree node locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Simplify btree key cache fill pathKent Overstreet
Don't allocate the new bkey_cached until after we've done the btree lookup; this means we can kill bkey_cached.valid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: kill key cache arg to bch2_assert_pos_locked()Kent Overstreet
this is an internal implementation detail - and we're improving key cache coherency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Plumb more logging through stdio redirectKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: fsck_err() may now take a btree_transKent Overstreet
fsck_err() now optionally takes a btree_trans; if the current thread has one, it is required that it be passed. The next patch will use this to unlock when waiting for user input. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-14bcachefs: Disk space accounting rewriteKent Overstreet
Main part of the disk accounting rewrite. This is a wholesale rewrite of the existing disk space accounting, which relies on percepu counters that are sharded by journal buffer, and rolled up and added to each journal write. With the new scheme, every set of counters is a distinct key in the accounting btree; this fixes scaling limitations of the old scheme, where counters took up space in each journal entry and required multiple percpu counters. Now, in memory accounting requires a single set of percpu counters - not multiple for each in flight journal buffer - and in the future we'll probably also have counters that don't use in memory percpu counters, they're not strictly required. An accounting update is now a normal btree update, using the btree write buffer path. At transaction commit time, we apply accounting updates to the in memory counters, which are percpu counters indexed in an eytzinger tree by the accounting key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-07-11bcachefs: Set PF_MEMALLOC_NOFS when trans->lockedKent Overstreet
proper lock ordering is: fs_reclaim -> btree node locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Fix btree_trans list orderingKent Overstreet
The debug code relies on btree_trans_list being ordered so that it can resume on subsequent calls or lock restarts. However, it was using trans->locknig_wait.task.pid, which is incorrect since btree_trans objects are cached and reused - typically by different tasks. Fix this by switching to pointer order, and also sort them lazily when required - speeding up the btree_trans_get() fastpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-23bcachefs: Fix race between trans_put() and btree_transactions_read()Kent Overstreet
debug.c was using closure_get() on a different thread's closure where the we don't know if the object being refcounted is alive. We keep btree_trans objects on a list so they can be printed by debug code, and because it is cost prohibitive to touch the btree_trans list every time we allocate and free btree_trans objects, cached objects are also on this list. However, we do not want the debug code to see cached but not in use btree_trans objects - critically because the btree_paths array will have been freed (if it was reallocated). closure_get() is also incorrect to use when that get may race with it hitting zero, i.e. we must already have a ref on the object or know the ref can't currently hit 0 for other reasons (as used in the cycle detector). to fix this, use the previously introduced closure_get_not_zero(), closure_return_sync(), and closure_init_stack_release(); the debug code now can only take a ref on a trans object if it's alive and in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-19bcachefs: Fix bch2_trans_put()Kent Overstreet
reference: https://github.com/koverstreet/bcachefs/issues/692 trans->ref is the reference used by the cycle detector, which walks btree_trans objects of other threads to walk the graph of held locks and issue wakeups when an abort is required. We have to wait for the ref to go to 1 before freeing trans->paths or clearing trans->locking_wait.task. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Add missing synchronize_srcu_expedited() call when shutting downKent Overstreet
We use the polling interface to srcu for tracking pending frees; when shutting down we don't need to wait for an srcu barrier to free them, but SRCU still gets confused if we shutdown with an outstanding grace period. Reported-by: syzbot+6a038377f0a594d7d44e@syzkaller.appspotmail.com Reported-by: syzbot+0ece6edfd05ed20e32d9@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-06-10bcachefs: Delete incorrect BTREE_ID_NR assertionKent Overstreet
for forwards compat we now explicitly allow mounting and using filesystems with unknown btrees, and we have to walk them for fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-20bcachefs: Improve bch2_assert_pos_locked()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: x-macroize journal flags enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: uninline set_btree_iter_dontneed()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: fix btree_path_clone() ip_allocatedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_trans_verify_not_unlocked()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_btree_path_can_relock()Kent Overstreet
With the new assertions, we shouldn't be holding locks when trans->locked is false, thus, we shouldn't use relock when we just want to check if we can relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: trans->lockedKent Overstreet
Add a field for tracking whether a transaction object holds btree locks, and assertions to verify state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: maintain lock invariants in btree_iter_next_node()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: get_unlocked_mut_path -> bch2_path_get_unlocked_mutKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: iter/update/trigger/str_hash flag cleanupKent Overstreet
Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: __BTREE_ITER_ALL_SNAPSHOTS -> BTREE_ITER_SNAPSHOT_FIELDKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: When traversing to interior nodes, propagate result to paths to ↵Kent Overstreet
same leaf node Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: bch2_btree_path_to_text()Kent Overstreet
Long form version of bch2_btree_path_to_text() - useful in error messages and tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-08bcachefs: prt_printf() now respects \r\n\tKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Improved topology repair checksKent Overstreet
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: fix trans->mem realloc in __bch2_trans_kmallocHongbo Li
The old code doesn't consider the mem alloced from mempool when call krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX" is inaccurate when trans->mem was allocated by krealloc function. Instead, we use used_mempool stuff to record the situation, and realloc or free the trans->mem in elegant way. Also, after krealloc failed in __bch2_trans_kmalloc, the old data should be copied to the new buffer when alloc from mempool_alloc. Fixes: 31403dca5bb1 ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Use kvzalloc() when dynamically allocating btree pathsKent Overstreet
THis silences a mm/page_alloc.c warning about allocating more than a page with GFP_NOFAIL - and there's no reason for this to not have a vmalloc fallback anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Save key_cache_path in peek_slot()Kent Overstreet
When bch2_btree_iter_peek_slot() clones the iterator to search for the next key, and then discovers that the key from the cloned iterator is the key we want to return - we also want to save the iter->key_cache_path as well, for the update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Kill more -EIO error codesKent Overstreet
This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: btree_and_journal_iter.transKent Overstreet
we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: Set path->uptodate when no node at levelKent Overstreet
We were failing to set path->uptodate when reaching the end of a btree node iterator, causing the new prefetch code for backpointers gc to go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: Fix journal replay with unreadable btree rootsKent Overstreet
When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-24bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btreeKent Overstreet
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the keyspace where no keys are visible in the current snapshot, we have a problem - we'll scan for a very long time before scanning terminates. Awhile back, this was fixed for most cases with peek_upto() (and assertions that enforce that it's being used). But the fix missed the fact that the inodes btree is different - every key offset is in a different snapshot tree, not just the inode field. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Improve trace_trans_restart_relockKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Add an option to control btree node prefetchingKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: track transaction durationsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: btree_trans always has statsKent Overstreet
reserve slot 0 for unknown (when we overflow), to avoid some branches Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_slot_updatesKent Overstreet
refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_btree_trans_peek_prev_updatesKent Overstreet
bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>