Age | Commit message (Collapse) | Author |
|
Add a type descriptor to btree_bkey_cached_common - there's no reason
not to since we've got padding that was otherwise unused, and this is a
nice cleanup (and helpful in later patches).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Taking a write lock will be able to fail, with the new cycle detector -
unless we pass it nofail, which is possible but not preferred.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.
This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes a small memory leak.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need to use the right class for some assertions to work correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Turns out this assertion was something we could legitimately hit - add a
comment describing what's going on, and handle it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
These were used more prior to getting rid of the in-memory bucket arrays
- they don't serve much purpose anymore, and deleting them lets us write
better assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- use strlcpy(), not strncpy()
- add tracepoints for btree_path alloc and free
- give the tracepoint for key cache upgrade fail a proper name
- add a tracepoint for btree_node_upgrade_fail
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This aids in debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We now record the length of time btree locks are held and expose this in debugfs.
Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need the caller name and a place to store our results, btree_trans provides this.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Originally, the btree key cache code would always allocate new entries
by reusing from the recently-freed list, if that list wasn't empty. But
that behaviour was dropped, for lock contention reasons.
But it seems that entries stranded on the freed list have been
contributing to some of our oom issues, because long running btree
transactions will prevent them from being freed.
This patch re-adds allocating from the freed list, but it also adds
percpu buffers to solve the lock contention issues - and the new percpu
freed lists will improve the evict paths, too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we allocate a buffer that's a bit bigger than necessary the
transaction commit path will be much less likely to have to reallocate -
which requires a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
These showed up when building for mips.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Since journal reclaim -> btree key cache flushing may require the
allocation of new btree nodes, it has an implicit dependency on copygc
in order to make forward progress - so we should avoid blocking copygc
unless the journal is really close to full.
This introduces watermarks to replace our single MAY_GET_UNRESERVED bit
in the journal, and adds a watermark for copygc and plumbs it through.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Six locks have a percpu mode, which we use for interior btree nodes, as
well as btree key cache keys for the subvolumes btree. We've been
switching locks back and forth between percpu and non percpu mode as
needed, but it turns out this is racy - when we're reusing an existing
node, other threads could be attempting to lock it while we're switching
it between modes.
This patch fixes this by never switching 'struct btree' between the two
modes, and instead segragating them between two different freed lists.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This helps with lock contention in the journalling code: instead of
updating our journal pin on every write, only get a journal pin if we
don't have one.
This means we can avoid hammering on journal locks nearly so much, at
the cost of carrying around a journal pin for an older entry than the
one we actually need. To handle that, if needed we update our journal
pin to the correct one when flushed by journal reclaim.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We were emitting two trace events on transaction restart in this code
path - delete the redundant one.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- Updates to non key cache iterators will now be transparently
redirected to the key cache for cached btrees.
- Except when creating new keys: then the update goes to underlying
btree
For for iterating over a cached btree to work, we need to ensure that if
a key exists in the key cache, it also exists in the btree - otherwise
the iterator code will skip past it and not check the key cache.
Otherwise, for consistency, all updates should go to the same place -
the key cache.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is the start of cache coherency with the btree key cache - this
adds a btree iterator flag that causes lookups to also check the key
cache when we're iterating over the btree (not iterating over the key
cache).
Note that we could still race with another thread creating at item in
the key cache and updating it, since we aren't holding the key cache
locked if it wasn't found. The next patch for the update path will
address this by causing the transaction to restart if the key cache is
found to be dirty.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree_key_cache_flush_pos() uses BTREE_ITER_CACHED_NOFILL - but it
wasn't checking for !ck->valid. It does check for the entry being dirty,
so it shouldn't matter, but this refactor it a bit and adds and
assertion.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This improves the transaction restart tracepoints - adding distinct
tracepoints for all the locations and reasons a transaction might have
been restarted, and ensures that there's a tracepoint for every
transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work
in userspace
- We don't need to print the bcachefs: or the filesystem name prefix in
userspace
- Improve a few error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This fixes some bugs when we hit an error very early in the filesystem
startup path, before most things have been initialized.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
For snapshots, we need to implement btree lookups that return the first
key that's an ancestor of the snapshot ID the lookup is being done in -
and filter out keys in unrelated snapshots. This patch adds the btree
iterator flag BTREE_ITER_FILTER_SNAPSHOTS which does that filtering.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Figured out the bug we were chasing, and it had nothing to do with
locking btree iterators/paths out of order.
This reverts commit ff08733dd298c969aec7c7828095458f73fd5374.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
btree_path_traverse_all() traverses btree iterators in sorted order, and
thus shouldn't see transaction restarts due to potential deadlocks - but
sometimes we do. This patch adds some more assertions and tracks some
more state to help track this down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.
This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was used for an optimization that hasn't existing in quite awhile
- iter->uptodate will probably be going away as well.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Disfavoured, and should go away.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
With the recent transaction restart changes, it's no longer needed - all
transaction commits have BTREE_INSERT_NOUNLOCK semantics.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We're working to standardize handling of transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Downgrading of btree iterators is something that should only happen
explicitly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We weren't correctly verifying that we had interior node intent locks -
this patch also fixes bugs uncovered by the new assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
Add a new flag to control assertions about updating to internal snapshot
nodes, that normally should not be written to - to be used in an
upcoming patch.
Also do some renaming - trigger_flags is now update_flags.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
|
|
We're seeing livelocks that appear to be due to
bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks
from using the key cache lock - we probably shouldn't be reporting
objects that can't actually be freed yet.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_varint_decode() can read up to 7 bytes past the end of the buffer,
which means we need to allocate slightly larger key cache buffers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Flushing the btree key cache needs to use allocation reserves - journal
reclaim depends on flushing the btree key cache for making forward
progress, and the allocator and copygc depend on journal reclaim making
forward progress.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We need to flush the btree key cache when it's too dirty, because
otherwise the shrinker won't be able to reclaim memory - this is done by
journal reclaim. But journal reclaim also kicks btree node writes: this
meant that btree node writes were getting kicked much too often just
because we needed to flush btree key cache keys.
This patch splits journal pins into two different lists, and teaches
journal reclaim to not flush btree node writes when it only needs to
flush key cache keys.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|