Age | Commit message (Collapse) | Author |
|
Dead code cleanup.
Link: https://lore.kernel.org/linux-bcachefs/20250612224059.39fddd07@batman.local.home/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We've been seeing some livelock-ish behavior in the index update part of
the main write path, and while we've got low level btree path
tracepoints, we've been lacking high level btree iterator tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for when we insert only part of an extent, due to too
many overwrites.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for any time we return an error and unwind.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- Convert to a 'fs_str' tracepoint that just emits as a string: this
lets us build up the tracepoint with a printbuf, using our pretty
printers, and they're much easier to manage
- Include locks_held, before and after
- Include the btree node pointer we failed on (error pointer, null, or
real node)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Internal moves shouldn't add new rebalance_work, but it's been reported
that this seems to be happening. Add a tracepoint and counter so we can
see what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Copygc needs to be able to move extents that have bitrotted. We don't
want to delete them - in the future we'll have an API for "read me the
data even if there's checksum errors", and in general we don't want to
delete anything unless the user asks us to.
That will require writing it with a new checksum, which means we can't
forget that there was a checksum error so we return the correct error to
userspace.
Rebalance also wants to skip bad extents; we can now use the poison flag
for that.
This is currently disabled by default, as we want read fua support so
that we can distinguish between transient and permanent errors from the
device. It may be enabled with the module parameter:
poison_extents_on_checksum_error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a simple tracepoint for stripe creation, we'll want to expand this
later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reorganize counters a bit, grouping related counters together.
New counters:
- io_read_inline
- io_read_hole
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a more general version of bch2_evacuate_bucket - to be used for
scrub.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The discard path is supposed to issue journal flushes when there's too
many buckets empty buckets that need a journal commit before they can be
written to again, but at some point this code seems to have been lost.
Bring it back with a new optimization to make sure we don't issue too
many journal flushes: the journal now tracks the sequence number of the
most recent flush in progress, which the discard path uses when deciding
which buckets need a journal flush.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for inserting new accounting entries: we're seeing odd
spinning behaviour in accounting read.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
86a494c8eef9 ("bcachefs: Kill bch2_get_next_backpointer()") dropped some
things the tracepoint emitted because bch2_evacuate_bucket() no longer
looks at the alloc key - but we did want at least some of that.
We still no longer look at the alloc key so we can't report on the
fragmentation number, but that's a direct function of dirty_sectors and
a copygc concern anyways - copygc should get its own tracepoint that
includes information from the fragmentation LRU.
But we can report on the number of sectors we moved and the bucket size.
Co-developed-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fastpath tracepoints, rarely needed, only enabled with
CONFIG_BCACHEFS_PATH_TRACEPOINTS.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
include information about the state of the btree key cache
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add trace_bch2_sync_fs() and trace_bch2_fsync() implementations.
The output in trace is as follows:
sync-29779 [000] ..... 193.700935: bch2_sync_fs: dev 254,16 wait 1
<...>-40027 [002] ..... 342.535227: bch2_fsync: dev 254,32 ino 4099 parent 4096 datasync 1
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
|
|
Tracepoints are garbage, and perf trace even cuts off some of our
fields.
Much nicer to just trace a string, and then we can build nicely
formatted output with printbufs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for downcasting private errors to standard errors, so
they can be recovered even when not logged; also, add some
documentation.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bcachefs btree nodes are big - typically 256k - and btree roots are
pinned in memory. As we're now up to 18 btrees, we now have significant
memory overhead in mostly empty btree roots.
And in the future we're going to start enforcing that certain btree node
boundaries exist, to solve lock contention issues - analagous to XFS's
AGIs.
Thus, we need to start allocating smaller btree node buffers when we
can. This patch changes code that refers to the filesystem constant
c->opts.btree_node_size to refer to the btree node buffer size -
btree_buf_bytes() - where appropriate.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out bch2_journal_bufs_to_text(), and use it in the
journal_entry_full() tracepoint; when we can't get a journal reservation
we need to know the outstanding journal entry sizes to know if the
problem is due to excessive flushing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It appears this was accidentally deleted at some point - also, do a bit
of cleanup.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now include backtraces for every thread involved in the cycle.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This gives us more context information - e.g. which codepath is invoking
btree node reads.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the CI, we're seeing tests failing due to excessive would_deadlock
transaction restarts - the tracepoint now includes the lock cycle that
occured.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now include the list of paths in use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- add a tracepoint for write_buffer_flush_sync; this is expensive
- fix the write_buffer_flush_slowpath tracepoint
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint for rebalance, printing out
- the target option
- the compression option
- the key being rebalanced
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Renamed from trace_move_extent_alloc_mem_fail, because there are other
reasons we colud fail (disk space allocation failure).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This deletes the complicated and somewhat expensive journal
pre-reservation machinery in favor of just using journal watermarks:
when the journal is more than half full, we run journal reclaim more
aggressively, and when the journal is more than 3/4s full we only allow
journal reclaim to get new journal reservations.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We should only be downgrading locks on success - otherwise, our
transaction restarts won't be getting the correct locks and we'll
livelock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
data_progress_list is gone - it was redundant with moving_context_list
The upcoming rebalance rewrite is going to have it using two different
move_stats objects with the same moving_context, depending on whether
it's scanning or using the rebalance_work btree - this patch plumbs
stats around a bit differently so that will work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a tracepoint to print the reason a read wasn't promoted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In https://github.com/koverstreet/bcachefs/issues/450, we're seeing
unexplained btree_path_relock_fail events - according to the information
currently in the tracepoint, it appears the relock should be succeeding.
This adds lock counts to the tracepoint to help track it down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Using drop_locks_do() ensures that every unlock() is paired with a
relock(), with proper error checking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.
On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.
More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.
The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.
On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.
Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Move path tracepoints now include the key being moved. Also, add new
tracepoints for the start of move_extent, and evacuate_bucket.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This greatly expands the move_extent_fail tracepoint - now it includes
all the information we have available, including exactly why the extent
wasn't updated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Seeing occasional test failures where we get stuck in a livelock that
involves this event - this will help track it down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|