Age | Commit message (Collapse) | Author |
|
If we don't leave stale pointers around, we won't have to deal with
bucket gen wraparound.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Cached pointers now have backpointers.
This means that we'll be able to kill cached pointers in the
bucket_invalidate path, when invalidating/reusing buckets containing
cached data, instead of leaving them around to be cleaned up by gc_gens
garbago collection - which requires a full metadata scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Transactional triggers need to run in a defined ordering, which is not
quite the same as btree ID integer comparison.
Previously this was handled in a hacky way in
bch2_trans_commit_run_triggers(), since it was only the alloc btree that
needed special handling, but upcoming stripe btree changes are going to
require more ordering changes - so, define that ordering.
Next patch will change the transaction commit path to use it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Introduce per-entry locks, like with struct bucket - the stripes heap is
going away.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It's now easier to add new LRU types.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pass in the backpointer explicitly, instead of assuming 'referring_k' is
an alloc key and calculating it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
FRAGMENTATION_START was incorrect, there's currently only one
fragmentation LRU (at the end of the reserved bits for LRU type), and
we're getting ready to add a stripe fragmentation lru - so give it a
better name.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Minor cleanup, no reason for the caller to have to this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
A user has been seeing the "error verifying existing checksum while
rewriting existing data (memory corruption?)" error.
This generally indicates a hardware issue (and that may be the case
here), but it might also indicate a bug, in which case we need more
information to look for patterns.
Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This option only applies filesystem wide.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The eytzinger code was previously relying on the following wrap-around
properties and their "eytzinger0" equivalents:
eytzinger1_prev(0, size) == eytzinger1_last(size)
eytzinger1_next(0, size) == eytzinger1_first(size)
However, these properties are no longer relied upon and no longer
necessary, so remove the corresponding asserts and forbid the use of
eytzinger1_prev(0, size) and eytzinger1_next(0, size).
This allows to further simplify the code in eytzinger1_next() and
eytzinger1_prev(): where the left shifting happens, eytzinger1_next() is
trying to move i to the lowest child on the left, which is equivalent to
doubling i until the next doubling would cause it to be greater than
size. This is implemented by shifting i to the left so that the most
significant bits align and then shifting i to the right by one if the
result is greater than size.
Likewise, eytzinger1_prev() is trying to move to the lowest child on the
right; the same applies here.
The 1-offset in (size - 1) in eytzinger1_next() isn't needed at all, but
the equivalent offset in eytzinger1_prev() is surprisingly needed to
preserve the 'eytzinger1_prev(0, size) == eytzinger1_last(size)'
property. However, since we no longer support that property, we can get
rid of these offsets as well. This saves one addition in each function
and makes the code less confusing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In this second step, transform the eytzinger indexes i, j, and k in
eytzinger1_sort_r() from 0-based to 1-based. This step looks a bit
messy, but the resulting code is slightly better.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In this first step, convert the eytzinger sort functions to use 1-based
primitives.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Several of the algorithms on eytzinger trees are implemented in terms of
the eytzinger0 primitives. However, those algorithms can just as easily
be expressed in terms of the eytzinger1 primitives, and that leads to
better and easier to understand code. Start by converting
eytzinger0_find().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Function eytzinger0_find() isn't currently covered, so add a self test.
We can rely on eytzinger0_find_le() here because it is being
tested independently.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add an eytzinger0_find_ge() self test similar to eytzinger0_find_gt().
Note that this test requires eytzinger0_find_ge() to return the first
matching element in the array in case of duplicates. To prevent
bisection errors, we only add this test after strenghening the original
implementation (see the previous commit).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Implement eytzinger0_find_ge() directly instead of implementing it in
terms of eytzinger0_find_le() and adjusting the result.
This turns eytzinger0_find_ge() into a minimum search, so when there are
duplicate elements, the result of eytzinger0_find_ge() will now always
point at the first matching element.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Instead of implementing eytzinger0_find_gt() in terms of
eytzinger0_find_le() and adjusting the result, implement it directly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add an eytzinger0_find_gt() self test similar to eytzinger0_find_le().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Replace the over-complicated implementation of eytzinger0_find_le() by
an equivalent, simpler version.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
eytzinger0_find_le() is also easy to concert to 1-based eytzinger (but
see the next commit).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Rename eytzinger0_find_test_val() to eytzinger0_find_test_le() and add a
new eytzinger0_find_test_val() wrapper that calls it.
We have already established that the array is sorted in eytzinger order,
so we can use the eytzinger iterator functions and check the boundary
conditions to verify the result of eytzinger0_find_le().
Only scan the entire array if we get an incorrect result. When we need
to scan, use eytzinger0_for_each_prev() so that we'll stop at the
highest matching element in the array in case there are duplicates;
going through the array linearly wouldn't give us that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add an eytzinger0_for_each_prev() macro for iterating through an
eytzinger array in reverse.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In eytzinger0_find_test(), remember the smallest element seen so far
instead of comparing adjacent array elements.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In eytzinger[01]_test(), make sure that eytzinger[01]_for_each()
iterates over all array elements.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix an obvious typo in cmp_u16().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
pr_info() format strings need to be newline terminated.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The iterator variable of eytzinger0_for_each() loops has been changed to
be locally scoped at some point, so remove variables defined outside the
loop that are now unused. In addition and for clarity, use a different
variable inside those loops where an outside variable would be shadowed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When EYTZINGER_DEBUG is defined, <linux/bug.h> needs to be included.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use an eytzinger0_for_each() loop here.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Necessary for adding backpointers for cached pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We have other metadata IO types covered, this was missing.
Note: this includes the time until completion, i.e. including parent
pointer update.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
impossible
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for stripe backpointers: this path previously would get very
confused at being asked to process (remove redundant replicas) stripes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Increase journal pipelining.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since we're increasing the number of 'struct journal_bufs', we don't
want them all permanently holding onto buffers for the journal data -
that'd be 16 * 2MB = 32MB, or potentially more.
Add a single-element mempool (open coded, since buffer size varies),
this also means we won't be hitting the memory allocator every time we
open and close a journal entry/buffer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is a small optimization, reducing the number of cachelines we touch
in the fast path - and it's also necessary for the next patch that
increases JOURNAL_BUF_NR.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More dead code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Dead code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This code needs quite a bit of work: we don't want to be walking all
metadata in the filesystem, we should just be walking backpointers, and
it should be switched to a data ioctl that can report progress via a
file descriptor, not the system console.
But that'll take more work - before we can safely walk only backpointers
we need to change device add to not reuse device indexes, since with
that change accounting being wrong introduces the possibility of
removing a device that still has pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
the backpointers code has progress indicators; these aren't great, since
they print to the dmesg console and we much prefer to have progress
indicators reporting to a specific userspace program so they're not
spamming the system console.
But not all codepaths that need progress indicators support that yet,
and we don't want users to think "this is hung".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
restarts
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
number
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.
Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Iterating over backpointers on a specific device is potentially much
cheaper than walking all filesystem data.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reorganize counters a bit, grouping related counters together.
New counters:
- io_read_inline
- io_read_hole
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When ancestor is less than IS_ANCESTOR_BITMAP, we would get an incorrect
result.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.
- New helper: bch2_dev_idx_is_online(), so that we can bail out and
report to userspace when we're unable to scrub because the device is
offline
- data_update_opts, which controls the data move path, now understands
scrub: data is only read, not written. The read path is responsible
for rewriting on read error, as with other reads.
- scrub_pred skips data extents that don't have checksums
- bch_ioctl_data has a new scrub member, which has a data_types field
for data types to check - i.e. all data types, or only metadata.
- Add new entries to bch_move_stats so that we can report numbers for
corrected and uncorrected errors
- Add a new enum to bch_ioctl_data_event for explicitly reporting
completion and return code (i.e. device offline)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|