Age | Commit message (Collapse) | Author |
|
Add a tracepoint for any time we return an error and unwind.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The new guard(), scoped_guard() allow for more natural code.
Some of the uses with creative flow control have been left.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Recover from "journal and btree in same bucket".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is straightforward enough: check_fix_ptrs() currently only runs
before we go RW, so updating the btree root pointer in c->btree_roots
suffices - it'll be written out in the first journal write we do.
For that, do_bch2_trans_commit_to_journal_replay() now handles
JSET_ENTRY_btree_root entries.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, don't error out in bucket_ref_update_err(): we don't want to
return -BCH_ERR_cannot_rewind_recovery if it's not an insert, if it's an
overwrite we continue.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We don't want to change the bucket gen, on gen mismatch: it's possible
to have multiple btree nodes with different gens in the same bucket that
we want to keep, if we have to recover from btree node scan.
It's also not necessary to set g->gen_valid; add a comment to that
effect.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add some more helpers, and mismatches is now a superset of the empty
bitmap - simplifies most checks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Consolidate the run_explicit_recovery_pass() interfaces by adding a
flags parameter; this will also let us add a RUN_RECOVERY_PASS_ratelimit
flag.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More self healing work: we're going to be calling
check_bucket_backpointer_mismatch() at runtime, outside of fsck.
Then when we need to we'll kick off the full
check_extents_to_backpointers recovery pass.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we detect an error that requires running a recovery pass, and we're
not in recovery, we won't be able to fix it until the next mount - make
sure we're noting in the superblock that it needs to run.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert device IO refs to enumerated_refs, for easier debugging of
refcount issues.
Simple conversion: enumerate all users and convert to the new helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.
But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We prefer helpers that emit log messages to printbufs rather than
printing them directly; that way, we can ensure that different log
messages from the same event are grouped together and formatted
appropriately in the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The resize memcpy path was totally busted.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
One reference to bch_dev_usage wasn't updated, which meant we weren't
reading the full bch_dev_usage_full - oops.
Fixes: 955ba7b5ea03 ("bcachefs: bch_dev_usage_full")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
All the fastpaths that need device usage don't need the sector totals or
fragmentation, just bucket counts.
Split bch_dev_usage up into two different versions, the normal one with
just bucket counts.
This is also a stack usage improvement, since we have a bch_dev_usage on
the stack in the allocation path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now have separate per device io_refs for read and write access.
This fixes a device removal bug where the discard workers were still
running while we're removing alloc info for that device.
It's also a bit of hardening; we no longer allow writes to devices that
are read-only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Struct with embedded VLA...
memcpy: detected field-spanning write (size 8) of single field "&gc->r.e" at fs/bcachefs/ec.c:465 (size 3)
WARNING: CPU: 1 PID: 936 at fs/bcachefs/ec.c:465 bch2_trigger_stripe+0x706/0x730
Modules linked in:
CPU: 1 UID: 0 PID: 936 Comm: mount.bcachefs Not tainted 6.14.0-rc6-ktest-00236-gefb0b5c62dbc #55
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:bch2_trigger_stripe+0x706/0x730
Code: b4 00 01 b9 03 00 00 00 48 89 fb 48 c7 c7 33 54 da 81 48 89 d6 49 89 d6 48 c7 c2 c3 36 db 81 e8 60 54 c5 ff 48 89 df 4c 89 f2 <0f> 0b e9 5c fd ff ff e8 fe 5e 4e 00 bf 10 00 00 00 48 c7 c6 ff ff
RSP: 0018:ffff88817081f680 EFLAGS: 00010246
RAX: f8fe7dd1c56b5600 RBX: ffff888101265368 RCX: 0000000000000027
RDX: 0000000000000008 RSI: 00000000fffbffff RDI: ffff888101265368
RBP: 0000000000000000 R08: 000000000003ffff R09: ffff88817f1fe000
R10: 00000000000bfffd R11: 0000000000000004 R12: ffff8881012652c0
R13: 0000000000000000 R14: 0000000000000008 R15: ffff88817081f6c9
FS: 00007fc428bc7c80(0000) GS:ffff888179280000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd3ee4a038 CR3: 000000010a9bc000 CR4: 0000000000750eb0
PKRU: 55555554
Call Trace:
<TASK>
? __warn+0xce/0x1b0
? bch2_trigger_stripe+0x706/0x730
? report_bug+0x11b/0x1a0
? bch2_trigger_stripe+0x706/0x730
? handle_bug+0x5e/0x90
? exc_invalid_op+0x1a/0x50
? asm_exc_invalid_op+0x1a/0x20
? bch2_trigger_stripe+0x706/0x730
bch2_gc_mark_key+0x2cf/0x430
bch2_check_allocations+0x1a64/0x1ed0
? vsnprintf+0x1ad/0x420
? bch2_check_allocations+0x191f/0x1ed0
bch2_run_recovery_passes+0x13b/0x2b0
bch2_fs_recovery+0x9b7/0x1290
? __bch2_print+0xb2/0xf0
? bch2_printbuf_exit+0x1e/0x30
? print_mount_opts+0x153/0x180
bch2_fs_start+0x274/0x3b0
bch2_fs_get_tree+0x516/0x6e0
vfs_get_tree+0x21/0xa0
do_new_mount+0x153/0x350
__x64_sys_mount+0x16c/0x1f0
do_syscall_64+0x6c/0x140
? arch_exit_to_user_mode_prepare+0x9/0x40
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Not all compilers fully initialize these - they're not guaranteed to
because of the union shenanigans.
Fixes: https://github.com/koverstreet/bcachefs/issues/844
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.
Also, add better indenting.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're hitting some issues with uninitialized struct padding, flagged by
kmsan.
They appear to be falso positives, otherwise bch2_accounting_validate()
would have flagged them as "junk at end". But for now, we'll need to
initialize disk_accounting_pos with memset().
This adds a new helper, bch2_disk_accounting_mod2(), that initializes a
disk_accounting_pos and does the accounting mod all at once - so overall
things actually get slightly more ergonomic.
BCH_DISK_ACCOUNTING_replicas keys are left for now; KMSAN isn't warning
about them and they're a bit special.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Cached pointers now have backpointers.
This means that we'll be able to kill cached pointers in the
bucket_invalidate path, when invalidating/reusing buckets containing
cached data, instead of leaving them around to be cleaned up by gc_gens
garbago collection - which requires a full metadata scan.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Introduce per-entry locks, like with struct bucket - the stripes heap is
going away.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a version of kvmalloc() that doesn't have the INT_MAX limit; large
filesystems do hit this.
We'll want to get rid of the in-memory bucket gens array, but we're not
there quite yet.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch_backpointer no longer contains the bucket_offset field, it's just a
direct LBA mapping (with low bits to account for compressed extent
splitting), so we don't need to refer to the device to construct it
anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Originally, we ran insert triggers before overwrite so that if an extent
was being moved (by fallocate insert/collapse range), the bucket sector
count wouldn't hit 0 partway through, and so we don't trigger state
changes caused by that too soon.
But this is better solved by just moving the data type change to the
alloc trigger itself, where it's already called.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't hold mark_lock while calling fsck_err() - that's a deadlock,
mark_lock is meant to be a leaf node lock.
It's also unnecessary for gc_bucket() and bucket_gen(); rcu suffices
since the bucket_gens array describes its size, and we can't race with
device removal or resize during gc/fsck since that takes state lock.
Reported-by: syzbot+38641fcbda1aaffefdd4@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The early-early allocation path, bch2_bucket_alloc_new_fs(), is no
longer needed - and inconsistencies around new_fs_bucket_idx have been a
frequent source of bugs.
Reported-by: syzbot+592425844580a6598410@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Since we no longer store backpointers in alloc keys, there's no reason
not to pass around bkey_i_backpointers; this means we don't have to pass
the bucket pos separately.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we find an error that indicates that we need to run fsck, we can
specify that directly with run_explicit_recovery_pass().
These are now log_fsck_err() calls: we're just logging in the superblock
that an error occurred - and possibly doing an emergency shutdown,
depending on policy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add the __counted_by compiler attribute to the flexible array member b
to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Use struct_size() to calculate the number of bytes to be allocated.
Update bucket_gens->nbuckets and bucket_gens->nbuckets_minus_first when
resizing.
Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_folio_reservation_get_partial(), on partial success, will now
return a reservation that's aligned to the filesystem blocksize.
This is a partial fix for fstests generic/299 - fio verify is badly
behaved in the presence of short writes that aren't aligned to its
blocksize.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This is necessary for erasure coded pointers to devices that have been
removed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want to be using private errcodes whenever possible, for better error
messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dev_rcu() now properly errors if the device is invalid
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
No reason for it not to be where it's needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add the __counted_by compiler attribute to the flexible array members
devs to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Increment nr_devs before adding a new device to the devs array and
adjust the array indexes accordingly. Add a helper macro for adding a
new device.
In bch2_journal_read(), explicitly set nr_devs to 0.
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fixes: 49aa7830396b ("bcachefs: Fix rebalance_work accounting")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
rebalance_work was keying off of the presence of rebelance_opts in the
extent - but that was incorrect, we keep those around after rebalance
for indirect extents since the inode's options are not directly
available
Fixes: 20ac515a9cc7 ("bcachefs: bch_acct_rebalance_work")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We run this in full RW mode now, so we have to guard against the
superblock buffer being reallocated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reported-by: syzbot+e3938cd6d761b78750e6@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This adds another disk accounting counter to track usage per inode
number (any snapshot ID).
This will be used for a couple things:
- It'll give us a way to tell the user how much space a given file ista
consuming in all snapshots; i.e. how much extra space it's consuming
due to snapshot versioning.
- It counts number of extents and total size of extents (both in btree
keyspace sectors and actual disk usage), meaning it gives us average
extent size: that is, it'll let us cheaply find fragmented files that
should be defragmented.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a line for capacity
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|