Age | Commit message (Collapse) | Author |
|
This fixes recursive subvolume removal.
Subvolume deletion is asynchronous; fs_path_parent, and thus the entry
in the subvolume_children btree, need to be cleared when the subvolume
is unlinked from the fs heirarchy - else we'll spuriously think a
subvolume has children and deletion will fail.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Debug for https://github.com/koverstreet/bcachefs/issues/843
Print useful debug info and go emergency read-only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.
Also, add better indenting.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.
Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
To be used by the mount helper in userspace, where we still have options
to be parsed by other layers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add a mode to disable automatic switching to percpu mode, useful when a
time_stats will only be used by one thread and we don't want to have to
flush the percpu buffers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When a filesystem is mounted read-only, subsequent attempts to mount it
as read-write fail with EBUSY. Previously, the error path in
bch2_fs_get_tree() would unconditionally call __bch2_fs_stop(),
improperly freeing resources for a filesystem that was still actively
mounted. This change modifies the error path to only call
__bch2_fs_stop() if the superblock has no valid root dentry, ensuring
resources are not cleaned up prematurely when the filesystem is in use.
Signed-off-by: Florian Albrechtskirchinger <falbrechtskirchinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We want to log errors all at once, not spread across multiple printks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Don't print a newline on empty string; this was causing us to also print
an extra newline when we got to the end of th string.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
syzbot discovered that this one is possible: we have pointers, but none
of them are to valid devices.
Reported-by: syzbot+336a6e6a2dbb7d4dba9a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The hole we find in the btree might be fully dirty in the page cache. If
so, keep searching.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't call bch2_seek_pagecache_hole(), and block on page locks, with
btree locks held.
This is easily fixed because we're at the end of the transaction - we
can just unlock, we don't need a drop_locks_do().
Reported-by: https://github.com/nagalun
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
state_lock guards against devices coming or leaving, changing state, or
the filesystem changing between ro <-> rw.
But it's not necessary for running recovery passes, and holding it
blocks asynchronous events that would cause us to go RO or kick out
devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's no reason for this not to be world readable - it provides the
currently supported on disk format version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fixes "task out to lunch" warnings during recovery on large machines
with lots of dirty data in the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree node scan has to wait on kthread workers that scan each device,
potentially for awhile.
We would like this to be interruptible, but we may need a different
mechanism than signals for that - we've had bugs in the past where
mounts were failing due to checking for signals, and no explanation on
where they came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Data move -> move_get_io_opts -> bch2_get_update_rebalance_opts
requires a not_extents iterator; this fixes the path where we're walking
the extents btree and chase a reflink pointer into the reflink btree.
bch2_lookup_indirect_extent() requires working with an extents iterator
(due to peek_slot() semantics), so we implement
bch2_lookup_indirect_extent_for_move().
This is simplified because there's no need to report
indirect_extent_missing_errors here, that can be deferred until fsck or
when a user reads that data.
Reported-by: Maƫl Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's various checks for "are we going to compress this" - but we're
not going to compress if we know it's incompressible.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We weren't checking that accounting keys have the expected number of
accounters. Originally we probably wanted to be flexible on this, but it
doesn't look like that will be required - accounting is extended by
adding new counter types, not more counters to an existing type.
This means we can drop a BUG_ON() that popped once in automated testing,
and the new validation will make that bug easier to track down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
They were being truncated, printk has a 1k limit per call
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, improve the message in prep_encoded_data() - it now prints
good/bad checksums, and checksum type.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
__bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size
to "the size we can read from the current extent" with a swap, and
restores it to "the size for the total read" after the read_extent call
with another swap.
But we neglected to do the restore before the "if (ret) goto err;" -
which is a problem if we're retrying those errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we're moving an extent that was partially overwritten,
bch2_write_rechecksum() will trim it to the currenty live range.
If we then also want to compress it, it'll be decrypted - but the nonce
has been advanced for the overwritten start of the extent that we
dropped, and we were using the nonce we calculated before rechecksum().
Reported-by: Gabriel de Perthuis <g2p.code@gmail.com>
Fixes: 127d90d2823e ("bcachefs: bch2_write_prep_encoded_data() now returns errcode")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_dev_usage_read() is fairly expensive, we should optimize this more.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It turned out a user was wondering why we were going read-only after a
write error, and he didn't realize he didn't have replication enabled -
this will make that more obvious, and we should be printing it anyways.
Link: https://www.reddit.com/r/bcachefs/comments/1jf9akl/large_data_transfers_switched_bcachefs_to_readonly/
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
00636 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b0
00636 Mem abort info:
00636 ESR = 0x0000000096000005
00636 EC = 0x25: DABT (current EL), IL = 32 bits
00636 SET = 0, FnV = 0
00636 EA = 0, S1PTW = 0
00636 FSC = 0x05: level 1 translation fault
00636 Data abort info:
00636 ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
00636 CM = 0, WnR = 0, TnD = 0, TagAccess = 0
00636 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
00636 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000101b10000
00636 [00000000000000b0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
00636 Internal error: Oops: 0000000096000005 [#1] SMP
00636 Modules linked in:
00636 CPU: 12 UID: 0 PID: 79369 Comm: cat Not tainted 6.14.0-rc6-ktest-g3783b8973ab7 #17757
00636 Hardware name: linux,dummy-virt (DT)
00636 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00636 pc : print_chain+0xb8/0x170
00636 lr : print_chain+0xa0/0x170
00636 sp : ffffff80d9c1bbb0
00636 x29: ffffff80d9c1bbb0 x28: 0000000000000002 x27: ffffff80c1be8250
00636 x26: ffffff80dd9b0000 x25: 0000000000000020 x24: 000000000000002d
00636 x23: 000000000000003c x22: ffffffc080a54518 x21: ffffff80da6e00d0
00636 x20: ffffff80da6e0170 x19: ffffff80c1a1d240 x18: 00000000ffffffff
00636 x17: 3535303937202d3c x16: 203139202d3c2035 x15: 00000000ffffffff
00636 x14: 0000000000000000 x13: ffffff80d71b63f1 x12: 0000000000000006
00636 x11: ffffffc080beb1c0 x10: 0000000000000020 x9 : 00000000000134cc
00636 x8 : 0000000000000020 x7 : 0000000000000004 x6 : 0000000000000020
00636 x5 : ffffff80d71b63f7 x4 : ffffffc080a5451b x3 : 0000000000000000
00636 x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
00636 Call trace:
00636 print_chain+0xb8/0x170 (P)
00636 bch2_check_for_deadlock+0x444/0x5a0
00636 bch2_btree_deadlock_read+0xb4/0x1c8
00636 full_proxy_read+0x74/0xd8
00636 vfs_read+0x90/0x300
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In debug mode, we save the call stack on transaction restart - but
there's no locking, so we can't touch it if we're issuing the restart
from another thread.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're hitting some issues with uninitialized struct padding, flagged by
kmsan.
They appear to be falso positives, otherwise bch2_accounting_validate()
would have flagged them as "junk at end". But for now, we'll need to
initialize disk_accounting_pos with memset().
This adds a new helper, bch2_disk_accounting_mod2(), that initializes a
disk_accounting_pos and does the accounting mod all at once - so overall
things actually get slightly more ergonomic.
BCH_DISK_ACCOUNTING_replicas keys are left for now; KMSAN isn't warning
about them and they're a bit special.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fix a kmsan splat
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We appear to be tripping over a compiler/kmsan bug with padding fields -
this is an easy workaround.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We may sometimes read from uninitialized memory; we know, and that's ok.
We check if a btree node has been reused before waiting on any
outstanding IO.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Catching these early makes them a lot easier to track down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We store to all fields, so the kmsan warnings were spurious - but
initializing via stores to bitfields appear to have been giving the
compiler/kmsan trouble, and they're not necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
kmsan doesn't know about inline assembly, obviously; this will close a
ton of syzbot bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
New data types might be added later, so we don't want to disallow
unknown data types - that'll be a compatibility hassle later. Instead,
ignore them.
Reported-by: syzbot+3a290f5ff67ca3023834@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
These are counted as stripe data in the corresponding alloc keys.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More on the "full online self healing" project:
We now run most of the dirent <-> inode consistency checks, with repair
code, at runtime - the exact same check and repair code that fsck runs.
This will allow us to repair the "dirent points to inode that does not
point back" inconsistencies that have been popping up at runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for calling bch2_check_dirent_target() from bch2_lookup().
- Add an inline wrapper, if the target and backpointer match we can skip
the function call.
- We don't (yet?) want to remove the dirent we did the lookup from (when
we find a directory or subvol with multiple valid dirents pointing to
it), we can defer on that until later. For now, add an "are we in
fsck?" parameter.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're gradually running more and more fsck.c checks at runtime,
whereever applicable; when we do so they get moved out of fsck.c.
Next patch will call bch2_check_dirent_target() from bch2_lookup().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
name <-> inode, code for managing the relationships between inodes and
dirents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Replace these with proper private error codes, so that when we get an
error message we're not sifting through the entire codebase to see where
it came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Prep work for killing off EIO and replacing them with proper private
error codes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's no reason for the caller to do the actual logging, it's all done
the same.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're fixing option parsing in userspace, it now obeys
OPT_SB_FIELD_SECTORS
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The smp_rmb() guarantees that reads from reservations.counter
occur before accessing cur_entry_u64s. It's paired with the
atomic64_try_cmpxchg in journal_entry_open.
Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert these to standard error codes, which means we can pass them
outside the journal code, they're easier to pass to tracepoints, etc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|