Age | Commit message (Collapse) | Author |
|
The new guard(), scoped_guard() allow for more natural code.
Some of the uses with creative flow control have been left.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Convert device IO refs to enumerated_refs, for easier debugging of
refcount issues.
Simple conversion: enumerate all users and convert to the new helpers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Drop the single-purpose write ref code in bcachefs.h, and convert to
enumarated refs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_print_string_as_lines() is a low level helper that allows messages
longer than 1k to be printed without truncation.
But we should always be printing with the helpers that take a filesystem
object, if we're in fsck they direct output to the userspace process
controlling fsck instead of the dmesg log.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Replace with logging the error in the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We won't be root causing this in the immediate future, and it's fairly
innocuous - so just log it in the superblock.
https://github.com/koverstreet/bcachefs/issues/869
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was planned to be done ages ago, now finally completed; there are
places where we have quite a few btree_trans objects on the stack, so
this reduces stack usage somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We now have separate per device io_refs for read and write access.
This fixes a device removal bug where the discard workers were still
running while we're removing alloc info for that device.
It's also a bit of hardening; we no longer allow writes to devices that
are read-only.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The hole we find in the btree might be fully dirty in the page cache. If
so, keep searching.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't call bch2_seek_pagecache_hole(), and block on page locks, with
btree locks held.
This is easily fixed because we're at the end of the transaction - we
can just unlock, we don't need a drop_locks_do().
Reported-by: https://github.com/nagalun
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
__bch_truncate_folio() may return 1 to indicate dirtyness of the folio
being truncated, needed for fpunch to get the i_size writes correct.
But truncate was forgetting to clear ret, and sometimes returning it as
an error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, io path option changes on a file would be picked up
automatically and applied to existing data - but not for reflinked data,
as we had no way of doing this safely. A user may have had permission to
copy (and reflink) a given file, but not write to it, and if so they
shouldn't be allowed to change e.g. nr_replicas or other options.
This uses the incompat feature mechanism in the previous patch to add a
new incompatible flag to bch_reflink_p, indicating whether a given
reflink pointer may propagate io path option changes back to the
indirect extent.
In this initial patch we're only setting it for the source extents.
We'd like to set it for the destination in a reflink copy, when the user
has write access to the source, but that requires mnt_idmap which is not
curretly plumbed up to remap_file_range.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More check and repair code: this fixes a warning in
bch2_journal_flush_seq_async()
Reported-by: syzbot+d119b445ec739e7f3068@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
__filemap_get_folio the return value cannot be NULL, so unnecessary checks
are removed.
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We'll be introducing btree_iter_peek_prev_min(), so rename for
consistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This used to not matter, but now we're being more strict.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fix a bug where mount was failing with -ERESTARTSYS:
https://github.com/koverstreet/bcachefs/issues/741
We only want the interruptible wait when called from fsync.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
the standard vfs inode hash table suffers from painful lock contention -
this is long overdue
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
By removing the early-exit when REMAP_FILE_DEDUP is set, we should be
able to support the fideduperange ioctl, albeit less efficiently than if
we handled some of the extent locking and comparison logic inside
bcachefs. Extent comparison logic already exists inside of
`__generic_remap_file_range_prep`.
Signed-off-by: Reed Riley <reed@riley.engineer>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Add trace_bch2_sync_fs() and trace_bch2_fsync() implementations.
The output in trace is as follows:
sync-29779 [000] ..... 193.700935: bch2_sync_fs: dev 254,16 wait 1
<...>-40027 [002] ..... 342.535227: bch2_fsync: dev 254,32 ino 4099 parent 4096 datasync 1
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We already using mapping_set_error() in bch2_writepage_io_done(), so all
we need to do is to use file_check_and_advance_wb_err() when handling
fsync() requests in bch2_fsync().
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
better stack traces
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fsync has a slightly odd usage of -EROFS, where it means "does not
support fsync". I didn't choose it...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Combine iter/update/trigger/str_hash flags into a single enum, and
x-macroize them for a to_text() function later.
These flags are all for a specific iter/key/update context, so it makes
sense to group them together - iter/update/trigger flags were already
given distinct bits, this cleans up and unifies that handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_journal_flush_seq requires us to have a write ref
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
REQ_OP_FLUSH is only for internal use in the blk-mq and request based
drivers. File systems and other block layer consumers must use
REQ_OP_WRITE | REQ_PREFLUSH as documented in
Documentation/block/writeback_cache_control.rst.
While REQ_OP_FLUSH appears to work for blk-mq drivers it does not
get the proper flush state machine handling, and completely fails
for any bio based drivers, including all the stacking drivers. The
block layer will also get a check in 6.8 to reject this use case
entirely.
[Note: completely untested, but as this never got fixed since the
original bug report in November:
https://bugzilla.kernel.org/show_bug.cgi?id=218184
and the the discussion in December:
https://lore.kernel.org/all/20231221053016.72cqcfg46vxwohcj@moria.home.lan/T/
this seems to be best way to force it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
drop_locks_do() should not be used in a fastpath without first trying
the do in nonblocking mode - the unlock and relock will cause excessive
transaction restarts and potentially livelocking with other threads that
are contending for the same locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When investigating transient failures of generic/441 on bcachefs, it
was determined that the cause of the failure was a combination of
unconditional emergency shutdown and racing between background
journal activity and the test switchover from a working device
mapper table to an error injecting table.
Part of the reason for this sequence of events is that bcachefs
aggressively flushes as much as possible during fsync(), regardless
of errors. While this is reasonable behavior, it is technically
unnecessary because once an error is returned from fsync(), the
caller cannot make any assumptions about the resilience of data.
Tweak the bch2_fsync() logic to return an error on failure of any of
the steps involved in the flush. Note that this change alone does
not prevent generic/441 failure, but in combination with a test
tweak to avoid racing during the dm-error table switchover it avoids
the unnecessary shutdowns and allows the test to pass reliably on
bcachefs.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In an ideal world, we'd have a common helper that could be used for
sorting a list of inodes into the correct lock order, and then the same
lock ordering could be used for any type of inode lock, not just
i_rwsem.
But the lock ordering rules for i_rwsem are a bit complicated, so -
abandon that dream for now and do it the more standard way.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're using more stack than we'd like in a number of functions, and
btree_trans is the biggest object that we stack allocate.
But we have to do a heap allocatation to initialize it anyways, so
there's no real downside to heap allocating the entire thing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
end_offset
The variables start_offset and end_offset are being initialized with
values that are never read, they being re-assigned later on. The
initializations are redundant and can be removed.
Cleans up clang-scan build warnings:
fs/bcachefs/fs-io.c:243:11: warning: Value stored to 'start_offset' during
its initialization is never read [deadcode.DeadStores]
fs/bcachefs/fs-io.c:244:11: warning: Value stored to 'end_offset' during
its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This pulls the non vfs specific parts of truncate and finsert/fcollapse
out of fs-io.c, and moves them to io_misc.c.
This is prep work for logging these operations, to make them atomic in
the event of a crash.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This will be used when we need to re-hash a directory tree when setting
flags.
It is not possible to have concurrent btree_trans on a thread.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
fs-io.c is too big - time for some reorganization
- fs-dio.c: direct io
- fs-pagecache.c: pagecache data structures (bch_folio), utility code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We've observed significant lock thrashing on fstests generic/083 in
fallocate, due to dropping and retaking btree locks when checking the
pagecache for data.
This adds a nonblocking mode to bch2_clamp_data_hole(), where we only
use folio_trylock(), and can thus be used safely while btree locks are
held - thus we only have to drop btree locks as a fallback, on actual
lock contention.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Previously, fallocate would only check the state of the extents btree
when determining if we need to create a reservation.
But the page cache might already have dirty data or a disk reservation.
This changes __bchfs_fallocate() to call bch2_seek_pagecache_hole() to
check for this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now that we have distinct error codes for different memory allocation
failures, the early init log messages are no longer needed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
- endianness fixes
- mark some things static
- fix a few __percpu annotations
- fix silent enum conversions
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
GFP_NOFS doesn't ever make sense. If we're allocatingc memory it should
be GFP_NOWAIT if btree locks are held, GFP_KERNEL otherwise.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We've been using __GFP_NOFAIL for allocating struct bch_folio, our
private per-folio state.
However, that struct is variable size - it holds state for each sector
in the folio, and folios can be quite large now, which means it's
possible for bch_folio to be larger than PAGE_SIZE now.
__GFP_NOFAIL allocations are undesirable in normal circumstances, but
particularly so at >= PAGE_SIZE, and warnings are emitted for that.
So, this patch adds proper error paths and eliminates most uses of
__GFP_NOFAIL. Also, do some more cleanup of gfp flags w.r.t. btree node
locks: we can use GFP_KERNEL, but only if we're not holding btree locks,
and if we are holding btree locks we should be using GFP_NOWAIT.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Now that we can reliably designate and find the master subvolume out of
a tree of snapshots, we can finally make quotas work with snapshots:
That is - quotas will now _ignore_ snapshot subvolumes, and only be in
effect for the master (non snapshot) subvolume.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|