summaryrefslogtreecommitdiff
path: root/fs/bcachefs/fs-io.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: bch2_trans_get_iter() no longer returns errorsKent Overstreet
Since we now always preallocate the maximum number of iterators when we initialize a btree transaction, getting an iterator never fails - we can delete a fair amount of error path code. This patch also simplifies the iterator allocation code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for __readahead_batch getting partial batchKent Overstreet
We were incorrectly ignoring the return value of __readahead_batch, leading to a null ptr deref in __bch2_page_state_create(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Deadlock prevention for ei_pagecache_lockKent Overstreet
In the dio write path, when get_user_pages() invokes the fault handler we have a recursive locking situation - we have to handle the lock ordering ourselves or we have a deadlock: this patch addresses that by checking for locking ordering violations and doing the unlock/relock dance if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use attach_page_private and detach_page_privateMatthew Wilcox (Oracle)
These recently added helpers simplify the code. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Remove page_state_init_for_readMatthew Wilcox (Oracle)
This is dead code; delete the function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix rare use after free in read pathKent Overstreet
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent() reallocates the buffer, k in bch2_read - which we pointed at the bkey_on_stack buffer - will now point to a stale buffer. Whoops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix __bch2_truncate_page()Kent Overstreet
__bch2_truncate_page() will mark some of the blocks in a page as unallocated. But, if the page is mmapped (and writable), every block in the page needs to be marked dirty, else those blocks won't be written by __bch2_writepage(). The solution is to change those userspace mappings to RO, so that we force bch2_page_mkwrite() to be called again. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix short buffered writesKent Overstreet
In the buffered write path, we have to check for short writes that write to the full page, where the page wasn't UpToDate; when this happens, the page is partly garbage, so we have to zero it out and revert that part of the write. This check was wrong - we reverted total from copied, but didn't revert the iov_iter, probably also leading to corrupted writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't cap ios in dio write path at 2 MBKent Overstreet
It appears this was erronious, a different bug was responsible Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor dio write code to reinit bch_write_opKent Overstreet
This fixes a bug where the BCH_WRITE_SKIP_CLOSURE_PUT was set incorrectly, causing the completion to be delivered multiple times. oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an option to disable reflink supportKent Overstreet
Reflink might be buggy, so we're adding an option so users can help bisect what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix stack corruptionYuxuan Shui
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no guarantee that it will be big enough to hold the bkey. And bch_read_indirect_extent is not aware of bkey_on_stack to call realloc on it. This cause a stack corruption. This commit makes bch_read_indirect_extent aware of bkey_on_stack so it can call realloc when appropriate. Tested-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't issue writes that are more than 1 MBKent Overstreet
the bcachefs io path in io.c can't bounce writes larger than that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix fallocate FL_INSERT_RANGEKent Overstreet
This was another bug because of bch2_btree_iter_set_pos() invalidating iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a use after free in dio write pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill TRANS_RESET_MEM|TRANS_RESET_ITERSKent Overstreet
All iterators should be released now with bch2_trans_iter_put(), so TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is always used. Also convert more code to __bch2_trans_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Sort & deduplicate updates in bch2_trans_update()Kent Overstreet
Previously, when doing multiple update in the same transaction commit that overwrote each other, we relied on doing the updates in the same order as the bch2_trans_update() calls in order to get the correct result. But that wasn't correct for triggers; bch2_trans_mark_update() when marking overwrites would do the wrong thing because it hadn't seen the update that was being overwritten. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out btree_trigger_flagsKent Overstreet
The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_ATOMICKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert all bch2_trans_commit() users to BTREE_INSERT_ATOMICKent Overstreet
BTREE_INSERT_ATOMIC should really be the default mode, and there's not that much code that doesn't need it - so this is prep work for getting rid of the flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_trans_reset() calls should be at the tops of loopsKent Overstreet
It needs to be called when we get -EINTR due to e.g. lock restart - this fixes a transaction iterators overflow bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for an assertion on filesystem errorKent Overstreet
Normally the in memory i_size is always greater than or equal to i_size on disk; this doesn't hold on filesystem error. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bkey_on_stack_reassemble()Kent Overstreet
Small helper function. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reorganize extents.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Inline data extentsKent Overstreet
This implements extents that have their data inline, in the value, instead of the bkey value being pointers to the data - and the read and write paths are updated to read from these new extent types and write them out, when the write size is small enough. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out extent_update.cKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework of cut_front & cut_backKent Overstreet
This changes bch2_cut_front and bch2_cut_back so that they're able to shorten the size of the value, and it also changes the extent update path to update the accounting in the btree node when this happens. When the size of the value is shortened, they zero out the space that's no longer used, so it's interpreted as noops (as implemented in the last patch). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bkey_on_stackKent Overstreet
This implements code for storing small bkeys on the stack and allocating out of a mempool if they're too big. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use wbc_to_write_flags()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Some reflink fixesKent Overstreet
len might fit into a loff_t when aligned_len does not - make sure we use a u64 for aligned_len. Also, we weren't always extending the inode correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Eliminate function calls in DIO fastpathsKent Overstreet
We can assume that usually buffered and O_DIRECT IO won't be mixed, and the calls to flush the page cache won't be needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: DIO write path only needs to shoot down pagecache once, not twiceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add pagecache_add lock to buffered IO path, fault pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't hold inode lock longer than necessary in dio write pathKent Overstreet
In theory we should be able to do (non appending/extending) dio writes without taking the inode lock at all - but this gets us most of the way there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Avoid atomics in write fast pathKent Overstreet
This adds some horrible hacks, but the atomic ops for closures were getting to be a pretty expensive part of the write path. We don't want to rip out closures entirely from the write path, because they're used for e.g. waiting on the allocator, or waiting on the journal flush, and that stuff would get really ugly without closures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix an error path raceKent Overstreet
On IO error, bch2_writepages_io_done() will set the page state to indicate nothing's already reserved (since the write didn't happen, we don't know what's already reserved). This can race with the buffered IO path, in between getting a disk reservation and calling bch2_set_page_dirty(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor bch2_trans_commit() pathKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Limit bios in writepages path to 256MKent Overstreet
This works around a bug where bio_full() doesn't check for bio->bi_iter.bi_size overflowing - and, we don't really want to build bios that are that big anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bchfs_extent_update()Kent Overstreet
The generic IO path now handles inode updates for i_size and i_sectors - this means we can drop a fair amount of code from fs-io.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert bch2_fpunch to bch2_extent_update()Kent Overstreet
As before - we're moving non Linux specific code out of fs-io.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out bchfs_extent_update()Kent Overstreet
The next few patches are going to be more moving the logic around i_size/i_sectors updates to io.c, and better separating the Linux VFS specific code from core bcachefs code, to better support the fuse port. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill some dependencies on ei_inodeKent Overstreet
Moving bch2_extent_update() to io.c will be greatly simplified if we no longer have to keep ei_inode.bi_size/bi_sectors up to date. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check if extending inode differentlyKent Overstreet
In bch2_extent_update(), we have to update the inode if i_size is changing (the file is being extend) or if i_sectors is changing, but we want to avoid touching the inode if it's not necessary. Change sum_sector_overwrites() to also check if there's already data above where we're writing to - this means we're definitely not extending the file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add a lock to bch_page_stateKent Overstreet
We can't use the page lock to protect it, because on writeback IO error we need to access the page state before calling end_page_writeback() and the page lock semantics are completely insane so that deadlocks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_extent_atomic_end() now traverses iterKent Overstreet
This fixes a bug in io.c bch2_write_index_default() - it was missing the traverse call, but bch2_extent_atomic_end returns an error now and can just call it itself. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_inode_peek()/bch2_inode_write()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix __bch2_buffered_write() returning -ENOMEMKent Overstreet
When grab_cache_page_write_begin() fails but we did pin some pages, we shouldn't return -ENOMEM, we should do a partial write. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework btree iterator lifetimesKent Overstreet
The btree_trans struct needs to memoize/cache btree iterators, so that on transaction restart we don't have to completely redo btree lookups, and so that we can do them all at once in the correct order when the transaction had to restart to avoid a deadlock. This switches the btree iterator lookups to work based on iterator position, instead of trying to match them up based on the stack trace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill deferred btree updatesKent Overstreet
Will be replaced by cached btree iterators Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix for partial buffered writesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>