summaryrefslogtreecommitdiff
path: root/fs/bcachefs/ec.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Split out struct gc_stripe from struct stripeKent Overstreet
We have two radix trees of stripes - one that mirrors some information from the stripes btree in normal operation, and another that GC uses to recalculate block usage counts. The normal one is now only used for finding partially empty stripes in order to reuse them - the normal stripes radix tree and the GC stripes radix tree are used significantly differently, so this patch splits them into separate types. In an upcoming patch we'll be replacing c->stripes with a btree that indexes stripes by the order we want to reuse them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Erasure coding fixesKent Overstreet
When we added the stripe and stripe_redundancy fields to alloc keys, we neglected to add them to the functions that convert back and forth with the in-memory types. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Push c->mark_lock usage down to where it is neededKent Overstreet
This changes the bch2_mark_key() and related paths to take mark lock where it is needed, instead of taking it in the upper transaction commit path - by pushing down locking we'll be able to handle fsck errors locally instead of requiring a separate check in the btree_gc code for replicas being marked. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert bucket_alloc_ret to negative error codesKent Overstreet
Start a new header, errcode.h, for bcachefs-private error codes - more error codes will be converted later. This patch just converts bucket_alloc_ret so that they can be mixed with standard error codes and passed as ERR_PTR errors - the ec.c code was doing this already, but incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: More enum stringsKent Overstreet
This patch converts more enums in the on disk format to our standard x-macro-with-strings deal - to enable better pretty-printing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: BTREE_TRIGGER_INSERT now only means insertKent Overstreet
This allows triggers to distinguish between a key entering the btree - i.e. being called from the trans commit path - vs. being called on a key that already exists, i.e. by GC. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert bch2_mark_key() to take a btree_trans *Kent Overstreet
This helps to unify the interface between bch2_mark_key() and bch2_trans_mark_key() - and it also gives access to the journal reservation and journal seq in the mark_key path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Assorted ec fixesKent Overstreet
- The backpointer that ec_stripe_update_ptrs() uses now needs to include the snapshot ID, which means we have to change where we add the backpointer to after getting the snapshot ID for the new extents - ec_stripe_update_ptrs() needs to be calling bch2_trans_begin() - improve error message in bch2_mark_stripe() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: bch2_trans_exit() no longer returns errorsKent Overstreet
Now that peek_node()/next_node() are converted to return errors directly, we don't need bch2_trans_exit() to return errors - it's cleaner this way and wasn't used much anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_ITER_SET_POS_AFTER_COMMITKent Overstreet
BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically advance extent btree iterators on sucessful commit. But with the upcomnig btree_path patch it's getting more awkward to support, and it adds overhead to core data structures that's only used in a few places, and can be easily done by the caller instead. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Reduce iter->trans usageKent Overstreet
Disfavoured, and should go away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: ensure iter->should_be_locked is setDan Robertson
Ensure that iter->should_be_locked value is set to true before we call bch2_trans_update in ec_stripe_update_ptrs. Signed-off-by: Dan Robertson <dan@dlrobertson.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Extensive triggers cleanupsKent Overstreet
- We no longer mark subsets of extents, they're marked like regular keys now - which means we can drop the offset & sectors arguments to trigger functions - Drop other arguments that are no longer needed anymore in various places - fs_usage - Drop the logic for handling extents in bch2_mark_update() that isn't needed anymore, to match bch2_trans_mark_update() - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we don't have an old key to mark Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for errors from bch2_trans_update()Kent Overstreet
Upcoming refactoring is going to change bch2_trans_update() to start returning transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Clean up bch2_btree_and_journal_walk()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix two btree iterator leaksKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Replace bch2_btree_iter_next() calls with bch2_btree_iter_advanceKent Overstreet
The way btree iterators work internally has been changing, particularly with the iter->real_pos changes, and bch2_btree_iter_next() is no longer hyper optimized - it's just advance followed by peek, so it's more efficient to just call advance where we're not using the return value of bch2_btree_iter_next(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Require all btree iterators to be freedKent Overstreet
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Have fsck check for stripe pointers matching stripeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix unnecessary read amplificaiton when allocating ec stripesRobbie Litchfield
When allocating an erasure coding stripe, bcachefs will always reuse any partial stripes before reserving a new stripe. This causes unnecessary read amplification when preparing a stripe for writing. This patch changes bcachefs to always reserve new stripes first, only relying on stripe reuse when copygc needs more time to empty buckets from existing stripes. Signed-off-by: Robbie Litchfield <blam.kiwi@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_alloc_v2Kent Overstreet
This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix build in userspaceKent Overstreet
The userspace bch_err() macro doesn't use the filesystem argument. Could also be fixed with a better macro. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't allocate stripes at POS_MINKent Overstreet
In the future, stripe index 0 will be a sentinal value. This patch doesn't disallow stripes at POS_MIN yet, leaving that for when we do the on disk format changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rework allocating buckets for stripesKent Overstreet
Allocating buckets for existing stripes was busted, in part because the data structures were too contorted. This reworks new stripes so that we have an array of open buckets that matches blocks in the stripe, and it's sparse if we're reusing an existing stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Preserve stripe blockcounts on existing stripesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill stripe->dirtyKent Overstreet
This makes bch2_stripes_write() work more like bch2_alloc_write(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Factor out bch2_ec_stripes_heap_start()Kent Overstreet
This fixes a bug where mark and sweep gc incorrectly was clearing out the stripes heap and causing assertions to fire later - simpler to just create the stripes heap after gc has finished. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Erasure coding fixes & refactoringKent Overstreet
- Originally bch_extent_stripe_ptr didn't contain the block index, instead we'd have to search through the stripe pointers to figure out which pointer matched. When the block field was added to bch_extent_stripe_ptr, not all of the code was updated to use it. This patch fixes that, and we also now verify that field where it makes sense. - The ec_stripe_buf_init/exit() functions have been improved, and are now used by the bch2_ec_read_extent() (recovery read) path. - get_stripe_key() is now used by bch2_ec_read_extent(). - We now have a getter and setter for checksums within a stripe, like we had previously for block sector counts, and ec_generate_checksums and ec_validate_checksums are now quite a bit smaller and cleaner. ec.c still needs a lot of work, but this patch is slowly moving things in the right direction. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Reduce/kill BKEY_PADDED useKent Overstreet
With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use separate new stripes for copygc and non-copygcKent Overstreet
Allocations for copygc have to be kept separate from everything else, so that copygc doesn't get starved. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change allocations for ec stripes to blockingKent Overstreet
We don't want writes to not get erasure coded just because the allocator temporarily wasn't keeping up. However, it's not guaranteed that these allocations will ever succeed, we can currently get stuck - especially if devices are different sizes - we still have work to do in this area. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't read existing stripes synchronously in write pathKent Overstreet
Previously, in the stripe creation path, when reusing an existing stripe we'd read the existing stripe synchronously - ouch. Now, we allocate two stripe bufs if we're using an existing stripe, so that we can do the read asynchronously - and, we read the full stripe so that we can run recovery, if necessary. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't use BTREE_INSERT_USE_RESERVE so muchKent Overstreet
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add BCH_BKEY_PTRS_MAXKent Overstreet
This now means "the maximum number of pointers within a bkey" - and bch_devs_list is updated to use it instead of BCH_REPLICAS_MAX, since stripes can contain more than BCH_REPLICAS_MAX pointers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Only try to get existing stripe once in stripe create pathKent Overstreet
The stripe creation path was too state-machiney: it would always run the full state machine until it had succesfully created a new stripe. But if we tried to get and reuse an existing stripe after we'd already allocated some buckets, the buckets we'd allocated might have conflicted with the blocks in the existing stripe we need to keep - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve some IO error messagesKent Overstreet
it's useful to know whether an error was for a read or a write - this also standardizes error messages a bit more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Build fixes for 32bit x86Kent Overstreet
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying... also switch to atomic64_cmpxchg instead of cmpxchg() for journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks for when it's not supported. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Account for stripe parity sectors separatelyKent Overstreet
Instead of trying to charge EC parity to the data within the stripe (which is subject to rounding errors), let's charge it to the stripe itself. It should also make -ENOSPC issues easier to deal with if we charge for parity blocks up front, and means we can also make more fine grained accounting available to the user. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improvements to writing alloc infoKent Overstreet
Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Perf improvements for bch_alloc_read()Kent Overstreet
On large filesystems reading in the alloc info takes a significant amount of time. But we don't need to be calling into the fully general bch2_mark_key() path, just open code what we need in bch2_alloc_read_fn(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_new_stripes_to_text()Kent Overstreet
painful looking typo, fortunately difficult to hit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert various code to printbufKent Overstreet
printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add bch2_blk_status_to_str()Kent Overstreet
We define our own BLK_STS_REMOVED, so we need our own to_str helper too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop extra pointers when marking data as in a stripeKent Overstreet
We ideally want the buckets used for the extra initial replicas to be reused right away. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for data typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allow existing stripes to be updated with new data bucketsKent Overstreet
This solves internal fragmentation within stripes. We already have copygc, which evacuates buckets that are partially or mostly empty, but it's up to the ec code that manages stripes to deal with stripes that have empty buckets in them. This patch changes the path for creating new stripes to check if there's existing stripes with empty buckets - and if so, update them with new data buckets instead of creating new stripes. TODO: improve the disk space accounting so that we can only use this (more expensive path) when we have too much fragmentation in existing stripes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor stripe creationKent Overstreet
Prep work for the patch to update existing stripes with new data blocks. This moves allocating new stripes into ec.c, and also sets up the data structures so that we can handly only allocating some of the blocks in a stripe. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Move stripe creation to workqueueKent Overstreet
This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>