summaryrefslogtreecommitdiff
path: root/fs/bcachefs/extents.c
AgeCommit message (Collapse)Author
2023-10-22bcachefs: Extent helper improvementsKent Overstreet
- __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available outside extents. - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and non const versions - bch2_extent_has_ptr() now returns the pointer it found Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix buffer overrun in ec_stripe_update_extent()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check for redundant ec entries/stripe ptrsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Cached pointers should not be erasure codedKent Overstreet
There's no reason to erasure code cached pointers: we'll always have another copy, and it'll be cheaper to read the other copy than do a reconstruct read. And erasure coded cached pointers would add complications that we'd rather not have to deal with, so let's make sure to disallow them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change bkey_invalid() rw param to flagsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Nocow supportKent Overstreet
This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Unwritten extents supportKent Overstreet
- bch2_extent_merge checks unwritten bit - read path returns 0s for unwritten extents without actually reading - reflink path skips over unwritten extents - bch2_bkey_ptrs_invalid() checks for extents with both written and unwritten extents, and non-normal extents (stripes, btree ptrs) with unwritten ptrs - fiemap checks for unwritten extents and returns FIEMAP_EXTENT_UNWRITTEN Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Better inlining in core write pathKent Overstreet
Provide inline versions of some allocation functions - bch2_alloc_sectors_done_inlined() - bch2_alloc_sectors_append_ptrs_inlined() and use them in the core IO path. Also, inline bch2_extent_update_i_size_sectors() and bch2_bkey_append_ptr(). In the core write path, function call overhead matters - every function call is a jump to a new location and a potential cache miss. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More errcode cleanupKent Overstreet
We shouldn't be overloading standard error codes now that we have provisions for bcachefs-specific errorcodes: this patch converts super.c and super-io.c to per error site errcodes, with a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New bpos_cmp(), bkey_cmp() replacementsKent Overstreet
This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Assorted checkpatch fixesKent Overstreet
checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Redo data_update interfaceKent Overstreet
This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Check for extents with too many ptrsKent Overstreet
We have a hardcoded maximum on number of pointers in an extent that's used by some other data structures - notably bch_devs_list - but we weren't actually checking for it. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Printbuf reworkKent Overstreet
This converts bcachefs to the modern printbuf interface/implementation, synced with the version to be submitted upstream. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix extent mergingKent Overstreet
When merging extents, we have to check that we won't overflow size fields in any CRC entries - but the check for this was wrong, because in the loop it was in we weren't keeping a pointer to the (packed, encoded) CRC field. Fix this by moving it to its own loop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add rw to .key_invalid()Kent Overstreet
This adds a new parameter to .key_invalid() methods for whether the key is being read or written; the idea being that methods can do more aggressive checks when a key is newly created and being written, when we wouldn't want to delete the key because of those checks. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Convert .key_invalid methods to printbufsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improve bch2_bkey_ptrs_to_text()Kent Overstreet
Print bucket:offset when the filesystem is online; this makes debugging easier when correlating with alloc updates. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Add .to_text() methods for all superblock sectionsKent Overstreet
This patch improves the superblock .to_text() methods and adds methods for all types that were missing them. It also improves printbufs by allowing them to specfiy what units we want to be printing in, and adds new wrapper methods for unifying our kernel and userspace environments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Turn encoded_extent_max into a regular optionKent Overstreet
It'll now be handled at format time and in sysfs like other options - it still can only be set at format time, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Option improvementsKent Overstreet
This adds flags for options that must be a power of two (block size and btree node size), and options that are stored in the superblock as a power of two (encoded extent max). Also: options are now stored in memory in the same units they're displayed in (bytes): we now convert when getting and setting from the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: More enum stringsKent Overstreet
This patch converts more enums in the on disk format to our standard x-macro-with-strings deal - to enable better pretty-printing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Fix implementation of KEY_TYPE_errorKent Overstreet
When force-removing a device, we were silently dropping extents that we no longer had pointers for - we should have been switching them to KEY_TYPE_error, so that reads for data that was lost return errors. This patch adds the logic for switching a key to KEY_TYPE_error to bch2_bkey_drop_ptr(), and improves the logic somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Plumb through subvolume idKent Overstreet
To implement snapshots, we need every filesystem btree operation (every btree operation without a subvolume) to start by looking up the subvolume and getting the current snapshot ID, with bch2_subvolume_get_snapshot() - then, that snapshot ID is used for doing btree lookups in BTREE_ITER_FILTER_SNAPSHOTS mode. This patch adds those bch2_subvolume_get_snapshot() calls, and also switches to passing around a subvol_inum instead of just an inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: btree_pathKent Overstreet
This splits btree_iter into two components: btree_iter is now the externally visible componont, and it points to a btree_path which is now reference counted. This means we no longer have to clone iterators up front if they might be mutated - btree_path can be shared by multiple iterators, and cloned if an iterator would mutate a shared btree_path. This will help us use iterators more efficiently, as well as slimming down the main long lived state in btree_trans, and significantly cleans up the logic for iterator lifetimes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve btree_bad_header() error messageKent Overstreet
We should always print out the full btree node ptr. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Merging for indirect extentsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Improved extent mergingKent Overstreet
Previously, checksummed extents could only be merged when the checksum covered only the currently live data. xfstest generic/064 creates a test file, then uses finsert calls to split the extent, then collapse calls to see if they get merged. But without any reads to trigger the narrow_crcs path, each of the split extents will still have a checksum for the entire original extent. This patch improves the extent merge path so that if either of the extents we're attempting to merge has a checksum that covers the entire merged extent, we just use that checksum. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Clean up key mergingKent Overstreet
This patch simplifies the key merging code by getting rid of partial merges - it's simpler and saner if we just don't merge extents when they'd overflow k->size. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22bcachefs: Start using bpos.snapshot fieldKent Overstreet
This patch starts treating the bpos.snapshot field like part of the key in the btree code: * bpos_successor() and bpos_predecessor() now include the snapshot field * Keys in btrees that will be using snapshots (extents, inodes, dirents and xattrs) now always have their snapshot field set to U32_MAX The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that determines whether we're iterating over keys in all snapshots or not - internally, this controlls whether bkey_(successor|predecessor) increment/decrement the snapshot field, or only the higher bits of the key. We add a new member to struct btree_iter, iter->snapshot: when BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always equal iter->snapshot, which will be 0 for btrees that don't use snapshots, and alsways U32_MAX for btrees that will use snapshots (until we enable snapshot creation). This patch also introduces a new metadata version number, and compat code for reading from/writing to older versions - this isn't a forced upgrade (yet). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add an .invalid method for bch2_btree_ptr_v2Kent Overstreet
It was using the method for btree_ptr_v1, but that wasn't checking all the fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Don't overwrite snapshot field in bch2_cut_back()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill bkey ops->debugcheck methodKent Overstreet
This code used to be used for running some assertions on alloc info at runtime, but it long predates fsck and hasn't been good for much in ages - we can delete it now. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Require all btree iterators to be freedKent Overstreet
We keep running into occasional bugs with btree transaction iterators overflowing - this will make those bugs more visible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for more enumsKent Overstreet
This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename BTREE_ID enums for consistency with other enumsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_discard is no longer usedKent Overstreet
KEY_TYPE_discard used to be used for extent whiteouts, but when handling over overlapping extents was lifted above the core btree code it became unused. This patch updates various code to reflect that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix some (spurious) warnings about uninitialized varsKent Overstreet
These are only complained about when building in userspace, for some reason. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop invalid stripe ptrs in fsckKent Overstreet
More repair code, now that we can repair extents during initial gc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_alloc_v2Kent Overstreet
This introduces a new version of KEY_TYPE_alloc, which uses the new varint encoding introduced for inodes. This means we'll eventually be able to support much larger bucket sizes (for SMR devices), and the read/write time fields are expanded to 64 bits - which will be used in the next patch to get rid of the periodic rescaling of those fields. Also, for buckets that are members of erasure coded stripes, this adds persistent fields for the index of the stripe they're members of and the stripe redundancy. This is part of work to get rid of having to scan and read into memory the alloc and stripes btrees at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add BTREE_PTR_RANGE_UPDATEDKent Overstreet
This is so that when we discover btree topology issues, we can just update the pointer to a btree node and signal btree read path that the min/max keys in the node header should be updated from the node pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Change when we allow overwritesKent Overstreet
Originally, we'd check for -ENOSPC when getting a disk reservation whenever the new extent took up more space on disk than the old extent. Erasure coding screwed this up, because with erasure coding writes are initially replicated, and then in the background the extra replicas are dropped when the stripe is created. This means that with erasure coding enabled, writes will always take up more space on disk than the data they're overwriting - but, according to posix, overwrites aren't supposed to return ENOSPC. So, in this patch we fudge things: if the new extent has more replicas than the _effective_ replicas of the old extent, or if the old extent is compressed and the new one isn't, we check for ENOSPC when getting the disk reservation - otherwise, we don't. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Check for duplicate device ptrs in bch2_bkey_ptrs_invalid()Kent Overstreet
This is something we clearly should be checking for, but weren't - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop sysfs interface to debug parametersKent Overstreet
It's not used much anymore, the module paramter interface is better. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Indirect inline data extentsKent Overstreet
When inline data extents were added, reflink was forgotten about - we need indirect inline data extents for reflink + inline data to work correctly. This patch adds them, and a new feature bit that's flipped when they're used. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Refactor replicas codeKent Overstreet
Awhile back the mechanism for garbage collecting unused replicas entries was significantly improved, but some cleanup was missed - this patch does that now. This is also prep work for a patch to account for erasure coded parity blocks separately - we need to consolidate the logic for checking/marking the various replicas entries from one bkey into a single function. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix extent_ptr_durability() calculation for erasure coded dataKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use x-macros for data typesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Improve assorted error messagesKent Overstreet
This also consolidates the various checks in bch2_mark_pointer() and bch2_trans_mark_pointer(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a locking bug in bch2_btree_ptr_debugcheck()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>