summaryrefslogtreecommitdiff
path: root/fs/xfs/scrub/bmap.c
AgeCommit message (Collapse)Author
2025-03-03xfs: allow COW forks on zoned file systems in xchk_bmapChristoph Hellwig
Zoned file systems can have COW forks even without reflinks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
2024-12-23xfs: cross-reference checks with the rt refcount btreeDarrick J. Wong
Use the realtime refcount btree to implement cross-reference checks in other data structures. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappingsDarrick J. Wong
Teach the bmbt scrubber how to perform a comprehensive check that the rmapbt does not contain /any/ mappings that are not described by bmbt records when it's dealing with a realtime file. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23xfs: cross-reference the realtime rmapbtDarrick J. Wong
Teach the data fork and realtime bitmap scrubbers to cross-reference information with the realtime rmap btree. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23xfs: allow queued realtime intents to drain before scrubbingDarrick J. Wong
When a writer thread executes a chain of log intent items for the realtime volume, the ILOCKs taken during each step are for each rt metadata file, not the entire rt volume itself. Although scrub takes all rt metadata ILOCKs, this isn't sufficient to guard against scrub checking the rt volume while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding the realtime volume. When there's a collision, cross-referencing between data structures (e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the mount structure the same drain that we use to protect scrub against concurrent AG updates, but this time for the realtime volume. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-23xfs: support file data forks containing metadata btreesDarrick J. Wong
Create a new fork format type for metadata btrees. This fork type requires that the inode is in the metadata directory tree, and only applies to the data fork. The actual type of the metadata btree itself is determined by the di_metatype field. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: don't coalesce file mappings that cross rtgroup boundaries in scrubDarrick J. Wong
The bmbt scrubber will combine file mappings if they are mergeable to reduce the number of cross-referencing checks. However, we shouldn't combine mappings that cross rt group boundaries because that will cause verifiers to trip incorrectly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: move RT bitmap and summary information to the rtgroupChristoph Hellwig
Move the pointers to the RT bitmap and summary inodes as well as the summary cache to the rtgroups structure to prepare for having a separate bitmap and summary inodes for each rtgroup. Code using the inodes now needs to operate on a rtgroup. Where easily possible such code is converted to iterate over all rtgroups, else rtgroup 0 (the only one that can currently exist) is hardcoded. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: add a generic group pointer to the btree cursorChristoph Hellwig
Replace the pag pointers in the type specific union with a generic xfs_group pointer. This prepares for adding realtime group support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: switch perag iteration from the for_each macros to a while based iteratorChristoph Hellwig
The current for_each_perag* macros are a bit annoying in that they require the caller to both provide an object and an index iterator, and also somewhat obsfucate the underlying control flow mechanism. Switch to open coded while loops using new xfs_perag_next{,_from,_range} helpers that return the next pag structure to iterate on based on the previous one or NULL for the loop start. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpersChristoph Hellwig
Add helpers to convert an agbno to a daddr or fsbno based on a pag structure. This provides a simpler conversion and better type safety compared to the existing code that passes the mount structure and the agno separately. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-08-14xfs: attr forks require attr, not attr2Darrick J. Wong
It turns out that I misunderstood the difference between the attr and attr2 feature bits. "attr" means that at some point an attr fork was created somewhere in the filesystem. "attr2" means that inodes have variable-sized forks, but says nothing about whether or not there actually /are/ attr forks in the system. If we have an attr fork, we only need to check that attr is set. Fixes: 99d9d8d05da26 ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-22xfs: create a helper to decide if a file mapping targets the rt volumeDarrick J. Wong
Create a helper so that we can stop open-coding this decision everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: repair inode fork block mapping data structuresDarrick J. Wong
Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: skip the rmapbt search on an empty attr fork unless we know it was zappedDarrick J. Wong
The attribute fork scrubber can optionally scan the reverse mapping records of the filesystem to determine if the fork is missing mappings that it should have. However, this is a very expensive operation, so we only want to do this if we suspect that the fork is missing records. For attribute forks the criteria for suspicion is that the attr fork is in EXTENTS format and has zero extents. However, there are several ways that a file can end up in this state through regular filesystem usage. For example, an LSM can set a s_security hook but then decide not to set an ACL; or an attr set can create the attr fork but then the actual set operation fails with ENOSPC; or we can delete all the attrs on a file whose data fork is in btree format, in which case we do not delete the attr fork. We don't want to run the expensive check for any case that can be arrived at through regular operations. However. When online inode repair decides to zap an attribute fork, it cannot determine if it is zapping ACL information. As a precaution it removes all the discretionary access control permissions and sets the user and group ids to zero. Check these three additional conditions to decide if we want to scan the rmap records. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: set inode sick state flags when we zap either ondisk forkDarrick J. Wong
In a few patches, we'll add some online repair code that tries to massage the ondisk inode record just enough to get it to pass the inode verifiers so that we can continue with more file repairs. Part of that massaging can include zapping the ondisk forks to clear errors. After that point, the bmap fork repair functions will rebuild the zapped forks. Christoph asked for stronger protections against online repair zapping a fork to get the inode to load vs. other threads trying to access the partially repaired file. Do this by adding a special "[DA]FORK_ZAPPED" inode health flag whenever repair zaps a fork, and sprinkling checks for that flag into the various file operations for things that don't like handling an unexpected zero-extents fork. In practice xfs_scrub will scrub and fix the forks almost immediately after zapping them, so the window is very small. However, if a crash or unmount should occur, we can still detect these zapped inode forks by looking for a zero-extents fork when data was expected. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15xfs: try to attach dquots to files before repairing themDarrick J. Wong
Inode resource usage is tracked in the quota metadata. Repairing a file might change the resources used by that file, which means that we need to attach dquots to the file that we're examining before accessing anything in the file protected by the ILOCK. However, there's a twist: a dquot cache miss requires the dquot to be read in from the quota file, during which we drop the ILOCK on the file being examined. This means that we *must* try to attach the dquots before taking the ILOCK. Therefore, dquots must be attached to files in the scrub setup function. If doing so yields corruption errors (or unknown dquot errors), we instead clear the quotachecked status, which will cause a quotacheck on next mount. A future series will make this trigger live quotacheck. While we're here, change the xrep_ino_dqattach function to use the unlocked dqattach functions so that we avoid cycling the ILOCK if the inode already has dquots attached. This makes the naming and locking requirements consistent with the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-10-17xfs: rename xfs_verify_rtext to xfs_verify_rtbextDarrick J. Wong
This helper function validates that a range of *blocks* in the realtime section is completely contained within the realtime section. It does /not/ validate ranges of *rtextents*. Rename the function to avoid suggesting that it does, and change the type of the @len parameter since xfs_rtblock_t is a position unit, not a length unit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-08-10xfs: don't check reflink iflag state when checking cow forkDarrick J. Wong
Any inode on a reflink filesystem can have a cow fork, even if the inode does not have the reflink iflag set. This happens either because the inode once had the iflag set but does not now, because we don't free the incore cow fork until the icache deletes the inode; or because we're running in alwayscow mode. Either way, we can collapse both of the xfs_is_reflink_inode calls into one, and change it to xfs_has_reflink, now that the bmap checker will return ENOENT if there is no pointer to the incore fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: simplify returns in xchk_bmapDarrick J. Wong
Remove the pointless goto and return code in xchk_bmap, since it only serves to obscure what's going on in the function. Instead, return whichever error code is appropriate there. For nonexistent forks, this should have been ENOENT. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: wrap ilock/iunlock operations on sc->ipDarrick J. Wong
Scrub tracks the resources that it's holding onto in the xfs_scrub structure. This includes the inode being checked (if applicable) and the inode lock state of that inode. Replace the open-coded structure manipulation with a trivial helper to eliminate sources of error. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-06-05xfs: fix broken logic when detecting mergeable bmap recordsDarrick J. Wong
Commit 6bc6c99a944c was a well-intentioned effort to initiate consolidation of adjacent bmbt mapping records by setting the PREEN flag. Consolidation can only happen if the length of the combined record doesn't overflow the 21-bit blockcount field of the bmbt recordset. Unfortunately, the length test is inverted, leading to it triggering on data forks like these: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208 1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) 3862536 Note that record 0 has a length of 16777208 512b blocks. This corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two records cannot be merged. However, the logic is still wrong even if we change the in-loop comparison, because the scope of our examination isn't broad enough inside the loop to detect mappings like this: 0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10 1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208 2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) 3862536 These three records could be merged into two, but one cannot determine this purely from looking at records 0-1 or 1-2 in isolation. Hoist the mergability detection outside the loop, and base its decision making on whether or not a merged mapping could be expressed in fewer bmbt records. While we're at it, fix the incorrect return type of the iter function. Fixes: 336642f79283 ("xfs: alert the user about data/attr fork mappings that could be merged") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-05-02xfs: flush dirty data and drain directios before scrubbing cow forkDarrick J. Wong
When we're scrubbing the COW fork, we need to take MMAPLOCK_EXCL to prevent page_mkwrite from modifying any inode state. The ILOCK should suffice to avoid confusing online fsck, but let's take the same locks that we do everywhere else. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-11xfs: don't call xchk_bmap_check_rmaps for btree-format file forksDarrick J. Wong
The logic at the end of xchk_bmap_want_check_rmaps tries to detect a file fork that has been zapped by what will become the online inode repair code. Zapped forks are in FMT_EXTENTS with zero extents, and some sort of hint that there's supposed to be data somewhere in the filesystem. Unfortunately, the inverted logic here is confusing and has the effect that we always call xchk_bmap_check_rmaps for FMT_BTREE forks. This is horribly inefficient and unnecessary, so invert the logic to get rid of this performance problem. This has caused 8h delays in generic/333 and generic/334. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: split the xchk_bmap_check_rmaps into a predicateDarrick J. Wong
This function has two parts: the second part scans every reverse mapping record for this file fork to make sure that there's a corresponding mapping in the fork, and the first part decides if we even want to do that. Split the first part into a separate predicate so that we can make more changes to it in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: alert the user about data/attr fork mappings that could be mergedDarrick J. Wong
If the data or attr forks have mappings that could be merged, let the user know that the structure could be optimized. This isn't a filesystem corruption since the regular filesystem does not try to be smart about merging bmbt records. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: split xchk_bmap_xref_rmap into two functionsDarrick J. Wong
There's more special-cased functionality than not in this function. Split it into two so that each can be far more cohesive. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: accumulate iextent records when checking bmapDarrick J. Wong
Currently, the bmap scrubber checks file fork mappings individually. In the case that the file uses multiple mappings to a single contiguous piece of space, the scrubber repeatedly locks the AG to check the existence of a reverse mapping that overlaps this file mapping. If the reverse mapping starts before or ends after the mapping we're checking, it will also crawl around in the bmbt checking correspondence for adjacent extents. This is not very time efficient because it does the crawling while holding the AGF buffer, and checks the middle mappings multiple times. Instead, create a custom iextent record iterator function that combines multiple adjacent allocated mappings into one large incore bmbt record. This is feasible because the incore bmbt record length is 64-bits wide. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: change bmap scrubber to store the previous mappingDarrick J. Wong
Convert the inode data/attr/cow fork scrubber to remember the entire previous mapping, not just the next expected offset. No behavior changes here, but this will enable some better checking in subsequent patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: don't take the MMAPLOCK when scrubbing file metadataDarrick J. Wong
The MMAPLOCK stabilizes mappings in a file's pagecache. Therefore, we do not need it to check directories, symlinks, extended attributes, or file-based metadata. Reduce its usage to the one case that requires it, which is when we want to scrub the data fork of a regular file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: rename xchk_get_inode -> xchk_iget_for_scrubbingDarrick J. Wong
Dave Chinner suggested renaming this function to make more obvious what it does. The function returns an incore inode to callers that want to scrub a metadata structure that hangs off an inode. If the iget fails with EINVAL, it will single-step the loading process to distinguish between actually free inodes or impossible inumbers (ENOENT); discrepancies between the inobt freemask and the free status in the inode record (EFSCORRUPTED). Any other negative errno is returned unchanged. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: ensure that single-owner file blocks are not owned by othersDarrick J. Wong
For any file fork mapping that can only have a single owner, make sure that there are no other rmap owners for that mapping. This patch requires the more detailed checking provided by xfs_rmap_count_owners so that we can know how many rmap records for a given range of space had a matching owner, how many had a non-matching owner, and how many conflicted with the records that have a matching owner. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: ensure that all metadata and data blocks are not cow staging extentsDarrick J. Wong
Make sure that all filesystem metadata blocks and file data blocks are not also marked as CoW staging extents. The extra checking added here was inspired by an actual VM host filesystem corruption incident due to bugs in the CoW handling of 4.x kernels. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: standardize ondisk to incore conversion for bmap btreesDarrick J. Wong
Fix all xfs_bmbt_disk_get_all callsites to call xfs_bmap_validate_extent and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: minimize overhead of drain wakeups by using jump labelsDarrick J. Wong
To reduce the runtime overhead even further when online fsck isn't running, use a static branch key to decide if we call wake_up on the drain. For compilers that support jump labels, the call to wake_up is replaced by a nop sled when nobody is waiting for intents to drain. From my initial microbenchmarking, every transition of the static key between the on and off states takes about 22000ns to complete; this is paid entirely by the xfs_scrub process. When the static key is off (which it should be when fsck isn't running), the nop sled adds an overhead of approximately 0.36ns to runtime code. The post-atomic lockless waiter check adds about 0.03ns, which is basically free. For the few compilers that don't support jump labels, runtime code pays the cost of calling wake_up on an empty waitqueue, which was observed to be about 30ns. However, most architectures that have sufficient memory and CPU capacity to run XFS also support jump labels, so this is not much of a worry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: update copyright years for scrub/ filesDarrick J. Wong
Update the copyright years in the scrub/ source code files. This isn't required, but it's helpful to remind myself just how long it's taken to develop this feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: fix author and spdx headers on scrub/ filesDarrick J. Wong
Fix the spdx tags to match current practice, and update the author contact information. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-02-13xfs: active perag reference countingDave Chinner
We need to be able to dynamically remove instantiated AGs from memory safely, either for shrinking the filesystem or paging AG state in and out of memory (e.g. supporting millions of AGs). This means we need to be able to safely exclude operations from accessing perags while dynamic removal is in progress. To do this, introduce the concept of active and passive references. Active references are required for high level operations that make use of an AG for a given operation (e.g. allocation) and pin the perag in memory for the duration of the operation that is operating on the perag (e.g. transaction scope). This means we can fail to get an active reference to an AG, hence callers of the new active reference API must be able to handle lookup failure gracefully. Passive references are used in low level code, where we might need to access the perag structure for the purposes of completing high level operations. For example, buffers need to use passive references because: - we need to be able to do metadata IO during operations like grow and shrink transactions where high level active references to the AG have already been blocked - buffers need to pin the perag until they are reclaimed from memory, something that high level code has no direct control over. - unused cached buffers should not prevent a shrink from being started. Hence we have active references that will form exclusion barriers for operations to be performed on an AG, and passive references that will prevent reclaim of the perag until all objects with passive references have been reclaimed themselves. This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API for active AG reference functionality. We also need to convert the for_each_perag*() iterators to use active references, which will start the process of converting high level code over to using active references. Conversion of non-iterator based code to active references will be done in followup patches. Note that the implementation using reference counting is really just a development vehicle for the API to ensure we don't have any leaks in the callers. Once we need to remove perag structures from memory dyanmically, we will need a much more robust per-ag state transition mechanism for preventing new references from being taken while we wait for existing references to drain before removal from memory can occur.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-11-16xfs: teach scrub to flag non-extents format cow forksDarrick J. Wong
CoW forks only exist in memory, which means that they can only ever have an incore extent tree. Hence they must always be FMT_EXTENTS, so check this when we're scrubbing them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16xfs: check that CoW fork extents are not sharedDarrick J. Wong
Ensure that extents in an inode's CoW fork are not marked as shared in the refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16xfs: block map scrub should handle incore delalloc reservationsDarrick J. Wong
Enhance the block map scrubber to check delayed allocation reservations. Though there are no physical space allocations to check, we do need to make sure that the range of file offsets being mapped are correct, and to bump the lastoff cursor so that key order checking works correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16xfs: teach scrub to check for adjacent bmaps when rmap larger than bmapDarrick J. Wong
When scrub is checking file fork mappings against rmap records and the rmap record starts before or ends after the bmap record, check the adjacent bmap records to make sure that they're adjacent to the one we're checking. This helps us to detect cases where the rmaps cover territory that the bmaps do not. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16xfs: fix perag loop in xchk_bmap_check_rmapsDarrick J. Wong
sparse complains that we can return an uninitialized error from this function and that pag could be uninitialized. We know that there are no zero-AG filesystems and hence we had to call xchk_bmap_check_ag_rmaps at least once, so this is not actually possible, but I'm too worn out from automated complaints from unsophisticated AIs so let's just fix this and move on to more interesting problems, eh? Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-09xfs: convert XFS_IFORK_PTR to a static inline helperDarrick J. Wong
We're about to make this logic do a bit more, so convert the macro to a static inline function for better typechecking and fewer shouty macros. No functional changes here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-07xfs: pass perag to xfs_alloc_read_agf()Dave Chinner
xfs_alloc_read_agf() initialises the perag if it hasn't been done yet, so it makes sense to pass it the perag rather than pull a reference from the buffer. This allows callers to be per-ag centric rather than passing mount/agno pairs everywhere. Whilst modifying the xfs_reflink_find_shared() function definition, declare it static and remove the extern declaration as it is an internal function only these days. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-04-27xfs: simplify xfs_rmap_lookup_le call sitesDarrick J. Wong
Most callers of xfs_rmap_lookup_le will retrieve the btree record immediately if the lookup succeeds. The overlapped version of this function (xfs_rmap_lookup_le_range) will return the record if the lookup succeeds, so make the regular version do it too. Get rid of the useless len argument, since it's not part of the lookup key. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-04-11xfs: Define max extent length based on on-disk format definitionChandan Babu R
The maximum extent length depends on maximum block count that can be stored in a BMBT record. Hence this commit defines MAXEXTLEN based on BMBT_BLOCKCOUNT_BITLEN. While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN. Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
2021-10-19xfs: prepare xfs_btree_cur for dynamic cursor heightsDarrick J. Wong
Split out the btree level information into a separate struct and put it at the end of the cursor structure as a VLA. Files with huge data forks (and in the future, the realtime rmap btree) will require the ability to support many more levels than a per-AG btree cursor, which means that we're going to create per-btree type cursor caches to conserve memory for the more common case. Note that a subsequent patch actually introduces dynamic cursor heights. This one merely rearranges the structure to prepare for that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-08-20xfs: fix perag structure refcounting error when scrub failsDarrick J. Wong
The kernel test robot found the following bug when running xfs/355 to scrub a bmap btree: XFS: Assertion failed: !sa->pag, file: fs/xfs/scrub/common.c, line: 412 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:110! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 1415 Comm: xfs_scrub Not tainted 5.14.0-rc4-00021-g48c6615cc557 #1 Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 RIP: 0010:assfail+0x23/0x28 [xfs] RSP: 0018:ffffc9000aacb890 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffc9000aacbcc8 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffc09e7dcd RBP: ffffc9000aacbc80 R08: ffff8881fdf17d50 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000 R13: ffff88820c7ed000 R14: 0000000000000001 R15: ffffc9000aacb980 FS: 00007f185b955700(0000) GS:ffff8881fdf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7f6ef43000 CR3: 000000020de38002 CR4: 00000000001706e0 Call Trace: xchk_ag_read_headers+0xda/0x100 [xfs] xchk_ag_init+0x15/0x40 [xfs] xchk_btree_check_block_owner+0x76/0x180 [xfs] xchk_btree_get_block+0xd0/0x140 [xfs] xchk_btree+0x32e/0x440 [xfs] xchk_bmap_btree+0xd4/0x140 [xfs] xchk_bmap+0x1eb/0x3c0 [xfs] xfs_scrub_metadata+0x227/0x4c0 [xfs] xfs_ioc_scrub_metadata+0x50/0xc0 [xfs] xfs_file_ioctl+0x90c/0xc40 [xfs] __x64_sys_ioctl+0x83/0xc0 do_syscall_64+0x3b/0xc0 The unusual handling of errors while initializing struct xchk_ag is the root cause here. Since the beginning of xfs_scrub, the goal of xchk_ag_read_headers has been to read all three AG header buffers and attach them both to the xchk_ag structure and the scrub transaction. Corruption errors on any of the three headers doesn't necessarily trigger an immediate return to userspace, because xfs_scrub can also tell us to /fix/ the problem. In other words, it's possible for the xchk_ag init functions to return an error code and a partially filled out structure so that scrub can use however much information it managed to pull. Before 5.15, it was sufficient to cancel (or commit) the scrub transaction on the way out of the scrub code to release the buffers. Ccommit 48c6615cc557 added a reference to the perag structure to struct xchk_ag. Since perag structures are not attached to transactions like buffers are, this adds the requirement that the perag ref be released explicitly. The scrub teardown function xchk_teardown was amended to do this for the xchk_ag embedded in struct xfs_scrub. Unfortunately, I forgot that certain parts of the scrub code probe multiple AGs and therefore handle the initialization and cleanup on their own. Specifically, the bmbt scrubber will initialize it long enough to cross-reference AG metadata for btree blocks and for the extent mappings in the bmbt. If one of the AG headers is corrupt, the init function returns with a live perag structure reference and some of the AG header buffers. If an error occurs, the cross referencing will be noted as XCORRUPTion and skipped, but the main scrub process will move on to the next record. It is now necessary to release the perag reference before we try to analyze something from a different AG, or else we'll trip over the assertion noted above. Fixes: 48c6615cc557 ("xfs: grab active perag ref when reading AG headers") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-19xfs: convert xfs_sb_version_has checks to use mount featuresDave Chinner
This is a conversion of the remaining xfs_sb_version_has..(sbp) checks to use xfs_has_..(mp) feature checks. This was largely done with a vim replacement macro that did: :0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR> A couple of other variants were also used, and the rest touched up by hand. $ size -t fs/xfs/built-in.a text data bss dec hex filename before 1127533 311352 484 1439369 15f689 (TOTALS) after 1125360 311352 484 1437196 15ee0c (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>