summaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)Author
2018-06-01xfs: xfs_rtword_t should be unsigned, not signedDarrick J. Wong
xfs_rtword_t is used for bit manipulations in the realtime bitmap file. Since we're performing bit shifts with this type, we don't want sign extension and we don't want to be left shifting negative quantities because that's undefined behavior. This also shuts up these UBSAN warnings: UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_rtbitmap.c:833:48 signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-05-30xfs: repair superblocksDarrick J. Wong
If one of the backup superblocks is found to differ seriously from superblock 0, write out a fresh copy from the in-core sb. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-29xfs: fix inobt magic number checkDarrick J. Wong
In commit a6a781a58befcbd467c ("xfs: have buffer verifier functions report failing address") the bad magic number return was ported incorrectly. Fixes: a6a781a58befcbd467ce843af4eaca3906aa1f08 Reported-by: syzbot+08ab33be0178b76851c8@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-05-16xfs: implement online get/set fs labelEric Sandeen
The GET ioctl is trivial, just return the current label. The SET ioctl is more involved: It transactionally modifies the superblock to write a new filesystem label to the primary super. A new variant of xfs_sync_sb then writes the superblock buffer immediately to disk so that the change is visible from userspace. It then invalidates any page cache that userspace might have previously read on the block device so that i.e. blkid can see the change immediately, and updates all secondary superblocks as userspace relable does. Signed-off-by: Eric Sandeen <sandeen@redhat.com> [darrick: use dchinner's new xfs_update_secondary_sbs function] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: factor the ag length extension code into libxfsDave Chinner
Growfs currently manually codes the extension of the last AG in a filesytem during the growfs process. Factor that out of the growfs code and move it into libxfs along with teh rest of the AG header modification code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: move growfs core to libxfsDave Chinner
So it can be shared with userspace (e.g. mkfs) easily. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: implement the metadata repair ioctl flagDarrick J. Wong
Plumb in the pieces necessary to make the "scrub" subfunction of the scrub ioctl actually work. This means that we make the IFLAG_REPAIR flag to the scrub ioctl actually do something, and we add an errortag knob so that xfstests can force the kernel to rebuild a metadata structure even if there's nothing wrong with it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: teach xfs_bmapi_remap to accept some bmapi flagsDarrick J. Wong
Teach xfs_bmapi_remap how to map in unwritten extent and to skip rmap updates. This enables us to rebuild real and unwritten extents from the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: make xfs_bmapi_remapi work with attribute forksDarrick J. Wong
Add a new flags argument to xfs_bmapi_remapi so that we can pass BMAPI flags into the function. This enables us to pass in BMAPI_ATTRFORK so that we can remap things into the attribute fork. Eventually the online repair code will use this to rebuild attribute forks, so make it non-static. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: hoist xfs_scrub_agfl_walk to libxfs as xfs_agfl_walkDarrick J. Wong
This function is basically a generic AGFL block iterator, so promote it to libxfs ahead of online repair wanting to use it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: superblock scrub should use short-lived buffersDarrick J. Wong
Secondary superblocks are rarely used, so create a helper to read a given non-primary AG's superblock and ensure that it won't stick around hogging memory. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-15xfs: factor out nodiscard helpersBrian Foster
The changes to skip discards of speculative preallocation and unwritten extents introduced several new wrapper functions through the bunmapi -> extent free codepath to reduce churn in all of the associated callers. In several cases, these wrappers simply toggle a single flag to skip or not skip discards for the resulting blocks. The explicit _nodiscard() wrappers for such an isolated set of callers is a bit overkill. Kill off these wrappers and replace with the calls to the underlying functions in the contexts that need to control discard behavior. Retain the wrappers that preserve the original calling conventions to serve the original purpose of reducing code churn. This is a refactoring patch and does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-15xfs: add BMAPI_NORMAP flag to perform block remapping without updating rmapbtDarrick J. Wong
Add a new flag, XFS_BMAPI_NORMAP, which will perform file block remapping without updating the rmapbt. This will be used by the repair code to reconstruct bmbts from the rmapbt, in which case we don't want the rmapbt update. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: add repair helpers for the reference count btreeDarrick J. Wong
Add a couple of functions to the refcount btree and generic btree code that will be used to repair the refcountbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: add repair helpers for the reverse mapping btreeDarrick J. Wong
Add a couple of functions to the reverse mapping btree that will be used to repair the rmapbt. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: expose various functions to repair codeDarrick J. Wong
Expose various helpers that the repair code will want to use. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-15xfs: add helpers to calculate btree sizeDarrick J. Wong
Add a bunch of helper functions that calculate the sizes of various btrees. These will be used to repair btrees and btree headers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-05-10xfs: replace XFS_QMOPT_DQALLOC with a simple booleanDarrick J. Wong
DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the _dqget family of functions cares about is DQALLOC. Therefore, change it to a boolean 'can alloc?' flag for the dqget interfaces where that makes sense. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-10xfs: remove unnecessary xfs_qm_dqattach parameterDarrick J. Wong
The flags argument is always zero, get rid of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: refactor XFS_QMOPT_DQNEXT out of existenceDarrick J. Wong
There's only one caller of DQNEXT and its semantics can be moved into a separate function, so create the function and get rid of the flag. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-05-10xfs: don't discard on free of unwritten extentsBrian Foster
Unwritten extents by definition have not been written to until they are converted to normal written extents. If unwritten extents are freed from a file, it is therefore guaranteed that the blocks have not been written to since allocation (note that zero range punches and reallocates blocks). To cut down on online discards generated from workloads that make use of preallocation, skip discards of extents if they are in the unwritten state when the extent is freed. Note that this optimization does not apply to log recovery, during which all freed extents are discarded if online discard is enabled. Also note that it may be possible for a filesystem crash to occur after write completion of an unwritten extent but before unwritten conversion such that the extent remains unwritten after log recovery. Since this pseudo-inconsistency may already be possible after a crash (consider writing to recently allocated blocks where the allocation transaction is lost after a crash), this change shouldn't introduce any fundamental limitations that don't already exist. In short, on storage stacks where discards are important, it's good practice to run an occasional fstrim even with online discard enabled in the filesystem, particularly after a crash. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: add bmapi nodiscard flagBrian Foster
Freed extents are unconditionally discarded when online discard is enabled. Define XFS_BMAPI_NODISCARD to allow callers to bypass discards when unnecessary. For example, this will be useful for eofblocks trimming. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: get rid of the log item descriptorDave Chinner
It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: adder caller IP to xfs_defer* tracepointsDave Chinner
So it's clear in the trace where they are being called from. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-10xfs: add missing rmap error returnDarrick J. Wong
xfs_rmap_lookup_le_range can return errors, so we need to check for them and bail out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-09xfs: bmap debugging should never panic the systemDarrick J. Wong
Don't panic() the system if the bmap records are garbage, just call ASSERT which gives us the same backtrace but enables developers to control if the system goes down or not. This makes debugging with generic/388 much easier because it won't reboot the machine midway through a run just because btree_read_bufl returns EIO when the fs has already shut down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-05-09xfs: defer agfl block frees from deferred ops processing contextBrian Foster
Now that AGFL block frees are deferred when dfops is set in the transaction, start deferring AGFL block frees from contexts that are known to push the limits of existing log reservations. The first such context is deferred operation processing itself. This primarily targets deferred extent frees (such as file extents and inode chunks), but in doing so covers all allocation operations that occur in deferred operation processing context. Update xfs_defer_finish() to set and reset ->t_agfl_dfops across the processing sequence. This means that any AGFL block frees due to allocation events result in the addition of new EFIs to the dfops rather than being processed immediately. xfs_defer_finish() rolls the transaction at least once more to process the frees of the AGFL blocks back to the allocation btrees and returns once the AGFL is rectified. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: defer agfl block frees when dfops is availableBrian Foster
The AGFL fixup code executes before every block allocation/free and rectifies the AGFL based on the current, dynamic allocation requirements of the fs. The AGFL must hold a minimum number of blocks to satisfy a worst case split of the free space btrees caused by the impending allocation operation. The AGFL is also updated to maintain the implicit requirement for a minimum number of free slots to satisfy a worst case join of the free space btrees. Since the AGFL caches individual blocks, AGFL reduction typically involves multiple, single block frees. We've had reports of transaction overrun problems during certain workloads that boil down to AGFL reduction freeing multiple blocks and consuming more space in the log than was reserved for the transaction. Since the objective of freeing AGFL blocks is to ensure free AGFL free slots are available for the upcoming allocation, one way to address this problem is to release surplus blocks from the AGFL immediately but defer the free of those blocks (similar to how file-mapped blocks are unmapped from the file in one transaction and freed via a deferred operation) until the transaction is rolled. This turns AGFL reduction into an operation with predictable log reservation consumption. Add the capability to defer AGFL block frees when a deferred ops list is available to the AGFL fixup code. Add a dfops pointer to the transaction to carry dfops through various contexts to the allocator context. Deferring AGFL frees is conditional behavior based on whether the transaction pointer is populated. The long term objective is to reuse the transaction pointer to clean up all unrelated callchains that pass dfops on the stack along with a transaction and in doing so, consistently defer AGFL blocks from the allocator. A bit of customization is required to handle deferred completion processing because AGFL blocks are accounted against a per-ag reservation pool and AGFL blocks are not inserted into the extent busy list when freed (they are inserted when used and released back to the AGFL). Reuse the majority of the existing deferred extent free infrastructure and customize it appropriately to handle AGFL blocks. Note that this patch only adds infrastructure. It does not change behavior because no callers have been updated to pass ->t_agfl_dfops into the allocation code. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: create agfl block free helper functionBrian Foster
Refactor the AGFL block free code into a new helper such that it can be invoked from deferred context. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: print specific dqblk that failed verifiersEric Sandeen
Rather than printing the top of the buffer that held a corrupted dqblk, restructure things to print out the specific one that failed by pushing the calls to the verifier_error function down into the verifier which iterates over the buffer and detects the error. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: add full xfs_dqblk verifierEric Sandeen
Add an xfs_dqblk verifier so that it can check the uuid on V5 filesystems; it calls the existing xfs_dquot_verify verifier to validate the xfs_disk_dquot_t contained inside it. This lets us move the uuid verification out of the crc verifier, which makes little sense. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: pass full xfs_dqblk to repair during quotacheckEric Sandeen
It's a bit dicey to pass in the smaller xfs_disk_dquot and then cast it to something larger; pass in the full xfs_dqblk so we know the caller has sent us the right thing. Rename the function to xfs_dqblk_repair for clarity. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: check type in quota verifier during quotacheckEric Sandeen
During quotacheck we send in the quota type, so verify that as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: remove unused flags arg from xfs_dquot_verifyEric Sandeen
Long ago the flags argument was used to determine whether to issue warnings about corruptions, but that's done elsewhere now and the flag is unused here, so remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-09xfs: make xfs_buf_incore out of lineDave Chinner
Move xfs_buf_incore out of line and make it the only way to look up a buffer in the buffer cache from outside the buffer cache. Convert the external users of _xfs_buf_find() to xfs_buf_incore() and make _xfs_buf_find() static. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: actually rename xfs_incore -> xfs_buf_incore] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-17xfs: don't fail when converting shortform attr to long form during ATTR_REPLACEDarrick J. Wong
Kanda Motohiro reported that expanding a tiny xattr into a large xattr fails on XFS because we remove the tiny xattr from a shortform fork and then try to re-add it after converting the fork to extents format having not removed the ATTR_REPLACE flag. This fails because the attr is no longer present, causing a fs shutdown. This is derived from the patch in his bug report, but we really shouldn't ignore a nonzero retval from the remove call. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119 Reported-by: kanda.motohiro@gmail.com Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-17xfs: set format back to extents if xfs_bmap_extents_to_btreeEric Sandeen
If xfs_bmap_extents_to_btree fails in a mode where we call xfs_iroot_realloc(-1) to de-allocate the root, set the format back to extents. Otherwise we can assume we can dereference ifp->if_broot based on the XFS_DINODE_FMT_BTREE format, and crash. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199423 Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-17xfs: enhance dinode verifierEric Sandeen
Add several more validations to xfs_dinode_verify: - For LOCAL data fork formats, di_nextents must be 0. - For LOCAL attr fork formats, di_anextents must be 0. - For inodes with no attr fork offset, - format must be XFS_DINODE_FMT_EXTENTS if set at all - di_anextents must be 0. Thanks to dchinner for pointing out a couple related checks I had forgotten to add. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199377 Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-09xfs: non-scrub - remove unused function parametersEric Sandeen
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-26xfs: clean up xfs_mount allocation and dynamic initializersBrian Foster
Most of the generic data structures embedded in xfs_mount are dynamically initialized immediately after mp is allocated. A few fields are left out and initialized during the xfs_mountfs() sequence, after mp has been attached to the superblock. To clean this up and help prevent premature access of associated fields, refactor xfs_mount allocation and all dependent init calls into a new helper. This self-documents that all low level data structures (i.e., locks, trees, etc.) should be initialized before xfs_mount is attached to the superblock. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-23xfs: remove dead inode version setting codeDave Chinner
We can only get into the branch if CRCs are enabled, so there's no need to check inside the branch for CRCs being enabled.... Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-23xfs: don't accept inode buffers with suspicious unlinked chainsDarrick J. Wong
When we're verifying inode buffers, sanity-check the unlinked pointer. We don't want to run the risk of trying to purge something that's obviously broken. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-23xfs: move inode extent size hint validation to libxfsDarrick J. Wong
Extent size hint validation is used by scrub to decide if there's an error, and it will be used by repair to decide to remove the hint. Since these use the same validation functions, move them to libxfs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-23xfs: refactor inode buffer verifier error loggingDarrick J. Wong
When the inode buffer verifier encounters an error, it's much more helpful to print a buffer from the offending inode instead of just the start of the inode chunk buffer. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-23xfs: refactor inode verifier error loggingDarrick J. Wong
Refactor some of the inode verifier failure logging call sites to use the new xfs_inode_verifier_error method which dumps the offending buffer as well as the code location of the failed check. This trims the output, makes it clearer to the admin that repair must be run, and gives the developers more details to work from. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-23xfs: refactor bmap record validationDarrick J. Wong
Refactor the bmap validator into a more complete helper that looks for extents that run off the end of the device, overflow into the next AG, or have invalid flag states. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-23xfs: sanity-check the unused space before trying to use itDarrick J. Wong
In xfs_dir2_data_use_free, we examine on-disk metadata and ASSERT if it doesn't make sense. Since a carefully crafted fuzzed image can cause the kernel to crash after blowing a bunch of assertions, let's move those checks into a validator function and rig everything up to return EFSCORRUPTED to userspace. Found by lastbit fuzzing ltail.bestcount via xfs/391. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-03-23xfs: detect agfl count corruption and reset agflBrian Foster
The struct xfs_agfl v5 header was originally introduced with unexpected padding that caused the AGFL to operate with one less slot than intended. The header has since been packed, but the fix left an incompatibility for users who upgrade from an old kernel with the unpacked header to a newer kernel with the packed header while the AGFL happens to wrap around the end. The newer kernel recognizes one extra slot at the physical end of the AGFL that the previous kernel did not. The new kernel will eventually attempt to allocate a block from that slot, which contains invalid data, and cause a crash. This condition can be detected by comparing the active range of the AGFL to the count. While this detects a padding mismatch, it can also trigger false positives for unrelated flcount corruption. Since we cannot distinguish a size mismatch due to padding from unrelated corruption, we can't trust the AGFL enough to simply repopulate the empty slot. Instead, avoid unnecessarily complex detection logic and and use a solution that can handle any form of flcount corruption that slips through read verifiers: distrust the entire AGFL and reset it to an empty state. Any valid blocks within the AGFL are intentionally leaked. This requires xfs_repair to rectify (which was already necessary based on the state the AGFL was found in). The reset mitigates the side effect of the padding mismatch problem from a filesystem crash to a free space accounting inconsistency. The generic approach also means that this patch can be safely backported to kernels with or without a packed struct xfs_agfl. Check the AGF for an invalid freelist count on initial read from disk. If detected, set a flag on the xfs_perag to indicate that a reset is required before the AGFL can be used. In the first transaction that attempts to use a flagged AGFL, reset it to empty, warn the user about the inconsistency and allow the freelist fixup code to repopulate the AGFL with new blocks. The xfs_perag flag is cleared to eliminate the need for repeated checks on each block allocation operation. This allows kernels that include the packing fix commit 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct") to handle older unpacked AGFL formats without a filesystem crash. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by Dave Chiluk <chiluk+linuxxfs@indeed.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11xfs: account only rmapbt-used blocks against rmapbt perag resBrian Foster
The rmapbt perag metadata reservation reserves blocks for the reverse mapping btree (rmapbt). Since the rmapbt uses blocks from the agfl and perag accounting is updated as blocks are allocated from the allocation btrees, the reservation actually accounts blocks as they are allocated to (or freed from) the agfl rather than the rmapbt itself. While this works for blocks that are eventually used for the rmapbt, not all agfl blocks are destined for the rmapbt. Blocks that are allocated to the agfl (and thus "reserved" for the rmapbt) but then used by another structure leads to a growing inconsistency over time between the runtime tracking of rmapbt usage vs. actual rmapbt usage. Since the runtime tracking thinks all agfl blocks are rmapbt blocks, it essentially believes that less future reservation is required to satisfy the rmapbt than what is actually necessary. The inconsistency is rectified across mount cycles because the perag reservation is initialized based on the actual rmapbt usage at mount time. The problem, however, is that the excessive drain of the reservation at runtime opens a window to allocate blocks for other purposes that might be required for the rmapbt on a subsequent mount. This problem can be demonstrated by a simple test that runs an allocation workload to consume agfl blocks over time and then observe the difference in the agfl reservation requirement across an unmount/mount cycle: mount ...: xfs_ag_resv_init: ... resv 3193 ask 3194 len 3194 ... ... : xfs_ag_resv_alloc_extent: ... resv 2957 ask 3194 len 1 umount...: xfs_ag_resv_free: ... resv 2956 ask 3194 len 0 mount ...: xfs_ag_resv_init: ... resv 3052 ask 3194 len 3194 As the above tracepoints show, the reservation requirement reduces from 3194 blocks to 2956 blocks as the workload runs. Without any other changes in the filesystem, the same reservation requirement jumps from 2956 to 3052 blocks over a umount/mount cycle. To address this divergence, update the RMAPBT reservation to account blocks used for the rmapbt only rather than all blocks filled into the agfl. This patch makes several high-level changes toward that end: 1.) Reintroduce an AGFL reservation type to serve as an accounting no-op for blocks allocated to (or freed from) the AGFL. 2.) Invoke RMAPBT usage accounting from the actual rmapbt block allocation path rather than the AGFL allocation path. The first change is required because agfl blocks are considered free blocks throughout their lifetime. The perag reservation subsystem is invoked unconditionally by the allocation subsystem, so we need a way to tell the perag subsystem (via the allocation subsystem) to not make any accounting changes for blocks filled into the AGFL. The second change causes the in-core RMAPBT reservation usage accounting to remain consistent with the on-disk state at all times and eliminates the risk of leaving the rmapbt reservation underfilled. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-03-11xfs: rename agfl perag res type to rmapbtBrian Foster
The AGFL perag reservation type accounts all allocations that feed into (or are released from) the allocation group free list (agfl). The purpose of the reservation is to support worst case conditions for the reverse mapping btree (rmapbt). As such, the agfl reservation usage accounting only considers rmapbt usage when the in-core counters are initialized at mount time. This implementation inconsistency leads to divergence of the in-core and on-disk usage accounting over time. In preparation to resolve this inconsistency and adjust the AGFL reservation into an rmapbt specific reservation, rename the AGFL reservation type and associated accounting fields to something more rmapbt-specific. Also fix up a couple tracepoints that incorrectly use the AGFL reservation type to pass the agfl state of the associated extent where the raw reservation type is expected. Note that this patch does not change perag reservation behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>