summaryrefslogtreecommitdiff
path: root/fs/xfs/scrub/common.c
AgeCommit message (Collapse)Author
2023-05-02xfs: disable reaping in fscounters scrubDarrick J. Wong
The fscounters scrub code doesn't work properly because it cannot quiesce updates to the percpu counters in the filesystem, hence it returns false corruption reports. This has been fixed properly in one of the online repair patchsets that are under review by replacing the xchk_disable_reaping calls with an exclusive filesystem freeze. Disabling background gc isn't sufficient to fix the problem. In other words, scrub doesn't need to call xfs_inodegc_stop, which is just as well since it wasn't correct to allow scrub to call xfs_inodegc_start when something else could be calling xfs_inodegc_stop (e.g. trying to freeze the filesystem). Neuter the scrubber for now, and remove the xchk_*_reaping functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2023-04-11xfs: don't take the MMAPLOCK when scrubbing file metadataDarrick J. Wong
The MMAPLOCK stabilizes mappings in a file's pagecache. Therefore, we do not need it to check directories, symlinks, extended attributes, or file-based metadata. Reduce its usage to the one case that requires it, which is when we want to scrub the data fork of a regular file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: retain the AGI when we can't iget an inode to scrub the coreDarrick J. Wong
xchk_get_inode is not quite the right function to be calling from the inode scrubber setup function. The common get_inode function either gets an inode and installs it in the scrub context, or it returns an error code explaining what happened. This is acceptable for most file scrubbers because it is not in their scope to fix corruptions in the inode core and fork areas that cause iget to fail. Dealing with these problems is within the scope of the inode scrubber, however. If iget fails with EFSCORRUPTED, we need to xchk_inode to flag that as corruption. Since we can't get our hands on an incore inode, we need to hold the AGI to prevent inode allocation activity so that nothing changes in the inode metadata. Looking ahead to the inode core repair patches, we will also need to hold the AGI buffer into xrep_inode so that we can make modifications to the xfs_dinode structure without any other thread swooping in to allocate or free the inode. Adapt the xchk_get_inode into xchk_setup_inode since this is a one-off use case where the error codes we check for are a little different, and the return state is much different from the common function. xchk_setup_inode prepares to check or repair an inode record, so it must continue the scrub operation even if the inode/inobt verifiers cause xfs_iget to return EFSCORRUPTED. This is done by attaching the locked AGI buffer to the scrub transaction and returning 0 to move on to the actual scrub. (Later, the online inode repair code will also want the xfs_imap structure so that it can reset the ondisk xfs_dinode structure.) xchk_get_inode retrieves an inode on behalf of a scrubber that operates on an incore inode -- data/attr/cow forks, directories, xattrs, symlinks, parent pointers, etc. If the inode/inobt verifiers fail and xfs_iget returns EFSCORRUPTED, we want to exit to userspace (because the caller should be fix the inode first) and drop everything we acquired along the way. A behavior common to both functions is that it's possible that xfs_scrub asked for a scrub-by-handle concurrent with the inode being freed or the passed-in inumber is invalid. In this case, we call xfs_imap to see if the inobt index thinks the inode is allocated, and return ENOENT ("nothing to check here") to userspace if this is not the case. The imap lookup is why both functions call xchk_iget_agi. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: rename xchk_get_inode -> xchk_iget_for_scrubbingDarrick J. Wong
Dave Chinner suggested renaming this function to make more obvious what it does. The function returns an incore inode to callers that want to scrub a metadata structure that hangs off an inode. If the iget fails with EINVAL, it will single-step the loading process to distinguish between actually free inodes or impossible inumbers (ENOENT); discrepancies between the inobt freemask and the free status in the inode record (EFSCORRUPTED). Any other negative errno is returned unchanged. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: fix an inode lookup race in xchk_get_inodeDarrick J. Wong
In commit d658e, we tried to improve the robustnes of xchk_get_inode in the face of EINVAL returns from iget by calling xfs_imap to see if the inobt itself thinks that the inode is allocated. Unfortunately, that commit didn't consider the possibility that the inode gets allocated after iget but before imap. In this case, the imap call will succeed, but we turn that into a corruption error and tell userspace the inode is corrupt. Avoid this false corruption report by grabbing the AGI header and retrying the iget before calling imap. If the iget succeeds, we can proceed with the usual scrub-by-handle code. Fix all the incorrect comments too, since unreadable/corrupt inodes no longer result in EINVAL returns. Fixes: d658e72b4a09 ("xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: manage inode DONTCACHE status at irele timeDarrick J. Wong
Right now, there are statements scattered all over the online fsck codebase about how we can't use XFS_IGET_DONTCACHE because of concerns about scrub's unusual practice of releasing inodes with transactions held. However, iget is the wrong place to handle this -- the DONTCACHE state doesn't matter at all until we try to *release* the inode, and here we get things wrong in multiple ways: First, if we /do/ have a transaction, we must NOT drop the inode, because the inode could have dirty pages, dropping the inode will trigger writeback, and writeback can trigger a nested transaction. Second, if the inode already had an active reference and the DONTCACHE flag set, the icache hit when scrub grabs another ref will not clear DONTCACHE. This is sort of by design, since DONTCACHE is now used to initiate cache drops so that sysadmins can change a file's access mode between pagecache and DAX. Third, if we do actually have the last active reference to the inode, we can set DONTCACHE to avoid polluting the cache. This is the /one/ case where we actually want that flag. Create an xchk_irele helper to encode all that logic and switch the online fsck code to use it. Since this now means that nearly all scrubbers use the same xfs_iget flags, we can wrap them too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: fix parent pointer scrub racing with subdirectory reparentingDarrick J. Wong
Jan Kara pointed out that rename() doesn't lock a subdirectory that is being moved from one parent to another, even though the move requires an update to the subdirectory's dotdot entry. This means that it's *not* sufficient to hold a directory's IOLOCK to stabilize the dotdot entry. We must hold the ILOCK of both the child and the alleged parent, and there's no use in holding the parent's IOLOCK. With that in mind, we can get rid of all the messy code that tries to grab the parent's IOLOCK, which means we don't need to let go of the ILOCK of the directory whose parent we are checking. We still have to use nonblocking mode to take the ILOCK of the alleged parent, so the revalidation loop has to stay. However, we can remove the retry counter, since threads aren't supposed to hold the ILOCK for long periods of time. Remove the inverted ilock helper from the common code since nobody uses it. Remove the entire source of -EDEADLOCK-based "retry harder" scrub executions. Link: https://lore.kernel.org/linux-xfs/20230117123735.un7wbamlbdihninm@quack3/ Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: scrub should use ECHRNG to signal that the drain is neededDarrick J. Wong
In the previous patch, we added jump labels to the intent drain code so that regular filesystem operations need not pay the price of checking for someone (scrub) waiting on intents to drain from some part of the filesystem when that someone isn't running. However, I observed that xfs/285 now spends a lot more time pushing the AIL from the inode btree scrubber than it used to. This is because the inobt scrubber will try push the AIL to try to get logged inode cores written to the filesystem when it sees a weird discrepancy between the ondisk inode and the inobt records. This AIL push is triggered when the setup function sees TRY_HARDER is set; and the requisite EDEADLOCK return is initiated when the discrepancy is seen. The solution to this performance slow down is to use a different result code (ECHRNG) for scrub code to signal that it needs to wait for deferred intent work items to drain out of some part of the filesystem. When this happens, set a new scrub state flag (XCHK_NEED_DRAIN) so that setup functions will activate the jump label. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: minimize overhead of drain wakeups by using jump labelsDarrick J. Wong
To reduce the runtime overhead even further when online fsck isn't running, use a static branch key to decide if we call wake_up on the drain. For compilers that support jump labels, the call to wake_up is replaced by a nop sled when nobody is waiting for intents to drain. From my initial microbenchmarking, every transition of the static key between the on and off states takes about 22000ns to complete; this is paid entirely by the xfs_scrub process. When the static key is off (which it should be when fsck isn't running), the nop sled adds an overhead of approximately 0.36ns to runtime code. The post-atomic lockless waiter check adds about 0.03ns, which is basically free. For the few compilers that don't support jump labels, runtime code pays the cost of calling wake_up on an empty waitqueue, which was observed to be about 30ns. However, most architectures that have sufficient memory and CPU capacity to run XFS also support jump labels, so this is not much of a worry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: allow queued AG intents to drain before scrubbingDarrick J. Wong
When a writer thread executes a chain of log intent items, the AG header buffer locks will cycle during a transaction roll to get from one intent item to the next in a chain. Although scrub takes all AG header buffer locks, this isn't sufficient to guard against scrub checking an AG while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding allocation groups. When there's a collision, cross-referencing between data structures (e.g. rmapbt and refcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the perag structure the count of active intents and make scrub wait until it has both AG header buffer locks and the intent counter reaches zero. One quirk of the drain code is that deferred bmap updates also bump and drop the intent counter. A fundamental decision made during the design phase of the reverse mapping feature is that updates to the rmapbt records are always made by the same code that updates the primary metadata. In other words, callers of bmapi functions expect that the bmapi functions will queue deferred rmap updates. Some parts of the reflink code queue deferred refcount (CUI) and bmap (BUI) updates in the same head transaction, but the deferred work manager completely finishes the CUI before the BUI work is started. As a result, the CUI drops the intent count long before the deferred rmap (RUI) update even has a chance to bump the intent count. The only way to keep the intent count elevated between the CUI and RUI is for the BUI to bump the counter until the RUI has been created. A second quirk of the intent drain code is that deferred work items must increment the intent counter as soon as the work item is added to the transaction. When a BUI completes and queues an RUI, the RUI must increment the counter before the BUI decrements it. The only way to accomplish this is to require that the counter be bumped as soon as the deferred work item is created in memory. In the next patches we'll improve on this facility, but this patch provides the basic functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: update copyright years for scrub/ filesDarrick J. Wong
Update the copyright years in the scrub/ source code files. This isn't required, but it's helpful to remind myself just how long it's taken to develop this feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-04-11xfs: fix author and spdx headers on scrub/ filesDarrick J. Wong
Fix the spdx tags to match current practice, and update the author contact information. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-02-13xfs: inobt can use perags in many more places than it doesDave Chinner
Lots of code in the inobt infrastructure is passed both xfs_mount and perags. We only need perags for the per-ag inode allocation code, so reduce the duplication by passing only the perags as the primary object. This ends up reducing the code size by a bit: text data bss dec hex filename orig 1138878 323979 548 1463405 16546d (TOTALS) patched 1138709 323979 548 1463236 1653c4 (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2023-02-13xfs: convert xfs_imap() to take a peragDave Chinner
Callers have referenced perags but they don't pass it into xfs_imap() so it takes it's own reference. Fix that so we can change inode allocation over to using active references. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-11-16xfs: check inode core when scrubbing metadata filesDarrick J. Wong
Metadata files (e.g. realtime bitmaps and quota files) do not show up in the bulkstat output, which means that scrub-by-handle does not work; they can only be checked through a specific scrub type. Therefore, each scrub type calls xchk_metadata_inode_forks to check the metadata for whatever's in the file. Unfortunately, that function doesn't actually check the inode record itself. Refactor the function a bit to make that happen. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-11-16xfs: don't track the AGFL buffer in the scrub AG contextDarrick J. Wong
While scrubbing an allocation group, we don't need to hold the AGFL buffer as part of the scrub context. All that is necessary to lock an AG is to hold the AGI and AGF buffers, so fix all the existing users of the AGFL buffer to grab them only when necessary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2022-07-07xfs: pass perag to xfs_alloc_read_agflDave Chinner
We have the perag in most places we call xfs_alloc_read_agfl, so pass the perag instead of a mount/agno pair. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-07-07xfs: pass perag to xfs_alloc_read_agf()Dave Chinner
xfs_alloc_read_agf() initialises the perag if it hasn't been done yet, so it makes sense to pass it the perag rather than pull a reference from the buffer. This allows callers to be per-ag centric rather than passing mount/agno pairs everywhere. Whilst modifying the xfs_reflink_find_shared() function definition, declare it static and remove the extern declaration as it is an internal function only these days. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-07-07xfs: pass perag to xfs_ialloc_read_agi()Dave Chinner
xfs_ialloc_read_agi() initialises the perag if it hasn't been done yet, so it makes sense to pass it the perag rather than pull a reference from the buffer. This allows callers to be per-ag centric rather than passing mount/agno pairs everywhere. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-05-04xfs: Set up infrastructure for log attribute replayAllison Henderson
Currently attributes are modified directly across one or more transactions. But they are not logged or replayed in the event of an error. The goal of log attr replay is to enable logging and replaying of attribute operations using the existing delayed operations infrastructure. This will later enable the attributes to become part of larger multi part operations that also must first be recorded to the log. This is mostly of interest in the scheme of parent pointers which would need to maintain an attribute containing parent inode information any time an inode is moved, created, or removed. Parent pointers would then be of interest to any feature that would need to quickly derive an inode path from the mount point. Online scrub, nfs lookups and fs grow or shrink operations are all features that could take advantage of this. This patch adds two new log item types for setting or removing attributes as deferred operations. The xfs_attri_log_item will log an intent to set or remove an attribute. The corresponding xfs_attrd_log_item holds a reference to the xfs_attri_log_item and is freed once the transaction is done. Both log items use a generic xfs_attr_log_format structure that contains the attribute name, value, flags, inode, and an op_flag that indicates if the operations is a set or remove. [dchinner: added extra little bits needed for intent whiteouts] Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2021-08-19xfs: convert bp->b_bn references to xfs_buf_daddr()Dave Chinner
Stop directly referencing b_bn in code outside the buffer cache, as b_bn is supposed to be used only as an internal cache index. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: convert xfs_sb_version_has checks to use mount featuresDave Chinner
This is a conversion of the remaining xfs_sb_version_has..(sbp) checks to use xfs_has_..(mp) feature checks. This was largely done with a vim replacement macro that did: :0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR> A couple of other variants were also used, and the rest touched up by hand. $ size -t fs/xfs/built-in.a text data bss dec hex filename before 1127533 311352 484 1439369 15f689 (TOTALS) after 1125360 311352 484 1437196 15ee0c (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: convert remaining mount flags to state flagsDave Chinner
The remaining mount flags kept in m_flags are actually runtime state flags. These change dynamically, so they really should be updated atomically so we don't potentially lose an update due to racing modifications. Convert these remaining flags to be stored in m_opstate and use atomic bitops to set and clear the flags. This also adds a couple of simple wrappers for common state checks - read only and shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-19xfs: replace xfs_sb_version checks with feature flag checksDave Chinner
Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-08-18xfs: make the record pointer passed to query_range functions constDarrick J. Wong
The query_range functions are supposed to call a caller-supplied function on each record found in the dataset. These functions don't own the memory storing the record, so don't let them change the record. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-18xfs: remove unnecessary agno variable from struct xchk_agDarrick J. Wong
Now that we always grab an active reference to a perag structure when dealing with perag metadata, we can remove this unnecessary variable. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-08-09xfs: grab active perag ref when reading AG headersDarrick J. Wong
This patch prepares scrub to deal with the possibility of tearing down entire AGs by changing the order of resource acquisition to match the rest of the XFS codebase. In other words, scrub now grabs AG resources in order of: perag structure, then AGI/AGF/AGFL buffers, then btree cursors; and releases them in reverse order. This requires us to distinguish xchk_ag_init callers -- some are responding to a user request to check AG metadata, in which case we can return ENOENT to userspace; but other callers have an ondisk reference to an AG that they're trying to cross-reference. In this second case, the lack of an AG means there's ondisk corruption, since ondisk metadata cannot point into nonexistent space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
2021-08-09xfs: don't run speculative preallocation gc when fs is frozenDarrick J. Wong
Now that we have the infrastructure to switch background workers on and off at will, fix the block gc worker code so that we don't actually run the worker when the filesystem is frozen, same as we do for deferred inactivation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2021-08-06xfs: per-cpu deferred inode inactivation queuesDave Chinner
Move inode inactivation to background work contexts so that it no longer runs in the context that releases the final reference to an inode. This will allow process work that ends up blocking on inactivation to continue doing work while the filesytem processes the inactivation in the background. A typical demonstration of this is unlinking an inode with lots of extents. The extents are removed during inactivation, so this blocks the process that unlinked the inode from the directory structure. By moving the inactivation to the background process, the userspace applicaiton can keep working (e.g. unlinking the next inode in the directory) while the inactivation work on the previous inode is done by a different CPU. The implementation of the queue is relatively simple. We use a per-cpu lockless linked list (llist) to queue inodes for inactivation without requiring serialisation mechanisms, and a work item to allow the queue to be processed by a CPU bound worker thread. We also keep a count of the queue depth so that we can trigger work after a number of deferred inactivations have been queued. The use of a bound workqueue with a single work depth allows the workqueue to run one work item per CPU. We queue the work item on the CPU we are currently running on, and so this essentially gives us affine per-cpu worker threads for the per-cpu queues. THis maintains the effective CPU affinity that occurs within XFS at the AG level due to all objects in a directory being local to an AG. Hence inactivation work tends to run on the same CPU that last accessed all the objects that inactivation accesses and this maintains hot CPU caches for unlink workloads. A depth of 32 inodes was chosen to match the number of inodes in an inode cluster buffer. This hopefully allows sequential allocation/unlink behaviours to defering inactivation of all the inodes in a single cluster buffer at a time, further helping maintain hot CPU and buffer cache accesses while running inactivations. A hard per-cpu queue throttle of 256 inode has been set to avoid runaway queuing when inodes that take a long to time inactivate are being processed. For example, when unlinking inodes with large numbers of extents that can take a lot of processing to free. Signed-off-by: Dave Chinner <dchinner@redhat.com> [djwong: tweak comments and tracepoints, convert opflags to state bits] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-07-02Merge tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "Most of the work this cycle has been on refactoring various parts of the codebase. The biggest non-cleanup changes are (1) reducing the number of cache flushes sent when writing the log; (2) a substantial number of log recovery fixes; and (3) I started accepting pull requests from contributors if the commits in their branches match what's been sent to the list. For a week or so I /had/ staged a major cleanup of the logging code from Dave Chinner, but it exposed so many lurking bugs in other parts of the logging and log recovery code that I decided to defer that patchset until we can address those latent bugs. Larger cleanups this time include walking the incore inode cache (me) and rework of the extended attribute code (Allison) to prepare it for adding logged xattr updates (and directory tree parent pointers) in future releases. Summary: - Refactor the buffer cache to use bulk page allocation - Convert agnumber-based AG iteration to walk per-AG structures - Clean up some unit conversions and other code warts - Reduce spinlock contention in the directio fastpath - Collapse all the inode cache walks into a single function - Remove indirect function calls from the inode cache walk code - Dramatically reduce the number of cache flushes sent when writing log buffers - Preserve inode sickness reports for longer - Rename xfs_eofblocks since it controls inode cache walks - Refactor the extended attribute code to prepare it for the addition of log intent items to make xattrs fully transactional - A few fixes to earlier large patchsets - Log recovery fixes so that we don't accidentally mark the log clean when log intent recovery fails - Fix some latent SOB errors - Clean up shutdown messages that get logged to dmesg - Fix a regression in the online shrink code - Fix a UAF in the buffer logging code if the fs goes offline - Fix uninitialized error variables - Fix a UAF in the CIL when commited log item callbacks race with a shutdown - Fix a bug where the CIL could hang trying to push part of the log ring buffer that hasn't been filled yet" * tag 'xfs-5.14-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (102 commits) xfs: don't wait on future iclogs when pushing the CIL xfs: Fix a CIL UAF by getting get rid of the iclog callback lock xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacks xfs: don't nest icloglock inside ic_callback_lock xfs: Initialize error in xfs_attr_remove_iter xfs: fix endianness issue in xfs_ag_shrink_space xfs: remove dead stale buf unpin handling code xfs: hold buffer across unpin and potential shutdown processing xfs: force the log offline when log intent item recovery fails xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes xfs: shorten the shutdown messages to a single line xfs: print name of function causing fs shutdown instead of hex pointer xfs: fix type mismatches in the inode reclaim functions xfs: separate primary inode selection criteria in xfs_iget_cache_hit xfs: refactor the inode recycling code xfs: add iclog state trace events xfs: xfs_log_force_lsn isn't passed a LSN xfs: Fix CIL throttle hang when CIL space used going backwards xfs: journal IO cache flush reductions xfs: remove need_start_rec parameter from xlog_write() ...
2021-06-28Merge tag 'fallthrough-fixes-clang-5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull fallthrough fixes from Gustavo Silva: "Fix many fall-through warnings when building with Clang 12.0.0 and '-Wimplicit-fallthrough' so that we at some point will be able to enable that warning by default" * tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits) rxrpc: Fix fall-through warnings for Clang drm/nouveau/clk: Fix fall-through warnings for Clang drm/nouveau/therm: Fix fall-through warnings for Clang drm/nouveau: Fix fall-through warnings for Clang xfs: Fix fall-through warnings for Clang xfrm: Fix fall-through warnings for Clang tipc: Fix fall-through warnings for Clang sctp: Fix fall-through warnings for Clang rds: Fix fall-through warnings for Clang net/packet: Fix fall-through warnings for Clang net: netrom: Fix fall-through warnings for Clang ide: Fix fall-through warnings for Clang hwmon: (max6621) Fix fall-through warnings for Clang hwmon: (corsair-cpro) Fix fall-through warnings for Clang firewire: core: Fix fall-through warnings for Clang braille_console: Fix fall-through warnings for Clang ipv4: Fix fall-through warnings for Clang qlcnic: Fix fall-through warnings for Clang bnxt_en: Fix fall-through warnings for Clang netxen_nic: Fix fall-through warnings for Clang ...
2021-06-08Merge tag 'xfs-perag-conv-tag' of ↵Darrick J. Wong
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2 xfs: initial agnumber -> perag conversions for shrink If we want to use active references to the perag to be able to gate shrink removing AGs and hence perags safely, we've got a fair bit of work to do actually use perags in all the places we need to. There's a lot of code that iterates ag numbers and then looks up perags from that, often multiple times for the same perag in the one operation. If we want to use reference counted perags for access control, then we need to convert all these uses to perag iterators, not agno iterators. [Patches 1-4] The first step of this is consolidating all the perag management - init, free, get, put, etc into a common location. THis is spread all over the place right now, so move it all into libxfs/xfs_ag.[ch]. This does expose kernel only bits of the perag to libxfs and hence userspace, so the structures and code is rearranged to minimise the number of ifdefs that need to be added to the userspace codebase. The perag iterator in xfs_icache.c is promoted to a first class API and expanded to the needs of the code as required. [Patches 5-10] These are the first basic perag iterator conversions and changes to pass the perag down the stack from those iterators where appropriate. A lot of this is obvious, simple changes, though in some places we stop passing the perag down the stack because the code enters into an as yet unconverted subsystem that still uses raw AGs. [Patches 11-16] These replace the agno passed in the btree cursor for per-ag btree operations with a perag that is passed to the cursor init function. The cursor takes it's own reference to the perag, and the reference is dropped when the cursor is deleted. Hence we get reference coverage for the entire time the cursor is active, even if the code that initialised the cursor drops it's reference before the cursor or any of it's children (duplicates) have been deleted. The first patch adds the perag infrastructure for the cursor, the next four patches convert a btree cursor at a time, and the last removes the agno from the cursor once it is unused. [Patches 17-21] These patches are a demonstration of the simplifications and cleanups that come from plumbing the perag through interfaces that select and then operate on a specific AG. In this case the inode allocation algorithm does up to three walks across all AGs before it either allocates an inode or fails. Two of these walks are purely just to select the AG, and even then it doesn't guarantee inode allocation success so there's a third walk if the selected AG allocation fails. These patches collapse the selection and allocation into a single loop, simplifies the error handling because xfs_dir_ialloc() always returns ENOSPC if no AG was selected for inode allocation or we fail to allocate an inode in any AG, gets rid of xfs_dir_ialloc() wrapper, converts inode allocation to run entirely from a single perag instance, and then factors xfs_dialloc() into a much, much simpler loop which is easy to understand. Hence we end up with the same inode allocation logic, but it only needs two complete iterations at worst, makes AG selection and allocation atomic w.r.t. shrink and chops out out over 100 lines of code from this hot code path. [Patch 22] Converts the unlink path to pass perags through it. There's more conversion work to be done, but this patchset gets through a large chunk of it in one hit. Most of the iterators are converted, so once this is solidified we can move on to converting these to active references for being able to free perags while the fs is still active. * tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (23 commits) xfs: remove xfs_perag_t xfs: use perag through unlink processing xfs: clean up and simplify xfs_dialloc() xfs: inode allocation can use a single perag instance xfs: get rid of xfs_dir_ialloc() xfs: collapse AG selection for inode allocation xfs: simplify xfs_dialloc_select_ag() return values xfs: remove agno from btree cursor xfs: use perag for ialloc btree cursors xfs: convert allocbt cursors to use perags xfs: convert refcount btree cursor to use perags xfs: convert rmap btree cursor to using a perag xfs: add a perag to the btree cursor xfs: pass perags around in fsmap data dev functions xfs: push perags through the ag reservation callouts xfs: pass perags through to the busy extent code xfs: convert secondary superblock walk to use perags xfs: convert xfs_iwalk to use perag references xfs: convert raw ag walks to use for_each_perag xfs: make for_each_perag... a first class citizen ...
2021-06-02xfs: use perag for ialloc btree cursorsDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02xfs: convert allocbt cursors to use peragsDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02xfs: convert refcount btree cursor to use peragsDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02xfs: convert rmap btree cursor to using a peragDave Chinner
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-06-02xfs: add a perag to the btree cursorDave Chinner
Which will eventually completely replace the agno in it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-06-02xfs: move xfs_perag_get/put to xfs_ag.[ch]Dave Chinner
They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-05-26xfs: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix the following warnings by replacing /* fall through */ comments, and its variants, with the new pseudo-keyword macro fallthrough: fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Notice that Clang doesn't recognize /* fall through */ comments as implicit fall-through markings, so in order to globally enable -Wimplicit-fallthrough for Clang, these comments need to be replaced with fallthrough; in the whole codebase. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2021-05-20xfs: fix deadlock retry tracepoint argumentsDarrick J. Wong
sc->ip is the inode that's being scrubbed, which means that it's not set for scrub types that don't involve inodes. If one of those scrubbers (e.g. inode btrees) returns EDEADLOCK, we'll trip over the null pointer. Fix that by reporting either the file being examined or the file that was used to call scrub. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-04-09xfs: get rid of the ip parameter to xchk_setup_*Darrick J. Wong
Now that the scrub context stores a pointer to the file that was used to invoke the scrub call, the struct xfs_inode pointer that we passed to all the setup functions is no longer necessary. This is only ever used if the caller wants us to scrub the metadata of the open file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-04-07xfs: move the di_flags field to struct xfs_inodeChristoph Hellwig
In preparation of removing the historic icinode struct, move the flags field into the containing xfs_inode structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-03-25xfs: remove return value from xchk_ag_btcur_initDarrick J. Wong
Functions called by this function cannot fail, so get rid of the return and error checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-03-25xfs: set the scrub AG number in xchk_ag_read_headersDarrick J. Wong
Since xchk_ag_read_headers initializes fields in struct xchk_ag, we might as well set the AG number and save the callers the trouble. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-02-03xfs: rename block gc start and stop functionsDarrick J. Wong
Shorten the names of the two functions that start and stop block preallocation garbage collection and move them up to the other blockgc functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-12-09xfs: remove unneeded return value check for *init_cursor()Joseph Qi
Since *init_cursor() can always return a valid cursor, the NULL check in caller is unneeded. So clean them up. This also keeps the behavior consistent with other callers. Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28xfs: remove unused header filesEric Sandeen
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-04-30xfs: add online scrub for superblock countersDarrick J. Wong
Teach online scrub how to check the filesystem summary counters. We use the incore delalloc block counter along with the incore AG headers to compute expected values for fdblocks, icount, and ifree, and then check that the percpu counter is within a certain threshold of the expected value. This is done to avoid having to freeze or otherwise lock the filesystem, which means that we're only checking that the counters are fairly close, not that they're exactly correct. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-04-26xfs: allow scrubbers to pause background reclaimDarrick J. Wong
The forthcoming summary counter patch races with regular filesystem activity to compute rough expected values for the counters. This design was chosen to avoid having to freeze the entire filesystem to check the counters, but while that's running we'd prefer to minimize background reclamation activity to reduce the perturbations to the incore free block count. Therefore, provide a way for scrubbers to disable background posteof and cowblock reclamation. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-04-16xfs: scrub should only cross-reference with healthy btreesDarrick J. Wong
Skip cross-referencing with a btree if the health report tells us that it's known to be bad. This should reduce the dmesg spew considerably. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>