summaryrefslogtreecommitdiff
path: root/fs/xfs/scrub/xfarray.c
AgeCommit message (Collapse)Author
2024-05-27xfs: drop xfarray sortinfo folio on errorDarrick J. Wong
Chandan Babu reports the following livelock in xfs/708: run fstests xfs/708 at 2024-05-04 15:35:29 XFS (loop16): EXPERIMENTAL online scrub feature in use. Use at your own risk! XFS (loop5): Mounting V5 Filesystem e96086f0-a2f9-4424-a1d5-c75d53d823be XFS (loop5): Ending clean mount XFS (loop5): Quotacheck needed: Please wait. XFS (loop5): Quotacheck: Done. XFS (loop5): EXPERIMENTAL online scrub feature in use. Use at your own risk! INFO: task xfs_io:143725 blocked for more than 122 seconds. Not tainted 6.9.0-rc4+ #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:xfs_io state:D stack:0 pid:143725 tgid:143725 ppid:117661 flags:0x00004006 Call Trace: <TASK> __schedule+0x69c/0x17a0 schedule+0x74/0x1b0 io_schedule+0xc4/0x140 folio_wait_bit_common+0x254/0x650 shmem_undo_range+0x9d5/0xb40 shmem_evict_inode+0x322/0x8f0 evict+0x24e/0x560 __dentry_kill+0x17d/0x4d0 dput+0x263/0x430 __fput+0x2fc/0xaa0 task_work_run+0x132/0x210 get_signal+0x1a8/0x1910 arch_do_signal_or_restart+0x7b/0x2f0 syscall_exit_to_user_mode+0x1c2/0x200 do_syscall_64+0x72/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e The shmem code is trying to drop all the folios attached to a shmem file and gets stuck on a locked folio after a bnobt repair. It looks like the process has a signal pending, so I started looking for places where we lock an xfile folio and then deal with a fatal signal. I found a bug in xfarray_sort_scan via code inspection. This function is called to set up the scanning phase of a quicksort operation, which may involve grabbing a locked xfile folio. If we exit the function with an error code, the caller does not call xfarray_sort_scan_done to put the xfile folio. If _sort_scan returns an error code while si->folio is set, we leak the reference and never unlock the folio. Therefore, change xfarray_sort to call _scan_done on exit. This is safe to call multiple times because it sets si->folio to NULL and ignores a NULL si->folio. Also change _sort_scan to use an intermediate variable so that we never pollute si->folio with an errptr. Fixes: 232ea052775f9 ("xfs: enable sorting of xfile-backed arrays") Reported-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-04-23xfs: reduce the rate of cond_resched calls inside scrubDarrick J. Wong
We really don't want to call cond_resched every single time we go through a loop in scrub -- there may be billions of records, and probing into the scheduler itself has overhead. Reduce this overhead by only calling cond_resched 10x per second; and add a counter so that we only check jiffies once every 1000 records or so. Surprisingly, this reduces scrub-only fstests runtime by about 2%. I used the bmapinflate xfs_db command to produce a billion-extent file and this stupid gadget reduced the scrub runtime by about 4%. From a stupid microbenchmark of calling these things 1 billion times, I estimate that cond_resched costs about 5.5ns per call; jiffes costs about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per call. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15xfs: repair extended attributesDarrick J. Wong
If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, stage a new attribute structure in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-02-21xfs: convert xfarray_pagesort to deal with large foliosDarrick J. Wong
Convert xfarray_pagesort to handle large folios by introducing a new xfile_get_folio routine that can return a folio of arbitrary size, and using heapsort on the full folio. This also corrects an off-by-one bug in the calculation of len in xfarray_pagesort that was papered over by xfarray_want_pagesort. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-21xfs: fix a comment in xfarray.cChristoph Hellwig
xfiles are shmem files, not memfds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-21xfs: remove xfarray_sortinfo.page_kaddrChristoph Hellwig
Now that xfile pages don't need kmapping, there is no need to cache the kernel virtual address for them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-21xfs: don't allow highmem pages in xfile mappingsChristoph Hellwig
XFS is generally used on 64-bit, non-highmem platforms and xfile mappings are accessed all the time. Reduce our pain by not allowing any highmem mappings in the xfile page cache and remove all the kmap calls for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-02-21xfs: remove the xfile_pread/pwrite APIsChristoph Hellwig
All current and pending xfile users use the xfile_obj_load and xfile_obj_store API, so make those the actual implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2023-08-10xfs: improve xfarray quicksort pivotDarrick J. Wong
Now that we have the means to do insertion sorts of small in-memory subsets of an xfarray, use it to improve the quicksort pivot algorithm by reading 7 records into memory and finding the median of that. This should prevent bad partitioning when a[lo] and a[hi] end up next to each other in the final sort, which can happen when sorting for cntbt repair when the free space is extremely fragmented (e.g. generic/176). This doesn't speed up the average quicksort run by much, but it will (hopefully) avoid the quadratic time collapse for which quicksort is famous. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: cache pages used for xfarray quicksort convergenceDarrick J. Wong
After quicksort picks a pivot item for a particular subsort, it walks the records in that subset from the outside in, rearranging them so that every record less than the pivot comes before it, and every record greater than the pivot comes after it. This scan has a lot of locality, so we can speed it up quite a bit by grabbing the xfile backing page and holding onto it as long as we possibly can. Doing so reduces the runtime by another 5% on the author's computer. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: speed up xfarray sort by sorting xfile page contents directlyDarrick J. Wong
If all the records in an xfarray subset live within the same memory page, we can short-circuit even more quicksort recursion by mapping that page into the local CPU and using the kernel's heapsort function to sort the subset. On the author's computer, this reduces the runtime by another 15% on a 500,000 element array. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: convert xfarray insertion sort to heapsort using scratchpad memoryDarrick J. Wong
In the previous patch, we created a very basic quicksort implementation for xfile arrays. While the use of an alternate sorting algorithm to avoid quicksort recursion on very small subsets reduces the runtime modestly, we could do better than a load and store-heavy insertion sort, particularly since each load and store requires a page mapping lookup in the xfile. For a small increase in kernel memory requirements, we could instead bulk load the xfarray records into memory, use the kernel's existing heapsort implementation to sort the records, and bulk store the memory buffer back into the xfile. On the author's computer, this reduces the runtime by about 5% on a 500,000 element array. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: enable sorting of xfile-backed arraysDarrick J. Wong
The btree bulk loading code requires that records be provided in the correct record sort order for the given btree type. In general, repair code cannot be required to collect records in order, and it is not feasible to insert new records in the middle of an array to maintain sort order. Implement a sorting algorithm so that we can sort the records just prior to bulk loading. In principle, an xfarray could consume many gigabytes of memory and its backing pages can be sent out to disk at any time. This means that we cannot map the entire array into memory at once, so we must find a way to divide the work into smaller portions (e.g. a page) that /can/ be mapped into memory. Quicksort seems like a reasonable fit for this purpose, since it uses a divide and conquer strategy to keep its average runtime logarithmic. The solution presented here is a port of the glibc implementation, which itself is derived from the median-of-three and tail call recursion strategies outlined by Sedgwick. Subsequent patches will optimize the implementation further by utilizing the kernel's heapsort on directly-mapped memory whenever possible, and improving the quicksort pivot selection algorithm to try to avoid O(n^2) collapses. Note: The sorting functionality gets its own patch because the basic big array mechanisms were plenty for a single code patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-08-10xfs: create a big array data structureDarrick J. Wong
Create a simple 'big array' data structure for storage of fixed-size metadata records that will be used to reconstruct a btree index. For repair operations, the most important operations are append, iterate, and sort. Earlier implementations of the big array used linked lists and suffered from severe problems -- pinning all records in kernel memory was not a good idea and frequently lead to OOM situations; random access was very inefficient; and record overhead for the lists was unacceptably high at 40-60%. Therefore, the big memory array relies on the 'xfile' abstraction, which creates a memfd file and stores the records in page cache pages. Since the memfd is created in tmpfs, the memory pages can be pushed out to disk if necessary and we have a built-in usage limit of 50% of physical memory. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev> Reviewed-by: Dave Chinner <dchinner@redhat.com>