linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2024-04-16	Merge tag 'repair-xattrs-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: online repair of extended attributes This series employs atomic extent swapping to enable safe reconstruction of extended attribute data attached to a file. Because xattrs do not have any redundant information to draw off of, we can at best salvage as much data as we can and build a new structure. Rebuilding an extended attribute structure consists of these three steps: First, we walk the existing attributes to salvage as many of them as we can, by adding them as new attributes attached to the repair tempfile. We need to add a new xfile-based data structure to hold blobs of arbitrary length to stage the xattr names and values. Second, we write the salvaged attributes to a temporary file, and use atomic extent swaps to exchange the entire attribute fork between the two files. Finally, we reap the old xattr blocks (which are now in the temporary file) as carefully as we can. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-xattrs-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: create an xattr iteration function for scrub xfs: flag empty xattr leaf blocks for optimization xfs: scrub should set preen if attr leaf has holes xfs: repair extended attributes xfs: use atomic extent swapping to fix user file fork data xfs: create a blob array data structure xfs: enable discarding of folios backing an xfile
2024-04-16	Merge tag 'dirattr-validate-owners-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: set and validate dir/attr block owners There are a couple of significant changes that need to be made to the directory and xattr code before we can support online repairs of those data structures. The first change is because online repair is designed to use libxfs to create a replacement dir/xattr structure in a temporary file, and use atomic extent swapping to commit the corrected structure. To avoid the performance hit of walking every block of the new structure to rewrite the owner number before the swap, we instead change libxfs to allow callers of the dir and xattr code the ability to set an explicit owner number to be written into the header fields of any new blocks that are created. For regular operation this will be the directory inode number. The second change is to update the dir/xattr code to actually check the owner number in each block that is read off the disk, since we don't currently do that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'dirattr-validate-owners-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: validate explicit directory free block owners xfs: validate explicit directory block buffer owners xfs: validate explicit directory data buffer owners xfs: validate directory leaf buffer owners xfs: validate dabtree node buffer owners xfs: validate attr remote value buffer owners xfs: validate attr leaf buffer owners xfs: reduce indenting in xfs_attr_node_list xfs: use the xfs_da_args owner field to set new dir/attr block owner xfs: add an explicit owner field to xfs_da_args
2024-04-16	Merge tag 'repair-rtsummary-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: online repair of realtime summaries We now have all the infrastructure we need to repair file metadata. We'll begin with the realtime summary file, because it is the least complex data structure. To support this we need to add three more pieces to the temporary file code from the previous patchset -- preallocating space in the temp file, formatting metadata into that space and writing the blocks to disk, and swapping the fork mappings atomically. After that, the actual reconstruction of the realtime summary information is pretty simple, since we can simply write the incore copy computed by the rtsummary scrubber to the temporary file, swap the contents, and reap the old blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-rtsummary-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: online repair of realtime summaries xfs: teach the tempfile to set up atomic file content exchanges xfs: support preallocating and copying content into temporary files
2024-04-16	Merge tag 'repair-tempfiles-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: create temporary files for online repair As mentioned earlier, the repair strategy for file-based metadata is to build a new copy in a temporary file and swap the file fork mappings with the metadata inode. We've built the atomic extent swap facility, so now we need to build a facility for handling private temporary files. The first step is to teach the filesystem to ignore the temporary files. We'll mark them as PRIVATE in the VFS so that the kernel security modules will leave it alone. The second step is to add the online repair code the ability to create a temporary file and reap extents from the temporary file after the extent swap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'repair-tempfiles-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: add the ability to reap entire inode forks xfs: refactor live buffer invalidation for repairs xfs: create temporary files and directories for online repair xfs: hide private inodes from bulkstat and handle functions
2024-04-16	Merge tag 'atomic-file-updates-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: atomic file content exchanges This series creates a new XFS_IOC_EXCHANGE_RANGE ioctl to exchange ranges of bytes between two files atomically. This new functionality enables data storage programs to stage and commit file updates such that reader programs will see either the old contents or the new contents in their entirety, with no chance of torn writes. A successful call completion guarantees that the new contents will be seen even if the system fails. The ability to exchange file fork mappings between files in this manner is critical to supporting online filesystem repair, which is built upon the strategy of constructing a clean copy of a damaged structure and committing the new structure into the metadata file atomically. The ioctls exist to facilitate testing of the new functionality and to enable future application program designs. User programs will be able to update files atomically by opening an O_TMPFILE, reflinking the source file to it, making whatever updates they want to make, and exchange the relevant ranges of the temp file with the original file. If the updates are aligned with the file block size, a new (since v2) flag provides for exchanging only the written areas. Note that application software must quiesce writes to the file while it stages an atomic update. This will be addressed by a subsequent series. This mechanism solves the clunkiness of two existing atomic file update mechanisms: for O_TRUNC + rewrite, this eliminates the brief period where other programs can see an empty file. For create tempfile + rename, the need to copy file attributes and extended attributes for each file update is eliminated. However, this method introduces its own awkwardness -- any program initiating an exchange now needs to have a way to signal to other programs that the file contents have changed. For file access mediated via read and write, fanotify or inotify are probably sufficient. For mmaped files, that may not be fast enough. Here is the proposed manual page: IOCTL-XFS-EXCHANGE-RANGE(2System Calls ManuIOCTL-XFS-EXCHANGE-RANGE(2) NAME ioctl_xfs_exchange_range - exchange the contents of parts of two files SYNOPSIS #include <sys/ioctl.h> #include <xfs/xfs_fs.h> int ioctl(int file2_fd, XFS_IOC_EXCHANGE_RANGE, struct xfs_ex‐ change_range arg); DESCRIPTION Given a range of bytes in a first file file1_fd and a second range of bytes in a second file file2_fd, this ioctl(2) ex‐ changes the contents of the two ranges. Exchanges are atomic with regards to concurrent file opera‐ tions. Implementations must guarantee that readers see either the old contents or the new contents in their entirety, even if the system fails. The system call parameters are conveyed in structures of the following form: struct xfs_exchange_range { __s32 file1_fd; __u32 pad; __u64 file1_offset; __u64 file2_offset; __u64 length; __u64 flags; }; The field pad must be zero. The fields file1_fd, file1_offset, and length define the first range of bytes to be exchanged. The fields file2_fd, file2_offset, and length define the second range of bytes to be exchanged. Both files must be from the same filesystem mount. If the two file descriptors represent the same file, the byte ranges must not overlap. Most disk-based filesystems require that the starts of both ranges must be aligned to the file block size. If this is the case, the ends of the ranges must also be so aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set. The field flags control the behavior of the exchange operation. XFS_EXCHANGE_RANGE_TO_EOF Ignore the length parameter. All bytes in file1_fd from file1_offset to EOF are moved to file2_fd, and file2's size is set to (file2_offset+(file1_length- file1_offset)). Meanwhile, all bytes in file2 from file2_offset to EOF are moved to file1 and file1's size is set to (file1_offset+(file2_length- file2_offset)). XFS_EXCHANGE_RANGE_DSYNC Ensure that all modified in-core data in both file ranges and all metadata updates pertaining to the exchange operation are flushed to persistent storage before the call returns. Opening either file de‐ scriptor with O_SYNC or O_DSYNC will have the same effect. XFS_EXCHANGE_RANGE_FILE1_WRITTEN Only exchange sub-ranges of file1_fd that are known to contain data written by application software. Each sub-range may be expanded (both upwards and downwards) to align with the file allocation unit. For files on the data device, this is one filesystem block. For files on the realtime device, this is the realtime extent size. This facility can be used to implement fast atomic scatter-gather writes of any complexity for software-defined storage targets if all writes are aligned to the file allocation unit. XFS_EXCHANGE_RANGE_DRY_RUN Check the parameters and the feasibility of the op‐ eration, but do not change anything. RETURN VALUE On error, -1 is returned, and errno is set to indicate the er‐ ror. ERRORS Error codes can be one of, but are not limited to, the follow‐ ing: EBADF file1_fd is not open for reading and writing or is open for append-only writes; or file2_fd is not open for reading and writing or is open for append-only writes. EINVAL The parameters are not correct for these files. This error can also appear if either file descriptor repre‐ sents a device, FIFO, or socket. Disk filesystems gen‐ erally require the offset and length arguments to be aligned to the fundamental block sizes of both files. EIO An I/O error occurred. EISDIR One of the files is a directory. ENOMEM The kernel was unable to allocate sufficient memory to perform the operation. ENOSPC There is not enough free space in the filesystem ex‐ change the contents safely. EOPNOTSUPP The filesystem does not support exchanging bytes between the two files. EPERM file1_fd or file2_fd are immutable. ETXTBSY One of the files is a swap file. EUCLEAN The filesystem is corrupt. EXDEV file1_fd and file2_fd are not on the same mounted filesystem. CONFORMING TO This API is XFS-specific. USE CASES Several use cases are imagined for this system call. In all cases, application software must coordinate updates to the file because the exchange is performed unconditionally. The first is a data storage program that wants to commit non- contiguous updates to a file atomically and coordinates write access to that file. This can be done by creating a temporary file, calling FICLONE(2) to share the contents, and staging the updates into the temporary file. The FULL_FILES flag is recom‐ mended for this purpose. The temporary file can be deleted or punched out afterwards. An example program might look like this: int fd = open("/some/file", O_RDWR); int temp_fd = open("/some", O_TMPFILE \| O_RDWR); ioctl(temp_fd, FICLONE, fd); / append 1MB of records / lseek(temp_fd, 0, SEEK_END); write(temp_fd, data1, 1000000); / update record index / pwrite(temp_fd, data1, 600, 98765); pwrite(temp_fd, data2, 320, 54321); pwrite(temp_fd, data2, 15, 0); / commit the entire update / struct xfs_exchange_range args = { .file1_fd = temp_fd, .flags = XFS_EXCHANGE_RANGE_TO_EOF, }; ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args); The second is a software-defined storage host (e.g. a disk jukebox) which implements an atomic scatter-gather write com‐ mand. Provided the exported disk's logical block size matches the file's allocation unit size, this can be done by creating a temporary file and writing the data at the appropriate offsets. It is recommended that the temporary file be truncated to the size of the regular file before any writes are staged to the temporary file to avoid issues with zeroing during EOF exten‐ sion. Use this call with the FILE1_WRITTEN flag to exchange only the file allocation units involved in the emulated de‐ vice's write command. The temporary file should be truncated or punched out completely before being reused to stage another write. An example program might look like this: int fd = open("/some/file", O_RDWR); int temp_fd = open("/some", O_TMPFILE \| O_RDWR); struct stat sb; int blksz; fstat(fd, &sb); blksz = sb.st_blksize; / land scatter gather writes between 100fsb and 500fsb / pwrite(temp_fd, data1, blksz 2, blksz * 100); pwrite(temp_fd, data2, blksz * 20, blksz * 480); pwrite(temp_fd, data3, blksz * 7, blksz * 257); /* commit the entire update / struct xfs_exchange_range args = { .file1_fd = temp_fd, .file1_offset = blksz 100, .file2_offset = blksz * 100, .length = blksz * 400, .flags = XFS_EXCHANGE_RANGE_FILE1_WRITTEN \| XFS_EXCHANGE_RANGE_FILE1_DSYNC, }; ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args); NOTES Some filesystems may limit the amount of data or the number of extents that can be exchanged in a single call. SEE ALSO ioctl(2) XFS 2024-02-10 IOCTL-XFS-EXCHANGE-RANGE(2) The reference implementation in XFS creates a new log incompat feature and log intent items to track high level progress of swapping ranges of two files and finish interrupted work if the system goes down. Sample code can be found in the corresponding changes to xfs_io to exercise the use case mentioned above. Note that this function is /not/ the O_DIRECT atomic untorn file writes concept that has also been floating around for years. It is also not the RWF_ATOMIC patchset that has been shared. This RFC is constructed entirely in software, which means that there are no limitations other than the general filesystem limits. As a side note, the original motivation behind the kernel functionality is online repair of file-based metadata. The atomic file content exchange is implemented as an atomic exchange of file fork mappings, which means that we can implement online reconstruction of extended attributes and directories by building a new one in another inode and exchanging the contents. Subsequent patchsets adapt the online filesystem repair code to use atomic file exchanges. This enables repair functions to construct a clean copy of a directory, xattr information, symbolic links, realtime bitmaps, and realtime summary information in a temporary inode. If this completes successfully, the new contents can be committed atomically into the inode being repaired. This is essential to avoid making corruption problems worse if the system goes down in the middle of running repair. For userspace, this series also includes the userspace pieces needed to test the new functionality, and a sample implementation of atomic file updates. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'atomic-file-updates-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: enable logged file mapping exchange feature docs: update swapext -> exchmaps language xfs: capture inode generation numbers in the ondisk exchmaps log item xfs: support non-power-of-two rtextsize with exchange-range xfs: make file range exchange support realtime files xfs: condense symbolic links after a mapping exchange operation xfs: condense directories after a mapping exchange operation xfs: condense extended attributes after a mapping exchange operation xfs: add error injection to test file mapping exchange recovery xfs: bind together the front and back ends of the file range exchange code xfs: create deferred log items for file mapping exchanges xfs: introduce a file mapping exchange log intent item xfs: create a incompat flag for atomic file mapping exchanges xfs: introduce new file range exchange ioctl vfs: export remap and write check helpers
2024-04-16	Merge tag 'file-exchange-refactorings-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: refactorings for atomic file content exchanges This series applies various cleanups and refactorings to file IO handling code ahead of the main series to implement atomic file content exchanges. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'file-exchange-refactorings-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: constify xfs_bmap_is_written_extent xfs: refactor non-power-of-two alignment checks xfs: hoist multi-fsb allocation unit detection to a helper xfs: create a new helper to return a file's allocation unit xfs: declare xfs_file.c symbols in xfs_file.h xfs: move xfs_iops.c declarations out of xfs_inode.h xfs: move inode lease breaking functions to xfs_inode.c
2024-04-16	Merge tag 'log-incompat-permissions-6.10_2024-04-15' of ↵	Chandan Babu R
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA xfs: improve log incompat feature handling This patchset improves the performance of log incompat feature bit handling by making a few changes to how the filesystem handles them. First, we now only clear the bits during a clean unmount to reduce calls to the (expensive) upgrade function to once per bit per mount. Second, we now only allow incompat feature upgrades for sysadmins or if the sysadmin explicitly allows it via mount option. Currently the only log incompat user is logged xattrs, which requires CONFIG_XFS_DEBUG=y, so there should be no user visible impact to this change. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> * tag 'log-incompat-permissions-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: only clear log incompat flags at clean unmount xfs: fix error bailout in xrep_abt_build_new_trees xfs: fix potential AGI <-> ILOCK ABBA deadlock in xrep_dinode_findmode_walk_directory xfs: fix an AGI lock acquisition ordering problem in xrep_dinode_findmode xfs: pass xfs_buf lookup flags to xfs_*read_agi
2024-04-15	xfs: create an xattr iteration function for scrub	Darrick J. Wong
	Create a streamlined function to walk a file's xattrs, without all the cursor management stuff in the regular listxattr. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: flag empty xattr leaf blocks for optimization	Darrick J. Wong
	Empty xattr leaf blocks at offset zero are a waste of space but otherwise harmless. If we encounter one, flag it as an opportunity for optimization. If we encounter empty attr leaf blocks anywhere else in the attr fork, that's corruption. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: scrub should set preen if attr leaf has holes	Darrick J. Wong
	If an attr block indicates that it could use compaction, set the preen flag to have the attr fork rebuilt, since the attr fork rebuilder can take care of that for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: repair extended attributes	Darrick J. Wong
	If the extended attributes look bad, try to sift through the rubble to find whatever keys/values we can, stage a new attribute structure in a temporary file and use the atomic extent swapping mechanism to commit the results in bulk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: use atomic extent swapping to fix user file fork data	Darrick J. Wong
	Build on the code that was recently added to the temporary repair file code so that we can atomically switch the contents of any file fork, even if the fork is in local format. The upcoming functions to repair xattrs, directories, and symlinks will need that capability. Repair can lock out access to these user files by holding IOLOCK_EXCL on these user files. Therefore, it is safe to drop the ILOCK of both the file being repaired and the tempfile being used for staging, and cancel the scrub transaction. We do this so that we can reuse the resource estimation and transaction allocation functions used by a regular file exchange operation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: create a blob array data structure	Darrick J. Wong
	Create a simple 'blob array' data structure for storage of arbitrarily sized metadata objects that will be used to reconstruct metadata. For the intended usage (temporarily storing extended attribute names and values) we only have to support storing objects and retrieving them. Use the xfile abstraction to store the attribute information in memory that can be swapped out. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: enable discarding of folios backing an xfile	Darrick J. Wong
	Create a new xfile function to discard the page cache that's backing part of an xfile. The next patch wil use this to drop parts of an xfile that aren't needed anymore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate explicit directory free block owners	Darrick J. Wong
	Port the existing directory freespace block header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate explicit directory block buffer owners	Darrick J. Wong
	Port the existing directory block header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate explicit directory data buffer owners	Darrick J. Wong
	Port the existing directory data header checking function to accept an owner number instead of an xfs_inode, then update the callsites to use xfs_da_args.owner when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate directory leaf buffer owners	Darrick J. Wong
	Check the owner field of directory leaf blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate dabtree node buffer owners	Darrick J. Wong
	Check the owner field of dabtree node blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate attr remote value buffer owners	Darrick J. Wong
	Check the owner field of xattr remote value blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: validate attr leaf buffer owners	Darrick J. Wong
	Create a leaf block header checking function to validate the owner field of xattr leaf blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: reduce indenting in xfs_attr_node_list	Darrick J. Wong
	Reduce the indentation here so that we can add some things in the next patch without going over the column limits. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: use the xfs_da_args owner field to set new dir/attr block owner	Darrick J. Wong
	When we're creating leaf, data, freespace, or dabtree blocks for directories and xattrs, use the explicit owner field (instead of the xfs_inode) to set the owner field. This will enable online repair to construct replacement data structures in a temporary file without having to change the owner fields prior to swapping the new and old structures. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: add an explicit owner field to xfs_da_args	Darrick J. Wong
	Add an explicit owner field to xfs_da_args, which will make it easier for online fsck to set the owner field of the temporary directory and xattr structures that it builds to repair damaged metadata. Note: I hopefully found all the xfs_da_args definitions by looking for automatic stack variable declarations and xfs_da_args.dp assignments: git grep -E '(args.dp =\|struct xfs_da_args[[:space:]][a-z0-9][a-z0-9]*)' Note that callers of xfs_attr_{get,set,change} can set the owner to zero (or leave it unset) to have the default set to args->dp. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: online repair of realtime summaries	Darrick J. Wong
	Repair the realtime summary data by constructing a new rtsummary file in the scrub temporary file, then atomically swapping the contents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: add the ability to reap entire inode forks	Darrick J. Wong
	In preparation for supporting repair of indexed file-based metadata (such as realtime bitmaps, directories, and extended attribute data), add a function to reap the old blocks after a metadata repair finishes. IOWs, this is an elaborate bunmapi call that deals with crosslinked blocks by unmapping them without freeing them, and also scans for incore buffers to invalidate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: teach the tempfile to set up atomic file content exchanges	Darrick J. Wong
	Create some new routines to exchange the contents of a temporary file created to stage a repair with another ondisk file. This will be used by the realtime summary repair function to commit atomically the new rtsummary data, which will be staged in the tempfile. The rest of XFS coordinates access to the realtime metadata inodes solely through the ILOCK. For repair to hold its exclusive access to the realtime summary file, it has to allocate a single large transaction and roll it repeatedly throughout the repair while holding the ILOCK. In turn, this means that for now there's only a partial file mapping exchange implementation for the temporary file because we can only work within an existing transaction. For now, the only tempswap functions needed here are to estimate the resource requirements of the exchange, reserve more space/quota to an existing transaction, and kick off the actual exchange. The rest will be added in a later patch in preparation for repairing xattrs and directories. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: support preallocating and copying content into temporary files	Darrick J. Wong
	Create the routines we need to preallocate space in a temporary ondisk file and then copy the contents of an xfile into the tempfile. The upcoming rtsummary repair feature will construct the contents of a realtime summary file in memory, after which it will want to copy all that into the ondisk temporary file before atomically committing the new rtsummary contents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: refactor live buffer invalidation for repairs	Darrick J. Wong
	In an upcoming patch, we will need to be able to look for xfs_buf objects caching file-based metadata blocks without needing to walk the (possibly corrupt) structures to find all the buffers. Repair already has most of the code needed to scan the buffer cache, so hoist these utility functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: create temporary files and directories for online repair	Darrick J. Wong
	Teach the online repair code how to create temporary files or directories. These temporary files can be used to stage reconstructed information until we're ready to perform an atomic extent swap to commit the new metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: hide private inodes from bulkstat and handle functions	Darrick J. Wong
	We're about to start adding functionality that uses internal inodes that are private to XFS. What this means is that userspace should never be able to access any information about these files, and should not be able to open these files by handle. To prevent users from ever finding the file or mis-interactions with the security apparatus, set S_PRIVATE on the inode. Don't allow bulkstat, open-by-handle, or linking of S_PRIVATE files into the directory tree. This should keep private inodes actually private. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: enable logged file mapping exchange feature	Darrick J. Wong
	Add the XFS_SB_FEAT_INCOMPAT_EXCHRANGE feature to the set of features that we will permit when mounting a filesystem. This turns on support for the file range exchange feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	docs: update swapext -> exchmaps language	Darrick J. Wong
	Start reworking the atomic swapext design documentation to refer to its new file contents/mapping exchange name. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: capture inode generation numbers in the ondisk exchmaps log item	Darrick J. Wong
	Per some very late review comments, capture the generation numbers of both inodes involved in a file content exchange operation so that we don't accidentally target files with have been reallocated. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: support non-power-of-two rtextsize with exchange-range	Darrick J. Wong
	The generic exchange-range alignment checks use (fast) bitmasking operations to perform block alignment checks on the exchange parameters. Unfortunately, bitmasks require that the alignment size be a power of two. This isn't true for realtime devices with a non-power-of-two extent size, so we have to copy-pasta the generic checks using long division for this to work properly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: make file range exchange support realtime files	Darrick J. Wong
	Now that bmap items support the realtime device, we can add the necessary pieces to the file range exchange code to support exchanging mappings. All we really need to do here is adjust the blockcount upwards to the end of the rt extent and remove the inode checks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: condense symbolic links after a mapping exchange operation	Darrick J. Wong
	The previous commit added a new file mapping exchange flag that enables us to perform post-exchange processing on file2 once we're done exchanging the extent mappings. Now add this ability for symlinks. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online symlink repair feature can salvage the remote target in a temporary link and exchange the data fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed symlink down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: condense directories after a mapping exchange operation	Darrick J. Wong
	The previous commit added a new file mapping exchange flag that enables us to perform post-swap processing on file2 once we're done exchanging extent mappings. Now add this ability for directories. This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online directory repair feature can create salvaged dirents in a temporary directory and exchange the data fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed directory down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: condense extended attributes after a mapping exchange operation	Darrick J. Wong
	Add a new file mapping exchange flag that enables us to perform post-exchange processing on file2 once we're done exchanging the extent mappings. If we were swapping mappings between extended attribute forks, we want to be able to convert file2's attr fork from block to inline format. (This implies that all fork contents are exchanged.) This isn't used anywhere right now, but we need to have the basic ondisk flags in place so that a future online xattr repair feature can create salvaged attrs in a temporary file and exchange the attr fork mappings when ready. If one file is in extents format and the other is inline, we will have to promote both to extents format to perform the exchange. After the exchange, we can try to condense the fixed file's attr fork back down to inline format if possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: add error injection to test file mapping exchange recovery	Darrick J. Wong
	Add an errortag so that we can test recovery of exchmaps log items. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: bind together the front and back ends of the file range exchange code	Darrick J. Wong
	So far, we've constructed the front end of the file range exchange code that does all the checking; and the back end of the file mapping exchange code that actually does the work. Glue these two pieces together so that we can turn on the functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: create deferred log items for file mapping exchanges	Darrick J. Wong
	Now that we've created the skeleton of a log intent item to track and restart file mapping exchange operations, add the upper level logic to commit intent items and turn them into concrete work recorded in the log. This builds on the existing bmap update intent items that have been around for a while now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: introduce a file mapping exchange log intent item	Darrick J. Wong
	Introduce a new intent log item to handle exchanging mappings between the forks of two files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: create a incompat flag for atomic file mapping exchanges	Darrick J. Wong
	Create a incompat flag so that we only attempt to process file mapping exchange log items if the filesystem supports it, and a geometry flag to advertise support if it's present. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: introduce new file range exchange ioctl	Darrick J. Wong
	Introduce a new ioctl to handle exchanging ranges of bytes between files. The goal here is to perform the exchange atomically with respect to applications -- either they see the file contents before the exchange or they see that A-B is now B-A, even if the kernel crashes. My original goal with all this code was to make it so that online repair can build a replacement directory or xattr structure in a temporary file and commit the repair by atomically exchanging all the data blocks between the two files. However, I needed a way to test this mechanism thoroughly, so I've been evolving an ioctl interface since then. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	vfs: export remap and write check helpers	Darrick J. Wong
	Export these functions so that the next patch can use them to check the file ranges being passed to the XFS_IOC_EXCHANGE_RANGE operation. Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: constify xfs_bmap_is_written_extent	Darrick J. Wong
	This predicate doesn't modify the structure that's being passed in, so we can mark it const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: refactor non-power-of-two alignment checks	Darrick J. Wong
	Create a helper function that can compute if a 64-bit number is an integer multiple of a 32-bit number, where the 32-bit number is not required to be an even power of two. This is needed for some new code for the realtime device, where we can set 37k allocation units and then have to remap them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: hoist multi-fsb allocation unit detection to a helper	Darrick J. Wong
	Replace the open-coded logic to decide if a file has a multi-fsb allocation unit to a helper to make the code easier to read. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-04-15	xfs: create a new helper to return a file's allocation unit	Darrick J. Wong
	Create a new helper function to calculate the fundamental allocation unit (i.e. the smallest unit of space we can allocate) of a file. Things are going to get hairy with range-exchange on the realtime device, so prepare for this now. Remove the static attribute from xfs_is_falloc_aligned since the next patch will need it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>