summaryrefslogtreecommitdiff
path: root/fs/xfs/xfs_iops.c
AgeCommit message (Collapse)Author
2021-01-24xfs: support idmapped mountsChristoph Hellwig
Enable idmapped mounts for xfs. This basically just means passing down the user_namespace argument from the VFS methods down to where it is passed to the relevant helpers. Note that full-filesystem bulkstat is not supported from inside idmapped mounts as it is an administrative operation that acts on the whole file system. The limitation is not applied to the bulkstat single operation that just operates on a single inode. Link: https://lore.kernel.org/r/20210121131959.646623-40-christian.brauner@ubuntu.com Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24fs: make helpers idmap mount awareChristian Brauner
Extend some inode methods with an additional user namespace argument. A filesystem that is aware of idmapped mounts will receive the user namespace the mount has been marked with. This can be used for additional permission checking and also to enable filesystems to translate between uids and gids if they need to. We have implemented all relevant helpers in earlier patches. As requested we simply extend the exisiting inode method instead of introducing new ones. This is a little more code churn but it's mostly mechanical and doesnt't leave us with additional inode methods. Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24acl: handle idmapped mountsChristian Brauner
The posix acl permission checking helpers determine whether a caller is privileged over an inode according to the acls associated with the inode. Add helpers that make it possible to handle acls on idmapped mounts. The vfs and the filesystems targeted by this first iteration make use of posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to translate basic posix access and default permissions such as the ACL_USER and ACL_GROUP type according to the initial user namespace (or the superblock's user namespace) to and from the caller's current user namespace. Adapt these two helpers to handle idmapped mounts whereby we either map from or into the mount's user namespace depending on in which direction we're translating. Similarly, cap_convert_nscap() is used by the vfs to translate user namespace and non-user namespace aware filesystem capabilities from the superblock's user namespace to the caller's user namespace. Enable it to handle idmapped mounts by accounting for the mount's user namespace. In addition the fileystems targeted in the first iteration of this patch series make use of the posix_acl_chmod() and, posix_acl_update_mode() helpers. Both helpers perform permission checks on the target inode. Let them handle idmapped mounts. These two helpers are called when posix acls are set by the respective filesystems to handle this case we extend the ->set() method to take an additional user namespace argument to pass the mount's user namespace down. Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24attr: handle idmapped mountsChristian Brauner
When file attributes are changed most filesystems rely on the setattr_prepare(), setattr_copy(), and notify_change() helpers for initialization and permission checking. Let them handle idmapped mounts. If the inode is accessed through an idmapped mount map it into the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Helpers that perform checks on the ia_uid and ia_gid fields in struct iattr assume that ia_uid and ia_gid are intended values and have already been mapped correctly at the userspace-kernelspace boundary as we already do today. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-22xfs: Fix assert failure in xfs_setattr_size()Yumei Huang
An assert failure is triggered by syzkaller test due to ATTR_KILL_PRIV is not cleared before xfs_setattr_size. As ATTR_KILL_PRIV is not checked/used by xfs_setattr_size, just remove it from the assert. Signed-off-by: Yumei Huang <yuhuang@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2020-12-12xfs: open code updating i_mode in xfs_set_aclChristoph Hellwig
Rather than going through the big and hairy xfs_setattr_nonsize function, just open code a transactional i_mode and i_ctime update. This allows to mark xfs_setattr_nonsize and remove the flags argument to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-12-12xfs: remove xfs_vn_setattr_nonsizeChristoph Hellwig
Merge xfs_vn_setattr_nonsize into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-12-09xfs: remove unnecessary null check in xfs_generic_createKaixu Xia
The function posix_acl_release() test the passed-in argument and move on only when it is non-null, so maybe the null check in xfs_generic_create is unnecessary. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-11-04xfs: flush new eof page on truncate to avoid post-eof corruptionBrian Foster
It is possible to expose non-zeroed post-EOF data in XFS if the new EOF page is dirty, backed by an unwritten block and the truncate happens to race with writeback. iomap_truncate_page() will not zero the post-EOF portion of the page if the underlying block is unwritten. The subsequent call to truncate_setsize() will, but doesn't dirty the page. Therefore, if writeback happens to complete after iomap_truncate_page() (so it still sees the unwritten block) but before truncate_setsize(), the cached page becomes inconsistent with the on-disk block. A mapped read after the associated page is reclaimed or invalidated exposes non-zero post-EOF data. For example, consider the following sequence when run on a kernel modified to explicitly flush the new EOF page within the race window: $ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file $ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file ... $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: 00 00 00 00 00 00 00 00 ........ $ umount /mnt/; mount <dev> /mnt/ $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: cd cd cd cd cd cd cd cd ........ Update xfs_setattr_size() to explicitly flush the new EOF page prior to the page truncate to ensure iomap has the latest state of the underlying block. Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-09-25xfs: directly call xfs_generic_create() for ->create() and ->mkdir()Kaixu Xia
The current create and mkdir handlers both call the xfs_vn_mknod() which is a wrapper routine around xfs_generic_create() function. Actually the create and mkdir handlers can directly call xfs_generic_create() function and reduce the call chain. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-06-09mmap locking API: convert mmap_sem commentsMichel Lespinasse
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-05Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of bug fixes and cleanups for ext4, including: - Fix performance problems found in dioread_nolock now that it is the default, caused by transaction leaks. - Clean up fiemap handling in ext4 - Clean up and refactor multiple block allocator (mballoc) code - Fix a problem with mballoc with a smaller file systems running out of blocks because they couldn't properly use blocks that had been reserved by inode preallocation. - Fixed a race in ext4_sync_parent() versus rename() - Simplify the error handling in the extent manipulation code - Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers. - Avoid passing an error pointer to brelse in ext4_xattr_set() - Fix race which could result to freeing an inode on the dirty last in data=journal mode. - Fix refcount handling if ext4_iget() fails - Fix a crash in generic/019 caused by a corrupted extent node" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits) ext4: avoid unnecessary transaction starts during writeback ext4: don't block for O_DIRECT if IOCB_NOWAIT is set ext4: remove the access_ok() check in ext4_ioctl_get_es_cache fs: remove the access_ok() check in ioctl_fiemap fs: handle FIEMAP_FLAG_SYNC in fiemap_prep fs: move fiemap range validation into the file systems instances iomap: fix the iomap_fiemap prototype fs: move the fiemap definitions out of fs.h fs: mark __generic_block_fiemap static ext4: remove the call to fiemap_check_flags in ext4_fiemap ext4: split _ext4_fiemap ext4: fix fiemap size checks for bitmap files ext4: fix EXT4_MAX_LOGICAL_BLOCK macro add comment for ext4_dir_entry_2 file_type member jbd2: avoid leaking transaction credits when unreserving handle ext4: drop ext4_journal_free_reserved() ext4: mballoc: use lock for checking free blocks while retrying ext4: mballoc: refactor ext4_mb_good_group() ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling ext4: mballoc: refactor ext4_mb_discard_preallocations() ...
2020-06-03fs: move the fiemap definitions out of fs.hChristoph Hellwig
No need to pull the fiemap definitions into almost every file in the kernel build. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-05-19xfs: move the per-fork nextents fields into struct xfs_iforkChristoph Hellwig
There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()Ira Weiny
The functionality in xfs_diflags_to_linux() and xfs_diflags_to_iflags() are nearly identical. The only difference is that *_to_linux() is called after inode setup and disallows changing the DAX flag. Combining them can be done with a flag which indicates if this is the initial setup to allow the DAX flag to be properly set only at init time. So remove xfs_diflags_to_linux() and call the modified xfs_diflags_to_iflags() directly. While we are here simplify xfs_diflags_to_iflags() to take struct xfs_inode and use xfs_ip2xflags() to ensure future diflags are included correctly. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04fs/xfs: Create function xfs_inode_should_enable_dax()Ira Weiny
xfs_inode_supports_dax() should reflect if the inode can support DAX not that it is enabled for DAX. Change the use of xfs_inode_supports_dax() to reflect only if the inode and underlying storage support dax. Add a new function xfs_inode_should_enable_dax() which reflects if the inode should be enabled for DAX. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYSIra Weiny
In prep for the new tri-state mount option which then introduces XFS_MOUNT_DAX_NEVER. Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04xfs: combine two if statements with same conditionKaixu Xia
The two if statements have same condition, and the mask value does not change in xfs_setattr_nonsize(), so combine them. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19xfs: remove the di_version field from struct icdinodeChristoph Hellwig
We know the version is 3 if on a v5 file system. For earlier file systems formats we always upgrade the remaining v1 inodes to v2 and thus only use v2 inodes. Use the xfs_sb_version_has_large_dinode helper to check if we deal with small or large dinodes, and thus remove the need for the di_version field in struct icdinode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: clean up the attr flag confusionChristoph Hellwig
The ATTR_* flags have a long IRIX history, where they a userspace interface, the on-disk format and an internal interface. We've split out the on-disk interface to the XFS_ATTR_* values, but despite (or because?) of that the flag have still been a mess. Switch the internal interface to pass the on-disk XFS_ATTR_* flags for the namespace and the Linux XATTR_* flags for the actual flags instead. The ATTR_* values that are actually used are move to xfs_fs.h with a new XFS_IOC_* prefix to not conflict with the userspace version that has the same name and must have the same value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: pass an initialized xfs_da_args structure to xfs_attr_setChristoph Hellwig
Instead of converting from one style of arguments to another in xfs_attr_set, pass the structure from higher up in the call chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: remove the icdinode di_uid/di_gid membersChristoph Hellwig
Use the Linux inode i_uid/i_gid members everywhere and just convert from/to the scalar value when reading or writing the on-disk inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02xfs: ensure that the inode uid/gid match values match the icdinode onesChristoph Hellwig
Instead of only synchronizing the uid/gid values in xfs_setup_inode, ensure that they always match to prepare for removing the icdinode fields. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-09xfs: Remove all strlen in all xfs_attr_* functions for attr names.Allison Henderson
This helps to pre-simplify the extra handling of the null terminator in delayed operations which use memcpy rather than strlen. Later when we introduce parent pointers, attribute names will become binary, so strlen will not work at all. Removing uses of strlen now will help reduce complexities later Signed-off-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13xfs: merge the projid fields in struct xfs_icdinodeChristoph Hellwig
There is no point in splitting the fields like this in an purely in-memory structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13xfs: use a struct timespec64 for the in-core crtimeChristoph Hellwig
struct xfs_icdinode is purely an in-memory data structure, so don't use a log on-disk structure for it. This simplifies the code a bit, and also reduces our include hell slightly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix a minor indenting problem in xfs_trans_ichgtime] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13xfs: convert open coded corruption check to use XFS_IS_CORRUPTDarrick J. Wong
Convert the last of the open coded corruption check and report idioms to use the XFS_IS_CORRUPT macro. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-10xfs: remove the now unused dir ops infrastructureChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10xfs: move the node header size to struct xfs_da_geometryChristoph Hellwig
Move the node header size field to struct xfs_da_geometry, and remove the now unused non-directory dir ops infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-04xfs: always log corruption errorsDarrick J. Wong
Make sure we log something to dmesg whenever we return -EFSCORRUPTED up the call stack. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-29xfs: reverse the polarity of XFS_MOUNT_COMPAT_IOSIZEChristoph Hellwig
Replace XFS_MOUNT_COMPAT_IOSIZE with an inverted XFS_MOUNT_LARGEIO flag that makes the usage more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: rename the XFS_MOUNT_DFLT_IOSIZE option toChristoph Hellwig
Make the flag match the mount option and usage. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: rename the m_writeio_* fields in struct xfs_mountChristoph Hellwig
Use the allocsize name to match the mount option and usage instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: remove the m_readio_* fields in struct xfs_mountChristoph Hellwig
m_readio_blocks is entirely unused, and m_readio_blocks is only used in xfs_stat_blksize in a max statements that is a no-op as it always has the same value as m_writeio_log. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-29xfs: cleanup calculating the stat optimal I/O sizeChristoph Hellwig
Move xfs_preferred_iosize to xfs_iops.c, unobsfucate it and also handle the realtime special case in the helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-28xfs: add a xfs_inode_buftarg helperChristoph Hellwig
Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a given inode. Replace the existing xfs_find_bdev_for_inode and xfs_find_daxdev_for_inode helpers with this new general one and cleanup some of the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: split the iomap ops for buffered vs direct writesChristoph Hellwig
Instead of lots of magic conditionals in the main write_begin handler this make the intent very clear. Thing will become even better once we support delayed allocations for extent size hints and realtime allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: split out a new set of read-only iomap opsChristoph Hellwig
Start untangling xfs_file_iomap_begin by splitting out the read-only case into its own set of iomap_ops with a very simply iomap_begin helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-08-22xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOTDarrick J. Wong
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp fails on account of being out of disk quota. I ran his reproducer script: # adduser dummy # adduser dummy plugdev # dd if=/dev/zero bs=1M count=100 of=test.img # mkfs.xfs test.img # mount -t xfs -o gquota test.img /mnt # mkdir -p /mnt/dummy # chown -c dummy /mnt/dummy # xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt (and then as user dummy) $ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo $ chgrp plugdev /mnt/dummy/foo and saw: ================================================ WARNING: lock held when returning to user space! 5.3.0-rc5 #rc5 Tainted: G W ------------------------------------------------ chgrp/47006 is leaving the kernel with locks still held! 1 lock held by chgrp/47006: #0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs] ...which is clearly caused by xfs_setattr_nonsize failing to unlock the ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing unlock. Reported-by: benjamin.moody@gmail.com Fixes: 253f4911f297 ("xfs: better xfs_trans_alloc interface") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org>
2019-06-28xfs: remove unused header filesEric Sandeen
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-03-01xfs: fix reporting supported extra file attributes for statx()Luis R. Rodriguez
statx(2) notes that any attribute that is not indicated as supported by stx_attributes_mask has no usable value. Commit 5f955f26f3d42d ("xfs: report crtime and attribute flags to statx") added support for informing userspace of extra file attributes but forgot to list these flags as supported making reporting them rather useless for the pedantic userspace author. $ git describe --contains 5f955f26f3d42d04aba65590a32eb70eedb7f37d v4.11-rc6~5^2^2~2 Fixes: 5f955f26f3d42d ("xfs: report crtime and attribute flags to statx") Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: add a comment reminding people to keep attributes_mask up to date] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-02-14xfs: don't ever put nlink > 0 inodes on the unlinked listDarrick J. Wong
When XFS creates an O_TMPFILE file, the inode is created with nlink = 1, put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile. If we crash before anything logs the inode (it's dirty incore but the vfs doesn't tell us it's dirty so we never log that change), the iunlink processing part of recovery will then explode with a pile of: XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file: fs/xfs/xfs_log_recover.c, line: 5072 Worse yet, since nlink is nonzero, the inodes also don't get cleaned up and they just leak until the next xfs_repair run. Therefore, change xfs_iunlink to require that inodes being put on the unlinked list have nlink == 0, change the tmpfile callers to instantiate nodes that way, and set the nlink to 1 just prior to calling d_tmpfile. Fix the comment for xfs_iunlink while we're at it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-09-29xfs: don't crash the vfs on a garbage inline symlinkDarrick J. Wong
The VFS routine that calls ->get_link blindly copies whatever's returned into the user's buffer. If we return a NULL pointer, the vfs will crash on the null pointer. Therefore, return -EFSCORRUPTED instead of blowing up the kernel. [dgc: clean up with hch's suggestions] Reported-by: wen.xu@gatech.edu Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-08-14Merge tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "This is the second part of the XFS changes for 4.19. The biggest changes are the removal of buffer heads frm XFS, a massive reworking of the deferred transaction operations handling code, the removal of the long defunct barrier/nobarrier mount options, and the addition of a few more online repair functions. Summary: - Use extent maps to track pagecache page status instead of bufferhead state. - Refactor pagecache read and write paths to use the new iomap library functions, which enable us to drop the old bufferhead code for pagesize == blocksize filesystems. - Set up parallel per-block-per-page metadata to track subpage information that was tracked by buffer heads, which enables us to drop the old bufferhead code for pagesize > blocksize filesystems. - Tie a deferred ops control structure to a transaction so that we can take advantage of an upper-level dfops without having to plumb pointer passing through the code. - Refactor the deferred ops code to track deferred ops as part of the transaction structure (instead of as a separate data structure) so that we can simplify the scoping rules around defer_ops. - Refactor twisty delwri buffer submission code to avoid deadlocks. - Shorten and fix indenting problems in the scrub code. - Detect obviously bad summary counts at mount and fix them. - Directly associate deferred ops control structure with a transaction so that callers no longer have to manage it themselves. - Remove a couple of IRIX-era inode macros. - Remove the long-deprecated 'barrier' and 'nobarrier' mount options. - Clean up the inode fork structure a bit. - Check for bad fs summary counter values in the superblock. - Reduce COW fork lookups during writeback. - Refactor the deferred ops control structures into the transaction structure, thereby eliminating the need for transaction users to handle the deferred ops as a separate data structure. - Add the ability to repair AG headers online. - Fix a crash due to insufficient return value checking. - Various fixes and cleanups" * tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits) xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree xfs: remove b_last_holder & associated macros iomap: Switch to offset_in_page for clarity xfs: Close race between direct IO and xfs_break_layouts() xfs: repair the AGI xfs: repair the AGFL xfs: repair the AGF xfs: remove dead error handling code in xfs_dquot_disk_alloc() xfs: use WRITE_ONCE to update if_seq xfs: fix a comment in xfs_log_reserve xfs: only validate summary counts on primary superblock xfs: substitute spaces with tabs xfs: fold dfops into the transaction xfs: always defer agfl block frees xfs: pass transaction to xfs_defer_add() xfs: replace xfs_defer_ops ->dop_pending with on-stack list xfs: cancel dfops on xfs_defer_finish() error xfs: clean out superfluous dfops dop params/vars xfs: drop dop param from xfs_defer_op_type ->finish_item() callback xfs: automatic dfops inode relogging ...
2018-08-03new helper: inode_fake_hash()Al Viro
open-coded in a quite a few places... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-26xfs: clean up IRELE/iput callsitesDarrick J. Wong
Replace the IRELE macro with a proper function so that we can do proper typechecking and so that we can stop open-coding iput in scrub, which means that we'll be able to ftrace inode lifetimes going through scrub correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-07-26xfs: remove all boilerplate defer init/finish codeBrian Foster
At this point, the transaction subsystem completely manages deferred items internally such that the common and boilerplate xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() -> xfs_trans_commit() sequence can be replaced with a simple transaction allocation and commit. Remove all such boilerplate deferred ops code. In doing so, we change each case over to use the dfops in the transaction and specifically eliminate: - The on-stack dfops and associated xfs_defer_init() call, as the internal dfops is initialized on transaction allocation. - xfs_bmap_finish() calls that precede a final xfs_trans_commit() of a transaction. - xfs_defer_cancel() calls in error handlers that precede a transaction cancel. The only deferred ops calls that remain are those that are non-deterministic with respect to the final commit of the associated transaction or are open-coded due to special handling. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-26xfs: pull up dfops from xfs_itruncate_extents()Brian Foster
xfs_itruncate_extents[_flags]() uses a local dfops with a transaction provided by the caller. It uses hacky ->t_dfops replacement logic to avoid stomping over an already populated ->t_dfops. The latter never occurs for current callers and the logic itself is not really appropriate. Clean this up by updating all callers to initialize a dfops and to use that down in xfs_itruncate_extents(). This more closely resembles the upcoming logic where dfops will be embedded within the transaction. We can also replace the xfs_defer_init() in the xfs_itruncate_extents_flags() loop with an assert. Both dfops and firstblock should be in a valid state after xfs_defer_finish() and the inode joined to the dfops is fixed throughout the loop. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-12Merge tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "Here's the second round of patches for XFS for 4.18. Most of the commits are small cleanups, bug fixes, and continued strengthening of metadata verifiers; the bulk of the diff is the conversion of the fs/xfs/ tree to use SPDX tags. This series has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Summary: - Strengthen metadata checking to avoid ASSERTing on bad disk contents - Validate btree records that are being retrieved for clients - Strengthen root inode verification - Convert license blurbs to SPDX tags - Enable changing DAX flag on directories - Fix some writeback deadlocks in reflink - Refactor out some old xfs helpers - Move type verifiers to a separate file - Fix some fuzzer crashes - Various other bug fixes" * tag 'xfs-4.18-merge-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits) xfs: update incore per-AG inode count xfs: replace do_mod with native operations xfs: don't call xfs_da_shrink_inode with NULL bp xfs: clean up MIN/MAX xfs: move various type verifiers to common file xfs: xfs_reflink_convert_cow() memory allocation deadlock xfs: setup VFS i_rwsem lockdep state correctly xfs: fix string handling in label get/set functions xfs: convert to SPDX license tags xfs: validate btree records on retrieval xfs: push corruption -> ESTALE conversion to xfs_nfs_get_inode() xfs: verify root inode more thoroughly xfs: verify COW extent size hint is valid in inode verifier xfs: verify extent size hint is valid in inode verifier xfs: catch bad stripe alignment configurations iomap: fsync swap files before iterating mappings xfs: use xfs_trans_getsb in xfs_sync_sb_buf xfs: don't assert on corrupted unlinked inode list xfs: explicitly pass buffer size to xfs_corruption_error xfs: don't assert when on-disk btree pointers are garbage ...