summaryrefslogtreecommitdiff
path: root/fs/afs/dir.c
AgeCommit message (Collapse)Author
2025-04-08VFS: rename lookup_one_len family to lookup_noperm and remove permission checkNeilBrown
The lookup_one_len family of functions is (now) only used internally by a filesystem on itself either - in a context where permission checking is irrelevant such as by a virtual filesystem populating itself, or xfs accessing its ORPHANAGE or dquota accessing the quota file; or - in a context where a permission check (MAY_EXEC on the parent) has just been performed such as a network filesystem finding in "silly-rename" file in the same directory. This is also the context after the _parentat() functions where currently lookup_one_qstr_excl() is used. So the permission check is pointless. The name "one_len" is unhelpful in understanding the purpose of these functions and should be changed. Most of the callers pass the len as "strlen()" so using a qstr and QSTR() can simplify the code. This patch renames these functions (include lookup_positive_unlocked() which is part of the family despite the name) to have a name based on "lookup_noperm". They are changed to receive a 'struct qstr' instead of separate name and len. In a few cases the use of QSTR() results in a new call to strlen(). try_lookup_noperm() takes a pointer to a qstr instead of the whole qstr. This is consistent with d_hash_and_lookup() (which is nearly identical) and useful for lookup_noperm_unlocked(). The new lookup_noperm_common() doesn't take a qstr yet. That will be tidied up in a subsequent patch. Signed-off-by: NeilBrown <neil@brown.name> Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs afs updates from Christian Brauner: "This contains the work for afs for this cycle: - Fix an occasional hang that's only really encountered when rmmod'ing the kafs module - Remove the "-o autocell" mount option. This is obsolete with the dynamic root and removing it makes the next patch slightly easier - Change how the dynamic root mount is constructed. Currently, the root directory is (de)populated when it is (un)mounted if there are cells already configured and, further, pairs of automount points have to be created/removed each time a cell is added/deleted This is changed so that readdir on the root dir lists all the known cell automount pairs plus the @cell symlinks and the inodes and dentries are constructed by lookup on demand. This simplifies the cell management code - A few improvements to the afs_volume and afs_server tracepoints - Pass trace info into the afs_lookup_cell() function to allow the trace log to indicate the purpose of the lookup - Remove the 'net' parameter from afs_unuse_cell() as it's superfluous - In rxrpc, allow a kernel app (such as kafs) to store a word of information on rxrpc_peer records - Use the information stored on the rxrpc_peer record to point to the afs_server record. This allows the server address lookup to be done away with - Simplify the afs_server ref/activity accounting to make each one self-contained and not garbage collected from the cell management work item - Simplify the afs_cell ref/activity accounting to make each one of these also self-contained and not driven by a central management work item The current code was intended to make it such that a single timer for the namespace and one work item per cell could do all the work required to maintain these records. This, however, made for some sequencing problems when cleaning up these records. Further, the attempt to pass refs along with timers and work items made getting it right rather tricky when the timer or work item already had a ref attached and now a ref had to be got rid of" * tag 'vfs-6.15-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Simplify cell record handling afs: Fix afs_server ref accounting afs: Use the per-peer app data provided by rxrpc rxrpc: Allow the app to store private data on peer structs afs: Drop the net parameter from afs_unuse_cell() afs: Make afs_lookup_cell() take a trace note afs: Improve server refcount/active count tracing afs: Improve afs_volume tracing to display a debug ID afs: Change dynroot to create contents on demand afs: Remove the "autocell" mount option
2025-03-10afs: Remove the "autocell" mount optionDavid Howells
Remove the "autocell" mount option. It was an attempt to do automounting of arbitrary cells based on what the user looked up but within the root directory of a mounted volume. This isn't really the right thing to do, and using the "dyn" mount option to get the dynamic root is the right way to do it. The kafs-client package uses "-o dyn" when mounting /afs, so it should be safe to drop "-o autocell". Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-7-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-3-dhowells@redhat.com/ # v4
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-30Merge tag 'pull-revalidate' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs d_revalidate updates from Al Viro: "Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions" * tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: fix ->rename_sem exclusion orangefs_d_revalidate(): use stable parent inode and name passed by caller ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller nfs: fix ->d_revalidate() UAF on ->d_name accesses nfs{,4}_lookup_validate(): use stable parent inode passed by caller gfs2_drevalidate(): use stable parent inode and name passed by caller fuse_dentry_revalidate(): use stable parent inode and name passed by caller vfat_revalidate{,_ci}(): use stable parent inode passed by caller exfat_d_revalidate(): use stable parent inode passed by caller fscrypt_d_revalidate(): use stable parent inode passed by caller ceph_d_revalidate(): propagate stable name down into request encoding ceph_d_revalidate(): use stable parent inode passed by caller afs_d_revalidate(): use stable name and parent inode passed by caller Pass parent directory inode and expected name to ->d_revalidate() generic_ci_d_compare(): use shortname_storage ext4 fast_commit: make use of name_snapshot primitives dissolve external_name.u into separate members make take_dentry_name_snapshot() lockless dcache: back inline names with a struct-wrapped array of unsigned long make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-27afs_d_revalidate(): use stable name and parent inode passed by callerAl Viro
No need to bother with boilerplate for obtaining the latter and for the former we really should not count upon ->d_name.name remaining stable under us. Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27Pass parent directory inode and expected name to ->d_revalidate()Al Viro
->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-12-20afs: Locally initialise the contents of a new symlink on creationDavid Howells
Since we know what the contents of a symlink will be when we create it on the server, initialise its contents locally too to avoid the need to download it. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-31-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Use the contained hashtable to search a directoryDavid Howells
Each directory image contains a hashtable with 128 buckets to speed up searching. Currently, kafs does not use this, but rather iterates over all the occupied slots in the image as it can share this with readdir. Switch kafs to use the hashtable for lookups to reduce the latency. Care must be taken that the hash chains are acyclic. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-30-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Make afs_mkdir() locally initialise a new directory's contentDavid Howells
Initialise a new directory's content when it is created by mkdir locally rather than downloading the content from the server as we can predict what it's going to look like. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-29-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Change the read result collector to only use one work itemDavid Howells
Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that walks along the subrequest queue as subrequests make progress or complete, unlocking folios progressively rather than doing the unlock in parallel as parallel requests come in. The code is remodelled to be more like the write-side code, though only using a single stream. This makes it more directly comparable and thus easier to duplicate fixes between the two sides. This has a number of advantages: (1) It's simpler. There doesn't need to be a complex donation mechanism to handle mismatches between the size and alignment of subrequests and folios. The collector unlocks folios as the subrequests covering each complete. (2) It should cause less scheduler overhead as there's a single work item in play unlocking pages in parallel when a read gets split up into a lot of subrequests instead of one per subrequest. Whilst the parallellism is nice in theory, in practice, the vast majority of loads are sequential reads of the whole file, so committing a bunch of threads to unlocking folios out of order doesn't help in those cases. (3) It should make it easier to implement content decryption. A folio cannot be decrypted until all the requests that contribute to it have completed - and, again, most loads are sequential and so, most of the time, we want to begin decryption sequentially (though it's great if the decryption can happen in parallel). There is a disadvantage in that we're losing the ability to decrypt and unlock things on an as-things-arrive basis which may affect some applications. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-28-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Use netfslib for directoriesDavid Howells
In the AFS ecosystem, directories are just a special type of file that is downloaded and parsed locally. Download is done by the same mechanism as ordinary files and the data can be cached. There is one important semantic restriction on directories over files: the client must download the entire directory in one go because, for example, the server could fabricate the contents of the blob on the fly with each download and give a different image each time. So that we can cache the directory download, switch AFS directory support over to using the netfslib single-object API, thereby allowing directory content to be stored in the local cache. To make this work, the following changes are made: (1) A directory's contents are now stored in a folio_queue chain attached to the afs_vnode (inode) struct rather than its associated pagecache, though multipage folios are still used to hold the data. The folio queue is discarded when the directory inode is evicted. This also helps with the phasing out of ITER_XARRAY. (2) Various directory operations are made to use and unuse the cache cookie. (3) The content checking, content dumping and content iteration are now performed with a standard iov_iter iterator over the contents of the folio queue. (4) Iteration and modification must be done with the vnode's validate_lock held. In conjunction with (1), this means that the iteration can be done without the need to lock pages or take extra refs on them, unlike when accessing ->i_pages. (5) Convert to using netfs_read_single() to read data. (6) Provide a ->writepages() to call netfs_writeback_single() to save the data to the cache according to the VM's scheduling whilst holding the validate_lock read-locked as (4). (7) Change local directory image editing functions: (a) Provide a function to get a specific block by number from the folio_queue as we can no longer use the i_pages xarray to locate folios by index. This uses a cursor to remember the current position as we need to iterate through the directory contents. The block is kmapped before being returned. (b) Make the function in (a) extend the directory by an extra folio if we run out of space. (c) Raise the check of the block free space counter, for those blocks that have one, higher in the function to eliminate a call to get a block. (d) Remove the page unlocking and putting done during the editing loops. This is no longer necessary as the folio_queue holds the references and the pages are no longer in the pagecache. (e) Mark the inode dirty and pin the cache usage till writeback at the end of a successful edit. (8) Don't set the large_folios flag on the inode as we do the allocation ourselves rather than the VM doing it automatically. (9) Mark the inode as being a single object that isn't uploaded to the server. (10) Enable caching on directories. (11) Only set the upload key for writeback for regular files. Notes: (*) We keep the ->release_folio(), ->invalidate_folio() and ->migrate_folio() ops as we set the mapping pointer on the folio. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-22-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Add more tracepoints to do with tracking validityDavid Howells
Add wrappers to set and clear the callback promise and to mark a directory as invalidated, and add tracepoints to track these events: (1) afs_cb_promise: Log when a callback promise is set on a vnode. (2) afs_vnode_invalid: Log when the server's callback promise for a vnode is no longer valid and we need to refetch the vnode metadata. (3) afs_dir_invalid: Log when the contents of a directory are marked invalid and requiring refetching from the server and the cache invalidating. and two tracepoints to record data version number management: (4) afs_set_dv: Log when the DV is recorded on a vnode. (5) afs_dv_mismatch: Log when the DV recorded on a vnode plus the expected delta for the operation does not match the DV we got back from the server. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-18-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTYDavid Howells
AFS servers pass back a code indicating EEXIST when they're asked to remove a directory that is not empty rather than ENOTEMPTY because not all the systems that an AFS server can run on have the latter error available and AFS preexisted the addition of that error in general. Fix afs_rmdir() to translate EEXIST to ENOTEMPTY. Fixes: 260a980317da ("[AFS]: Add "directory write" support.") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-13-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24afs: Fix missing subdir edit when renamed between parent dirsDavid Howells
When rename moves an AFS subdirectory between parent directories, the subdir also needs a bit of editing: the ".." entry needs updating to point to the new parent (though I don't make use of the info) and the DV needs incrementing by 1 to reflect the change of content. The server also sends a callback break notification on the subdirectory if we have one, but we can take care of recovering the promise next time we access the subdir. This can be triggered by something like: mount -t afs %example.com:xfstest.test20 /xfstest.test/ mkdir /xfstest.test/{aaa,bbb,aaa/ccc} touch /xfstest.test/bbb/ccc/d mv /xfstest.test/{aaa/ccc,bbb/ccc} touch /xfstest.test/bbb/ccc/e When the pathwalk for the second touch hits "ccc", kafs spots that the DV is incorrect and downloads it again (so the fix is not critical). Fix this, if the rename target is a directory and the old and new parents are different, by: (1) Incrementing the DV number of the target locally. (2) Editing the ".." entry in the target to refer to its new parent's vnode ID and uniquifier. Link: https://lore.kernel.org/r/3340431.1729680010@warthog.procyon.org.uk Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") cc: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-03afs: drop usage of folio_file_posKairui Song
folio_file_pos is only needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use folio_pos instead. It can't be a swap cache page here. Swap mapping may only call into fs through swap_rw and that is not supported for afs. So just drop it and use folio_pos instead. Link: https://lkml.kernel.org/r/20240521175854.96038-6-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: David Howells <dhowells@redhat.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-14afs: Revert "afs: Hide silly-rename files from userspace"David Howells
This reverts commit 57e9d49c54528c49b8bffe6d99d782ea051ea534. This undoes the hiding of .__afsXXXX silly-rename files. The problem with hiding them is that rm can't then manually delete them. This also reverts commit 5f7a07646655fb4108da527565dcdc80124b14c4 ("afs: Fix endless loop in directory parsing") as that's a bugfix for the above. Fixes: 57e9d49c5452 ("afs: Hide silly-rename files from userspace") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008102.html Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/3085695.1710328121@warthog.procyon.org.uk Reviewed-by: Jeffrey E Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-27afs: Fix endless loop in directory parsingDavid Howells
If a directory has a block with only ".__afsXXXX" files in it (from uncompleted silly-rename), these .__afsXXXX files are skipped but without advancing the file position in the dir_context. This leads to afs_dir_iterate() repeating the block again and again. Fix this by making the code that skips the .__afsXXXX file also manually advance the file position. The symptoms are a soft lookup: watchdog: BUG: soft lockup - CPU#3 stuck for 52s! [check:5737] ... RIP: 0010:afs_dir_iterate_block+0x39/0x1fd ... ? watchdog_timer_fn+0x1a6/0x213 ... ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? afs_dir_iterate_block+0x39/0x1fd afs_dir_iterate+0x10a/0x148 afs_readdir+0x30/0x4a iterate_dir+0x93/0xd3 __do_sys_getdents64+0x6b/0xd4 This is almost certainly the actual fix for: https://bugzilla.kernel.org/show_bug.cgi?id=218496 Fixes: 57e9d49c5452 ("afs: Hide silly-rename files from userspace") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/786185.1708694102@warthog.procyon.org.uk Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-22afs: Fix error handling with lookup via FS.InlineBulkStatusDavid Howells
When afs does a lookup, it tries to use FS.InlineBulkStatus to preemptively look up a bunch of files in the parent directory and cache this locally, on the basis that we might want to look at them too (for example if someone does an ls on a directory, they may want want to then stat every file listed). FS.InlineBulkStatus can be considered a compound op with the normal abort code applying to the compound as a whole. Each status fetch within the compound is then given its own individual abort code - but assuming no error that prevents the bulk fetch from returning the compound result will be 0, even if all the constituent status fetches failed. At the conclusion of afs_do_lookup(), we should use the abort code from the appropriate status to determine the error to return, if any - but instead it is assumed that we were successful if the op as a whole succeeded and we return an incompletely initialised inode, resulting in ENOENT, no matter the actual reason. In the particular instance reported, a vnode with no permission granted to be accessed is being given a UAEACCES abort code which should be reported as EACCES, but is instead being reported as ENOENT. Fix this by abandoning the inode (which will be cleaned up with the op) if file[1] has an abort code indicated and turn that abort code into an error instead. Whilst we're at it, add a tracepoint so that the abort codes of the individual subrequests of FS.InlineBulkStatus can be logged. At the moment only the container abort code can be 0. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-22afs: Hide silly-rename files from userspaceDavid Howells
There appears to be a race between silly-rename files being created/removed and various userspace tools iterating over the contents of a directory, leading to such errors as: find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory tar: ./include/linux/greybus/.__afs3C95: File removed before we read it when building a kernel. Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files to userspace. This doesn't stop them being looked up directly by name as we need to be able to look them up from within the kernel as part of the silly-rename algorithm. Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-22afs: Don't use certain unnecessary folio_*() functionsDavid Howells
Filesystems should use folio->index and folio->mapping, instead of folio_index(folio), folio_mapping() and folio_file_mapping() since they know that it's in the pagecache. Change this automagically with: perl -p -i -e 's/folio_mapping[(]([^)]*)[)]/\1->mapping/g' fs/afs/*.c perl -p -i -e 's/folio_file_mapping[(]([^)]*)[)]/\1->mapping/g' fs/afs/*.c perl -p -i -e 's/folio_index[(]([^)]*)[)]/\1->index/g' fs/afs/*.c Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
2024-01-01afs: Overhaul invalidation handling to better support RO volumesDavid Howells
Overhaul the third party-induced invalidation handling, making use of the previously added volume-level event counters (cb_scrub and cb_ro_snapshot) that are now being parsed out of the VolSync record returned by the fileserver in many of its replies. This allows better handling of RO (and Backup) volumes. Since these are snapshot of a RW volume that are updated atomically simultantanously across all servers that host them, they only require a single callback promise for the entire volume. The currently upstream code assumes that RO volumes operate in the same manner as RW volumes, and that each file has its own individual callback - which means that it does a status fetch for *every* file in a RO volume, whether or not the volume got "released" (volume callback breaks can occur for other reasons too, such as the volumeserver taking ownership of a volume from a fileserver). To this end, make the following changes: (1) Change the meaning of the volume's cb_v_break counter so that it is now a hint that we need to issue a status fetch to work out the state of a volume. cb_v_break is incremented by volume break callbacks and by server initialisation callbacks. (2) Add a second counter, cb_v_check, to the afs_volume struct such that if this differs from cb_v_break, we need to do a check. When the check is complete, cb_v_check is advanced to what cb_v_break was at the start of the status fetch. (3) Move the list of mmap'd vnodes to the volume and trigger removal of PTEs that map to files on a volume break rather than on a server break. (4) When a server reinitialisation callback comes in, use the server-to-volume reverse mapping added in a preceding patch to iterate over all the volumes using that server and clear the volume callback promises for that server and the general volume promise as a whole to trigger reanalysis. (5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE (TIME64_MIN) value in the cb_expires_at field, reducing the number of checks we need to make. (6) Change afs_check_validity() to quickly see if various event counters have been incremented or if the vnode or volume callback promise is due to expire/has expired without making any changes to the state. That is now left to afs_validate() as this may get more complicated in future as we may have to examine server records too. (7) Overhaul afs_validate() so that it does a single status fetch if we need to check the state of either the vnode or the volume - and do so under appropriate locking. The function does the following steps: (A) If the vnode/volume is no longer seen as valid, then we take the vnode validation lock and, if the volume promise has expired, the volume check lock also. The latter prevents redundant checks being made to find out if a new version of the volume got released. (B) If a previous RPC call found that the volsync changed unexpectedly or that a RO volume was updated, then we unmap all PTEs pointing to the file to stop mmap being used for access. (C) If the vnode is still seen to be of uncertain validity, then we perform an FS.FetchStatus RPC op to jointly update the volume status and the vnode status. This assessment is done as part of parsing the reply: If the RO volume creation timestamp advances, cb_ro_snapshot is incremented; if either the creation or update timestamps changes in an unexpected way, the cb_scrub counter is incremented If the Data Version returned doesn't match the copy we have locally, then we ask for the pagecache to be zapped. This takes care of handling RO update. (D) If cb_scrub differs between volume and vnode, the vnode's pagecache is zapped and the vnode's cb_scrub is updated unless the file is marked as having been deleted. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Fix comment in afs_do_lookup()David Howells
Fix the comment in afs_do_lookup() that says that slot 0 is used for the fid being looked up and slot 1 is used for the directory. It's actually done the other way round. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-12-24afs: Simplify error handlingDavid Howells
Simplify error handling a bit by moving it from the afs_addr_cursor struct to the afs_operation and afs_vl_cursor structs and using the error prioritisation function for accumulating errors from multiple sources (AFS tries to rotate between multiple fileservers, some of which may be inaccessible or in some state of offlinedness). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-12-24afs: Wrap most op->error accesses with inline funcsDavid Howells
Wrap most op->error accesses with inline funcs which will make it easier for a subsequent patch to replace op->error with something else. Two functions are added to this end: (1) afs_op_error() - Get the error code. (2) afs_op_set_error() - Set the error code. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-06-07afs: Fix setting of mtime when creating a file/dir/symlinkDavid Howells
kafs incorrectly passes a zero mtime (ie. 1st Jan 1970) to the server when creating a file, dir or symlink because the mtime recorded in the afs_operation struct gets passed to the server by the marshalling routines, but the afs_mkdir(), afs_create() and afs_symlink() functions don't set it. This gets masked if a file or directory is subsequently modified. Fix this by filling in op->mtime before calling the create op. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-02afs: Avoid endless loop if file is larger than expectedMarc Dionne
afs_read_dir fetches an amount of data that's based on what the inode size is thought to be. If the file on the server is larger than what was fetched, the code rechecks i_size and retries. If the local i_size was not properly updated, this can lead to an endless loop of fetching i_size from the server and noticing each time that the size is larger on the server. If it is known that the remote size is larger than i_size, bump up the fetch size to that size. Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org
2023-04-05mm: return an ERR_PTR from __filemap_get_folioChristoph Hellwig
Instead of returning NULL for all errors, distinguish between: - no entry found and not asked to allocated (-ENOENT) - failed to allocate memory (-ENOMEM) - would block (-EAGAIN) so that callers don't have to guess the error based on the passed in flags. Also pass through the error through the direct callers: filemap_get_folio, filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio. [hch@lst.de: fix null-pointer deref] Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004 Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2] Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->symlink() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->create() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-12-22afs: Stop implementing ->writepage()David Howells
We're trying to get rid of the ->writepage() hook[1]. Stop afs from using it by unlocking the page and calling afs_writepages_region() rather than folio_write_one(). A flag is passed to afs_writepages_region() to indicate that it should only write a single region so that we don't flush the entire file in ->write_begin(), but do add other dirty data to the region being written to try and reduce the number of RPC ops. This requires ->migrate_folio() to be implemented, so point that at filemap_migrate_folio() for files and also for symlinks and directories. This can be tested by turning on the afs_folio_dirty tracepoint and then doing something like: xfs_io -c "w 2223 7000" -c "w 15000 22222" -c "w 23 7" /afs/my/test/foo and then looking in the trace to see if the write at position 15000 gets stored before page 0 gets dirtied for the write at position 23. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Christoph Hellwig <hch@lst.de> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20221113162902.883850-1-hch@lst.de/ [1] Link: https://lore.kernel.org/r/166876785552.222254.4403222906022558715.stgit@warthog.procyon.org.uk/ # v1
2022-11-25use less confusing names for iov_iter direction initializersAl Viro
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-17Change calling conventions for filldir_tAl Viro
filldir_t instances (directory iterators callbacks) used to return 0 for "OK, keep going" or -E... for "stop". Note that it's *NOT* how the error values are reported - the rules for those are callback-dependent and ->iterate{,_shared}() instances only care about zero vs. non-zero (look at emit_dir() and friends). So let's just return bool ("should we keep going?") - it's less confusing that way. The choice between "true means keep going" and "true means stop" is bikesheddable; we have two groups of callbacks - do something for everything in directory, until we run into problem and find an entry in directory and do something to it. The former tended to use 0/-E... conventions - -E<something> on failure. The latter tended to use 0/1, 1 being "stop, we are done". The callers treated anything non-zero as "stop", ignoring which non-zero value did they get. "true means stop" would be more natural for the second group; "true means keep going" - for the first one. I tried both variants and the things like if allocation failed something = -ENOMEM; return true; just looked unnatural and asking for trouble. [folded suggestion from Matthew Wilcox <willy@infradead.org>] Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-06-09netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_contextDavid Howells
While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> cc: Jonathan Corbet <corbet@lwn.net> cc: Eric Van Hensbergen <ericvh@gmail.com> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <smfrench@gmail.com> cc: William Kucharski <william.kucharski@oracle.com> cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> cc: Dave Chinner <david@fromorbit.com> cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-01afs: Fix infinite loop found by xfstest generic/676David Howells
In AFS, a directory is handled as a file that the client downloads and parses locally for the purposes of performing lookup and getdents operations. The in-kernel afs filesystem has a number of functions that do this. A directory file is arranged as a series of 2K blocks divided into 32-byte slots, where a directory entry occupies one or more slots, plus each block starts with one or more metadata blocks. When parsing a block, if the last slots are occupied by a dirent that occupies more than a single slot and the file position points at a slot that's not the initial one, the logic in afs_dir_iterate_block() that skips over it won't advance the file pointer to the end of it. This will cause an infinite loop in getdents() as it will keep retrying that block and failing to advance beyond the final entry. Fix this by advancing the file pointer if the next entry will be beyond it when we skip a block. This was found by the generic/676 xfstest but can also be triggered with something like: ~/xfstests-dev/src/t_readdir_3 /xfstest.test/z 4000 1 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lore.kernel.org/r/165391973497.110268.2939296942213894166.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-09afs: Convert to release_folioMatthew Wilcox (Oracle)
A straightforward conversion as they already work in terms of folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-03-15afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio()Matthew Wilcox (Oracle)
This is a trivial change. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2022-03-15afs: Convert directory aops to invalidate_folioMatthew Wilcox (Oracle)
Use folio->index instead of folio_index() because there's no way we're writing a page from the swapcache to a directory. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
2021-11-10afs: Use folios in directory handlingDavid Howells
Convert the AFS directory handling code to use folios. With these changes, afs passes -g quick xfstests. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing@auristor.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/162877312172.3085614.992850861791211206.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162981154845.1901565.2078707403143240098.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005746215.2472992.8321380998443828308.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584190457.4023316.10544419117563104940.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/CAH2r5mtECQA6K_OGgU=_G8qLY3G-6-jo1odVyF9EK+O2-EWLFg@mail.gmail.com/ # v3 Link: https://lore.kernel.org/r/163649330345.309189.11182522282723655658.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657854055.834781.5800946340537517009.stgit@warthog.procyon.org.uk/ # v5
2021-09-13afs: Fix incorrect triggering of sillyrename on 3rd-party invalidationDavid Howells
The AFS filesystem is currently triggering the silly-rename cleanup from afs_d_revalidate() when it sees that a dentry has been changed by a third party[1]. It should not be doing this as the cleanup includes deleting the silly-rename target file on iput. Fix this by removing the places in the d_revalidate handling that validate anything other than the directory and the dirent. It probably should not be looking to validate the target inode of the dentry also. This includes removing the point in afs_d_revalidate() where the inode that a dentry used to point to was marked as being deleted (AFS_VNODE_DELETED). We don't know it got deleted. It could have been renamed or it could have hard links remaining. This was reproduced by cloning a git repo onto an afs volume on one machine, switching to another machine and doing "git status", then switching back to the first and doing "git status". The second status would show weird output due to ".git/index" getting deleted by the above mentioned mechanism. A simpler way to do it is to do: machine 1: touch a machine 2: touch b; mv -f b a machine 1: stat a on an afs volume. The bug shows up as the stat failing with ENOENT and the file server log showing that machine 1 deleted "a". Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename") Reported-by: Markus Suvanto <markus.suvanto@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=214217#c4 [1] Link: https://lore.kernel.org/r/163111668100.283156.3851669884664475428.stgit@warthog.procyon.org.uk/
2021-09-13afs: Add missing vnode validation checksDavid Howells
afs_d_revalidate() should only be validating the directory entry it is given and the directory to which that belongs; it shouldn't be validating the inode/vnode to which that dentry points. Besides, validation need to be done even if we don't call afs_d_revalidate() - which might be the case if we're starting from a file descriptor. In order for afs_d_revalidate() to be fixed, validation points must be added in some other places. Certain directory operations, such as afs_unlink(), already check this, but not all and not all file operations either. Note that the validation of a vnode not only checks to see if the attributes we have are correct, but also gets a promise from the server to notify us if that file gets changed by a third party. Add the following checks: - Check the vnode we're going to make a hard link to. - Check the vnode we're going to move/rename. - Check the vnode we're going to read from. - Check the vnode we're going to write to. - Check the vnode we're going to sync. - Check the vnode we're going to make a mapped page writable for. Some of these aren't strictly necessary as we're going to perform a server operation that might get the attributes anyway from which we can determine if something changed - though it might not get us a callback promise. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/163111667354.283156.12720698333342917516.stgit@warthog.procyon.org.uk/
2021-07-21afs: Remove redundant assignment to retJiapeng Chong
Variable ret is set to -ENOENT and -ENOMEM but this value is never read as it is overwritten or not used later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: fs/afs/dir.c:2014:4: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]. fs/afs/dir.c:659:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores]. [DH made the following modifications: - In afs_rename(), -ENOMEM should be placed in op->error instead of ret, rather than the assignment being removed entirely. afs_put_operation() will pick it up from there and return it. - If afs_sillyrename() fails, its error code should be placed in op->error rather than in ret also. ] Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/1619691492-83866-1-git-send-email-jiapeng.chong@linux.alibaba.com Link: https://lore.kernel.org/r/162609465444.3133237.7562832521724298900.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162610729052.3408253.17364333638838151299.stgit@warthog.procyon.org.uk/ # v2
2021-05-27afs: Fix the nlink handling of dir-over-dir renameDavid Howells
Fix rename of one directory over another such that the nlink on the deleted directory is cleared to 0 rather than being decremented to 1. This was causing the generic/035 xfstest to fail. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/162194384460.3999479.7605572278074191079.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-01afs: Fix speculative status fetchesDavid Howells
The generic/464 xfstest causes kAFS to emit occasional warnings of the form: kAFS: vnode modified {100055:8a} 30->31 YFS.StoreData64 (c=6015) This indicates that the data version received back from the server did not match the expected value (the DV should be incremented monotonically for each individual modification op committed to a vnode). What is happening is that a lookup call is doing a bulk status fetch speculatively on a bunch of vnodes in a directory besides getting the status of the vnode it's actually interested in. This is racing with a StoreData operation (though it could also occur with, say, a MakeDir op). On the client, a modification operation locks the vnode, but the bulk status fetch only locks the parent directory, so no ordering is imposed there (thereby avoiding an avenue to deadlock). On the server, the StoreData op handler doesn't lock the vnode until it's received all the request data, and downgrades the lock after committing the data until it has finished sending change notifications to other clients - which allows the status fetch to occur before it has finished. This means that: - a status fetch can access the target vnode either side of the exclusive section of the modification - the status fetch could start before the modification, yet finish after, and vice-versa. - the status fetch and the modification RPCs can complete in either order. - the status fetch can return either the before or the after DV from the modification. - the status fetch might regress the locally cached DV. Some of these are handled by the previous fix[1], but that's not sufficient because it checks the DV it received against the DV it cached at the start of the op, but the DV might've been updated in the meantime by a locally generated modification op. Fix this by the following means: (1) Keep track of when we're performing a modification operation on a vnode. This is done by marking vnode parameters with a 'modification' note that causes the AFS_VNODE_MODIFYING flag to be set on the vnode for the duration. (2) Alter the speculation race detection to ignore speculative status fetches if either the vnode is marked as being modified or the data version number is not what we expected. Note that whilst the "vnode modified" warning does get recovered from as it causes the client to refetch the status at the next opportunity, it will also invalidate the pagecache, so changes might get lost. Fixes: a9e5c87ca744 ("afs: Fix speculative status fetch going out of order wrt to modifications") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-and-reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/160605082531.252452.14708077925602709042.stgit@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/linux-fsdevel/161961335926.39335.2552653972195467566.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-23afs: Prepare for use of THPsDavid Howells
As a prelude to supporting transparent huge pages, use thp_size() and similar rather than PAGE_SIZE/SHIFT. Further, try and frame everything in terms of file positions and lengths rather than page indices and numbers of pages. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588540227.3465195.4752143929716269062.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118155821.1232039.540445038028845740.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161051439.2537118.15577827510426326534.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340415869.1303470.6040191748634322355.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539559365.286939.18344613540296085269.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653815142.2770958.454490670311230206.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789098713.6155.16394227991842480300.stgit@warthog.procyon.org.uk/ # v6
2021-04-23afs: Set up the iov_iter before calling afs_extract_data()David Howells
afs_extract_data() sets up a temporary iov_iter and passes it to AF_RXRPC each time it is called to describe the remaining buffer to be filled. Instead: (1) Put an iterator in the afs_call struct. (2) Set the iterator for each marshalling stage to load data into the appropriate places. A number of convenience functions are provided to this end (eg. afs_extract_to_buf()). This iterator is then passed to afs_extract_data(). (3) Use the new ITER_XARRAY iterator when reading data to load directly into the inode's pages without needing to create a list of them. This will allow O_DIRECT calls to be supported in future patches. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/152898380012.11616.12094591785228251717.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/153685394431.14766.3178466345696987059.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/153999787395.866.11218209749223643998.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/154033911195.12041.3882700371848894587.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/158861250059.340223.1248231474865140653.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465827399.1377938.11181327349704960046.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588533776.3465195.3612752083351956948.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118151238.1232039.17015723405750601161.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161047240.2537118.14721975104810564022.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340410333.1303470.16260122230371140878.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539554187.286939.15305559004905459852.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653810525.2770958.4630666029125411789.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789093719.6155.7877160739235087723.stgit@warthog.procyon.org.uk/ # v6
2021-04-23afs: Move key to afs_read structDavid Howells
Stash the key used to authenticate read operations in the afs_read struct. This will be necessary to reissue the operation against the server if a read from the cache fails in upcoming cache changes. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/158861248336.340223.1851189950710196001.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465823899.1377938.11925978022348532049.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588529557.3465195.7303323479305254243.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118147693.1232039.13780672951838643842.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161043340.2537118.511899217704140722.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340406678.1303470.12676824086429446370.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539550819.286939.1268332875889175195.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653806683.2770958.11300984379283401542.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789089556.6155.14603302893431820997.stgit@warthog.procyon.org.uk/ # v6
2021-03-15afs: Stop listxattr() from listing "afs.*" attributesDavid Howells
afs_listxattr() lists all the available special afs xattrs (i.e. those in the "afs.*" space), no matter what type of server we're dealing with. But OpenAFS servers, for example, cannot deal with some of the extra-capable attributes that AuriStor (YFS) servers provide. Unfortunately, the presence of the afs.yfs.* attributes causes errors[1] for anything that tries to read them if the server is of the wrong type. Fix the problem by removing afs_listxattr() so that none of the special xattrs are listed (AFS doesn't support xattrs). It does mean, however, that getfattr won't list them, though they can still be accessed with getxattr() and setxattr(). This can be tested with something like: getfattr -d -m ".*" /afs/example.com/path/to/file With this change, none of the afs.* attributes should be visible. Changes: ver #2: - Hide all of the afs.* xattrs, not just the ACL ones. Fixes: ae46578b963f ("afs: Get YFS ACLs and information through xattrs") Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003502.html [1] Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003567.html # v1 Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003573.html # v2