summaryrefslogtreecommitdiff
path: root/include/trace/events/afs.h
AgeCommit message (Collapse)Author
2025-04-14rxrpc: Display security params in the afs_cb_call tracepointDavid Howells
Make the afs_cb_call tracepoint display some security parameters to make debugging easier. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250411095303.2316168-12-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-10afs: Simplify cell record handlingDavid Howells
Simplify afs_cell record handling to avoid very occasional races that cause module removal to hang (it waits for all cell records to be removed). There are two things that particularly contribute to the difficulty: firstly, the code tries to pass a ref on the cell to the cell's maintenance work item (which gets awkward if the work item is already queued); and, secondly, there's an overall cell manager that tries to use just one timer for the entire cell collection (to avoid having loads of timers). However, both of these are probably unnecessarily restrictive. To simplify this, the following changes are made: (1) The cell record collection manager is removed. Each cell record manages itself individually. (2) Each afs_cell is given a second work item (cell->destroyer) that is queued when its refcount reaches zero. This is not done in the context of the putting thread as it might be in an inconvenient place to sleep. (3) Each afs_cell is given its own timer. The timer is used to expire the cell record after a period of unuse if not otherwise pinned and can also be used for other maintenance tasks if necessary (of which there are currently none as DNS refresh is triggered by filesystem operations). (4) The afs_cell manager work item (cell->manager) is no longer given a ref on the cell when queued; rather, the manager must be deleted. This does away with the need to deal with the consequences of losing a race to queue cell->manager. Clean up of extra queuing is deferred to the destroyer. (5) The cell destroyer work item makes sure the cell timer is removed and that the normal cell work is cancelled before farming the actual destruction off to RCU. (6) When a network namespace is destroyed or the kafs module is unloaded, it's now a simple matter of marking the namespace as dead then just waking up all the cell work items. They will then remove and destroy themselves once all remaining activity counts and/or a ref counts are dropped. This makes sure that all server records are dropped first. (7) The cell record state set is reduced to just four states: SETTING_UP, ACTIVE, REMOVING and DEAD. The record persists in the active state even when it's not being used until the time comes to remove it rather than downgrading it to an inactive state from whence it can be restored. This means that the cell still appears in /proc and /afs when not in use until it switches to the REMOVING state - at which point it is removed. Note that the REMOVING state is included so that someone wanting to resurrect the cell record is forced to wait whilst the cell is torn down in that state. Once it's in the DEAD state, it has been removed from net->cells tree and is no longer findable and can be replaced. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-16-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-12-dhowells@redhat.com/ # v4
2025-03-10afs: Fix afs_server ref accountingDavid Howells
The current way that afs_server refs are accounted and cleaned up sometimes cause rmmod to hang when it is waiting for cell records to be removed. The problem is that the cell cleanup might occasionally happen before the server cleanup and then there's nothing that causes the cell to garbage-collect the remaining servers as they become inactive. Partially fix this by: (1) Give each afs_server record its own management timer that rather than relying on the cell manager's central timer to drive each individual cell's maintenance work item to garbage collect servers. This timer is set when afs_unuse_server() reduces a server's activity count to zero and will schedule the server's destroyer work item upon firing. (2) Give each afs_server record its own destroyer work item that removes the record from the cell's database, shuts down the timer, cancels any pending work for itself, sends an RPC to the server to cancel outstanding callbacks. This change, in combination with the timer, obviates the need to try and coordinate so closely between the cell record and a bunch of other server records to try and tear everything down in a coordinated fashion. With this, the cell record is pinned until the server RCU is complete and namespace/module removal will wait until all the cell records are removed. (3) Now that incoming calls are mapped to servers (and thus cells) using data attached to an rxrpc_peer, the UUID-to-server mapping tree is moved from the namespace to the cell (cell->fs_servers). This means there can no longer be duplicates therein - and that allows the mapping tree to be simpler as there doesn't need to be a chain of same-UUID servers that are in different cells. (4) The lock protecting the UUID mapping tree is switched to an rw_semaphore on the cell rather than a seqlock on the namespace as it's now only used during mounting in contexts in which we're allowed to sleep. (5) When it comes time for a cell that is being removed to purge its set of servers, it just needs to iterate over them and wake them up. Once a server becomes inactive, its destroyer work item will observe the state of the cell and immediately remove that record. (6) When a server record is removed, it is marked AFS_SERVER_FL_EXPIRED to prevent reattempts at removal. The record will be dispatched to RCU for destruction once its refcount reaches 0. (7) The AFS_SERVER_FL_UNCREATED/CREATING flags are used to synchronise simultaneous creation attempts. If one attempt fails, it will abandon the attempt and allow another to try again. Note that the record can't just be abandoned when dead as it's bound into a server list attached to a volume and only subject to replacement if the server list obtained for the volume from the VLDB changes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-15-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-11-dhowells@redhat.com/ # v4
2025-03-10afs: Use the per-peer app data provided by rxrpcDavid Howells
Make use of the per-peer application data that rxrpc now allows the application to store on the rxrpc_peer struct to hold a back pointer to the afs_server record that peer represents an endpoint for. Then, when a call comes in to the AFS cache manager, this can be used to map it to the correct server record rather than having to use a UUID-to-server mapping table and having to do an additional lookup. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-14-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-10-dhowells@redhat.com/ # v4
2025-03-10afs: Drop the net parameter from afs_unuse_cell()David Howells
Remove the redundant net parameter to afs_unuse_cell() as cell->net can be used instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-12-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-8-dhowells@redhat.com/ # v4
2025-03-10afs: Make afs_lookup_cell() take a trace noteDavid Howells
Pass a note to be added to the afs_cell tracepoint to afs_lookup_cell() so that different callers can be distinguished. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-11-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-7-dhowells@redhat.com/ # v4
2025-03-10afs: Improve server refcount/active count tracingDavid Howells
Improve server refcount/active count tracing to distinguish between simply getting/putting a ref and using/unusing the server record (which changes the activity count as well as the refcount). This makes it a bit easier to work out what's going on. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-10-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-6-dhowells@redhat.com/ # v4
2025-03-10afs: Improve afs_volume tracing to display a debug IDDavid Howells
Improve the tracing of afs_volume objects to include displaying a debug ID so that different instances of volumes with the same "vid" can be distinguished. Also be consistent about displaying the volume's refcount (and not the cell's). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-9-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-5-dhowells@redhat.com/ # v4
2025-03-10afs: Change dynroot to create contents on demandDavid Howells
Change the AFS dynamic root to do things differently: (1) Rather than having the creation of cell records create inodes and dentries for cell mountpoints, create them on demand during lookup. This simplifies cell management and locking as we no longer have to create these objects in advance *and* on speculative lookup by the user for a cell that isn't precreated. (2) Rather than using the libfs dentry-based readdir (the dentries now no longer exist until accessed from (1)), have readdir generate the contents by reading the list of cells. The @cell symlinks get pushed in positions 2 and 3 if rootcell has been configured. (3) Make the @cell symlink dentries persist for the life of the superblock or until reclaimed, but make cell mountpoints disappear immediately if unused. It's not perfect as someone doing an "ls -l /afs" may create a whole bunch of dentries which will be garbage collected immediately. But any dentry that gets automounted will be pinned by the mount, so it shouldn't be too bad. (4) Allocate the inode numbers for the cell mountpoints from an IDR to prevent duplicates appearing in the event it cycles round. The number allocated from the IDR is doubled to provide two inode numbers - one for the normal cell name (RO) and one for the dotted cell name (RW). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20250224234154.2014840-8-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20250310094206.801057-4-dhowells@redhat.com/ # v4
2025-02-21afs: Give an afs_server object a ref on the afs_cell object it points toDavid Howells
Give an afs_server object a ref on the afs_cell object it points to so that the cell doesn't get deleted before the server record. Whilst this is circular (cell -> vol -> server_list -> server -> cell), the ref only pins the memory, not the lifetime as that's controlled by the activity counter. When the volume's activity counter reaches 0, it detaches from the cell and discards its server list; when a cell's activity counter reaches 0, it discards its root volume. At that point, the circularity is cut. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20Merge tag 'vfs-6.14-rc1.afs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull afs updates from Christian Brauner: "Dynamic root improvements: - Create an /afs/.<cell> mountpoint to match the /afs/<cell> mountpoint when a cell is created - Add some more checks on cell names proposed by the user to prevent dodgy symlink bodies from being created. Also prevent rootcell from being altered once set to simplify the locking - Change the handling of /afs/@cell from being a dentry name substitution at lookup time to making it a symlink to the current cell name and also provide a /afs/.@cell symlink to point to the dotted cell mountpoint Fixes: - Fix the abort code check in the fallback handling for the YFS.RemoveFile2 RPC call - Use call->op->server() for oridnary filesystem RPC calls that have an operation descriptor instead of call->server()" * tag 'vfs-6.14-rc1.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix the fallback handling for the YFS.RemoveFile2 RPC call afs: Make /afs/@cell and /afs/.@cell symlinks afs: Add rootcell checks afs: Make /afs/.<cell> as well as /afs/<cell> mountpoints
2025-01-10afs: Make /afs/@cell and /afs/.@cell symlinksDavid Howells
Make /afs/@cell a symlink in the /afs dynamic root to match what other AFS clients do rather than doing a substitution in the dentry name. This has the bonus of being tab-expandable also. Further, provide a /afs/.@cell symlink to point to the dotted cell share. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250107183454.608451-4-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Add a tracepoint for afs_read_receive()David Howells
Add a tracepoint for afs_read_receive() to allow potential missed wakeups to be debugged. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-32-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Make afs_mkdir() locally initialise a new directory's contentDavid Howells
Initialise a new directory's content when it is created by mkdir locally rather than downloading the content from the server as we can predict what it's going to look like. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-29-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Fix cleanup of immediately failed async callsDavid Howells
If we manage to begin an async call, but fail to transmit any data on it due to a signal, we then abort it which causes a race between the notification of call completion from rxrpc and our attempt to cancel the notification. The notification will be necessary, however, for async FetchData to terminate the netfs subrequest. However, since we get a notification from rxrpc upon completion of a call (aborted or otherwise), we can just leave it to that. This leads to calls not getting cleaned up, but appearing in /proc/net/rxrpc/calls as being aborted with code 6. Fix this by making the "error_do_abort:" case of afs_make_call() abort the call and then abandon it to the notification handler. Fixes: 34fa47612bfe ("afs: Fix race in async call refcounting") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-25-dhowells@redhat.com cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Use netfslib for symlinks, allowing them to be cachedDavid Howells
Use netfslib to read symlinks, thereby allowing them to be cached by fscache and cachefiles. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-23-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Use netfslib for directoriesDavid Howells
In the AFS ecosystem, directories are just a special type of file that is downloaded and parsed locally. Download is done by the same mechanism as ordinary files and the data can be cached. There is one important semantic restriction on directories over files: the client must download the entire directory in one go because, for example, the server could fabricate the contents of the blob on the fly with each download and give a different image each time. So that we can cache the directory download, switch AFS directory support over to using the netfslib single-object API, thereby allowing directory content to be stored in the local cache. To make this work, the following changes are made: (1) A directory's contents are now stored in a folio_queue chain attached to the afs_vnode (inode) struct rather than its associated pagecache, though multipage folios are still used to hold the data. The folio queue is discarded when the directory inode is evicted. This also helps with the phasing out of ITER_XARRAY. (2) Various directory operations are made to use and unuse the cache cookie. (3) The content checking, content dumping and content iteration are now performed with a standard iov_iter iterator over the contents of the folio queue. (4) Iteration and modification must be done with the vnode's validate_lock held. In conjunction with (1), this means that the iteration can be done without the need to lock pages or take extra refs on them, unlike when accessing ->i_pages. (5) Convert to using netfs_read_single() to read data. (6) Provide a ->writepages() to call netfs_writeback_single() to save the data to the cache according to the VM's scheduling whilst holding the validate_lock read-locked as (4). (7) Change local directory image editing functions: (a) Provide a function to get a specific block by number from the folio_queue as we can no longer use the i_pages xarray to locate folios by index. This uses a cursor to remember the current position as we need to iterate through the directory contents. The block is kmapped before being returned. (b) Make the function in (a) extend the directory by an extra folio if we run out of space. (c) Raise the check of the block free space counter, for those blocks that have one, higher in the function to eliminate a call to get a block. (d) Remove the page unlocking and putting done during the editing loops. This is no longer necessary as the folio_queue holds the references and the pages are no longer in the pagecache. (e) Mark the inode dirty and pin the cache usage till writeback at the end of a successful edit. (8) Don't set the large_folios flag on the inode as we do the allocation ourselves rather than the VM doing it automatically. (9) Mark the inode as being a single object that isn't uploaded to the server. (10) Enable caching on directories. (11) Only set the upload key for writeback for regular files. Notes: (*) We keep the ->release_folio(), ->invalidate_folio() and ->migrate_folio() ops as we set the mapping pointer on the folio. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-22-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20afs: Add more tracepoints to do with tracking validityDavid Howells
Add wrappers to set and clear the callback promise and to mark a directory as invalidated, and add tracepoints to track these events: (1) afs_cb_promise: Log when a callback promise is set on a vnode. (2) afs_vnode_invalid: Log when the server's callback promise for a vnode is no longer valid and we need to refetch the vnode metadata. (3) afs_dir_invalid: Log when the contents of a directory are marked invalid and requiring refetching from the server and the cache invalidating. and two tracepoints to record data version number management: (4) afs_set_dv: Log when the DV is recorded on a vnode. (5) afs_dv_mismatch: Log when the DV recorded on a vnode plus the expected delta for the operation does not match the DV we got back from the server. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-18-dhowells@redhat.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24afs: Fix missing subdir edit when renamed between parent dirsDavid Howells
When rename moves an AFS subdirectory between parent directories, the subdir also needs a bit of editing: the ".." entry needs updating to point to the new parent (though I don't make use of the info) and the DV needs incrementing by 1 to reflect the change of content. The server also sends a callback break notification on the subdirectory if we have one, but we can take care of recovering the promise next time we access the subdir. This can be triggered by something like: mount -t afs %example.com:xfstest.test20 /xfstest.test/ mkdir /xfstest.test/{aaa,bbb,aaa/ccc} touch /xfstest.test/bbb/ccc/d mv /xfstest.test/{aaa/ccc,bbb/ccc} touch /xfstest.test/bbb/ccc/e When the pathwalk for the second touch hits "ccc", kafs spots that the DV is incorrect and downloads it again (so the fix is not critical). Fix this, if the rename target is a directory and the old and new parents are different, by: (1) Incrementing the DV number of the target locally. (2) Editing the ".." entry in the target to refer to its new parent's vnode ID and uniquifier. Link: https://lore.kernel.org/r/3340431.1729680010@warthog.procyon.org.uk Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") cc: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-11Merge tag 'vfs-6.9.file' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull file locking updates from Christian Brauner: "A few years ago struct file_lock_context was added to allow for separate lists to track different types of file locks instead of using a singly-linked list for all of them. Now leases no longer need to be tracked using struct file_lock. However, a lot of the infrastructure is identical for leases and locks so separating them isn't trivial. This splits a group of fields used by both file locks and leases into a new struct file_lock_core. The new core struct is embedded in struct file_lock. Coccinelle was used to convert a lot of the callers to deal with the move, with the remaining 25% or so converted by hand. Afterwards several internal functions in fs/locks.c are made to work with struct file_lock_core. Ultimately this allows to split struct file_lock into struct file_lock and struct file_lease. The file lease APIs are then converted to take struct file_lease" * tag 'vfs-6.9.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (51 commits) filelock: fix deadlock detection in POSIX locking filelock: always define for_each_file_lock() smb: remove redundant check filelock: don't do security checks on nfsd setlease calls filelock: split leases out of struct file_lock filelock: remove temporary compatibility macros smb/server: adapt to breakup of struct file_lock smb/client: adapt to breakup of struct file_lock ocfs2: adapt to breakup of struct file_lock nfsd: adapt to breakup of struct file_lock nfs: adapt to breakup of struct file_lock lockd: adapt to breakup of struct file_lock fuse: adapt to breakup of struct file_lock gfs2: adapt to breakup of struct file_lock dlm: adapt to breakup of struct file_lock ceph: adapt to breakup of struct file_lock afs: adapt to breakup of struct file_lock 9p: adapt to breakup of struct file_lock filelock: convert seqfile handling to use file_lock_core filelock: convert locks_translate_pid to take file_lock_core ...
2024-02-05afs: adapt to breakup of struct file_lockJeff Layton
Most of the existing APIs have remained the same, but subsystems that access file_lock fields directly need to reach into struct file_lock_core now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-35-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-22afs: Fix error handling with lookup via FS.InlineBulkStatusDavid Howells
When afs does a lookup, it tries to use FS.InlineBulkStatus to preemptively look up a bunch of files in the parent directory and cache this locally, on the basis that we might want to look at them too (for example if someone does an ls on a directory, they may want want to then stat every file listed). FS.InlineBulkStatus can be considered a compound op with the normal abort code applying to the compound as a whole. Each status fetch within the compound is then given its own individual abort code - but assuming no error that prevents the bulk fetch from returning the compound result will be 0, even if all the constituent status fetches failed. At the conclusion of afs_do_lookup(), we should use the abort code from the appropriate status to determine the error to return, if any - but instead it is assumed that we were successful if the op as a whole succeeded and we return an incompletely initialised inode, resulting in ENOENT, no matter the actual reason. In the particular instance reported, a vnode with no permission granted to be accessed is being given a UAEACCES abort code which should be reported as EACCES, but is instead being reported as ENOENT. Fix this by abandoning the inode (which will be cleaned up with the op) if file[1] has an abort code indicated and turn that abort code into an error instead. Whilst we're at it, add a tracepoint so that the abort codes of the individual subrequests of FS.InlineBulkStatus can be logged. At the moment only the container abort code can be 0. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-19Merge tag 'vfs-6.8.netfs' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: "This extends the netfs helper library that network filesystems can use to replace their own implementations. Both afs and 9p are ported. cifs is ready as well but the patches are way bigger and will be routed separately once this is merged. That will remove lots of code as well. The overal goal is to get high-level I/O and knowledge of the page cache and ouf of the filesystem drivers. This includes knowledge about the existence of pages and folios The pull request converts afs and 9p. This removes about 800 lines of code from afs and 300 from 9p. For 9p it is now possible to do writes in larger than a page chunks. Additionally, multipage folio support can be turned on for 9p. Separate patches exist for cifs removing another 2000+ lines. I've included detailed information in the individual pulls I took. Summary: - Add NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to prevent these from happening at the same time. - Support for direct and unbuffered I/O. - Support for write-through caching in the page cache. - O_*SYNC and RWF_*SYNC writes use write-through rather than writing to the page cache and then flushing afterwards. - Support for write-streaming. - Support for write grouping. - Skip reads for which the server could only return zeros or EOF. - The fscache module is now part of the netfs library and the corresponding maintainer entry is updated. - Some helpers from the fscache subsystem are renamed to mark them as belonging to the netfs library. - Follow-up fixes for the netfs library. - Follow-up fixes for the 9p conversion" * tag 'vfs-6.8.netfs' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (50 commits) netfs: Fix wrong #ifdef hiding wait cachefiles: Fix signed/unsigned mixup netfs: Fix the loop that unmarks folios after writing to the cache netfs: Fix interaction between write-streaming and cachefiles culling netfs: Count DIO writes netfs: Mark netfs_unbuffered_write_iter_locked() static netfs: Fix proc/fs/fscache symlink to point to "netfs" not "../netfs" netfs: Rearrange netfs_io_subrequest to put request pointer first 9p: Use length of data written to the server in preference to error 9p: Do a couple of cleanups 9p: Fix initialisation of netfs_inode for 9p cachefiles: Fix __cachefiles_prepare_write() 9p: Use netfslib read/write_iter afs: Use the netfs write helpers netfs: Export the netfs_sreq tracepoint netfs: Optimise away reads above the point at which there can be no data netfs: Implement a write-through caching option netfs: Provide a launder_folio implementation netfs: Provide a writepages implementation netfs, cachefiles: Pass upper bound length to allow expansion ...
2024-01-01afs: trace: Log afs_make_call(), including server addressDavid Howells
Add a tracepoint to log calls to afs_make_call(), including the destination server address. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Fix fileserver rotationDavid Howells
Fix the fileserver rotation so that it doesn't use RTT as the basis for deciding which server and address to use as this doesn't necessarily give a good indication of the best path. Instead, use the configurable preference list in conjunction with whatever probes have succeeded at the time of looking. To this end, make the following changes: (1) Keep an array of "server states" to track what addresses we've tried on each server and move the waitqueue entries there that we'll need for probing. (2) Each afs_server_state struct is made to pin the corresponding server's endpoint state rather than the afs_operation struct carrying a pin on the server we're currently looking at. (3) Drop the server list preference; we now always rescan the server list. (4) afs_wait_for_probes() now uses the server state list to guide it in what it waits for (and to provide the waitqueue entries) and returns an indication of whether we'd got a response, run out of responsive addresses or the endpoint state had been superseded and we need to restart the iteration. (5) Call afs_get_address_preferences*() occasionally to refresh the preference values. (6) When picking a server, scan the addresses of the servers for which we have as-yet untested communications, looking for the highest priority one and use that instead of trying all the addresses for a particular server in ascending-RTT order. (7) When a Busy or Offline state is seen across all available servers, do a short sleep. (8) If we detect that we accessed a future RO volume version whilst it is undergoing replication, reissue the op against the older version until at least half of the servers are replicated. (9) Whilst RO replication is ongoing, increase the frequency of Volume Location server checks for that volume to every ten minutes instead of hourly. Also add a tracepoint to track progress through the rotation algorithm. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Overhaul invalidation handling to better support RO volumesDavid Howells
Overhaul the third party-induced invalidation handling, making use of the previously added volume-level event counters (cb_scrub and cb_ro_snapshot) that are now being parsed out of the VolSync record returned by the fileserver in many of its replies. This allows better handling of RO (and Backup) volumes. Since these are snapshot of a RW volume that are updated atomically simultantanously across all servers that host them, they only require a single callback promise for the entire volume. The currently upstream code assumes that RO volumes operate in the same manner as RW volumes, and that each file has its own individual callback - which means that it does a status fetch for *every* file in a RO volume, whether or not the volume got "released" (volume callback breaks can occur for other reasons too, such as the volumeserver taking ownership of a volume from a fileserver). To this end, make the following changes: (1) Change the meaning of the volume's cb_v_break counter so that it is now a hint that we need to issue a status fetch to work out the state of a volume. cb_v_break is incremented by volume break callbacks and by server initialisation callbacks. (2) Add a second counter, cb_v_check, to the afs_volume struct such that if this differs from cb_v_break, we need to do a check. When the check is complete, cb_v_check is advanced to what cb_v_break was at the start of the status fetch. (3) Move the list of mmap'd vnodes to the volume and trigger removal of PTEs that map to files on a volume break rather than on a server break. (4) When a server reinitialisation callback comes in, use the server-to-volume reverse mapping added in a preceding patch to iterate over all the volumes using that server and clear the volume callback promises for that server and the general volume promise as a whole to trigger reanalysis. (5) Replace the AFS_VNODE_CB_PROMISED flag with an AFS_NO_CB_PROMISE (TIME64_MIN) value in the cb_expires_at field, reducing the number of checks we need to make. (6) Change afs_check_validity() to quickly see if various event counters have been incremented or if the vnode or volume callback promise is due to expire/has expired without making any changes to the state. That is now left to afs_validate() as this may get more complicated in future as we may have to examine server records too. (7) Overhaul afs_validate() so that it does a single status fetch if we need to check the state of either the vnode or the volume - and do so under appropriate locking. The function does the following steps: (A) If the vnode/volume is no longer seen as valid, then we take the vnode validation lock and, if the volume promise has expired, the volume check lock also. The latter prevents redundant checks being made to find out if a new version of the volume got released. (B) If a previous RPC call found that the volsync changed unexpectedly or that a RO volume was updated, then we unmap all PTEs pointing to the file to stop mmap being used for access. (C) If the vnode is still seen to be of uncertain validity, then we perform an FS.FetchStatus RPC op to jointly update the volume status and the vnode status. This assessment is done as part of parsing the reply: If the RO volume creation timestamp advances, cb_ro_snapshot is incremented; if either the creation or update timestamps changes in an unexpected way, the cb_scrub counter is incremented If the Data Version returned doesn't match the copy we have locally, then we ask for the pagecache to be zapped. This takes care of handling RO update. (D) If cb_scrub differs between volume and vnode, the vnode's pagecache is zapped and the vnode's cb_scrub is updated unless the file is marked as having been deleted. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Parse the VolSync record in the reply of a number of RPC opsDavid Howells
A number of fileserver RPC operations return a VolSync record as part of their reply that gives some information about the state of the volume being accessed, including: (1) A volume Creation timestamp. For an RW volume, this is the time at which the volume was created; if it changes, the RW volume was presumably restored from a backup and all cached data should be scrubbed as Data Version numbers could regress on the files in the volume. For an RO volume, this is the time it was last snapshotted from the RW volume. It is expected to advance each time this happens; if it regresses, cached data should be scrubbed. (2) A volume Update timestamp (Auristor only). For an RW volume, this is updated any time any change is made to a volume or its contents. If it regresses, all cached data must be scrubbed. For an RO volume, this is a copy of the RW volume's Update timestamp at the point of snapshotting. It can be used as a version number when checking to see if a callback on a RO volume was due to a snapshot. If it regresses, all cached data must be scrubbed. but this is currently not made use of by the in-kernel afs filesystem. Make the afs filesystem use this by: (1) Add an update time field to the afs_volsync struct and use a value of TIME64_MIN in both that and the creation time to indicate that they are unset. (2) Add creation and update time fields to the afs_volume struct and use this to track the two timestamps. (3) Add a volsync_lock mutex to the afs_volume struct to control modification access for when we detect a change in these values. (3) Add a 'pre-op volsync' struct to the afs_operation struct to record the state of the volume tracking before the op. (4) Add a new counter, cb_scrub, to the afs_volume struct to count events that require all data to be scrubbed. A copy is placed in the afs_vnode struct (inode) and if they no longer match, a scrub takes place. (5) When the result of an operation is being parsed, parse the VolSync data too, if it is provided. Note that the two timestamps are handled separately, since they don't work in quite the same way. - If the afs_volume tracking is unset, just set it and do nothing else. - If the result timestamps are the same as the ones in afs_volume, do nothing. - If the timestamps regress, increment cb_scrub if not already done so. - If the creation timestamp on a RW volume changes, increment cb_scrub if not already done so. - If the creation timestamp on a RO volume advances, update the server list and see if the current server has been excluded, if so reissue the op. Once over half of the replication sites have been updated, increment cb_ro_snapshot to indicate updates may be required and switch over to excluding unupdated replication sites. - If the creation timestamp on a Backup volume advances, just increment cb_ro_snapshot to trigger updates. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Apply server breaks to mmap'd files in the call processorDavid Howells
Apply server breaks to mmap'd files that are being used from that server from the call processor work function rather than punting it off to a workqueue. The work item, afs_server_init_callback(), then bumps each individual inode off to its own work item introducing a potentially lengthy delay. This reduces that delay at the cost of extending the amount of time we delay replying to the CB.InitCallBack3 notification RPC from the server. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Keep a record of the current fileserver endpoint stateDavid Howells
Keep a record of the current fileserver endpoint state, including the probe state, and replace it when a new probe is started rather than just squelching the old state and overwriting it. Clearance of the old state can cause a race if there's another thread also currently trying to communicate with that server. It appears that this race might be the culprit for some occasions where kafs complains about invalid data in the RPC reply because the rotation algorithm fell all the way through without actually issuing an RPC call and the error return got filled in from the probe state (which has a zero error recorded). Whatever happens to be in the caller's reply buffer is then taken as the response. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Dispatch vlserver probes in priority orderDavid Howells
When probing all the addresses for a volume location server, dispatch them in order of descending priority to try and get back highest priority one first. Also add a tracepoint to show the transmission and completion of the probes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2024-01-01afs: Dispatch fileserver probes in priority orderDavid Howells
When probing all the addresses for a fileserver, dispatch them in order of descending priority to try and get back highest priority one first. Also add a tracepoint to show the transmission and completion of the probes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-12-28afs: Use the netfs write helpersDavid Howells
Make afs use the netfs write helpers. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24afs: Fold the afs_addr_cursor struct inDavid Howells
Fold the afs_addr_cursor struct into the afs_operation struct and the afs_vl_cursor struct and fold its operations into their callers also. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-12-24afs: Add a tracepoint for struct afs_addr_listDavid Howells
Add a tracepoint to track the lifetime of the afs_addr_list struct. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org
2023-12-24afs: Don't use folio->private to record partial modificationDavid Howells
AFS currently uses folio->private to store the range of bytes within a folio that have been modified - the idea being that if we have, say, a 2MiB folio and someone writes a single byte, we only have to write back that single page and not the whole 2MiB folio - thereby saving on network bandwidth. Remove this, at least for now, and accept the extra network load (which doesn't matter in the common case of writing a whole file at a time from beginning to end). This makes folio->private available for netfslib to use. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24afs: Automatically generate trace tag enumsDavid Howells
Automatically generate trace tag enums from the symbol -> string mapping tables rather than having the enums as well, thereby reducing duplicated data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
2023-12-24afs: Remove whitespace before most ')' from the trace headerDavid Howells
checkpatch objects to whitespace before ')', so remove most of it from the afs trace header. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org
2022-08-02afs: Fix access after dec in put functionsDavid Howells
Reference-putting functions should not access the object being put after decrementing the refcount unless they reduce the refcount to zero. Fix a couple of instances of this in afs by copying the information to be logged by tracepoint to local variables before doing the decrement. [Fixed a bit in afs_put_server() that I'd missed but Marc caught] Fixes: 341f741f04be ("afs: Refcount the afs_call struct") Fixes: 452181936931 ("afs: Trace afs_server usage") Fixes: 977e5f8ed0ab ("afs: Split the usage count on struct afs_server") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/165911278430.3745403.16526310736054780645.stgit@warthog.procyon.org.uk/ # v1
2022-08-02afs: Use refcount_t rather than atomic_tDavid Howells
Use refcount_t rather than atomic_t in afs to make use of the count checking facilities provided. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/165911277768.3745403.423349776836296452.stgit@warthog.procyon.org.uk/ # v1
2021-11-10netfs, 9p, afs, ceph: Use foliosDavid Howells
Convert the netfs helper library to use folios throughout, convert the 9p and afs filesystems to use folios in their file I/O paths and convert the ceph filesystem to use just enough folios to compile. With these changes, afs passes -g quick xfstests. Changes ======= ver #5: - Got rid of folio_end{io,_read,_write}() and inlined the stuff it does instead (Willy decided he didn't want this after all). ver #4: - Fixed a bug in afs_redirty_page() whereby it didn't set the next page index in the loop and returned too early. - Simplified a check in v9fs_vfs_write_folio_locked()[1]. - Undid a change to afs_symlink_readpage()[1]. - Used offset_in_folio() in afs_write_end()[1]. - Changed from using page_endio() to folio_end{io,_read,_write}()[1]. ver #2: - Add 9p foliation. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dominique Martinet <asmadeus@codewreck.org> Tested-by: kafs-testing@auristor.com cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Dominique Martinet <asmadeus@codewreck.org> cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
2021-09-13afs: Try to avoid taking RCU read lock when checking vnode validityDavid Howells
Try to avoid taking the RCU read lock when checking the validity of a vnode's callback state. The only thing it's needed for is to pin the parent volume's server list whilst we search it to find the record of the server we're currently using to see if it has been reinitialised (ie. it sent us a CB.InitCallBackState* RPC). Do this by the following means: (1) Keep an additional per-cell counter (fs_s_break) that's incremented each time any of the fileservers in the cell reinitialises. Since the new counter can be accessed without RCU from the vnode, we can check that first - and only if it differs, get the RCU read lock and check the volume's server list. (2) Replace afs_get_s_break_rcu() with afs_check_server_good() which now indicates whether the callback promise is still expected to be present on the server. This does the checks as described in (1). (3) Restructure afs_check_validity() to take account of the change in (2). We can also get rid of the valid variable and just use the need_clear variable with the addition of the afs_cb_break_no_promise reason. (4) afs_check_validity() probably shouldn't be altering vnode->cb_v_break and vnode->cb_s_break when it doesn't have cb_lock exclusively locked. Move the change to vnode->cb_v_break to __afs_break_callback(). Delegate the change to vnode->cb_s_break to afs_select_fileserver() and set vnode->cb_fs_s_break there also. (5) afs_validate() no longer needs to get the RCU read lock around its call to afs_check_validity() - and can skip the call entirely if we don't have a promise. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Markus Suvanto <markus.suvanto@gmail.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/163111669583.283156.1397603105683094563.stgit@warthog.procyon.org.uk/
2021-07-21afs: Fix tracepoint string placement with built-in AFSDavid Howells
To quote Alexey[1]: I was adding custom tracepoint to the kernel, grabbed full F34 kernel .config, disabled modules and booted whole shebang as VM kernel. Then did perf record -a -e ... It crashed: general protection fault, probably for non-canonical address 0x435f5346592e4243: 0000 [#1] SMP PTI CPU: 1 PID: 842 Comm: cat Not tainted 5.12.6+ #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:t_show+0x22/0xd0 Then reproducer was narrowed to # cat /sys/kernel/tracing/printk_formats Original F34 kernel with modules didn't crash. So I started to disable options and after disabling AFS everything started working again. The root cause is that AFS was placing char arrays content into a section full of _pointers_ to strings with predictable consequences. Non canonical address 435f5346592e4243 is "CB.YFS_" which came from CM_NAME macro. Steps to reproduce: CONFIG_AFS=y CONFIG_TRACING=y # cat /sys/kernel/tracing/printk_formats Fix this by the following means: (1) Add enum->string translation tables in the event header with the AFS and YFS cache/callback manager operations listed by RPC operation ID. (2) Modify the afs_cb_call tracepoint to print the string from the translation table rather than using the string at the afs_call name pointer. (3) Switch translation table depending on the service we're being accessed as (AFS or YFS) in the tracepoint print clause. Will this cause problems to userspace utilities? Note that the symbolic representation of the YFS service ID isn't available to this header, so I've put it in as a number. I'm not sure if this is the best way to do this. (4) Remove the name wrangling (CM_NAME) macro and put the names directly into the afs_call_type structs in cmservice.c. Fixes: 8e8d7f13b6d5a9 ("afs: Add some tracepoints") Reported-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: Andrew Morton <akpm@linux-foundation.org> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLAXfvZ+rObEOdc%2F@localhost.localdomain/ [1] Link: https://lore.kernel.org/r/643721.1623754699@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/162430903582.2896199.6098150063997983353.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/162609463957.3133237.15916579353149746363.stgit@warthog.procyon.org.uk/ # v1 (repost) Link: https://lore.kernel.org/r/162610726860.3408253.445207609466288531.stgit@warthog.procyon.org.uk/ # v2
2021-04-23afs: Use ITER_XARRAY for writingDavid Howells
Use a single ITER_XARRAY iterator to describe the portion of a file to be transmitted to the server rather than generating a series of small ITER_BVEC iterators on the fly. This will make it easier to implement AIO in afs. In theory we could maybe use one giant ITER_BVEC, but that means potentially allocating a huge array of bio_vec structs (max 256 per page) when in fact the pagecache already has a structure listing all the relevant pages (radix_tree/xarray) that can be walked over. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/153685395197.14766.16289516750731233933.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/158861251312.340223.17924900795425422532.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465828607.1377938.6903132788463419368.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588535018.3465195.14509994354240338307.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118152415.1232039.6452879415814850025.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161048194.2537118.13763612220937637316.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340411602.1303470.4661108879482218408.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539555629.286939.5241869986617154517.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653811456.2770958.7017388543246759245.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789095005.6155.6789055030327407928.stgit@warthog.procyon.org.uk/ # v6
2021-04-23afs: Pass page into dirty region helpers to provide THP sizeDavid Howells
Pass a pointer to the page being accessed into the dirty region helpers so that the size of the page can be determined in case it's a transparent huge page. This also required the page to be passed into the afs_page_dirty trace point - so there's no need to specifically pass in the index or private data as these can be retrieved directly from the page struct. Signed-off-by: David Howells <dhowells@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588527183.3465195.16107942526481976308.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118144921.1232039.11377711180492625929.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161040747.2537118.11435394902674511430.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340404553.1303470.11414163641767769882.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539548385.286939.8864598314493255313.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653804285.2770958.3497360004849598038.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789087043.6155.16922142208140170528.stgit@warthog.procyon.org.uk/ # v6
2021-01-04afs: Fix directory entry size calculationDavid Howells
The number of dirent records used by an AFS directory entry should be calculated using the assumption that there is a 16-byte name field in the first block, rather than a 20-byte name field (which is actually the case). This miscalculation is historic and effectively standard, so we have to use it. The calculation we need to use is: 1 + (((strlen(name) + 1) + 15) >> 5) where we are adding one to the strlen() result to account for the NUL termination. Fix this by the following means: (1) Create an inline function to do the calculation for a given name length. (2) Use the function to calculate the number of records used for a dirent in afs_dir_iterate_block(). Use this to move the over-end check out of the loop since it only needs to be done once. Further use this to only go through the loop for the 2nd+ records composing an entry. The only test there now is for if the record is allocated - and we already checked the first block at the top of the outer loop. (3) Add a max name length check in afs_dir_iterate_block(). (4) Make afs_edit_dir_add() and afs_edit_dir_remove() use the function from (1) to calculate the number of blocks rather than doing it incorrectly themselves. Fixes: 63a4681ff39c ("afs: Locally edit directory data for mkdir/create/unlink/...") Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Marc Dionne <marc.dionne@auristor.com>
2020-10-29afs: Fix afs_invalidatepage to adjust the dirty regionDavid Howells
Fix afs_invalidatepage() to adjust the dirty region recorded in page->private when truncating a page. If the dirty region is entirely removed, then the private data is cleared and the page dirty state is cleared. Without this, if the page is truncated and then expanded again by truncate, zeros from the expanded, but no-longer dirty region may get written back to the server if the page gets laundered due to a conflicting 3rd-party write. It mustn't, however, shorten the dirty region of the page if that page is still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is stored in page->private to record this. Fixes: 4343d00872e1 ("afs: Get rid of the afs_writeback record") Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-29afs: Wrap page->private manipulations in inline functionsDavid Howells
The afs filesystem uses page->private to store the dirty range within a page such that in the event of a conflicting 3rd-party write to the server, we write back just the bits that got changed locally. However, there are a couple of problems with this: (1) I need a bit to note if the page might be mapped so that partial invalidation doesn't shrink the range. (2) There aren't necessarily sufficient bits to store the entire range of data altered (say it's a 32-bit system with 64KiB pages or transparent huge pages are in use). So wrap the accesses in inline functions so that future commits can change how this works. Also move them out of the tracing header into the in-directory header. There's not really any need for them to be in the tracing header. Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-16afs: Don't assert on unpurgeable server recordsDavid Howells
Don't give an assertion failure on unpurgeable afs_server records - which kills the thread - but rather emit a trace line when we are purging a record (which only happens during network namespace removal or rmmod) and print a notice of the problem. Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-16afs: Add tracing for cell refcount and active user countDavid Howells
Add a tracepoint to log the cell refcount and active user count and pass in a reason code through various functions that manipulate these counters. Additionally, a helper function, afs_see_cell(), is provided to log interesting places that deal with a cell without actually doing any accounting directly. Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04afs: Add a tracepoint to track the lifetime of the afs_volume structDavid Howells
Add a tracepoint to track the lifetime of the afs_volume struct. Signed-off-by: David Howells <dhowells@redhat.com>