summaryrefslogtreecommitdiff
path: root/Documentation/filesystems
AgeCommit message (Collapse)Author
2019-01-26mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smapsMichal Hocko
[ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ] Patch series "THP eligibility reporting via proc". This series of three patches aims at making THP eligibility reporting much more robust and long term sustainable. The trigger for the change is a regression report [2] and the long follow up discussion. In short the specific application didn't have good API to query whether a particular mapping can be backed by THP so it has used VMA flags to workaround that. These flags represent a deep internal state of VMAs and as such they should be used by userspace with a great deal of caution. A similar has happened for [3] when users complained that VM_MIXEDMAP is no longer set on DAX mappings. Again a lack of a proper API led to an abuse. The first patch in the series tries to emphasise that that the semantic of flags might change and any application consuming those should be really careful. The remaining two patches provide a more suitable interface to address [2] and provide a consistent API to query the THP status both for each VMA and process wide as well. [1] http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz This patch (of 3): Even though vma flags exported via /proc/<pid>/smaps are explicitly documented to be not guaranteed for future compatibility the warning doesn't go far enough because it doesn't mention semantic changes to those flags. And they are important as well because these flags are a deep implementation internal to the MM code and the semantic might change at any time. Let's consider two recent examples: http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz : commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the : mean time certain customer of ours started poking into /proc/<pid>/smaps : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA : flags, the application just fails to start complaining that DAX support is : missing in the kernel. http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com : Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") : introduced a regression in that userspace cannot always determine the set : of vmas where thp is ineligible. : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps : to determine if a vma is eligible to be backed by hugepages. : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to : be disabled and emit "nh" as a flag for the corresponding vmas as part of : /proc/pid/smaps. After the commit, thp is disabled by means of an mm : flag and "nh" is not emitted. : This causes smaps parsing libraries to assume a vma is eligible for thp : and ends up puzzling the user on why its memory is not backed by thp. In both cases userspace was relying on a semantic of a specific VMA flag. The primary reason why that happened is a lack of a proper interface. While this has been worked on and it will be fixed properly, it seems that our wording could see some refinement and be more vocal about semantic aspect of these flags as well. Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Oppenheimer <bepvte@gmail.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-04-08fs/proc: Stop trying to report thread stacksAndy Lutomirski
commit b18cb64ead400c01bf1580eeba330ace51f8087d upstream. This reverts more of: b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") ... which was partially reverted by: 65376df58217 ("proc: revert /proc/<pid>/maps [stack:TID] annotation") Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps. In current kernels, /proc/PID/maps (or /proc/TID/maps even for threads) shows "[stack]" for VMAs in the mm's stack address range. In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the target thread's stack's VMA. This is racy, probably returns garbage and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone: KSTK_ESP is not safe to use on tasks that aren't known to be running ordinary process-context kernel code. This patch removes the difference and just shows "[stack]" for VMAs in the mm's stack range. This is IMO much more sensible -- the actual "stack" address really is treated specially by the VM code, and the current thread stack isn't even well-defined for programs that frequently switch stacks on their own. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux API <linux-api@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tycho Andersen <tycho.andersen@canonical.com> Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-08proc: revert /proc/<pid>/maps [stack:TID] annotationJohannes Weiner
commit 65376df582174ffcec9e6471bf5b0dd79ba05e4a upstream. Commit b76437579d13 ("procfs: mark thread stack correctly in proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps. Finding the task of a stack VMA requires walking the entire thread list, turning this into quadratic behavior: a thousand threads means a thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a million combinations. The cost is not in proportion to the usefulness as described in the patch. Drop the [stack:TID] annotation to make /proc/<pid>/maps (and /proc/<pid>/numa_maps) usable again for higher thread counts. The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as identifying the stack VMA there is an O(1) operation. Siddesh said: "The end users needed a way to identify thread stacks programmatically and there wasn't a way to do that. I'm afraid I no longer remember (or have access to the resources that would aid my memory since I changed employers) the details of their requirement. However, I did do this on my own time because I thought it was an interesting project for me and nobody really gave any feedback then as to its utility, so as far as I am concerned you could roll back the main thread maps information since the information is available in the thread-specific files" Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-02-25ext4: correct documentation for grpid mount optionErnesto A. Fernández
commit 9f0372488cc9243018a812e8cfbf27de650b187b upstream. The grpid option is currently described as being the same as nogrpid. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12efi: Make efivarfs entries immutable by defaultPeter Jones
[ Upstream commit ed8b0de5a33d2a2557dce7f9429dca8cb5bc5879 ] "rm -rf" is bricking some peoples' laptops because of variables being used to store non-reinitializable firmware driver data that's required to POST the hardware. These are 100% bugs, and they need to be fixed, but in the mean time it shouldn't be easy to *accidentally* brick machines. We have to have delete working, and picking which variables do and don't work for deletion is quite intractable, so instead make everything immutable by default (except for a whitelist), and make tools that aren't quite so broad-spectrum unset the immutable flag. Signed-off-by: Peter Jones <pjones@redhat.com> Tested-by: Lee, Chun-Yi <jlee@suse.com> Acked-by: Matthew Garrett <mjg59@coreos.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2014-11-20ovl: rename filesystem type to "overlay"Miklos Szeredi
Some distributions carry an "old" format of overlayfs while mainline has a "new" format. The distros will possibly want to keep the old overlayfs alongside the new for compatibility reasons. To make it possible to differentiate the two versions change the name of the new one from "overlayfs" to "overlay". Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Andy Whitcroft <apw@canonical.com>
2014-10-24overlay: overlay filesystem documentationNeil Brown
Document the overlay filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-24vfs: add i_op->dentry_open()Miklos Szeredi
Add a new inode operation i_op->dentry_open(). This is for stacked filesystems that want to return a struct file from a different filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-10-16NTFS: Remove changelog from Documentation/filesystems/ntfs.txt.Anton Altaparmakov
Changelog is in git history, no need to have a copy in the documentation. Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
2014-10-14autofs: the documentation I wanted to readNeilBrown
This documents autofs from the perspective of what the module actually supports rather than how automount is expected to use it. It is formatted using "markdown" and works best with Markdown.pl (markdown_py doesn't like some constructs). [rdunlap@infradead.org: copy editing] Signed-off-by: NeilBrown <neilb@suse.de> Cc: Randy Dunlap <rdunlap@infradead.org> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-13Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The big thing in this pile is Eric's unmount-on-rmdir series; we finally have everything we need for that. The final piece of prereqs is delayed mntput() - now filesystem shutdown always happens on shallow stack. Other than that, we have several new primitives for iov_iter (Matt Wilcox, culled from his XIP-related series) pushing the conversion to ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c cleanups and fixes (including the external name refcounting, which gives consistent behaviour of d_move() wrt procfs symlinks for long and short names alike) and assorted cleanups and fixes all over the place. This is just the first pile; there's a lot of stuff from various people that ought to go in this window. Starting with unionmount/overlayfs mess... ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits) fs/file_table.c: Update alloc_file() comment vfs: Deduplicate code shared by xattr system calls operating on paths reiserfs: remove pointless forward declaration of struct nameidata don't need that forward declaration of struct nameidata in dcache.h anymore take dname_external() into fs/dcache.c let path_init() failures treated the same way as subsequent link_path_walk() fix misuses of f_count() in ppp and netlink ncpfs: use list_for_each_entry() for d_subdirs walk vfs: move getname() from callers to do_mount() gfs2_atomic_open(): skip lookups on hashed dentry [infiniband] remove pointless assignments gadgetfs: saner API for gadgetfs_create_file() f_fs: saner API for ffs_sb_create_file() jfs: don't hash direct inode [s390] remove pointless assignment of ->f_op in vmlogrdr ->open() ecryptfs: ->f_op is never NULL android: ->f_op is never NULL nouveau: __iomem misannotations missing annotation in fs/file.c fs: namespace: suppress 'may be used uninitialized' warnings ...
2014-10-11Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking related changes from Jeff Layton: "This release is a little more busy for file locking changes than the last: - a set of patches from Kinglong Mee to fix the lockowner handling in knfsd - a pile of cleanups to the internal file lease API. This should get us a bit closer to allowing for setlease methods that can block. There are some dependencies between mine and Bruce's trees this cycle, and I based my tree on top of the requisite patches in Bruce's tree" * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits) locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING locks: flock_make_lock should return a struct file_lock (or PTR_ERR) locks: set fl_owner for leases to filp instead of current->files locks: give lm_break a return value locks: __break_lease cleanup in preparation of allowing direct removal of leases locks: remove i_have_this_lease check from __break_lease locks: move freeing of leases outside of i_lock locks: move i_lock acquisition into generic_*_lease handlers locks: define a lm_setup handler for leases locks: plumb a "priv" pointer into the setlease routines nfsd: don't keep a pointer to the lease in nfs4_file locks: clean up vfs_setlease kerneldoc comments locks: generic_delete_lease doesn't need a file_lock at all nfsd: fix potential lease memory leak in nfs4_setlease locks: close potential race in lease_get_mtime security: make security_file_set_fowner, f_setown and __f_setown void return locks: consolidate "nolease" routines locks: remove lock_may_read and lock_may_write lockd: rip out deferred lock handling from testlock codepath NFSD: Get reference of lockowner when coping file_lock ...
2014-10-09vfs: fix typo in s_op->alloc_inode() documentationKirill Smelkov
The function which calls s_op->alloc_inode() is not inode_alloc(), but instead alloc_inode() which lives in fs/inode.c . The typo was there from the beginning from 5ea626aa (VFS: update documentation, 2005) - there was no standalone inode_alloc() for the whole kernel history. Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-08Merge tag 'f2fs-for-3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set introduces a couple of new features such as large sector size, FITRIM, and atomic/volatile writes. Several patches enhance power-off recovery and checkpoint routines. The fsck.f2fs starts to support fixing corrupted partitions with recovery hints provided by this patch-set. Summary: - retain some recovery information for fsck.f2fs - enhance checkpoint speed - enhance flush command management - bug fix for lseek - tune in-place-update policies - enhance roll-forward speed - revisit all the roll-forward and fsync rules - support larget sector size - support FITRIM - support atomic and volatile writes And several clean-ups and bug fixes are included" * tag 'f2fs-for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits) f2fs: support volatile operations for transient data f2fs: support atomic writes f2fs: remove unused return value f2fs: clean up f2fs_ioctl functions f2fs: potential shift wrapping buf in f2fs_trim_fs() f2fs: call f2fs_unlock_op after error was handled f2fs: check the use of macros on block counts and addresses f2fs: refactor flush_nat_entries to remove costly reorganizing ops f2fs: introduce FITRIM in f2fs_ioctl f2fs: introduce cp_control structure f2fs: use more free segments until SSR is activated f2fs: change the ipu_policy option to enable combinations f2fs: fix to search whole dirty segmap when get_victim f2fs: fix to clean previous mount option when remount_fs f2fs: skip punching hole in special condition f2fs: support large sector size f2fs: fix to truncate blocks past EOF in ->setattr f2fs: update i_size when __allocate_data_block f2fs: use MAX_BIO_BLOCKS(sbi) f2fs: remove redundant operation during roll-forward recovery ...
2014-10-07locks: move freeing of leases outside of i_lockJeff Layton
There was only one place where we still could free a file_lock while holding the i_lock -- lease_modify. Add a new list_head argument to the lm_change operation, pass in a private list when calling it, and fix those callers to dispose of the list once the lock has been dropped. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07locks: move i_lock acquisition into generic_*_lease handlersJeff Layton
Now that we have a saner internal API for managing leases, we no longer need to mandate that the inode->i_lock be held over most of the lease code. Push it down into generic_add_lease and generic_delete_lease. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-10-07locks: plumb a "priv" pointer into the setlease routinesJeff Layton
In later patches, we're going to add a new lock_manager_operation to finish setting up the lease while still holding the i_lock. To do this, we'll need to pass a little bit of info in the fcntl setlease case (primarily an fasync structure). Plumb the extra pointer into there in advance of that. We declare this pointer as a void ** to make it clear that this is private info, and that the caller isn't required to set this unless the lm_setup specifically requires it. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-09-26Documentation: update .gitignore filesPeter Foley
Add some missing files to .gitignore. Push Documentation/.gitignore down into subdirectories. Signed-off-by: Peter Foley <pefoley2@pefoley.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-09-26Documentation: add makefiles for more targetsPeter Foley
Add a bunch of previously unbuilt source files to the Documentation build machinery. Signed-off-by: Peter Foley <pefoley2@pefoley.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-09-26Documentation: use subdir-y to avoid unnecessary built-in.o filesPeter Foley
Change the Documentation makefiles from obj-m to subdir-y to avoid generating unnecessary built-in.o files since nothing in Documentation/ is ever linked in to vmlinux. Signed-off-by: Peter Foley <pefoley2@pefoley.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-09-23f2fs: change the ipu_policy option to enable combinationsJaegeuk Kim
This patch changes the ipu_policy setting to use any combination of orthogonal policies. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-16f2fs: give an option to enable in-place-updates during fsync to usersJaegeuk Kim
If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-09-07Documentation: NFS/RDMA: Document separate Kconfig symbolsPaul Bolle
The NFS/RDMA Kconfig symbol was split into separate options for client and server in commit 2e8c12e1b765 ("xprtrdma: add separate Kconfig options for NFSoRDMA client and server support"). Update the documentation to reflect this split. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-07Documentation: seq_file: Document seq_open_private(), seq_release_private()Rob Jones
Despite the fact that these functions have been around for years, they are little used (only 15 uses in 13 files at the preseht time) even though many other files use work-arounds to achieve the same result. By documenting them, hopefully they will become more widely used. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-16Merge tag 'locks-v3.17-2' of git://git.samba.org/jlayton/linuxLinus Torvalds
Pull file locking bugfixes from Jeff Layton: "Most of these patches are to fix a long-standing regression that crept in when the BKL was removed from the file-locking code. The code was converted to use a conventional spinlock, but some fl_release_private ops can block and you can end up sleeping inside the lock. There's also a patch to make /proc/locks show delegations as 'DELEG'" * tag 'locks-v3.17-2' of git://git.samba.org/jlayton/linux: locks: update Locking documentation to clarify fl_release_private behavior locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped locks: don't reuse file_lock in __posix_lock_file locks: don't call locks_release_private from locks_copy_lock locks: show delegations as "DELEG" in /proc/locks
2014-08-14locks: update Locking documentation to clarify fl_release_private behaviorJeff Layton
Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Jeff Layton <jlayton@primarydata.com>
2014-08-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Stuff in here: - acct.c fixes and general rework of mnt_pin mechanism. That allows to go for delayed-mntput stuff, which will permit mntput() on deep stack without worrying about stack overflows - fs shutdown will happen on shallow stack. IOW, we can do Eric's umount-on-rmdir series without introducing tons of stack overflows on new mntput() call chains it introduces. - Bruce's d_splice_alias() patches - more Miklos' rename() stuff. - a couple of regression fixes (stable fodder, in the end of branch) and a fix for API idiocy in iov_iter.c. There definitely will be another pile, maybe even two. I'd like to get Eric's series in this time, but even if we miss it, it'll go right in the beginning of for-next in the next cycle - the tricky part of prereqs is in this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) fix copy_tree() regression __generic_file_write_iter(): fix handling of sync error after DIO switch iov_iter_get_pages() to passing maximal number of pages fs: mark __d_obtain_alias static dcache: d_splice_alias should detect loops exportfs: update Exporting documentation dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED dcache: remove unused d_find_alias parameter dcache: d_obtain_alias callers don't all want DISCONNECTED dcache: d_splice_alias should ignore DCACHE_DISCONNECTED dcache: d_splice_alias mustn't create directory aliases dcache: close d_move race in d_splice_alias dcache: move d_splice_alias namei: trivial fix to vfs_rename_dir comment VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode. cifs: support RENAME_NOREPLACE hostfs: support rename flags shmem: support RENAME_EXCHANGE shmem: support RENAME_NOREPLACE btrfs: add RENAME_NOREPLACE ...
2014-08-09Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull CIFS updates from Steve French: "The most visible change in this set is the additional of multi-credit support for SMB2/SMB3 which dramatically improves the large file i/o performance for these dialects and significantly increases the maximum i/o size used on the wire for SMB2/SMB3. Also reconnection behavior after network failure is improved" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (35 commits) Add worker function to set allocation size [CIFS] Fix incorrect hex vs. decimal in some debug print statements update CIFS TODO list Add Pavel to contributor list in cifs AUTHORS file Update cifs version CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2 CIFS: Optimize readpages in a short read case on reconnects CIFS: Optimize cifs_user_read() in a short read case on reconnects CIFS: Improve indentation in cifs_user_read() CIFS: Fix possible buffer corruption in cifs_user_read() CIFS: Count got bytes in read_into_pages() CIFS: Use separate var for the number of bytes got in async read CIFS: Indicate reconnect with ECONNABORTED error code CIFS: Use multicredits for SMB 2.1/3 reads CIFS: Fix rsize usage for sync read CIFS: Fix rsize usage in user read CIFS: Separate page reading from user read CIFS: Fix rsize usage in readpages CIFS: Separate page search from readpages CIFS: Use multicredits for SMB 2.1/3 writes ...
2014-08-07exportfs: update Exporting documentationJ. Bruce Fields
Minor documentation updates: - refer to d_obtain_alias rather than d_alloc_anon - explain when to use d_splice_alias and when d_materialise_unique. - cut some details of d_splice_alias/d_materialise_unique implementation. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.NeilBrown
In REF-walk mode, ->d_manage can return -EISDIR to indicate that the dentry is not really a mount trap (or even a mount point) and that any mounts or any DCACHE_NEED_AUTOMOUNT flag should be ignored. RCU-walk mode doesn't currently support this, so if there is a dentry with DCACHE_NEED_AUTOMOUNT set but which shouldn't be a mount-trap, lookup_fast() will always drop in REF-walk mode. With this patch, an -EISDIR from ->d_manage will always cause mounts and automounts to be ignored, both in REF-walk and RCU-walk. Bug-fixed-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-05Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
2014-08-04Merge tag 'for-f2fs-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series includes patches to: - add nobarrier mount option - support tmpfile and rename2 - enhance the fdatasync behavior - fix the error path - fix the recovery routine - refactor a part of the checkpoint procedure - reduce some lock contentions" * tag 'for-f2fs-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: use for_each_set_bit to simplify the code f2fs: add f2fs_balance_fs for expand_inode_data f2fs: invalidate xattr node page when evict inode f2fs: avoid skipping recover_inline_xattr after recover_inline_data f2fs: add tracepoint for f2fs_direct_IO f2fs: reduce competition among node page writes f2fs: fix coding style f2fs: remove redundant lines in allocate_data_block f2fs: add tracepoint for f2fs_issue_flush f2fs: avoid retrying wrong recovery routine when error was occurred f2fs: test before set/clear bits f2fs: fix wrong condition for unlikely f2fs: enable in-place-update for fdatasync f2fs: skip unnecessary data writes during fsync f2fs: add info of appended or updated data writes f2fs: use radix_tree for ino management f2fs: add infra for ino management f2fs: punch the core function for inode management f2fs: add nobarrier mount option f2fs: fix to put root inode in error path of fill_super ...
2014-08-02update CIFS TODO listSteve French
Signed-off-by: Steve French <smfrench@gmail.com>
2014-08-02Add Pavel to contributor list in cifs AUTHORS fileSteve French
Signed-off-by: Steve French <smfrench@gmail.com> CC: Pavel Shilovsky <pshilovsky@samba.org>
2014-07-29f2fs: add nobarrier mount optionJaegeuk Kim
This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-07-18docs: Procfs -- Document timerfd outputCyrill Gorcunov
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Link: http://lkml.kernel.org/r/20140715215703.199905126@openvz.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-16sched: Remove proliferation of wait_on_bit() action functionsNeilBrown
The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "This the bunch that sat in -next + lock_parent() fix. This is the minimal set; there's more pending stuff. In particular, I really hope to get acct.c fixes merged this cycle - we need that to deal sanely with delayed-mntput stuff. In the next pile, hopefully - that series is fairly short and localized (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more iov_iter work. Most of prereqs for ->splice_write with sane locking order are there and Kent's dio rewrite would also fit nicely on top of this pile" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits) lock_parent: don't step on stale ->d_parent of all-but-freed one kill generic_file_splice_write() ceph: switch to iter_file_splice_write() shmem: switch to iter_file_splice_write() nfs: switch to iter_splice_write_file() fs/splice.c: remove unneeded exports ocfs2: switch to iter_file_splice_write() ->splice_write() via ->write_iter() bio_vec-backed iov_iter optimize copy_page_{to,from}_iter() bury generic_file_aio_{read,write} lustre: get rid of messing with iovecs ceph: switch to ->write_iter() ceph_sync_direct_write: stop poking into iov_iter guts ceph_sync_read: stop poking into iov_iter guts new helper: copy_page_from_iter() fuse: switch to ->write_iter() btrfs: switch to ->write_iter() ocfs2: switch to ->write_iter() xfs: switch to ->write_iter() ...
2014-06-12Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linusAl Viro
Backmerge of dcache.c changes from mainline. It's that, or complete rebase... Conflicts: fs/splice.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-06-10Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "The largest piece is a long-overdue rewrite of the xdr code to remove some annoying limitations: for example, there was no way to return ACLs larger than 4K, and readdir results were returned only in 4k chunks, limiting performance on large directories. Also: - part of Neil Brown's work to make NFS work reliably over the loopback interface (so client and server can run on the same machine without deadlocks). The rest of it is coming through other trees. - cleanup and bugfixes for some of the server RDMA code, from Steve Wise. - Various cleanup of NFSv4 state code in preparation for an overhaul of the locking, from Jeff, Trond, and Benny. - smaller bugfixes and cleanup from Christoph Hellwig and Kinglong Mee. Thanks to everyone! This summer looks likely to be busier than usual for knfsd. Hopefully we won't break it too badly; testing definitely welcomed" * 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits) nfsd4: fix FREE_STATEID lockowner leak svcrdma: Fence LOCAL_INV work requests svcrdma: refactor marshalling logic nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry nfs4: remove unused CHANGE_SECURITY_LABEL nfsd4: kill READ64 nfsd4: kill READ32 nfsd4: simplify server xdr->next_page use nfsd4: hash deleg stateid only on successful nfs4_set_delegation nfsd4: rename recall_lock to state_lock nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open nfsd4: use recall_lock for delegation hashing nfsd: fix laundromat next-run-time calculation nfsd: make nfsd4_encode_fattr static SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING nfsd: remove unused function nfsd_read_file nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer NFSD: Error out when getting more than one fsloc/secinfo/uuid NFSD: Using type of uint32_t for ex_nflavors instead of int ...
2014-06-09Merge tag 'for-f2fs-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, there is no special interesting feature, but we've investigated a couple of tuning points with respect to the I/O flow. Several major bug fixes and a bunch of clean-ups also have been made. This patch-set includes the following major enhancement patches: - enhance wait_on_page_writeback - support SEEK_DATA and SEEK_HOLE - enhance readahead flows - enhance IO flushes - support fiemap - add some tracepoints The other bug fixes are as follows: - fix to support a large volume > 2TB correctly - recovery bug fix wrt fallocated space - fix recursive lock on xattr operations - fix some cases on the remount flow And, there are a bunch of cleanups" * tag 'for-f2fs-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (52 commits) f2fs: support f2fs_fiemap f2fs: avoid not to call remove_dirty_inode f2fs: recover fallocated space f2fs: fix to recover data written by dio f2fs: large volume support f2fs: avoid crash when trace f2fs_submit_page_mbio event in ra_sum_pages f2fs: avoid overflow when large directory feathure is enabled f2fs: fix recursive lock by f2fs_setxattr MAINTAINERS: add a co-maintainer from samsung for F2FS MAINTAINERS: change the email address for f2fs f2fs: use inode_init_owner() to simplify codes f2fs: avoid to use slab memory in f2fs_issue_flush for efficiency f2fs: add a tracepoint for f2fs_read_data_page f2fs: add a tracepoint for f2fs_write_{meta,node,data}_pages f2fs: add a tracepoint for f2fs_write_{meta,node,data}_page f2fs: add a tracepoint for f2fs_write_end f2fs: add a tracepoint for f2fs_write_begin f2fs: fix checkpatch warning f2fs: deactivate inode page if the inode is evicted f2fs: decrease the lock granularity during write_begin ...
2014-06-06Documentation/filesystems/seq_file.txt: create_proc_entry deprecatedFabian Frederick
Linked article in seq_file.txt still uses create_proc_entry which was removed in commit 80e928f7ebb9 ("proc: Kill create_proc_entry()"). This patch adds information for kernel 3.10 and above Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06fs/fat/: add support for DOS 1.x formatted volumesConrad Meyer
Add structure for parsed BPB information, struct fat_bios_param_block, and move all of the deserialization and validation logic from fat_fill_super() into fat_read_bpb(). Add a 'dos1xfloppy' mount option to infer DOS 2.x BIOS Parameter Block defaults from block device geometry for ancient floppies and floppy images, as a fall-back from the default BPB parsing logic. When fat_read_bpb() finds an invalid FAT filesystem and dos1xfloppy is set, fall back to fat_read_static_bpb(). fat_read_static_bpb() validates that the entire BPB is zero, and that the floppy has a DOS-style 8086 code bootstrapping header. Then it fills in default BPB values from media size and a table.[0] Media size is assumed to be static for archaic FAT volumes. See also: [1]. Fixes kernel.org bug #42617. [0]: https://en.wikipedia.org/wiki/File_Allocation_Table#Exceptions [1]: http://www.win.tue.nl/~aeb/linux/fs/fat/fat-1.html [hirofumi@mail.parknet.co.jp: fix missed error code] Signed-off-by: Conrad Meyer <cse.cem@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial into next Pull trivial tree changes from Jiri Kosina: "Usual pile of patches from trivial tree that make the world go round" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) staging: go7007: remove reference to CONFIG_KMOD aic7xxx: Remove obsolete preprocessor define of: dma: doc fixes doc: fix incorrect formula to calculate CommitLimit value doc: Note need of bc in the kernel build from 3.10 onwards mm: Fix printk typo in dmapool.c modpost: Fix comment typo "Modules.symvers" Kconfig.debug: Grammar s/addition/additional/ wimax: Spelling s/than/that/, wording s/destinatary/recipient/ aic7xxx: Spelling s/termnation/termination/ arm64: mm: Remove superfluous "the" in comment of: Spelling s/anonymouns/anonymous/ dma: imx-sdma: Spelling s/determnine/determine/ ath10k: Improve grammar in comments ath6kl: Spelling s/determnine/determine/ of: Improve grammar for of_alias_get_id() documentation drm/exynos: Spelling s/contro/control/ radio-bcm2048.c: fix wrong overflow check doc: printk-formats: do not mention casts for u64/s64 doc: spelling error changes ...
2014-06-04f2fs: avoid overflow when large directory feathure is enabledChao Yu
When large directory feathure is enable, We have one case which could cause overflow in dir_buckets() as following: special case: level + dir_level >= 32 and level < MAX_DIR_HASH_DEPTH / 2. Here we define MAX_DIR_BUCKETS to limit the return value when the condition could trigger potential overflow. Changes from V1 o modify description of calculation in f2fs.txt suggested by Changman Lee. Suggested-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2014-05-30nfsd4: allow exotic read compoundsJ. Bruce Fields
I'm not sure why a client would want to stuff multiple reads in a single compound rpc, but it's legal for them to do it, and we should really support it. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-25Documentation: update /proc/stat "intr" count summaryJan Moskyto Matejka
The sum at the beginning of line "intr" includes also unnumbered interrupts. It implies that the sum at the beginning isn't the sum of the remainder of the line, not even an estimation. Fixed the documentation to mention that. This behaviour was added to /proc/stat in commit a2eddfa95919 ("x86: make /proc/stat account for all interrupts") Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-22doc: fix incorrect formula to calculate CommitLimit valuePetr Oros
The formula to calculate "CommitLimit" value mentioned in kernel documentation is incorrect. Right formula is: CommitLimit = ([total RAM pages] - [total huge TLB pages]) * overcommit_ratio / 100 + [total swap pages] Signed-off-by: Petr Oros <poros@redhat.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-05-06new methods: ->read_iter() and ->write_iter()Al Viro
Beginning to introduce those. Just the callers for now, and it's clumsier than it'll eventually become; once we finish converting aio_read and aio_write instances, the things will get nicer. For now, these guys are in parallel to ->aio_read() and ->aio_write(); they take iocb and iov_iter, with everything in iov_iter already validated. File offset is passed in iocb->ki_pos, iov/nr_segs - in iov_iter. Main concerns in that series are stack footprint and ability to split the damn thing cleanly. [fix from Peter Ujfalusi <peter.ujfalusi@ti.com> folded] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-06pass iov_iter to ->direct_IO()Al Viro
unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>