summaryrefslogtreecommitdiff
path: root/fs/ceph
AgeCommit message (Collapse)Author
2018-04-02ceph: fix root quota realm checkYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: don't check quota for snap inodeYan, Zheng
snap inode's i_snap_realm is not pointing to ceph_snap_realm. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: quota: update MDS when max_bytes is approachingLuis Henriques
When we're reaching the ceph.quota.max_bytes limit, i.e., when writing more than 1/16th of the space left in a quota realm, update the MDS with the new file size. This mirrors the fuse-client approach with commit 122c50315ed1 ("client: Inform mds file size when approaching quota limit"), in the ceph git tree. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: quota: support for ceph.quota.max_bytesLuis Henriques
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: quota: don't allow cross-quota renamesLuis Henriques
This patch changes ceph_rename so that -EXDEV is returned if an attempt is made to mv a file between two different dir trees with different quotas setup. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: quota: support for ceph.quota.max_filesLuis Henriques
This patch adds support for the max_files quota. It hooks into all the ceph functions that add new filesystem objects that need to be checked against the quota limits. When these limits are hit, -EDQUOT is returned. Note that we're not checking quotas on ceph_link(). ceph_link doesn't really create a new inode, and since the MDS doesn't update the directory statistics when a new (hard) link is created (only with symlinks), they are not accounted as a new file. Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: quota: add initial infrastructure to support cephfs quotasLuis Henriques
This patch adds the infrastructure required to support cephfs quotas as it is currently implemented in the ceph fuse client. Cephfs quotas can be set on any directory, and can restrict the number of bytes or the number of files stored beneath that point in the directory hierarchy. Quotas are set using the extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', and can be removed by setting these attributes to '0'. Link: http://tracker.ceph.com/issues/22372 Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: rename function drop_leases() to a more descriptive nameYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: fix invalid point dereference for error case in mdsc destroyChengguang Xu
1. set fsc->mdsc after successfully allocate all necessary memory in mdsc init. 2. if fsc->mdsc is NULL, just skip destroy operation in mdsc destroy. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: return proper bool type to caller instead of pointerChengguang Xu
Change to return true/false only for bool type return code. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: optimize memory usageChengguang Xu
In current code, regular file and directory use same struct ceph_file_info to store fs specific data so the struct has to include some fields which are only used for directory (e.g., readdir related info), when having plenty of regular files, it will lead to memory waste. This patch introduces dedicated ceph_dir_file_info cache for readdir related thins. So that regular file does not include those unused fields anymore. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: optimize mds session registerChengguang Xu
Do memory allocation first, so that avoid unnecessary initialization of newly allocated session in error case. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: add __init attribution to init funcitonsChengguang Xu
Add __init attribution to the functions which are called only once during initiating/registering operations and deleting unnecessary symbol exports. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: filter out used flags when printing unused open flagsChengguang Xu
Filter out used access mode flags when printing unused open flags. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: don't wait on writeback when there is no more dirty pagesYan, Zheng
In sync mode, writepages() needs to write all dirty pages. But it can only write dirty pages associated with the oldest snapc. To write dirty pages associated with next snapc, it needs to wait until current writes complete. If there is no more dirty pages, writepages() should not wait on writeback. Otherwise, dirty page writeback becomes very slow. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: invalidate pages that beyond EOF in ceph_writepages_start()Yan, Zheng
Dirty pages can be associated with different capsnap. Different capsnap may have different EOF value. So invalidating dirty pages according to the largest EOF value is wrong. Dirty pages beyond EOF, but associated with other capsnap, do not get invalidated. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: mark the cap cache as unreclaimableChengguang Xu
Releasing cap is affected by many factors (e.g., avail_count/reserve_count/min_count) and min_count could be specified high volume in client mount option. Hence it's better to mark cap cache as unreclaimable in case of non-trivial discrepancies between memory shown as reclaimable and what is actually reclaimed. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: change variable name to follow common ruleChengguang Xu
Variable name ci is mostly used for ceph_inode_info. Variable name fi is mostly used for ceph_file_info. Variable name cf is mostly used for ceph_cap_flush. Change variable name to follow above common rules in case of confusing. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: optimizing cap reservationChengguang Xu
When caps_avail_count is in a low level, most newly trimmed caps will probably go into ->caps_list and caps_avail_count will be increased. Hence after trimming, should recheck caps_avail_count to effectly reuse newly trimmed caps. Also, when releasing unnecessary caps follow the same rule of ceph_put_cap. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: release unreserved caps if having enough available capsChengguang Xu
When unreserving caps check if there is too mamy available caps in the ->caps_list, if so release unreserved caps. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: optimizing cap allocationChengguang Xu
When setting high volume of caps_min_count or having many unreserved caps, unused caps may always keep in the ->caps_list even can't get new cap from kmem_cache_alloc because lack of maximum limitation of caps_avail_count. Hence reuse caps in ->caps_list if available, it's maybe better than setting max limitation of caps_avail_count and releasing unused caps when reaching the limit. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: adding protection for showing cap reservation infoChengguang Xu
Adding spinlock protection during getting cap reservation ralated fields so that the numbers match below BUG_ON condition in the code. BUG_ON(mdsc->caps_total_count != mdsc->caps_use_count + mdsc->caps_reserve_count + mdsc->caps_avail_count); Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: use seq_show_option for string type optionsChengguang Xu
Using seq_show_option to replace seq_printf for string type options. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: change permission for readonly debugfs entriesChengguang Xu
Remove write permission for debugfs entries which only have readonly function. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: keep consistent semantic in fscache related option combinationChengguang Xu
When specifying multiple fscache related options, the result isn't always the same as option order, this fix will keep strict consistent meaning by order. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02ceph: add newline to end of debug message formatChengguang Xu
Some of dout format do not include newline in the end, fix for the files which are in fs/ceph and net/ceph directories, and changing printk to dout for printing debug info in super.c Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: move ceph_calc_file_object_mapping() to striper.cIlya Dryomov
ceph_calc_file_object_mapping() has nothing to do with osdmaps. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02libceph, ceph: change ceph_calc_file_object_mapping() signatureIlya Dryomov
- make it void - xlen (object extent length) out parameter should be u32 because only a single stripe unit is mapped at a time Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-03-30ceph: only dirty ITER_IOVEC pages for direct readYan, Zheng
If a page is already locked, attempting to dirty it leads to a deadlock in lock_page(). This is what currently happens to ITER_BVEC pages when a dio-enabled loop device is backed by ceph: $ losetup --direct-io /dev/loop0 /mnt/cephfs/img $ xfs_io -c 'pread 0 4k' /dev/loop0 Follow other file systems and only dirty ITER_IOVEC pages. Cc: stable@kernel.org Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-03-01ceph: fix potential memory leak in init_caches()Chengguang Xu
There is lack of cache destroy operation for ceph_file_cachep when failing from fscache register. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-26ceph: fix dentry leak when failing to init debugfsChengguang Xu
When failing from ceph_fs_debugfs_init() in ceph_real_mount(), there is lack of dput of root_dentry and it causes slab errors, so change the calling order of ceph_fs_debugfs_init() and open_root_dentry() and do some cleanups to avoid this issue. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-26libceph, ceph: avoid memory leak when specifying same option several timesChengguang Xu
When parsing string option, in order to avoid memory leak we need to carefully free it first in case of specifying same option several times. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-26ceph: flush dirty caps of unlinked inode ASAPZhi Zhang
Client should release unlinked inode from its cache ASAP. But client can't release inode with dirty caps. Link: http://tracker.ceph.com/issues/22886 Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-02-22get rid of pointless includes of fs_struct.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-08Merge tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "Things have been very quiet on the rbd side, as work continues on the big ticket items slated for the next merge window. On the CephFS side we have a large number of cap handling improvements, a fix for our long-standing abuse of ->journal_info in ceph_readpages() and yet another dentry pointer management patch" * tag 'ceph-for-4.16-rc1' of git://github.com/ceph/ceph-client: ceph: improving efficiency of syncfs libceph: check kstrndup() return value ceph: try to allocate enough memory for reserved caps ceph: fix race of queuing delayed caps ceph: delete unreachable code in ceph_check_caps() ceph: limit rate of cap import/export error messages ceph: fix incorrect snaprealm when adding caps ceph: fix un-balanced fsc->writeback_count update ceph: track read contexts in ceph_file_info ceph: avoid dereferencing invalid pointer during cached readdir ceph: use atomic_t for ceph_inode_info::i_shared_gen ceph: cleanup traceless reply handling for rename ceph: voluntarily drop Fx cap for readdir request ceph: properly drop caps for setattr request ceph: voluntarily drop Lx cap for link/rename requests ceph: voluntarily drop Ax cap for requests that create new inode rbd: whitelist RBD_FEATURE_OPERATIONS feature bit rbd: don't NULL out ->obj_request in rbd_img_obj_parent_read_full() rbd: use kmem_cache_zalloc() in rbd_img_request_create() rbd: obj_request->completion is unused
2018-01-31Merge tag 'docs-4.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...
2018-01-30ceph: improving efficiency of syncfsChengguang Xu
write_inode() could be called variety of reasons, in the case of syncfs(2) there is no need to wait for flush getting completed in write_inode(), ->sync_fs is for guaranteeing flush completion for all inodes at that point. Signed-off-by: Chengguang Xu <cgxu519@icloud.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: try to allocate enough memory for reserved capsZhi Zhang
ceph_reserve_caps() may not reserve enough caps under high memory pressure, but it saved the needed caps number that expected to be reserved. When getting caps, crash would happen due to number mismatch. Now we will try to trim more caps when failing to allocate memory for caps need to be reserved, then try again. If still failing to allocate memory, return -ENOMEM. Signed-off-by: Zhi Zhang <zhang.david2011@gmail.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: fix race of queuing delayed capsYan, Zheng
When called with CHECK_CAPS_AUTHONLY flag, ceph_check_caps() only processes auth caps. In that case, it's unsafe to remove inode from mdsc->cap_delay_list, because there can be delayed non-auth caps. Besides, ceph_check_caps() may lock/unlock i_ceph_lock several times, when multiple threads call ceph_check_caps() at the same time. It's possible that one thread calls __cap_delay_requeue(), another thread calls __cap_delay_cancel(). __cap_delay_cancel() should be called at very beginning of ceph_check_caps(), so that it does not race with __cap_delay_requeue(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: delete unreachable code in ceph_check_caps()Yan, Zheng
"revoking & (CEPH_CAP_FILE_CACHE|CEPH_CAP_FILE_LAZYIO)" has already been tested before calling try_nonblocking_invalidate() Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: limit rate of cap import/export error messagesYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: fix incorrect snaprealm when adding capsYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: fix un-balanced fsc->writeback_count updateYan, Zheng
Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: track read contexts in ceph_file_infoYan, Zheng
Previously ceph_read_iter() uses current->journal to pass context info to ceph_readpages(), so that ceph_readpages() can distinguish read(2) from readahead(2)/fadvise(2)/madvise(2). The problem is that page fault can happen when copying data to userspace memory. Page fault may call other filesystem's page_mkwrite() if the userspace memory is mapped to a file. The later filesystem may also want to use current->journal. The fix is define a on-stack data structure in ceph_read_iter(), add it to context list in ceph_file_info. ceph_readpages() searches the list, find if there is a context belongs to current thread. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: avoid dereferencing invalid pointer during cached readdirYan, Zheng
Readdir cache keeps array of dentry pointers in page cache. If any dentry in readdir cache gets pruned, ceph_d_prune() disables readdir cache for later readdir syscall. The problem is that ceph_d_prune() ignores unhashed dentry. Ideally MDS should have already revoked CEPH_CAP_FILE_SHARED (which also disables readdir cache) when dentry gets unhashed. But if it is somehow MDS does not properly revoke CEPH_CAP_FILE_SHARED and the unhashed dentry gets pruned later, ceph_d_prune() will not disable readdir cache, later readdir may reference invalid dentry pointer. The fix is make ceph_d_prune() do extra check for unhashed dentry. Disable readdir cache if the unhashed dentry is still referenced by readdir cache. Another fix in this patch is handle d_splice_alias(). If a dentry gets spliced into new parent dentry, treat it as if it was pruned (call ceph_d_prune() for it). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: use atomic_t for ceph_inode_info::i_shared_genYan, Zheng
It allows accessing i_shared_gen without holding i_ceph_lock. It is preparation for later patch. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: cleanup traceless reply handling for renameYan, Zheng
ceph_fill_trace() already calls ceph_invalidate_dir_request() for traceless reply. No need to duplicate the code in ceph_rename(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: voluntarily drop Fx cap for readdir requestYan, Zheng
MDS need to rdlock directory inode's filelock when handling readdir request. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: properly drop caps for setattr requestYan, Zheng
For CEPH_SETATTR_ATIME, MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked filelock. For CEPH_SETATTR_SIZE request that truncates file to smaller size, MDS needs to xlock filelock, Fsxrw caps are not allowed for xlocked filelock. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-01-29ceph: voluntarily drop Lx cap for link/rename requestsYan, Zheng
MDS need to xlock inode's linklock when handling link/rename requests. Voluntarily dropping CEPH_CAP_AUTH_EXCL avoids a cap revoke message. Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>