summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2020-01-17ext4: fix deadlock allocating crypto bounce page from mempoolEric Biggers
ext4_writepages() on an encrypted file has to encrypt the data, but it can't modify the pagecache pages in-place, so it encrypts the data into bounce pages and writes those instead. All bounce pages are allocated from a mempool using GFP_NOFS. This is not correct use of a mempool, and it can deadlock. This is because GFP_NOFS includes __GFP_DIRECT_RECLAIM, which enables the "never fail" mode for mempool_alloc() where a failed allocation will fall back to waiting for one of the preallocated elements in the pool. But since this mode is used for all a bio's pages and not just the first, it can deadlock waiting for pages already in the bio to be freed. This deadlock can be reproduced by patching mempool_alloc() to pretend that pool->alloc() always fails (so that it always falls back to the preallocations), and then creating an encrypted file of size > 128 KiB. Fix it by only using GFP_NOFS for the first page in the bio. For subsequent pages just use GFP_NOWAIT, and if any of those fail, just submit the bio and start a new one. This will need to be fixed in f2fs too, but that's less straightforward. Fixes: c9af28fdd449 ("ext4 crypto: don't let data integrity writebacks fail with ENOMEM") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191231181149.47619-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: Delete ext4_kvzvalloc()Naoto Kobayashi
Since we're not using ext4_kvzalloc(), delete this function. Signed-off-by: Naoto Kobayashi <naoto.kobayashi4c@gmail.com> Link: https://lore.kernel.org/r/20191227080523.31808-2-naoto.kobayashi4c@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: re-enable extent zeroout optimization on encrypted filesEric Biggers
For encrypted files, commit 36086d43f657 ("ext4 crypto: fix bugs in ext4_encrypted_zeroout()") disabled the optimization where when a write occurs to the middle of an unwritten extent, the head and/or tail of the extent (when they aren't too large) are zeroed out, turned into an initialized extent, and merged with the part being written to. This optimization helps prevent fragmentation of the extent tree. However, disabling this optimization also made fscrypt_zeroout_range() nearly impossible to test, as now it's only reachable via the very rare case in ext4_split_extent_at() where allocating a new extent tree block fails due to ENOSPC. 'gce-xfstests -c ext4/encrypt -g auto' doesn't even hit this at all. It's preferable to avoid really rare cases that are hard to test. That commit also cited data corruption in xfstest generic/127 as a reason to disable the extent zeroout optimization, but that's no longer reproducible anymore. It also cited fscrypt_zeroout_range() having poor performance, but I've written a patch to fix that. Therefore, re-enable the extent zeroout optimization on encrypted files. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191226161114.53606-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: only use fscrypt_zeroout_range() on regular filesEric Biggers
fscrypt_zeroout_range() is only for encrypted regular files, not for encrypted directories or symlinks. Fortunately, currently it seems it's never called on non-regular files. But to be safe ext4 should explicitly check S_ISREG() before calling it. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191226161022.53490-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: allow ZERO_RANGE on encrypted filesEric Biggers
When ext4 encryption support was first added, ZERO_RANGE was disallowed, supposedly because test failures (e.g. ext4/001) were seen when enabling it, and at the time there wasn't enough time/interest to debug it. However, there's actually no reason why ZERO_RANGE can't work on encrypted files. And it fact it *does* work now. Whole blocks in the zeroed range are converted to unwritten extents, as usual; encryption makes no difference for that part. Partial blocks are zeroed in the pagecache and then ->writepages() encrypts those blocks as usual. ext4_block_zero_page_range() handles reading and decrypting the block if needed before actually doing the pagecache write. Also, f2fs has always supported ZERO_RANGE on encrypted files. As far as I can tell, the reason that ext4/001 was failing in v4.1 was actually because of one of the bugs fixed by commit 36086d43f657 ("ext4 crypto: fix bugs in ext4_encrypted_zeroout()"). The bug made ext4_encrypted_zeroout() always return a positive value, which caused unwritten extents in encrypted files to sometimes not be marked as initialized after being written to. This bug was not actually in ZERO_RANGE; it just happened to trigger during the extents manipulation done in ext4/001 (and probably other tests too). So, let's enable ZERO_RANGE on encrypted files on ext4. Tested with: gce-xfstests -c ext4/encrypt -g auto gce-xfstests -c ext4/encrypt_1k -g auto Got the same set of test failures both with and without this patch. But with this patch 6 fewer tests are skipped: ext4/001, generic/008, generic/009, generic/033, generic/096, and generic/511. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191226154216.4808-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: handle decryption error in __ext4_block_zero_page_range()Eric Biggers
fscrypt_decrypt_pagecache_blocks() can fail, because it uses skcipher_request_alloc(), which uses kmalloc(), which can fail; and also because it calls crypto_skcipher_decrypt(), which can fail depending on the driver that actually implements the crypto. Therefore it's not appropriate to WARN on decryption error in __ext4_block_zero_page_range(). Remove the WARN and just handle the error instead. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191226154105.4704-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: remove unnecessary selections from EXT3_FSEric Biggers
Since EXT3_FS already selects EXT4_FS, there's no reason for it to redundantly select all the selections of EXT4_FS -- notwithstanding the comments that claim otherwise. Remove these redundant selections to avoid confusion. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191226153920.4466-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2020-01-17ext4: use true,false for bool variablezhengbin
Fixes coccicheck warning: fs/ext4/extents.c:5271:6-12: WARNING: Assignment of 0/1 to bool variable fs/ext4/extents.c:5287:4-10: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1577241959-138695-1-git-send-email-zhengbin13@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: uninline ext4_inode_journal_mode()Eric Biggers
Determining an inode's journaling mode has gotten more complicated over time. Move ext4_inode_journal_mode() from an inline function into ext4_jbd2.c to reduce the compiled code size. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191209233602.117778-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2020-01-17ext4: remove unnecessary ifdefs in htree_dirblock_to_tree()Eric Biggers
The ifdefs for CONFIG_FS_ENCRYPTION in htree_dirblock_to_tree() are unnecessary, as the called functions are already stubbed out when !CONFIG_FS_ENCRYPTION. Remove them. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20191209213225.18477-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2020-01-17ext4: remove unnecessary assignment in ext4_htree_store_dirent()Chengguang Xu
We have allocated memory using kzalloc() so don't have to set 0 again in last byte. Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Link: https://lore.kernel.org/r/20191206054317.3107-1-cgxu519@mykernel.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-17ext4: avoid fetching btime in ext4_getattr() unless requestedTheodore Ts'o
Linus observed that an allmodconfig build which does a lot of stat(2) calls that ext4_getattr() was a noticeable (1%) amount of CPU time, due to the cache line for i_extra_isize getting pulled in. Since the normal stat system call doesn't return btime, it's a complete waste. So only calculate btime when it is explicitly requested. [ Fixed to check against request_mask instead of query_flags. ] Link: https://lore.kernel.org/r/CAHk-=wivmk_j6KbTX+Er64mLrG8abXZo0M10PNdAnHc8fWXfsQ@mail.gmail.com Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-14fs-verity: implement readahead of Merkle tree pagesEric Biggers
When fs-verity verifies data pages, currently it reads each Merkle tree page synchronously using read_mapping_page(). Therefore, when the Merkle tree pages aren't already cached, fs-verity causes an extra 4 KiB I/O request for every 512 KiB of data (assuming that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in more I/O requests and performance loss than is strictly necessary. Therefore, implement readahead of the Merkle tree pages. For simplicity, we take advantage of the fact that the kernel already does readahead of the file's *data*, just like it does for any other file. Due to this, we don't really need a separate readahead state (struct file_ra_state) just for the Merkle tree, but rather we just need to piggy-back on the existing data readahead requests. We also only really need to bother with the first level of the Merkle tree, since the usual fan-out factor is 128, so normally over 99% of Merkle tree I/O requests are for the first level. Therefore, make fsverity_verify_bio() enable readahead of the first Merkle tree level, for up to 1/4 the number of pages in the bio, when it sees that the REQ_RAHEAD flag is set on the bio. The readahead size is then passed down to ->read_merkle_tree_page() for the filesystem to (optionally) implement if it sees that the requested page is uncached. While we're at it, also make build_merkle_tree_level() set the Merkle tree readahead size, since it's easy to do there. However, for now don't set the readahead size in fsverity_verify_page(), since currently it's only used to verify holes on ext4 and f2fs, and it would need parameters added to know how much to read ahead. This patch significantly improves fs-verity sequential read performance. Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches: On an ARM64 phone (using sha256-ce): Before: 217 MB/s After: 263 MB/s (compare to sha256sum of non-verity file: 357 MB/s) In an x86_64 VM (using sha256-avx2): Before: 173 MB/s After: 215 MB/s (compare to sha256sum of non-verity file: 223 MB/s) Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.org Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2020-01-09kunit: allow kunit tests to be loaded as a moduleAlan Maguire
As tests are added to kunit, it will become less feasible to execute all built tests together. By supporting modular tests we provide a simple way to do selective execution on a running system; specifying CONFIG_KUNIT=y CONFIG_KUNIT_EXAMPLE_TEST=m ...means we can simply "insmod example-test.ko" to run the tests. To achieve this we need to do the following: o export the required symbols in kunit o string-stream tests utilize non-exported symbols so for now we skip building them when CONFIG_KUNIT_TEST=m. o drivers/base/power/qos-test.c contains a few unexported interface references, namely freq_qos_read_value() and freq_constraints_init(). Both of these could be potentially defined as static inline functions in include/linux/pm_qos.h, but for now we simply avoid supporting module build for that test suite. o support a new way of declaring test suites. Because a module cannot do multiple late_initcall()s, we provide a kunit_test_suites() macro to declare multiple suites within the same module at once. o some test module names would have been too general ("test-test" and "example-test" for kunit tests, "inode-test" for ext4 tests); rename these as appropriate ("kunit-test", "kunit-example-test" and "ext4-inode-test" respectively). Also define kunit_test_suite() via kunit_test_suites() as callers in other trees may need the old definition. Co-developed-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Knut Omang <knut.omang@oracle.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Acked-by: Theodore Ts'o <tytso@mit.edu> # for ext4 bits Acked-by: David Gow <davidgow@google.com> # For list-test Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2020-01-03dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()Vivek Goyal
As of now dax_writeback_mapping_range() takes "struct block_device" as a parameter and dax_dev is searched from bdev name. This also involves taking a fresh reference on dax_dev and putting that reference at the end of function. We are developing a new filesystem virtio-fs and using dax to access host page cache directly. But there is no block device. IOW, we want to make use of dax but want to get rid of this assumption that there is always a block device associated with dax_dev. So pass in "struct dax_device" as parameter instead of bdev. ext2/ext4/xfs are current users and they already have a reference on dax_device. So there is no need to take reference and drop reference to dax_device on each call of this function. Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200103183307.GB13350@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-12-31fscrypt: Allow modular crypto algorithmsHerbert Xu
The commit 643fa9612bf1 ("fscrypt: remove filesystem specific build config option") removed modular support for fs/crypto. This causes the Crypto API to be built-in whenever fscrypt is enabled. This makes it very difficult for me to test modular builds of the Crypto API without disabling fscrypt which is a pain. As fscrypt is still evolving and it's developing new ties with the fs layer, it's hard to build it as a module for now. However, the actual algorithms are not required until a filesystem is mounted. Therefore we can allow them to be built as modules. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20191227024700.7vrzuux32uyfdgum@gondor.apana.org.au Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31fscrypt: don't check for ENOKEY from fscrypt_get_encryption_info()Eric Biggers
fscrypt_get_encryption_info() returns 0 if the encryption key is unavailable; it never returns ENOKEY. So remove checks for ENOKEY. Link: https://lore.kernel.org/r/20191209212348.243331-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-26ext4: Optimize ext4 DIO overwritesJan Kara
Currently we start transaction for mapping every extent for writing using direct IO. This is unnecessary when we know we are overwriting already allocated blocks and the overhead of starting a transaction can be significant especially for multithreaded workloads doing small writes. Use iomap operations that avoid starting a transaction for direct IO overwrites. This improves throughput of 4k random writes - fio jobfile: [global] rw=randrw norandommap=1 invalidate=0 bs=4k numjobs=16 time_based=1 ramp_time=30 runtime=120 group_reporting=1 ioengine=psync direct=1 size=16G filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31 file_service_type=random nrfiles=32 from 3018MB/s to 4059MB/s in my test VM running test against simulated pmem device (note that before iomap conversion, this workload was able to achieve 3708MB/s because old direct IO path avoided transaction start for overwrites as well). For dax, the win is even larger improving throughput from 3042MB/s to 4311MB/s. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191218174433.19380-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26ext4: export information about first/last errors via /sys/fs/ext4/<dev>Theodore Ts'o
Make {first,last}_error_{ino,block,line,func,errcode} available via sysfs. Also add a missing newline for {first,last}_error_time. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26ext4: simulate various I/O and checksum errors when reading metadataTheodore Ts'o
This allows us to test various error handling code paths Link: https://lore.kernel.org/r/20191209012317.59398-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26ext4: save the error code which triggered an ext4_error() in the superblockTheodore Ts'o
This allows the cause of an ext4_error() report to be categorized based on whether it was triggered due to an I/O error, or an memory allocation error, or other possible causes. Most errors are caused by a detected file system inconsistency, so the default code stored in the superblock will be EXT4_ERR_EFSCORRUPTED. Link: https://lore.kernel.org/r/20191204032335.7683-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-26Merge branch 'rk/inode_lock' into devTheodore Ts'o
These are ilock patches which helps improve the current inode lock scalabiliy problem in ext4 DIO mixed read/write workload case. The problem was first reported by Joseph [1]. This should help improve mixed read/write workload cases for databases which use directIO. These patches are based upon upstream discussion with Jan Kara & Joseph [2]. The problem really is that in case of DIO overwrites, we start with a exclusive lock and then downgrade it later to shared lock. This causes a scalability problem in case of mixed DIO read/write workload case. i.e. if we have any ongoing DIO reads and then comes a DIO writes, (since writes starts with excl. inode lock) then it has to wait until the shared lock is released (which only happens when DIO read is completed). Same is true for vice versa as well. The same can be easily observed with perf-tools trace analysis [3]. For more details, including performance numbers, please see [4]. [1] https://lore.kernel.org/linux-ext4/1566871552-60946-4-git-send-email-joseph.qi@linux.alibaba.com/ [2] https://lore.kernel.org/linux-ext4/20190910215720.GA7561@quack2.suse.cz/ [3] https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/perf.report [4] https://lore.kernel.org/r/20191212055557.11151-1-riteshh@linux.ibm.com
2019-12-22ext4: Move to shared i_rwsem even without dioread_nolock mount optRitesh Harjani
We were using shared locking only in case of dioread_nolock mount option in case of DIO overwrites. This mount condition is not needed anymore with current code, since:- 1. No race between buffered writes & DIO overwrites. Since buffIO writes takes exclusive lock & DIO overwrites will take shared locking. Also DIO path will make sure to flush and wait for any dirty page cache data. 2. No race between buffered reads & DIO overwrites, since there is no block allocation that is possible with DIO overwrites. So no stale data exposure should happen. Same is the case between DIO reads & DIO overwrites. 3. Also other paths like truncate is protected, since we wait there for any DIO in flight to be over. Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-4-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22ext4: Start with shared i_rwsem in case of DIO instead of exclusiveRitesh Harjani
Earlier there was no shared lock in DIO read path. But this patch (16c54688592ce: ext4: Allow parallel DIO reads) simplified some of the locking mechanism while still allowing for parallel DIO reads by adding shared lock in inode DIO read path. But this created problem with mixed read/write workload. It is due to the fact that in DIO path, we first start with exclusive lock and only when we determine that it is a ovewrite IO, we downgrade the lock. This causes the problem, since we still have shared locking in DIO reads. So, this patch tries to fix this issue by starting with shared lock and then switching to exclusive lock only when required based on ext4_dio_write_checks(). Other than that, it also simplifies below cases:- 1. Simplified ext4_unaligned_aio API to ext4_unaligned_io. Previous API was abused in the sense that it was not really checking for AIO anywhere also it used to check for extending writes. So this API was renamed and simplified to ext4_unaligned_io() which actully only checks if the IO is really unaligned. Now, in case of unaligned direct IO, iomap_dio_rw needs to do zeroing of partial block and that will require serialization against other direct IOs in the same block. So we take a exclusive inode lock for any unaligned DIO. In case of AIO we also need to wait for any outstanding IOs to complete so that conversion from unwritten to written is completed before anyone try to map the overlapping block. Hence we take exclusive inode lock and also wait for inode_dio_wait() for unaligned DIO case. Please note since we are anyway taking an exclusive lock in unaligned IO, inode_dio_wait() becomes a no-op in case of non-AIO DIO. 2. Added ext4_extending_io(). This checks if the IO is extending the file. 3. Added ext4_dio_write_checks(). In this we start with shared inode lock and only switch to exclusive lock if required. So in most cases with aligned, non-extending, dioread_nolock & overwrites, it tries to write with a shared lock. If not, then we restart the operation in ext4_dio_write_checks(), after acquiring exclusive lock. Reviewed-by: Jan Kara <jack@suse.cz> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-3-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAITRitesh Harjani
Apparently our current rwsem code doesn't like doing the trylock, then lock for real scheme. So change our dax read/write methods to just do the trylock for the RWF_NOWAIT case. This seems to fix AIM7 regression in some scalable filesystems upto ~25% in some cases. Claimed in commit 942491c9e6d6 ("xfs: fix AIM7 regression") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191212055557.11151-2-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22ext4: treat buffers contining write errors as valid in ext4_sb_bread()Theodore Ts'o
In commit 7963e5ac9012 ("ext4: treat buffers with write errors as containing valid data") we missed changing ext4_sb_bread() to use ext4_buffer_uptodate(). So fix this oversight. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-22Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Ext4 bug fixes, including a regression fix" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: clarify impact of 'commit' mount option ext4: fix unused-but-set-variable warning in ext4_add_entry() jbd2: fix kernel-doc notation warning ext4: use RCU API in debug_print_tree ext4: validate the debug_want_extra_isize mount option at parse time ext4: reserve revoke credits in __ext4_new_inode ext4: unlock on error in ext4_expand_extra_isize() ext4: optimize __ext4_check_dir_entry() ext4: check for directory entries too close to block end ext4: fix ext4_empty_dir() for directories with holes
2019-12-21ext4: fix unused-but-set-variable warning in ext4_add_entry()Yunfeng Ye
Warning is found when compile with "-Wunused-but-set-variable": fs/ext4/namei.c: In function ‘ext4_add_entry’: fs/ext4/namei.c:2167:23: warning: variable ‘sbi’ set but not used [-Wunused-but-set-variable] struct ext4_sb_info *sbi; ^~~ Fix this by moving the variable @sbi under CONFIG_UNICODE. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/cb5eb904-224a-9701-c38f-cb23514b1fff@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-15ext4: use RCU API in debug_print_treePhong Tran
struct ext4_sb_info.system_blks was marked __rcu. But access the pointer without using RCU lock and dereference. Sparse warning with __rcu notation: block_validity.c:139:29: warning: incorrect type in argument 1 (different address spaces) block_validity.c:139:29: expected struct rb_root const * block_validity.c:139:29: got struct rb_root [noderef] <asn:4> * Link: https://lore.kernel.org/r/20191213153306.30744-1-tranmanphong@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Phong Tran <tranmanphong@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-15ext4: validate the debug_want_extra_isize mount option at parse timeTheodore Ts'o
Instead of setting s_want_extra_size and then making sure that it is a valid value afterwards, validate the field before we set it. This avoids races and other problems when remounting the file system. Link: https://lore.kernel.org/r/20191215063020.GA11512@mit.edu Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-and-tested-by: syzbot+4a39a025912b265cacef@syzkaller.appspotmail.com
2019-12-14ext4: reserve revoke credits in __ext4_new_inodeyangerkun
It's possible that __ext4_new_inode will release the xattr block, so it will trigger a warning since there is revoke credits will be 0 if the handle == NULL. The below scripts can reproduce it easily. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3861 at fs/jbd2/revoke.c:374 jbd2_journal_revoke+0x30e/0x540 fs/jbd2/revoke.c:374 ... __ext4_forget+0x1d7/0x800 fs/ext4/ext4_jbd2.c:248 ext4_free_blocks+0x213/0x1d60 fs/ext4/mballoc.c:4743 ext4_xattr_release_block+0x55b/0x780 fs/ext4/xattr.c:1254 ext4_xattr_block_set+0x1c2c/0x2c40 fs/ext4/xattr.c:2112 ext4_xattr_set_handle+0xa7e/0x1090 fs/ext4/xattr.c:2384 __ext4_set_acl+0x54d/0x6c0 fs/ext4/acl.c:214 ext4_init_acl+0x218/0x2e0 fs/ext4/acl.c:293 __ext4_new_inode+0x352a/0x42b0 fs/ext4/ialloc.c:1151 ext4_mkdir+0x2e9/0xbd0 fs/ext4/namei.c:2774 vfs_mkdir+0x386/0x5f0 fs/namei.c:3811 do_mkdirat+0x11c/0x210 fs/namei.c:3834 do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:294 ... ------------------------------------- scripts: mkfs.ext4 /dev/vdb mount /dev/vdb /mnt cd /mnt && mkdir dir && for i in {1..8}; do setfacl -dm "u:user_"$i":rx" dir; done mkdir dir/dir1 && mv dir/dir1 ./ sh repro.sh && add some user [root@localhost ~]# cat repro.sh while [ 1 -eq 1 ]; do rm -rf dir rm -rf dir1/dir1 mkdir dir for i in {1..8}; do setfacl -dm "u:test"$i":rx" dir; done setfacl -m "u:user_9:rx" dir & mkdir dir1/dir1 & done Before exec repro.sh, dir1 has inherit the default acl from dir, and xattr block of dir1 dir is not the same, so the h_refcount of these two dir's xattr block will be 1. Then repro.sh can trigger the warning with the situation show as below. The last h_refcount can be clear with mkdir, and __ext4_new_inode has not reserved revoke credits, so the warning will happened, fix it by reserve revoke credits in __ext4_new_inode. Thread 1 Thread 2 mkdir dir set default acl(will create a xattr block blk1 and the refcount of ext4_xattr_header will be 1) ... mkdir dir1/dir1 ->....->ext4_init_acl ->__ext4_set_acl(set default acl, will reuse blk1, and h_refcount will be 2) setfacl->ext4_set_acl->... ->ext4_xattr_block_set(will create new block blk2 to store xattr) ->__ext4_set_acl(set access acl, since h_refcount of blk1 is 2, will create blk3 to store xattr) ->ext4_xattr_release_block(dec h_refcount of blk1 to 1) ->ext4_xattr_release_block(dec h_refcount and since it is 0, will release the block and trigger the warning) Link: https://lore.kernel.org/r/20191213014900.47228-1-yangerkun@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-14ext4: unlock on error in ext4_expand_extra_isize()Dan Carpenter
We need to unlock the xattr before returning on this error path. Cc: stable@kernel.org # 4.13 Fixes: c03b45b853f5 ("ext4, project: expand inode extra size if possible") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20191213185010.6k7yl2tck3wlsdkt@kili.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-14ext4: optimize __ext4_check_dir_entry()Theodore Ts'o
Make __ext4_check_dir_entry() a bit easier to understand, and reduce the object size of the function by over 11%. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20191209004346.38526-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-14ext4: check for directory entries too close to block endJan Kara
ext4_check_dir_entry() currently does not catch a case when a directory entry ends so close to the block end that the header of the next directory entry would not fit in the remaining space. This can lead to directory iteration code trying to access address beyond end of current buffer head leading to oops. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191202170213.4761-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-14ext4: fix ext4_empty_dir() for directories with holesJan Kara
Function ext4_empty_dir() doesn't correctly handle directories with holes and crashes on bh->b_data dereference when bh is NULL. Reorganize the loop to use 'offset' variable all the times instead of comparing pointers to current direntry with bh->b_data pointer. Also add more strict checking of '.' and '..' directory entries to avoid entering loop in possibly invalid state on corrupted filesystems. References: CVE-2019-19037 CC: stable@vger.kernel.org Fixes: 4e19d6b65fb4 ("ext4: allow directory holes") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191202170213.4761-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-12-09fs/ext4/inode-test: Fix inode test on 32 bit platforms.Iurii Zaikin
Fixes the issue caused by the fact that in C in the expression of the form -1234L only 1234L is the actual literal, the unary minus is an operation applied to the literal. Which means that to express the lower bound for the type one has to negate the upper bound and subtract 1. Original error: Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == -2147483648 timestamp.tv_sec == 2147483648 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits: msb:1 lower_bound:1 extra_bits: 0 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 2147483648 timestamp.tv_sec == 6442450944 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on: msb:1 lower_bound:1 extra_bits: 1 Expected test_data[i].expected.tv_sec == timestamp.tv_sec, but test_data[i].expected.tv_sec == 6442450944 timestamp.tv_sec == 10737418240 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on: msb:1 lower_bound:1 extra_bits: 2 not ok 1 - inode_test_xtimestamp_decoding not ok 1 - ext4_inode_test Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Iurii Zaikin <yzaikin@google.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Brendan Higgins <brendanhiggins@google.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-30Merge tag 'for_v5.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, quota, reiserfs cleanups and fixes from Jan Kara: - Refactor the quota on/off kernel internal interfaces (mostly for ubifs quota support as ubifs does not want to have inodes holding quota information) - A few other small quota fixes and cleanups - Various small ext2 fixes and cleanups - Reiserfs xattr fix and one cleanup * tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits) ext2: code cleanup for descriptor_loc() fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long ext2: fix improper function comment ext2: code cleanup for ext2_try_to_allocate() ext2: skip unnecessary operations in ext2_try_to_allocate() ext2: Simplify initialization in ext2_try_to_allocate() ext2: code cleanup by calling ext2_group_last_block_no() ext2: introduce new helper ext2_group_last_block_no() reiserfs: replace open-coded atomic_dec_and_mutex_lock() ext2: check err when partial != NULL quota: Handle quotas without quota inodes in dquot_get_state() quota: Make dquot_disable() work without quota inodes quota: Drop dquot_enable() fs: Use dquot_load_quota_inode() from filesystems quota: Rename vfs_load_quota_inode() to dquot_load_quota_inode() quota: Simplify dquot_resume() quota: Factor out setup of quota inode quota: Check that quota is not dirty before release quota: fix livelock in dquot_writeback_dquots ext2: don't set *count in the case of failure in ext2_try_to_allocate() ...
2019-11-30Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge window saw the the following new featuers added to ext4: - Direct I/O via iomap (required the iomap-for-next branch from Darrick as a prereq). - Support for using dioread-nolock where the block size < page size. - Support for encryption for file systems where the block size < page size. - Rework of journal credits handling so a revoke-heavy workload will not cause the journal to run out of space. - Replace bit-spinlocks with spinlocks in jbd2 Also included were some bug fixes and cleanups, mostly to clean up corner cases from fuzzed file systems and error path handling" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits) ext4: work around deleting a file with i_nlink == 0 safely ext4: add more paranoia checking in ext4_expand_extra_isize handling jbd2: make jbd2_handle_buffer_credits() handle reserved handles ext4: fix a bug in ext4_wait_for_tail_page_commit ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails ext4: code cleanup for get_next_id ext4: fix leak of quota reservations ext4: remove unused variable warning in parse_options() ext4: Enable encryption for subpage-sized blocks fs/buffer.c: support fscrypt in block_read_full_page() ext4: Add error handling for io_end_vec struct allocation jbd2: Fine tune estimate of necessary descriptor blocks jbd2: Provide trace event for handle restarts ext4: Reserve revoke credits for freed blocks jbd2: Make credit checking more strict jbd2: Rename h_buffer_credits to h_total_credits jbd2: Reserve space for revoke descriptor blocks jbd2: Drop jbd2_space_needed() jbd2: Account descriptor blocks into t_outstanding_credits jbd2: Factor out common parts of stopping and restarting a handle ...
2019-11-30Merge tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "In this release, we hoisted as much of XFS' writeback code into iomap as was practicable, refactored the unshare file data function, added the ability to perform buffered io copy on write, and tweaked various parts of the directio implementation as needed to port ext4's directio code (that will be a separate pull). Summary: - Make iomap_dio_rw callers explicitly tell us if they want us to wait - Port the xfs writeback code to iomap to complete the buffered io library functions - Refactor the unshare code to share common pieces - Add support for performing copy on write with buffered writes - Other minor fixes - Fix unchecked return in iomap_bmap - Fix a type casting bug in a ternary statement in iomap_dio_bio_actor - Improve tracepoints for easier diagnostic ability - Fix pipe page leakage in directio reads" * tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits) iomap: Fix pipe page leakage during splicing iomap: trace iomap_appply results iomap: fix return value of iomap_dio_bio_actor on 32bit systems iomap: iomap_bmap should check iomap_apply return value iomap: Fix overflow in iomap_page_mkwrite fs/iomap: remove redundant check in iomap_dio_rw() iomap: use a srcmap for a read-modify-write I/O iomap: renumber IOMAP_HOLE to 0 iomap: use write_begin to read pages to unshare iomap: move the zeroing case out of iomap_read_page_sync iomap: ignore non-shared or non-data blocks in xfs_file_dirty iomap: always use AOP_FLAG_NOFS in iomap_write_begin iomap: remove the unused iomap argument to __iomap_write_end iomap: better document the IOMAP_F_* flags iomap: enhance writeback error message iomap: pass a struct page to iomap_finish_page_writeback iomap: cleanup iomap_ioend_compare iomap: move struct iomap_page out of iomap.h iomap: warn on inline maps in iomap_writepage_map iomap: lift the xfs writeback code to iomap ...
2019-11-25Merge tag 'linux-kselftest-5.5-rc1-kunit' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest KUnit support gtom Shuah Khan: "This adds KUnit, a lightweight unit testing and mocking framework for the Linux kernel from Brendan Higgins. KUnit is not an end-to-end testing framework. It is currently supported on UML and sub-systems can write unit tests and run them in UML env. KUnit documentation is included in this update. In addition, this Kunit update adds 3 new kunit tests: - proc sysctl test from Iurii Zaikin - the 'list' doubly linked list test from David Gow - ext4 tests for decoding extended timestamps from Iurii Zaikin In the future KUnit will be linked to Kselftest framework to provide a way to trigger KUnit tests from user-space" * tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits) lib/list-test: add a test for the 'list' doubly linked list ext4: add kunit test for decoding extended timestamps Documentation: kunit: Fix verification command kunit: Fix '--build_dir' option kunit: fix failure to build without printk MAINTAINERS: add proc sysctl KUnit test to PROC SYSCTL section kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec() MAINTAINERS: add entry for KUnit the unit testing framework Documentation: kunit: add documentation for KUnit kunit: defconfig: add defconfigs for building KUnit tests kunit: tool: add Python wrappers for running KUnit tests kunit: test: add tests for KUnit managed resources kunit: test: add the concept of assertions kunit: test: add tests for kunit test abort kunit: test: add support for test abort objtool: add kunit_try_catch_throw to the noreturn list kunit: test: add initial tests lib: enable building KUnit in lib/ kunit: test: add the concept of expectations kunit: test: add assertion printing library ...
2019-11-25Merge tag 'fsverity-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: "Expose the fs-verity bit through statx()" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: docs: fs-verity: mention statx() support f2fs: support STATX_ATTR_VERITY ext4: support STATX_ATTR_VERITY statx: define STATX_ATTR_VERITY docs: fs-verity: document first supported kernel version
2019-11-19Merge branch 'tt/misc' into devTheodore Ts'o
2019-11-19ext4: work around deleting a file with i_nlink == 0 safelyTheodore Ts'o
If the file system is corrupted such that a file's i_links_count is too small, then it's possible that when unlinking that file, i_nlink will already be zero. Previously we were working around this kind of corruption by forcing i_nlink to one; but we were doing this before trying to delete the directory entry --- and if the file system is corrupted enough that ext4_delete_entry() fails, then we exit with i_nlink elevated, and this causes the orphan inode list handling to be FUBAR'ed, such that when we unmount the file system, the orphan inode list can get corrupted. A better way to fix this is to simply skip trying to call drop_nlink() if i_nlink is already zero, thus moving the check to the place where it makes the most sense. https://bugzilla.kernel.org/show_bug.cgi?id=205433 Link: https://lore.kernel.org/r/20191112032903.8828-1-tytso@mit.edu Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2019-11-19ext4: add more paranoia checking in ext4_expand_extra_isize handlingTheodore Ts'o
It's possible to specify a non-zero s_want_extra_isize via debugging option, and this can cause bad things(tm) to happen when using a file system with an inode size of 128 bytes. Add better checking when the file system is mounted, as well as when we are actually doing the trying to do the inode expansion. Link: https://lore.kernel.org/r/20191110121510.GH23325@mit.edu Reported-by: syzbot+f8d6f8386ceacdbfff57@syzkaller.appspotmail.com Reported-by: syzbot+33d7ea72e47de3bdf4e1@syzkaller.appspotmail.com Reported-by: syzbot+44b6763edfc17144296f@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2019-11-14ext4: fix a bug in ext4_wait_for_tail_page_commityangerkun
No need to wait for any commit once the page is fully truncated. Besides, it may confuse e.g. concurrent ext4_writepage() with the page still be dirty (will be cleared by truncate_pagecache() in ext4_setattr()) but buffers has been freed; and then trigger a bug show as below: [ 26.057508] ------------[ cut here ]------------ [ 26.058531] kernel BUG at fs/ext4/inode.c:2134! ... [ 26.088130] Call trace: [ 26.088695] ext4_writepage+0x914/0xb28 [ 26.089541] writeout.isra.4+0x1b4/0x2b8 [ 26.090409] move_to_new_page+0x3b0/0x568 [ 26.091338] __unmap_and_move+0x648/0x988 [ 26.092241] unmap_and_move+0x48c/0xbb8 [ 26.093096] migrate_pages+0x220/0xb28 [ 26.093945] kernel_mbind+0x828/0xa18 [ 26.094791] __arm64_sys_mbind+0xc8/0x138 [ 26.095716] el0_svc_common+0x190/0x490 [ 26.096571] el0_svc_handler+0x60/0xd0 [ 26.097423] el0_svc+0x8/0xc Run the procedure (generate by syzkaller) parallel with ext3. void main() { int fd, fd1, ret; void *addr; size_t length = 4096; int flags; off_t offset = 0; char *str = "12345"; fd = open("a", O_RDWR | O_CREAT); assert(fd >= 0); /* Truncate to 4k */ ret = ftruncate(fd, length); assert(ret == 0); /* Journal data mode */ flags = 0xc00f; ret = ioctl(fd, _IOW('f', 2, long), &flags); assert(ret == 0); /* Truncate to 0 */ fd1 = open("a", O_TRUNC | O_NOATIME); assert(fd1 >= 0); addr = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_SHARED, fd, offset); assert(addr != (void *)-1); memcpy(addr, str, 5); mbind(addr, length, 0, 0, 0, MPOL_MF_MOVE); } And the bug will be triggered once we seen the below order. reproduce1 reproduce2 ... | ... truncate to 4k | change to journal data mode | | memcpy(set page dirty) truncate to 0: | ext4_setattr: | ... | ext4_wait_for_tail_page_commit | | mbind(trigger bug) truncate_pagecache(clean dirty)| ... ... | mbind will call ext4_writepage() since the page still be dirty, and then report the bug since the buffers has been free. Fix it by return directly once offset equals to 0 which means the page has been fully truncated. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: yangerkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-14ext4: bio_alloc with __GFP_DIRECT_RECLAIM never failsGao Xiang
Similar to [1] [2], bio_alloc with __GFP_DIRECT_RECLAIM flags guarantees bio allocation under some given restrictions, as stated in block/bio.c and fs/direct-io.c So here it's ok to not check for NULL value from bio_alloc(). [1] https://lore.kernel.org/r/20191030035518.65477-1-gaoxiang25@huawei.com [2] https://lore.kernel.org/r/20190830162812.GA10694@infradead.org Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Link: https://lore.kernel.org/r/20191031092315.139267-1-gaoxiang25@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-14ext4: code cleanup for get_next_idChengguang Xu
Now the checks in ext4_get_next_id() and dquot_get_next_id() are almost the same, so just call dquot_get_next_id() instead of ext4_get_next_id(). Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Link: https://lore.kernel.org/r/20191006103028.31299-1-cgxu519@mykernel.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-14ext4: fix leak of quota reservationsJan Kara
Commit 8fcc3a580651 ("ext4: rework reserved cluster accounting when invalidating pages") moved freeing of delayed allocation reservations from dirty page invalidation time to time when we evict corresponding status extent from extent status tree. For inodes which don't have any blocks allocated this may actually happen only in ext4_clear_blocks() which is after we've dropped references to quota structures from the inode. Thus reservation of quota leaked. Fix the problem by clearing quota information from the inode only after evicting extent status tree in ext4_clear_inode(). Link: https://lore.kernel.org/r/20191108115420.GI20863@quack2.suse.cz Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 8fcc3a580651 ("ext4: rework reserved cluster accounting when invalidating pages") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-11-14ext4: remove unused variable warning in parse_options()Olof Johansson
Commit c33fbe8f673c5 ("ext4: Enable blocksize < pagesize for dioread_nolock") removed the only user of 'sbi' outside of the ifdef, so it caused a new warning: fs/ext4/super.c:2068:23: warning: unused variable 'sbi' [-Wunused-variable] Fixes: c33fbe8f673c5 ("ext4: Enable blocksize < pagesize for dioread_nolock") Signed-off-by: Olof Johansson <olof@lixom.net> Link: https://lore.kernel.org/r/20191111022523.34256-1-olof@lixom.net Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>