summaryrefslogtreecommitdiff
path: root/fs/f2fs/f2fs.h
AgeCommit message (Collapse)Author
2020-05-11f2fs: introduce mempool for {,de}compress intermediate page allocationChao Yu
If compression feature is on, in scenario of no enough free memory, page refault ratio is higher than before, the root cause is: - {,de}compression flow needs to allocate intermediate pages to store compressed data in cluster, so during their allocation, vm may reclaim mmaped pages. - if above reclaimed pages belong to compressed cluster, during its refault, it may cause more intermediate pages allocation, result in reclaiming more mmaped pages. So this patch introduces a mempool for intermediate page allocation, in order to avoid high refault ratio, by default, number of preallocated page in pool is 512, user can change the number by assigning 'num_compress_pages' parameter during module initialization. Ma Feng found warnings in the original patch and fixed like below. Fix the following sparse warning: fs/f2fs/compress.c:501:5: warning: symbol 'num_compress_pages' was not declared. Should it be static? fs/f2fs/compress.c:530:6: warning: symbol 'f2fs_compress_free_page' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Ma Feng <mafeng.ma@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: support partial truncation on compressed inodeChao Yu
Supports to truncate compressed/normal cluster partially on compressed inode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: remove redundant compress inode checkChao Yu
due to f2fs_post_read_required() has did that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-05-08f2fs: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header files related to F2FS File System support. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-17f2fs: support read iostatChao Yu
Adds to support accounting read IOs from userspace/kernel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-17f2fs: add tracepoint for f2fs iostatDaeho Jeong
Added a tracepoint to see iostat of f2fs. Default period of that is 3 second. This tracepoint can be used to be monitoring I/O statistics periodically. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-16f2fs: introduce sysfs/data_io_flag to attach REQ_META/FUAJaegeuk Kim
This patch introduces a way to attach REQ_META/FUA explicitly to all the data writes given temperature. -> attach REQ_FUA to Hot Data writes -> attach REQ_FUA to Hot|Warm Data writes -> attach REQ_FUA to Hot|Warm|Cold Data writes -> attach REQ_FUA to Hot|Warm|Cold Data writes as well as REQ_META to Hot Data writes Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-07Merge tag 'f2fs-for-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've mainly focused on fixing bugs and addressing issues in recently introduced compression support. Enhancement: - add zstd support, and set LZ4 by default - add ioctl() to show # of compressed blocks - show mount time in debugfs - replace rwsem with spinlock - avoid lock contention in DIO reads Some major bug fixes wrt compression: - compressed block count - memory access and leak - remove obsolete fields - flag controls Other bug fixes and clean ups: - fix overflow when handling .flags in inode_info - fix SPO issue during resize FS flow - fix compression with fsverity enabled - potential deadlock when writing compressed pages - show missing mount options" * tag 'f2fs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: keep inline_data when compression conversion f2fs: fix to disable compression on directory f2fs: add missing CONFIG_F2FS_FS_COMPRESSION f2fs: switch discard_policy.timeout to bool type f2fs: fix to verify tpage before releasing in f2fs_free_dic() f2fs: show compression in statx f2fs: clean up dic->tpages assignment f2fs: compress: support zstd compress algorithm f2fs: compress: add .{init,destroy}_decompress_ctx callback f2fs: compress: fix to call missing destroy_compress_ctx() f2fs: change default compression algorithm f2fs: clean up {cic,dic}.ref handling f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages() f2fs: xattr.h: Make stub helpers inline f2fs: fix to avoid double unlock f2fs: fix potential .flags overflow on 32bit architecture f2fs: fix NULL pointer dereference in f2fs_verity_work() f2fs: fix to clear PG_error if fsverity failed f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile() f2fs: don't trigger data flush in foreground operation ...
2020-04-03f2fs: fix to disable compression on directoryChao Yu
It needs to call f2fs_disable_compressed_file() to disable compression on directory. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-03f2fs: switch discard_policy.timeout to bool typeChao Yu
While checking discard timeout, we use specified type UMOUNT_DISCARD_TIMEOUT, so just replace doplicy.timeout with it, and switch doplicy.timeout to bool type. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-04-03f2fs: compress: support zstd compress algorithmChao Yu
Add zstd compress algorithm support, use "compress_algorithm=zstd" mountoption to enable it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix potential .flags overflow on 32bit architectureChao Yu
f2fs_inode_info.flags is unsigned long variable, it has 32 bits in 32bit architecture, since we introduced FI_MMAP_FILE flag when we support data compression, we may access memory cross the border of .flags field, corrupting .i_sem field, result in below deadlock. To fix this issue, let's expand .flags as an array to grab enough space to store new flags. Call Trace: __schedule+0x8d0/0x13fc ? mark_held_locks+0xac/0x100 schedule+0xcc/0x260 rwsem_down_write_slowpath+0x3ab/0x65d down_write+0xc7/0xe0 f2fs_drop_nlink+0x3d/0x600 [f2fs] f2fs_delete_inline_entry+0x300/0x440 [f2fs] f2fs_delete_entry+0x3a1/0x7f0 [f2fs] f2fs_unlink+0x500/0x790 [f2fs] vfs_unlink+0x211/0x490 do_unlinkat+0x483/0x520 sys_unlink+0x4a/0x70 do_fast_syscall_32+0x12b/0x683 entry_SYSENTER_32+0xaa/0x102 Fixes: 4c8ff7095bef ("f2fs: support data compression") Tested-by: Ondrej Jirman <megous@megous.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't trigger data flush in foreground operationChao Yu
Data flush can generate heavy IO and cause long latency during flush, so it's not appropriate to trigger it in foreground operation. And also, we may face below potential deadlock during data flush: - f2fs_write_multi_pages - f2fs_write_raw_pages - f2fs_write_single_data_page - f2fs_balance_fs - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - filemap_fdatawrite -- stuck on flush same cluster Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: clean up f2fs_may_encrypt()Chao Yu
Merge below two conditions into f2fs_may_encrypt() for cleanup - IS_ENCRYPTED() - DUMMY_ENCRYPTION_ENABLED() Check IS_ENCRYPTED(inode) condition in f2fs_init_inode_metadata() is enough since we have already set encrypt flag in f2fs_new_inode(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't mark compressed inode dirty during f2fs_iget()Chao Yu
- f2fs_iget - do_read_inode - set_inode_flag(, FI_COMPRESSED_FILE) - __mark_inode_dirty_flag(, true) It's unnecessary, so let's just mark compressed inode dirty while compressed inode conversion. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-25block: move the part_stat* helpers from genhd.h to a new headerChristoph Hellwig
These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-22f2fs: use kmem_cache pool during inline xattr lookupsChao Yu
It's been observed that kzalloc() on lookup_all_xattrs() are called millions of times on Android, quickly becoming the top abuser of slub memory allocator. Use a dedicated kmem cache pool for xattr lookups to mitigate this. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: avoid __GFP_NOFAIL in f2fs_bio_allocChao Yu
__f2fs_bio_alloc() won't fail due to memory pool backend, remove unneeded __GFP_NOFAIL flag in __f2fs_bio_alloc(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: introduce F2FS_IOC_GET_COMPRESS_BLOCKSChao Yu
With this newly introduced interface, user can get block number compression saved in target inode. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: fix to avoid triggering IO in write pathChao Yu
If we are in write IO path, we need to avoid using GFP_KERNEL. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: introduce DEFAULT_IO_TIMEOUTChao Yu
As Geert Uytterhoeven reported: for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50); On some platforms, HZ can be less than 50, then unexpected 0 timeout jiffies will be set in congestion_wait(). This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue. Quoted from Geert Uytterhoeven: "A timeout of HZ means 1 second. HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50. If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20), as that takes care of the special cases, and never returns 0." Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: clean up bggc mount optionChao Yu
There are three status for background gc: on, off and sync, it's a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC) combinations to indicate status of background gc. So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount options, and add F2FS_OPTION().bggc_mode with below three status to clean up codes and enhance bggc mode's scalability. enum { BGGC_MODE_ON, /* background gc is on */ BGGC_MODE_OFF, /* background gc is off */ BGGC_MODE_SYNC, /* * background gc is on, migrating blocks * like foreground gc */ }; Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: clean up lfs/adaptive mount optionChao Yu
This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options, and add F2FS_OPTION.fs_mode with below two status to indicate filesystem mode. enum { FS_MODE_ADAPTIVE, /* use both lfs/ssr allocation */ FS_MODE_LFS, /* use lfs allocation only */ }; It can enhance code readability and fs mode's scalability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: fix to show norecovery mount optionChao Yu
Previously, 'norecovery' mount option will be shown as 'disable_roll_forward', fix to show original option name correctly. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-19f2fs: clean up codes with {f2fs_,}data_blkaddr()Chao Yu
- rename datablock_addr() to data_blkaddr(). - wrap data_blkaddr() with f2fs_data_blkaddr() to clean up parameters. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-11f2fs: fix to check dirty pages during compressed inode conversionChao Yu
Compressed cluster can be generated during dirty data writeback, if there is dirty pages on compressed inode, it needs to disable converting compressed inode to non-compressed one. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-11f2fs: fix to account compressed inode correctlyChao Yu
stat_inc_compr_inode() needs to check FI_COMPRESSED_FILE flag, so in f2fs_disable_compressed_file(), we should call stat_dec_compr_inode() before clearing FI_COMPRESSED_FILE flag. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10f2fs: fix inconsistent commentsChao Yu
Lack of maintenance on comments may mislead developers, fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-10f2fs: cover last_disk_size update with spinlockChao Yu
This change solves below hangtask issue: INFO: task kworker/u16:1:58 blocked for more than 122 seconds. Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u16:1 D 0 58 2 0x00000000 Workqueue: writeback wb_workfn (flush-179:0) Backtrace: (__schedule) from [<c0913234>] (schedule+0x78/0xf4) (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0) (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70) (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac) (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4) (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c) (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4) (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454) (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0) (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4) (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338) (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c) (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544) (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574) (worker_thread) from [<c01564fc>] (kthread+0x144/0x170) (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c) Reported-and-tested-by: Ondřej Jirman <megi@xff.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27f2fs: add missing function name in kernel messageChao Yu
Otherwise, we can not distinguish the exact location of messages, when there are more than one places printing same message. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27f2fs: recycle unused compress_data.chksum feildChao Yu
In Struct compress_data, chksum field was never used, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-02-27f2fs: fix the panic in do_checkpoint()Sahitya Tummala
There could be a scenario where f2fs_sync_meta_pages() will not ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus, resulting in the below panic in do_checkpoint() - f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) && !f2fs_cp_error(sbi)); This can happen in a low-memory condition, where shrinker could also be doing the writepage operation (stack shown below) at the same time when checkpoint is running on another core. schedule down_write f2fs_submit_page_write -> by this time, this page in page cache is tagged as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY is cleared, due to which f2fs_sync_meta_pages() cannot sync this page in do_checkpoint() path. f2fs_do_write_meta_page __f2fs_write_meta_page f2fs_write_meta_page shrink_page_list shrink_inactive_list shrink_node_memcg shrink_node kswapd Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-23f2fs: Add f2fs stats to sysfsHridya Valsaraju
Currently f2fs stats are only available from /d/f2fs/status. This patch adds some of the f2fs stats to sysfs so that they are accessible even when debugfs is not mounted. The following sysfs nodes are added: -/sys/fs/f2fs/<disk>/free_segments -/sys/fs/f2fs/<disk>/cp_foreground_calls -/sys/fs/f2fs/<disk>/cp_background_calls -/sys/fs/f2fs/<disk>/gc_foreground_calls -/sys/fs/f2fs/<disk>/gc_background_calls -/sys/fs/f2fs/<disk>/moved_blocks_foreground -/sys/fs/f2fs/<disk>/moved_blocks_background -/sys/fs/f2fs/<disk>/avg_vblocks Signed-off-by: Hridya Valsaraju <hridya@google.com> [Jaegeuk Kim: allow STAT_FS without DEBUG_FS] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: change to use rwsem for gc_mutexChao Yu
Mutex lock won't serialize callers, in order to avoid starving of unlucky caller, let's use rwsem lock instead. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: convert inline_dir early before starting renameJaegeuk Kim
If we hit an error during rename, we'll get two dentries in different directories. Chao adds to check the room in inline_dir which can avoid needless inversion. This should be done by inode_lock(&old_dir). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-17f2fs: support data compressionChao Yu
This patch tries to support compression in f2fs. - New term named cluster is defined as basic unit of compression, file can be divided into multiple clusters logically. One cluster includes 4 << n (n >= 0) logical pages, compression size is also cluster size, each of cluster can be compressed or not. - In cluster metadata layout, one special flag is used to indicate cluster is compressed one or normal one, for compressed cluster, following metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores data including compress header and compressed data. - In order to eliminate write amplification during overwrite, F2FS only support compression on write-once file, data can be compressed only when all logical blocks in file are valid and cluster compress ratio is lower than specified threshold. - To enable compression on regular inode, there are three ways: * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext Compress metadata layout: [Dnode Structure] +-----------------------------------------------+ | cluster 1 | cluster 2 | ......... | cluster N | +-----------------------------------------------+ . . . . . . . . . Compressed Cluster . . Normal Cluster . +----------+---------+---------+---------+ +---------+---------+---------+---------+ |compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 | +----------+---------+---------+---------+ +---------+---------+---------+---------+ . . . . . . +-------------+-------------+----------+----------------------------+ | data length | data chksum | reserved | compressed data | +-------------+-------------+----------+----------------------------+ Changelog: 20190326: - fix error handling of read_end_io(). - remove unneeded comments in f2fs_encrypt_one_page(). 20190327: - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages(). - don't jump into loop directly to avoid uninitialized variables. - add TODO tag in error path of f2fs_write_cache_pages(). 20190328: - fix wrong merge condition in f2fs_read_multi_pages(). - check compressed file in f2fs_post_read_required(). 20190401 - allow overwrite on non-compressed cluster. - check cluster meta before writing compressed data. 20190402 - don't preallocate blocks for compressed file. - add lz4 compress algorithm - process multiple post read works in one workqueue Now f2fs supports processing post read work in multiple workqueue, it shows low performance due to schedule overhead of multiple workqueue executing orderly. 20190921 - compress: support buffered overwrite C: compress cluster flag V: valid block address N: NEW_ADDR One cluster contain 4 blocks before overwrite after overwrite - VVVV -> CVNN - CVNN -> VVVV - CVNN -> CVNN - CVNN -> CVVV - CVVV -> CVNN - CVVV -> CVVV 20191029 - add kconfig F2FS_FS_COMPRESSION to isolate compression related codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm. note that: will remove lzo backend if Jaegeuk agreed that too. - update codes according to Eric's comments. 20191101 - apply fixes from Jaegeuk 20191113 - apply fixes from Jaegeuk - split workqueue for fsverity 20191216 - apply fixes from Jaegeuk 20200117 - fix to avoid NULL pointer dereference [Jaegeuk Kim] - add tracepoint for f2fs_{,de}compress_pages() - fix many bugs and add some compression stats - fix overwrite/mmap bugs - address 32bit build error, reported by Geert. - bug fixes when handling errors and i_compressed_blocks Reported-by: <noreply@ellerman.id.au> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: introduce private biosetChao Yu
In low memory scenario, we can allocate multiple bios without submitting any of them. - f2fs_write_checkpoint() - block_operations() - f2fs_sync_node_pages() step 1) flush cold nodes, allocate new bio from mempool - bio_alloc() - mempool_alloc() step 2) flush hot nodes, allocate a bio from mempool - bio_alloc() - mempool_alloc() step 3) flush warm nodes, be stuck in below call path - bio_alloc() - mempool_alloc() - loop to wait mempool element release, as we only reserved memory for two bio allocation, however above allocated two bios may never be submitted. So we need avoid using default bioset, in this patch we introduce a private bioset, in where we enlarg mempool element count to total number of log header, so that we can make sure we have enough backuped memory pool in scenario of allocating/holding multiple bios. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: cleanup duplicate stats for atomic filesSahitya Tummala
Remove duplicate sbi->aw_cnt stats counter that tracks the number of atomic files currently opened (it also shows incorrect value sometimes). Use more relit lable sbi->atomic_files to show in the stats. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: Check write pointer consistency of non-open zonesShin'ichiro Kawasaki
To catch f2fs bugs in write pointer handling code for zoned block devices, check write pointers of non-open zones that current segments do not point to. Do this check at mount time, after the fsync data recovery and current segments' write pointer consistency fix. Or when fsync data recovery is disabled by mount option, do the check when there is no fsync data. Check two items comparing write pointers with valid block maps in SIT. The first item is check for zones with no valid blocks. When there is no valid blocks in a zone, the write pointer should be at the start of the zone. If not, next write operation to the zone will cause unaligned write error. If write pointer is not at the zone start, reset the write pointer to place at the zone start. The second item is check between the write pointer position and the last valid block in the zone. It is unexpected that the last valid block position is beyond the write pointer. In such a case, report as a bug. Fix is not required for such zone, because the zone is not selected for next write operation until the zone get discarded. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-01-15f2fs: Check write pointer consistency of open zonesShin'ichiro Kawasaki
On sudden f2fs shutdown, write pointers of zoned block devices can go further but f2fs meta data keeps current segments at positions before the write operations. After remounting the f2fs, this inconsistency causes write operations not at write pointers and "Unaligned write command" error is reported. To avoid the error, compare current segments with write pointers of open zones the current segments point to, during mount operation. If the write pointer position is not aligned with the current segment position, assign a new zone to the current segment. Also check the newly assigned zone has write pointer at zone start. If not, reset write pointer of the zone. Perform the consistency check during fsync recovery. Not to lose the fsync data, do the check after fsync data gets restored and before checkpoint commit which flushes data at current segment positions. Not to cause conflict with kworker's dirfy data/node flush, do the fix within SBI_POR_DOING protection. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-19f2fs: Fix deadlock in f2fs_gc() context during atomic files handlingSahitya Tummala
The FS got stuck in the below stack when the storage is almost full/dirty condition (when FG_GC is being done). schedule_timeout io_schedule_timeout congestion_wait f2fs_drop_inmem_pages_all f2fs_gc f2fs_balance_fs __write_node_page f2fs_fsync_node_pages f2fs_do_sync_file f2fs_ioctl The root cause for this issue is there is a potential infinite loop in f2fs_drop_inmem_pages_all() for the case where gc_failure is true and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is not set. Fix this by keeping track of the total atomic files currently opened and using that to exit from this condition. Fix-suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-19f2fs: show f2fs instance in printk_ratelimitedChao Yu
As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-11-07f2fs: support aligned pinned fileJaegeuk Kim
This patch supports 2MB-aligned pinned file, which can guarantee no GC at all by allocating fully valid 2MB segment. Check free segments by has_not_enough_free_secs() with large budget. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-10-25f2fs: cache global IPU bioChao Yu
In commit 8648de2c581e ("f2fs: add bio cache for IPU"), we added f2fs_submit_ipu_bio() in __write_data_page() as below: __write_data_page() if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) { f2fs_submit_ipu_bio(sbi, bio, page); .... } in order to avoid below deadlock: Thread A Thread B - __write_data_page (inode x, page y) - f2fs_do_write_data_page - set_page_writeback ---- set writeback flag in page y - f2fs_inplace_write_data - f2fs_balance_fs - lock gc_mutex - lock gc_mutex - f2fs_gc - do_garbage_collect - gc_data_segment - move_data_page - f2fs_wait_on_page_writeback - wait_on_page_writeback --- wait writeback of page y However, the bio submission breaks the merge of IPU IOs. So in this patch let's add a global bio cache for merged IPU pages, then f2fs_wait_on_page_writeback() is able to submit bio if a writebacked page is cached in global bio cache. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-10-04f2fs: fix to update time in lazytime modeChao Yu
generic/018 reports an inconsistent status of atime, the testcase is as below: - open file with O_SYNC - write file to construct fraged space - calc md5 of file - record {a,c,m}time - defrag file --- do nothing - umount & mount - check {a,c,m}time The root cause is, as f2fs enables lazytime by default, atime update will dirty vfs inode, rather than dirtying f2fs inode (by set with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will fail to update inode page due to our skip: f2fs_write_inode() if (is_inode_flag_set(inode, FI_DIRTY_INODE)) return 0; So eventually, after evict(), we lose last atime for ever. To fix this issue, we need to check whether {a,c,m,cr}time is consistent in between inode cache and inode page, and only skip f2fs_update_inode() if f2fs inode is not dirty and time is consistent as well. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-21Merge tag 'f2fs-for-5.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we introduced casefolding support in f2fs, and fixed various bugs in individual features such as IO alignment, checkpoint=disable, quota, and swapfile. Enhancement: - support casefolding w/ enhancement in ext4 - support fiemap for directory - support FS_IO_GET|SET_FSLABEL Bug fix: - fix IO stuck during checkpoint=disable - avoid infinite GC loop - fix panic/overflow related to IO alignment feature - fix livelock in swap file - fix discard command leak - disallow dio for atomic_write" * tag 'f2fs-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits) f2fs: add a condition to detect overflow in f2fs_ioc_gc_range() f2fs: fix to add missing F2FS_IO_ALIGNED() condition f2fs: fix to fallback to buffered IO in IO aligned mode f2fs: fix to handle error path correctly in f2fs_map_blocks f2fs: fix extent corrupotion during directIO in LFS mode f2fs: check all the data segments against all node ones f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY f2fs: fix inode rwsem regression f2fs: fix to avoid accessing uninitialized field of inode page in is_alive() f2fs: avoid infinite GC loop due to stale atomic files f2fs: Fix indefinite loop in f2fs_gc() f2fs: convert inline_data in prior to i_size_write f2fs: fix error path of f2fs_convert_inline_page() f2fs: add missing documents of reserve_root/resuid/resgid f2fs: fix flushing node pages when checkpoint is disabled f2fs: enhance f2fs_is_checkpoint_ready()'s readability f2fs: clean up __bio_alloc()'s parameter f2fs: fix wrong error injection path in inc_valid_block_count() f2fs: fix to writeout dirty inode during node flush f2fs: optimize case-insensitive lookups ...
2019-09-16f2fs: fix to fallback to buffered IO in IO aligned modeChao Yu
In LFS mode, we allow OPU for direct IO, however, we didn't consider IO alignment feature, so direct IO can trigger unaligned IO, let's just fallback to buffered IO to keep correct IO alignment semantics in all places. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-06f2fs: fix wrong error injection path in inc_valid_block_count()Chao Yu
If FAULT_BLOCK type error injection is on, in inc_valid_block_count() we may decrease sbi->alloc_valid_block_count percpu stat count incorrectly, fix it. Fixes: 36b877af7992 ("f2fs: Keep alloc_valid_block_count in sync") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-06f2fs: optimize case-insensitive lookupsChao Yu
This patch ports below casefold enhancement patch from ext4 to f2fs commit 3ae72562ad91 ("ext4: optimize case-insensitive lookups") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: support FS_IOC_{GET,SET}FSLABELChao Yu
Support two generic fs ioctls FS_IOC_{GET,SET}FSLABEL, letting f2fs pass generic/492 testcase. Fixes were made by Eric where: - f2fs: fix buffer overruns in FS_IOC_{GET, SET}FSLABEL utf16s_to_utf8s() and utf8s_to_utf16s() take the number of characters, not the number of bytes. - f2fs: fix copying too many bytes in FS_IOC_SETFSLABEL Userspace provides a null-terminated string, so don't assume that the full FSLABEL_MAX bytes can always be copied. - f2fs: add missing authorization check in FS_IOC_SETFSLABEL FS_IOC_SETFSLABEL modifies the filesystem superblock, so it shouldn't be allowed to regular users. Require CAP_SYS_ADMIN, like xfs and btrfs do. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>