summaryrefslogtreecommitdiff
path: root/fs/f2fs/data.c
AgeCommit message (Collapse)Author
2016-02-22f2fs: avoid multiple node page writes due to inline_dataJaegeuk Kim
The sceanrio is: 1. create fully node blocks 2. flush node blocks 3. write inline_data for all the node blocks again 4. flush node blocks redundantly So, this patch tries to flush inline_data when flushing node blocks. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: do f2fs_balance_fs when block is allocatedJaegeuk Kim
We should consider data block allocation to trigger f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: use writepages->lock for WB_SYNC_ALLJaegeuk Kim
If there are many writepages calls by multiple threads in background, we don't need to serialize to merge all the bios, since it's background. In such the case, it'd better to run writepages concurrently. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: remove needless condition checkJaegeuk Kim
This patch removes needless condition variable. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-02-22f2fs: relocate is_merged_pageChao Yu
Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-22wrappers for ->i_mutex accessAl Viro
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-11f2fs: detect idle time depending on user behaviorJaegeuk Kim
This patch adds last time that user requested filesystem operations. This information is used to detect whether system is idle or not later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08f2fs: recognize encrypted data in f2fs_fiemapChao Yu
This patch fixes to teach f2fs_fiemap to recognize encrypted data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08f2fs: clean up f2fs_balance_fsJaegeuk Kim
This patch adds one parameter to clean up all the callers of f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08f2fs: avoid unnecessary f2fs_balance_fs callsJaegeuk Kim
Only when node page is newly dirtied, it needs to check whether we need to do f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-08f2fs: check the page status filled from diskJaegeuk Kim
After reading a page, we need to check whether there is any error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-06f2fs: read isize while holding i_mutex in fiemapFan Li
make sure the isize we read doesn't change during the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-01-03f2fs: introduce max_file_blocks in sbiChao Yu
Introduce max_file_blocks in sbi to store max block index of file in f2fs, it could be used to avoid unneeded calculation of max block index in runtime. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix overflow of sbi->max_file_blocks] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-31f2fs: write pending bios when cp_error is setJaegeuk Kim
When testing ioc_shutdown, put_super is able to be hanged by waiting for writebacking pages as follows. INFO: task umount:2723 blocked for more than 120 seconds. Tainted: G O 4.4.0-rc3+ #8 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D ffff88000859f9d8 0 2723 2110 0x00000000 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c Call Trace: [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8182310c>] schedule+0x3c/0x90 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30 [<ffffffff8111617c>] ? ktime_get+0xac/0x140 [<ffffffff818239f0>] ? bit_wait+0x50/0x50 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110 [<ffffffff81823a25>] bit_wait_io+0x35/0x50 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs] [<ffffffff812639bc>] evict+0xbc/0x190 [<ffffffff81263d19>] iput+0x229/0x2c0 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs] [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0 [<ffffffff812478f7>] kill_block_super+0x27/0x70 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs] [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20 [<ffffffff810ac463>] task_work_run+0x73/0xa0 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: use i_size_read to get i_sizeJaegeuk Kim
We need to use i_size_read() to get inode->i_size. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: add a max block check for get_data_block_bmapYunlei He
This patch adds a max block check for get_data_block_bmap. Trinity test program will send a block number as parameter into ioctl_fibmap, which will be used in get_node_path(), when the block number large than f2fs max blocks, it will trigger kernel bug. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> [Jaegeuk Kim: fix missing condition, pointed by Chao Yu] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: fix bugs and simplify codes of f2fs_fiemapFan Li
fix bugs: 1. len could be updated incorrectly when start+len is beyond isize. 2. If there is a hole consisting of more than two blocks, it could fail to add FIEMAP_EXTENT_LAST flag for the last extent. 3. If there is an extent beyond isize, when we search extents in a range that ends at isize, it will also return the extent beyond isize, which is outside the range. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: let user being aware of IO errorChao Yu
Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: avoid f2fs_lock_op in f2fs_write_beginJaegeuk Kim
If f2fs_write_begin is to update data, we can bypass calling f2fs_lock_op() in order to avoid the checkpoint latency in the write syscall. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: introduce prepare_write_begin to clean upJaegeuk Kim
This patch adds prepare_write_begin to clean f2fs_write_begin. The major role of this function is to convert any inline_data and allocate or find block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: call f2fs_balance_fs only when node was changedJaegeuk Kim
If user tries to update or read data, we don't need to call f2fs_balance_fs which triggers f2fs_gc, which increases unnecessary long latency. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: reduce covered region of sbi->cp_rwsem in f2fs_map_blocksChao Yu
Only cover sbi->cp_rwsem on one dnode page's allocation and modification instead of multiple's in f2fs_map_blocks, it can reduce the covered region of cp_rwsem, then we can avoid potential long time delay for concurrent checkpointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: record node block allocation in dnode_of_dataJaegeuk Kim
This patch introduces recording node block allocation in dnode_of_data. This information helps to figure out whether any node block is allocated during specific file operations. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-30f2fs: check inline_data flag at converting timeJaegeuk Kim
We can check inode's inline_data flag when calling to convert it. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-17f2fs: optimize the flow of f2fs_map_blocksFan Li
check map->m_len right after it changes to avoid excess call to update dnode_of_data. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-16f2fs: record dirty status of regular/symlink inodeChao Yu
Maintain regular/symlink inode which has dirty pages in global dirty list and record their total dirty pages count like the way of handling directory inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04f2fs: kill f2fs_drop_largest_extentChao Yu
For direct IO, f2fs only allocate new address for the block which is not exist in the disk before, its mapping info should not exist in extent cache previously, so here we do not need to call f2fs_drop_largest_extent to drop related cache. Due to no more callers for f2fs_drop_largest_extent now, kill it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04f2fs: fix to remove directory inode from dirty listChao Yu
If last dirty dentry page was writebacked in reclaim path, we should remove its directory inode from global dirty list to avoid unnecessary flush for this inode when doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04f2fs: support file defragmentChao Yu
This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file defragment in a specified range of regular file. This ioctl can be used in very limited workload: if user expects high sequential read performance in randomly written file, this interface can be used for defragmentation, after that file can be written as continuous as possible in the device. Meanwhile, it has side-effect, it will make holes in segments where blocks located originally, so it's better to trigger GC to eliminate fragment in segments. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-12-04f2fs: commit atomic written page in LFS modeChao Yu
We should always commit atomic written pages in LFS mode, otherwise data will become corrupted if we encounter suddent power cut after partial pages committed in IPU mode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-20f2fs: support fiemap for inline_dataJaegeuk Kim
There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-20f2fs: flush dirty data for bmapJaegeuk Kim
Users expect bmap will give allocated block addresses. Let's play likewise ext4. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-13f2fs crypto: fix racing of accessing encrypted page amongChao Yu
different competitors Since we use different page cache (normally inode's page cache for R/W and meta inode's page cache for GC) to cache the same physical block which is belong to an encrypted inode. Writeback of these two page cache should be exclusive, but now we didn't handle writeback state well, so there may be potential racing problem: a) kworker: f2fs_gc: - f2fs_write_data_pages - f2fs_write_data_page - do_write_data_page - write_data_page - f2fs_submit_page_mbio (page#1 in inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) - gc_data_segment - move_encrypted_block - pagecache_get_page (page#2 in meta inode's page cache was cached with the invalid datas of physical block located in new blkaddr) - f2fs_submit_page_mbio (page#1 was submitted, later, page#2 with invalid data will be submitted) b) f2fs_gc: - gc_data_segment - move_encrypted_block - f2fs_submit_page_mbio (page#1 in meta inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) user thread: - f2fs_write_begin - f2fs_submit_page_bio (we submit the request to block layer to update page#2 in inode's page cache with physical block located in new blkaddr, so here we may read gabbage data from new blkaddr since GC hasn't writebacked the page#1 yet) This patch fixes above potential racing problem for encrypted inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12f2fs: add a tracepoint for f2fs_read_data_pagesChao Yu
This patch adds a tracepoint for f2fs_read_data_pages to trace when pages are readahead by VFS. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12f2fs: set GFP_NOFS for grab_cache_pageJaegeuk Kim
For normal inodes, their pages are allocated with __GFP_FS, which can cause filesystem calls when reclaiming memory. This can incur a dead lock condition accordingly. So, this patch addresses this problem by introducing f2fs_grab_cache_page(.., bool for_write), which calls grab_cache_page_write_begin() with AOP_FLAG_NOFS. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12Revert "f2fs: do not skip dentry block writes"Jaegeuk Kim
The periodic checkpoint can resolve the previous issue. So, now we can use this again to improve the reported performance regression: https://lkml.org/lkml/2015/10/8/20 This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.
2015-10-09f2fs: do not skip dentry block writesJaegeuk Kim
Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory pressure and the number of dirty pages is pretty small. But, we didn't skip for normal data writes, which gives us not much big impact on overall performance. Moreover, by skipping some data writes, kworker falls into infinite loop to try to write blocks, when many dir inodes have only one dentry block. So, this patch removes skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09f2fs: use correct flag in f2fs_map_blocks()Chao Yu
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc8865 ("f2fs: fix incorrect mapping for bmap"), but forget to use this flag in the right place, fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09f2fs: fix to handle io error in ->direct_IOChao Yu
Here is a oops reported as following message when testing generic/019 of xfstest: ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882! invalid opcode: 0000 [#1] SMP Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_def CPU: 2 PID: 25441 Comm: fio Tainted: G O 4.3.0-rc1+ #6 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000 RIP: 0010:[<ffffffffa0784981>] [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP: 0018:ffff8803fd61f918 EFLAGS: 00010246 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0 Stack: 000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b 0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000 ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358 Call Trace: [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs] [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs] [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs] [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs] [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160 [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0 [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200 [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20 [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs] [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs] [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290 [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50 [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0 [<ffffffff8120b913>] do_io_submit+0x373/0x630 [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0 [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20 [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb RIP [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP <ffff8803fd61f918> ---[ end trace 2e577d7f711ddb86 ]--- The reason is that: in the test of generic/019, we will trigger a manmade IO error in block layer through debugfs, after that, prefree segment will no longer be freed, because we always skip doing gc or checkpoint when there occurs an IO error. Meanwhile fio with aio engine generated a large number of direct IOs, which continue allocating spaces in free segment until we run out of them, eventually, results in panic in new_curseg as no more free segment was found. So, this patch changes to return EIO in direct_IO for this condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09f2fs: reorganize f2fs_map_blocksChao Yu
In this patch, we try to reorganize f2fs_map_blocks to make block mapping flow more clear by using following structure: /* check status of mapping */ if (unmapped) { /* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */ if (create) { /* write path, handle dio write case here */ alloc_and_map; } else { /* * handle read cases from all call paths: * 1. generic read; * 2. dio read; * 3. fiemap; * 4. bmap */ } } /* map buffer_header */ Besides, this patch handles the missing case correctly for dio write: When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will not allocate blocks correctly for preallocated blocks, but returning with an unmapped buffer head, which will result in failure of dio write. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09f2fs: fix overflow of size calculationChao Yu
We have potential overflow issue when calculating size of object, when we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only 32-bits space in 32-bit architecture, left shifting will incur overflow, i.e: pgoff_t index = 0xFFFFFFFF; loff_t size = index << PAGE_CACHE_SHIFT; size: 0xFFFFF000 So we should cast index with 64-bits type to avoid this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-09-03Merge tag 'for-f2fs-4.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major work includes fixing and enhancing the existing extent_cache feature, which has been well settling down so far and now it becomes a default mount option accordingly. Also, this version newly registers a f2fs memory shrinker to reclaim several objects consumed by a couple of data structures in order to avoid memory pressures. Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which triggers a cleaning job explicitly by users. Most of the other patches are to fix bugs occurred in the corner cases across the whole code area" * tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits) f2fs: upset segment_info repair f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent f2fs: update extent tree in batches f2fs: fix to release inode correctly f2fs: handle f2fs_truncate error correctly f2fs: avoid unneeded initializing when converting inline dentry f2fs: atomically set inode->i_flags f2fs: fix wrong pointer access during try_to_free_nids f2fs: use __GFP_NOFAIL to avoid infinite loop f2fs: lookup neighbor extent nodes for merging later f2fs: split __insert_extent_tree_ret for readability f2fs: kill dead code in __insert_extent_tree f2fs: adjust showing of extent cache stat f2fs: add largest/cached stat in extent cache f2fs: fix incorrect mapping for bmap f2fs: add annotation for space utilization of regular/inline dentry f2fs: fix to update cached_en of extent tree properly f2fs: fix typo f2fs: check the node block address of newly allocated nid f2fs: go out for insert_inode_locked failure ...
2015-09-02Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "This first core part of the block IO changes contains: - Cleanup of the bio IO error signaling from Christoph. We used to rely on the uptodate bit and passing around of an error, now we store the error in the bio itself. - Improvement of the above from myself, by shrinking the bio size down again to fit in two cachelines on x86-64. - Revert of the max_hw_sectors cap removal from a revision again, from Jeff Moyer. This caused performance regressions in various tests. Reinstate the limit, bump it to a more reasonable size instead. - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me. Most devices have huge trim limits, which can cause nasty latencies when deleting files. Enable the admin to configure the size down. We will look into having a more sane default instead of UINT_MAX sectors. - Improvement of the SGP gaps logic from Keith Busch. - Enable the block core to handle arbitrarily sized bios, which enables a nice simplification of bio_add_page() (which is an IO hot path). From Kent. - Improvements to the partition io stats accounting, making it faster. From Ming Lei. - Also from Ming Lei, a basic fixup for overflow of the sysfs pending file in blk-mq, as well as a fix for a blk-mq timeout race condition. - Ming Lin has been carrying Kents above mentioned patches forward for a while, and testing them. Ming also did a few fixes around that. - Sasha Levin found and fixed a use-after-free problem introduced by the bio->bi_error changes from Christoph. - Small blk cgroup cleanup from Viresh Kumar" * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits) blk: Fix bio_io_vec index when checking bvec gaps block: Replace SG_GAPS with new queue limits mask block: bump BLK_DEF_MAX_SECTORS to 2560 Revert "block: remove artifical max_hw_sectors cap" blk-mq: fix race between timeout and freeing request blk-mq: fix buffer overflow when reading sysfs file of 'pending' Documentation: update notes in biovecs about arbitrarily sized bios block: remove bio_get_nr_vecs() fs: use helper bio_add_page() instead of open coding on bi_io_vec block: kill merge_bvec_fn() completely md/raid5: get rid of bio_fits_rdev() md/raid5: split bio for chunk_aligned_read block: remove split code in blkdev_issue_{discard,write_same} btrfs: remove bio splitting and merge_bvec_fn() calls bcache: remove driver private bio splitting code block: simplify bio_add_page() block: make generic_make_request handle arbitrarily sized bios blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) block: don't access bio->bi_error after bio_put() block: shrink struct bio down to 2 cache lines again ...
2015-08-21f2fs: fix incorrect mapping for bmapChao Yu
The test step is like below: 1. touch file 2. truncate -s $((1024*1024)) file 3. fallocate -o 0 -l $((1024*1024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20f2fs: handle failed bio allocationJaegeuk Kim
As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-13block: remove bio_get_nr_vecs()Kent Overstreet
We can always fill up the bio now, no need to estimate the possible size based on queue parameters. Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [hch: rebased and wrote a changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-11f2fs: remove inmem radix treeChao Yu
Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit 042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-11f2fs: report EINVAL for unalignment direct IOChao Yu
We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in detail is as fallow: dio04 <<<test_start>>> tag=dio04 stime=1432278894 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write diotest4 5 TPASS : Read beyond the file size ...... the result of ext4 with same environment: dio04 <<<test_start>>> tag=dio04 stime=1432259643 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TPASS : Odd count of read and write diotest4 4 TPASS : Read beyond the file size ...... The reason is that when triggering DIO in f2fs, we will return zero value in ->direct_IO if writer's buffer offset, file offset and transfer size is not alignment to block size of filesystem, resulting in falling back into buffered write instead of returning -EINVAL. This patch fixes that problem by returning correct error number for above case, and removing the judgement condition in check_direct_IO to make sure the verification will be enabled for direct reader too. Besides, Jaegeuk Kim pointed out that there is expectional cases we should always make direct-io falling back into buffered write, such as dio in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Chao Yu make small change and add detail description in commit message] Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05f2fs: use extent cache to optimize f2fs_reserve_blockFan Li
In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05f2fs: fix to release inode page correctlyChao Yu
In following call path, we will pass a locked and referenced ipage pointer to get_new_data_page: - init_inode_metadata - make_empty_dir - get_new_data_page There are two exit paths in get_new_data_page when error occurs: 1) grab_cache_page fails, ipage will not be released; 2) f2fs_reserve_block fails, ipage will be released in callee. So, it's not consistent for error handling in get_new_data_page. For f2fs_reserve_block, it's not very easy to change the rule of error handling, since it's already complicated. Here we deside to choose an easy way to fix this issue: If any error occur in get_new_data_page, we will ensure releasing ipage in this function. The same issue is in f2fs_convert_inline_dir, fix that too. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>