Age | Commit message (Collapse) | Author |
|
This avoids -EINVAL when trying to freeze f2fs.
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This is simpler, and as a side effect it replaces several uses of
kmap_atomic() with its recommended replacement kmap_local_page().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
ATGC is using SSR which violates LFS mode used by zoned device.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we mainly fixed some corner cases that manipulate a
per-file compression flag inappropriately. And, we found f2fs counted
valid blocks in a section incorrectly when zone capacity is set, and
thus, fixed it with additional sysfs entry to check it easily.
Lastly, this series includes several patches with respect to the new
atomic write support such as a couple of bug fixes and re-adding
atomic_write_abort support that we removed by mistake in the previous
release.
Enhancements:
- add sysfs entries to understand atomic write operations and zone
capacity
- introduce memory mode to get a hint for low-memory devices
- adjust the waiting time of foreground GC
- decompress clusters under softirq to avoid non-deterministic
latency
- do not skip updating inode when retrying to flush node page
- enforce single zone capacity
Bug fixes:
- set the compression/no-compression flags correctly
- revive F2FS_IOC_ABORT_VOLATILE_WRITE
- check inline_data during compressed inode conversion
- understand zone capacity when calculating valid block count
As usual, the series includes several minor clean-ups and sanity
checks"
* tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits)
f2fs: use onstack pages instead of pvec
f2fs: intorduce f2fs_all_cluster_page_ready
f2fs: clean up f2fs_abort_atomic_write()
f2fs: handle decompress only post processing in softirq
f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED
f2fs: do not set compression bit if kernel doesn't support
f2fs: remove device type check for direct IO
f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data
f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE
f2fs: fix to do sanity check on segment type in build_sit_entries()
f2fs: obsolete unused MAX_DISCARD_BLOCKS
f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()
f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time
f2fs: introduce sysfs atomic write statistics
f2fs: don't bother wait_ms by foreground gc
f2fs: invalidate meta pages only for post_read required inode
f2fs: allow compression of files without blocks
f2fs: fix to check inline_data during compressed inode conversion
f2fs: Delete f2fs_copy_page() and replace with memcpy_page()
f2fs: fix to invalidate META_MAPPING before DIO write
...
|
|
f2fs_abort_atomic_write() has checked whether current inode is
atomic_write one or not, it's redundant to check in its caller,
remove it for cleanup.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
introduce the below 4 new sysfs node for atomic write statistics.
- current_atomic_write: the total current atomic write block count,
which is not committed yet.
- peak_atomic_write: the peak value of total current atomic write block
count after boot.
- committed_atomic_block: the accumulated total committed atomic write
block count after boot.
- revoked_atomic_block: the accumulated total revoked atomic write block
count after boot.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In order to simplify the complicated per-zone capacity, let's support
only one capacity for entire zoned device.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Introduce memory mode to supports "normal" and "low" memory modes.
"low" mode is to support low memory devices. Because of the nature of
low memory devices, in this mode, f2fs will try to save memory sometimes
by sacrificing performance. "normal" mode is the default mode and same
as before.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.
This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.
In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.
The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.
After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40
[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Switch f2fs over to the functions that are replacing
fscrypt_set_test_dummy_encryption(). Since f2fs hasn't been converted
to the new mount API yet, this doesn't really provide a benefit for
f2fs. But it allows fscrypt_set_test_dummy_encryption() to be removed.
Also take the opportunity to eliminate an #ifdef.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've refactored the existing atomic write support
implemented by in-memory operations to have storing data in disk
temporarily, which can give us a benefit to accept more atomic writes.
At the same time, we removed the existing volatile write support.
We've also revisited the file pinning and GC flows and found some
corner cases which contributeed abnormal system behaviours.
As usual, there're several minor code refactoring for readability,
sanity check, and clean ups.
Enhancements:
- allow compression for mmap files in compress_mode=user
- kill volatile write support
- change the current atomic write way
- give priority to select unpinned section for foreground GC
- introduce data read/write showing path info
- remove unnecessary f2fs_lock_op in f2fs_new_inode
Bug fixes:
- fix the file pinning flow during checkpoint=disable and GCs
- fix foreground and background GCs to select the right victims and
get free sections on time
- fix GC flags on defragmenting pages
- avoid an infinite loop to flush node pages
- fix fallocate to use file_modified to update permissions
consistently"
* tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: fix to tag gcing flag on page during file defragment
f2fs: replace F2FS_I(inode) and sbi by the local variable
f2fs: add f2fs_init_write_merge_io function
f2fs: avoid unneeded error handling for revoke_entry_slab allocation
f2fs: allow compression for mmap files in compress_mode=user
f2fs: fix typo in comment
f2fs: make f2fs_read_inline_data() more readable
f2fs: fix to do sanity check for inline inode
f2fs: fix fallocate to use file_modified to update permissions consistently
f2fs: don't use casefolded comparison for "." and ".."
f2fs: do not stop GC when requiring a free section
f2fs: keep wait_ms if EAGAIN happens
f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters
f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION
f2fs: kill volatile write support
f2fs: change the current atomic write way
f2fs: don't need inode lock for system hidden quota
f2fs: stop allocating pinned sections if EAGAIN happens
f2fs: skip GC if possible when checkpoint disabling
f2fs: give priority to select unpinned section for foreground GC
...
|
|
Almost all other initialization of variables in f2fs_fill_super are
extraced to a single function. Also do it for write_io[], which can
make code more clean.
This patch just refactors the code, theres no functional change.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The f2fs_gc uses a bitmap to indicate pinned sections, but when disabling
chckpoint, we call f2fs_gc() with NULL_SEGNO which selects the same dirty
segment as a victim all the time, resulting in checkpoint=disable failure,
for example. Let's pick another one, if we fail to collect it.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No functional change.
- remove checkpoint=disable check for f2fs_write_checkpoint
- get sec_freed all the time
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There is no good reason to allow this mount option when the kernel isn't
configured with encryption support. Since this option is only for
testing, we can just fix this; we don't really need to worry about
breaking anyone who might be counting on this option being ignored.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Current atomic write has three major issues like below.
- keeps the updates in non-reclaimable memory space and they are even
hard to be migrated, which is not good for contiguous memory
allocation.
- disk spaces used for atomic files cannot be garbage collected, so
this makes it difficult for the filesystem to be defragmented.
- If atomic write operations hit the threshold of either memory usage
or garbage collection failure count, All the atomic write operations
will fail immediately.
To resolve the issues, I will keep a COW inode internally for all the
updates to be flushed from memory, when we need to flush them out in a
situation like high memory pressure. These COW inodes will be tagged
as orphan inodes to be reclaimed in case of sudden power-cut or system
failure during atomic writes.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's avoid false-alarmed lockdep warning.
[ 58.914674] [T1501146] -> #2 (&sb->s_type->i_mutex_key#20){+.+.}-{3:3}:
[ 58.915975] [T1501146] system_server: down_write+0x7c/0xe0
[ 58.916738] [T1501146] system_server: f2fs_quota_sync+0x60/0x1a8
[ 58.917563] [T1501146] system_server: block_operations+0x16c/0x43c
[ 58.918410] [T1501146] system_server: f2fs_write_checkpoint+0x114/0x318
[ 58.919312] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c
[ 58.920214] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c
[ 58.920999] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738
[ 58.921862] [T1501146] system_server: f2fs_sync_file+0x30/0x48
[ 58.922667] [T1501146] system_server: __arm64_sys_fsync+0x84/0xf8
[ 58.923506] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c
[ 58.924604] [T1501146] system_server: do_el0_svc+0x28/0xa0
[ 58.925366] [T1501146] system_server: el0_svc+0x24/0x38
[ 58.926094] [T1501146] system_server: el0_sync_handler+0x88/0xec
[ 58.926920] [T1501146] system_server: el0_sync+0x1b4/0x1c0
[ 58.927681] [T1501146] -> #1 (&sbi->cp_global_sem){+.+.}-{3:3}:
[ 58.928889] [T1501146] system_server: down_write+0x7c/0xe0
[ 58.929650] [T1501146] system_server: f2fs_write_checkpoint+0xbc/0x318
[ 58.930541] [T1501146] system_server: f2fs_issue_checkpoint+0x178/0x21c
[ 58.931443] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c
[ 58.932226] [T1501146] system_server: sync_filesystem+0xac/0x130
[ 58.933053] [T1501146] system_server: generic_shutdown_super+0x38/0x150
[ 58.933958] [T1501146] system_server: kill_block_super+0x24/0x58
[ 58.934791] [T1501146] system_server: kill_f2fs_super+0xcc/0x124
[ 58.935618] [T1501146] system_server: deactivate_locked_super+0x90/0x120
[ 58.936529] [T1501146] system_server: deactivate_super+0x74/0xac
[ 58.937356] [T1501146] system_server: cleanup_mnt+0x128/0x168
[ 58.938150] [T1501146] system_server: __cleanup_mnt+0x18/0x28
[ 58.938944] [T1501146] system_server: task_work_run+0xb8/0x14c
[ 58.939749] [T1501146] system_server: do_notify_resume+0x114/0x1e8
[ 58.940595] [T1501146] system_server: work_pending+0xc/0x5f0
[ 58.941375] [T1501146] -> #0 (&sbi->gc_lock){+.+.}-{3:3}:
[ 58.942519] [T1501146] system_server: __lock_acquire+0x1270/0x2868
[ 58.943366] [T1501146] system_server: lock_acquire+0x114/0x294
[ 58.944169] [T1501146] system_server: down_write+0x7c/0xe0
[ 58.944930] [T1501146] system_server: f2fs_issue_checkpoint+0x13c/0x21c
[ 58.945831] [T1501146] system_server: f2fs_sync_fs+0x48/0x6c
[ 58.946614] [T1501146] system_server: f2fs_do_sync_file+0x334/0x738
[ 58.947472] [T1501146] system_server: f2fs_ioc_commit_atomic_write+0xc8/0x14c
[ 58.948439] [T1501146] system_server: __f2fs_ioctl+0x674/0x154c
[ 58.949253] [T1501146] system_server: f2fs_ioctl+0x54/0x88
[ 58.950018] [T1501146] system_server: __arm64_sys_ioctl+0xa8/0x110
[ 58.950865] [T1501146] system_server: el0_svc_common.llvm.12821150825140585682+0xd8/0x20c
[ 58.951965] [T1501146] system_server: do_el0_svc+0x28/0xa0
[ 58.952727] [T1501146] system_server: el0_svc+0x24/0x38
[ 58.953454] [T1501146] system_server: el0_sync_handler+0x88/0xec
[ 58.954279] [T1501146] system_server: el0_sync+0x1b4/0x1c0
Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If the number of unusable blocks is not larger than
unusable capacity, we can skip GC when checkpoint
disabling.
Signed-off-by: Weichao Guo <guoweichao@oppo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: Fix missing gc_mode assignment]
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There are no more aop flags left, so remove the parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
F2FS zoned support has power of 2 zone size assumption in many places
such as in __f2fs_issue_discard_zone, init_blkz_info. As the power of 2
requirement has been removed from the block layer, explicitly add a
condition in f2fs to allow only power of 2 zone size devices.
This condition will be relaxed once those calculation based on power of
2 is made generic.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Instead of calling bdev_zone_sectors() multiple times, call
it once and cache the value locally. This will make the
subsequent change easier to read.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
There are multiple calculations and reads of fields of sbi that should
be protected by stat_lock. As stat_lock is not used to read these
values in statfs, this can lead to inconsistent results.
Extend the locking to prevent this issue.
Commit c9c8ed50d94c ("f2fs: fix to avoid potential race on
sbi->unusable_block_count access/update")
already added the use of sbi->stat_lock in statfs in
order to make the calculation of multiple, different fields atomic so
that results are consistent. This is similar to that patch regarding the
change in statfs.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch removes obsolete whint_mode.
Fixes: 41d36a9f3e53 ("fs: remove kiocb.ki_hint")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge updates from Andrew Morton:
- A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs
- Most the MM patches which precede the patches in Willy's tree: kasan,
pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
Docs/ABI/testing: add DAMON sysfs interface ABI document
Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
selftests/damon: add a test for DAMON sysfs interface
mm/damon/sysfs: support DAMOS stats
mm/damon/sysfs: support DAMOS watermarks
mm/damon/sysfs: support schemes prioritization
mm/damon/sysfs: support DAMOS quotas
mm/damon/sysfs: support DAMON-based Operation Schemes
mm/damon/sysfs: support the physical address space monitoring
mm/damon/sysfs: link DAMON for virtual address spaces monitoring
mm/damon: implement a minimal stub for sysfs-based DAMON interface
mm/damon/core: add number of each enum type values
mm/damon/core: allow non-exclusive DAMON start/stop
Docs/damon: update outdated term 'regions update interval'
Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
Docs/vm/damon: call low level monitoring primitives the operations
mm/damon: remove unnecessary CONFIG_DAMON option
mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
mm/damon/dbgfs-test: fix is_target_id() change
...
|
|
The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As congestion is no longer tracked, congestion_wait() is effectively
equivalent to io_schedule_timeout().
So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE
and call that instead.
Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, f2fs has some performance improvements for Android
workloads such as using read-unfair rwsems and adding some sysfs
entries to control GCs and discard commands in more details. In
addtiion, it has some tunings to improve the recovery speed after
sudden power-cut.
Enhancement:
- add reader-unfair rwsems with F2FS_UNFAIR_RWSEM: will replace with
generic API support
- adjust to make the readahead/recovery flow more efficiently
- sysfs entries to control issue speeds of GCs and Discard commands
- enable idmapped mounts
Bug fix:
- correct wrong error handling routines
- fix missing conditions in quota
- fix a potential deadlock between writeback and block plug routines
- fix a deadlock btween freezefs and evict_inode
We've added some boundary checks to avoid kernel panics on corrupted
images, and several minor code clean-ups"
* tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (27 commits)
f2fs: fix to do sanity check on .cp_pack_total_block_count
f2fs: make gc_urgent and gc_segment_mode sysfs node readable
f2fs: use aggressive GC policy during f2fs_disable_checkpoint()
f2fs: fix compressed file start atomic write may cause data corruption
f2fs: initialize sbi->gc_mode explicitly
f2fs: introduce gc_urgent_mid mode
f2fs: compress: fix to print raw data size in error path of lz4 decompression
f2fs: remove redundant parameter judgment
f2fs: use spin_lock to avoid hang
f2fs: don't get FREEZE lock in f2fs_evict_inode in frozen fs
f2fs: remove unnecessary read for F2FS_FITS_IN_INODE
f2fs: introduce F2FS_UNFAIR_RWSEM to support unfair rwsem
f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes
f2fs: fix to do sanity check on curseg->alloc_type
f2fs: fix to avoid potential deadlock
f2fs: quota: fix loop condition at f2fs_quota_sync()
f2fs: Restore rwsem lockdep support
f2fs: fix missing free nid in f2fs_handle_failed_inode
f2fs: support idmapped mounts
f2fs: add a way to limit roll forward recovery time
...
|
|
Let's enable GC_URGENT_HIGH mode during f2fs_disable_checkpoint(),
so that we can use SSR allocator for GCed data/node persistence,
it can improve the performance due to it avoiding migration of
data/node locates in selected target segment of SSR allocator.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It needs to initialized sbi->gc_mode to GC_NORMAL explicitly.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's purge inode cache in order to avoid the below deadlock.
[freeze test] shrinkder
freeze_super
- pwercpu_down_write(SB_FREEZE_FS)
- super_cache_scan
- down_read(&sb->s_umount)
- prune_icache_sb
- dispose_list
- evict
- f2fs_evict_inode
thaw_super
- down_write(&sb->s_umount);
- __percpu_down_read(SB_FREEZE_FS)
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
cnt should be passed to sb_has_quota_active() instead of type to check
active quota properly.
Moreover, when the type is -1, the compiler with enough inline knowledge
can discard sb_has_quota_active() check altogether, causing a NULL pointer
dereference at the following inode_lock(dqopt->files[cnt]):
[ 2.796010] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[ 2.796024] Mem abort info:
[ 2.796025] ESR = 0x96000005
[ 2.796028] EC = 0x25: DABT (current EL), IL = 32 bits
[ 2.796029] SET = 0, FnV = 0
[ 2.796031] EA = 0, S1PTW = 0
[ 2.796032] Data abort info:
[ 2.796034] ISV = 0, ISS = 0x00000005
[ 2.796035] CM = 0, WnR = 0
[ 2.796046] user pgtable: 4k pages, 39-bit VAs, pgdp=00000003370d1000
[ 2.796048] [00000000000000a0] pgd=0000000000000000, pud=0000000000000000
[ 2.796051] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 2.796056] CPU: 7 PID: 640 Comm: f2fs_ckpt-259:7 Tainted: G S 5.4.179-arter97-r8-64666-g2f16e087f9d8 #1
[ 2.796057] Hardware name: Qualcomm Technologies, Inc. Lahaina MTP lemonadep (DT)
[ 2.796059] pstate: 80c00005 (Nzcv daif +PAN +UAO)
[ 2.796065] pc : down_write+0x28/0x70
[ 2.796070] lr : f2fs_quota_sync+0x100/0x294
[ 2.796071] sp : ffffffa3f48ffc30
[ 2.796073] x29: ffffffa3f48ffc30 x28: 0000000000000000
[ 2.796075] x27: ffffffa3f6d718b8 x26: ffffffa415fe9d80
[ 2.796077] x25: ffffffa3f7290048 x24: 0000000000000001
[ 2.796078] x23: 0000000000000000 x22: ffffffa3f7290000
[ 2.796080] x21: ffffffa3f72904a0 x20: ffffffa3f7290110
[ 2.796081] x19: ffffffa3f77a9800 x18: ffffffc020aae038
[ 2.796083] x17: ffffffa40e38e040 x16: ffffffa40e38e6d0
[ 2.796085] x15: ffffffa40e38e6cc x14: ffffffa40e38e6d0
[ 2.796086] x13: 00000000000004f6 x12: 00162c44ff493000
[ 2.796088] x11: 0000000000000400 x10: ffffffa40e38c948
[ 2.796090] x9 : 0000000000000000 x8 : 00000000000000a0
[ 2.796091] x7 : 0000000000000000 x6 : 0000d1060f00002a
[ 2.796093] x5 : ffffffa3f48ff718 x4 : 000000000000000d
[ 2.796094] x3 : 00000000060c0000 x2 : 0000000000000001
[ 2.796096] x1 : 0000000000000000 x0 : 00000000000000a0
[ 2.796098] Call trace:
[ 2.796100] down_write+0x28/0x70
[ 2.796102] f2fs_quota_sync+0x100/0x294
[ 2.796104] block_operations+0x120/0x204
[ 2.796106] f2fs_write_checkpoint+0x11c/0x520
[ 2.796107] __checkpoint_and_complete_reqs+0x7c/0xd34
[ 2.796109] issue_checkpoint_thread+0x6c/0xb8
[ 2.796112] kthread+0x138/0x414
[ 2.796114] ret_from_fork+0x10/0x18
[ 2.796117] Code: aa0803e0 aa1f03e1 52800022 aa0103e9 (c8e97d02)
[ 2.796120] ---[ end trace 96e942e8eb6a0b53 ]---
[ 2.800116] Kernel panic - not syncing: Fatal exception
[ 2.800120] SMP: stopping secondary CPUs
Fixes: 9de71ede81e6 ("f2fs: quota: fix potential deadlock")
Cc: <stable@vger.kernel.org> # v5.15+
Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch enables idmapped mounts for f2fs, since all dedicated helpers
for this functionality existsm, so, in this patch we just pass down the
user_namespace argument from the VFS methods to the relevant helpers.
Simple idmap example on f2fs image:
1. truncate -s 128M f2fs.img
2. mkfs.f2fs f2fs.img
3. mount f2fs.img /mnt/f2fs/
4. touch /mnt/f2fs/file
5. ls -ln /mnt/f2fs/
total 0
-rw-r--r-- 1 0 0 0 2月 4 13:17 file
6. ./mount-idmapped --map-mount b:0:1001:1 /mnt/f2fs/ /mnt/scratch_f2fs/
7. ls -ln /mnt/scratch_f2fs/
total 0
-rw-r--r-- 1 1001 1001 0 2月 4 13:17 file
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This adds a sysfs entry to call checkpoint during fsync() in order to avoid
long elapsed time to run roll-forward recovery when booting the device.
Default value doesn't enforce the limitation which is same as before.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Once F2FS_IPU_FORCE policy is enabled in some cases:
a) f2fs forces to use F2FS_IPU_FORCE in a small-sized volume
b) user sets F2FS_IPU_FORCE policy via sysfs
Then we may fail to defragment file due to IPU policy check, it doesn't
make sense, let's introduce a new IPU policy to allow OPU during file
defragmentation.
In small-sized volume, let's enable F2FS_IPU_HONOR_OPU_WRITE policy
by default.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode cleanup from Gabriel Krisman Bertazi:
"A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
the previous CONFIG_UNICODE. It is -rc material since we don't want to
expose the former symbol on 5.17.
This has been living on linux-next for the past week"
* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: clean up the Kconfig symbol confusion
|
|
f2fs rw_semaphores work better if writers can starve readers,
especially for the checkpoint thread, because writers are strictly
more important than reader threads. This prevents significant priority
inversion between low-priority readers that blocked while trying to
acquire the read lock and a second acquisition of the write lock that
might be blocking high priority work.
Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.
Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.
Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tried to address some performance issues in
f2fs_checkpoint and direct IO flows. Also, there was a work to enhance
the page cache management used for compression. Other than them, we've
done typical work including sysfs, code clean-ups, tracepoint, sanity
check, in addition to bug fixes on corner cases.
Enhancements:
- use iomap for direct IO
- try to avoid lock contention to improve f2fs_ckpt speed
- avoid unnecessary memory allocation in compression flow
- POSIX_FADV_DONTNEED drops the page cache containing compression
pages
- add some sysfs entries (gc_urgent_high_remaining, pending_discard)
Bug fixes:
- try not to expose unwritten blocks to user by DIO (this was added
to avoid merge conflict; another patch is coming to address other
missing case)
- relax minor error condition for file pinning feature used in
Android OTA
- fix potential deadlock case in compression flow
- should not truncate any block on pinned file
In addition, we've done some code clean-ups and tracepoint/sanity
check improvement"
* tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits)
f2fs: do not allow partial truncation on pinned file
f2fs: remove redunant invalidate compress pages
f2fs: Simplify bool conversion
f2fs: don't drop compressed page cache in .{invalidate,release}page
f2fs: fix to reserve space for IO align feature
f2fs: fix to check available space of CP area correctly in update_ckpt_flags()
f2fs: support fault injection to f2fs_trylock_op()
f2fs: clean up __find_inline_xattr() with __find_xattr()
f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr()
f2fs: do not bother checkpoint by f2fs_get_node_info
f2fs: avoid down_write on nat_tree_lock during checkpoint
f2fs: compress: fix potential deadlock of compress file
f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
f2fs: add gc_urgent_high_remaining sysfs node
f2fs: fix to do sanity check in is_alive()
f2fs: fix to avoid panic in is_alive() if metadata is inconsistent
f2fs: fix to do sanity check on inode type during garbage collection
f2fs: avoid duplicate call of mark_inode_dirty
f2fs: show number of pending discard commands
f2fs: support POSIX_FADV_DONTNEED drop compressed page cache
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode updates from Gabriel Krisman Bertazi:
"This includes patches from Christoph Hellwig to split the large data
tables of the unicode subsystem into a loadable module, which allow
users to not have them around if case-insensitive filesystems are not
to be used. It also includes minor code fixes to unicode and its
users, from the same author.
All the patches here have been on linux-next releases for the past
months"
* tag 'unicode-for-next-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: only export internal symbols for the selftests
unicode: Add utf8-data module
unicode: cache the normalization tables in struct unicode_map
unicode: move utf8cursor to utf8-selftest.c
unicode: simplify utf8len
unicode: remove the unused utf8{,n}age{min,max} functions
unicode: pass a UNICODE_AGE() tripple to utf8_load
unicode: mark the version field in struct unicode_map unsigned
unicode: remove the charset field from struct unicode_map
f2fs: simplify f2fs_sb_read_encoding
ext4: simplify ext4_sb_read_encoding
|
|
Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:
- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.
Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.
It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.
This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.
For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.
linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.
Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
https://bugzilla.kernel.org/show_bug.cgi?id=204137
With below script, we will hit panic during new segment allocation:
DISK=bingo.img
MOUNT_DIR=/mnt/f2fs
dd if=/dev/zero of=$DISK bs=1M count=105
mkfs.f2fe -a 1 -o 19 -t 1 -z 1 -f -q $DISK
mount -t f2fs $DISK $MOUNT_DIR -o "noinline_dentry,flush_merge,noextent_cache,mode=lfs,io_bits=7,fsync_mode=strict"
for (( i = 0; i < 4096; i++ )); do
name=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 10`
mkdir $MOUNT_DIR/$name
done
umount $MOUNT_DIR
rm $DISK
--- Core dump ---
Call Trace:
allocate_segment_by_default+0x9d/0x100 [f2fs]
f2fs_allocate_data_block+0x3c0/0x5c0 [f2fs]
do_write_page+0x62/0x110 [f2fs]
f2fs_outplace_write_data+0x43/0xc0 [f2fs]
f2fs_do_write_data_page+0x386/0x560 [f2fs]
__write_data_page+0x706/0x850 [f2fs]
f2fs_write_cache_pages+0x267/0x6a0 [f2fs]
f2fs_write_data_pages+0x19c/0x2e0 [f2fs]
do_writepages+0x1c/0x70
__filemap_fdatawrite_range+0xaa/0xe0
filemap_fdatawrite+0x1f/0x30
f2fs_sync_dirty_inodes+0x74/0x1f0 [f2fs]
block_operations+0xdc/0x350 [f2fs]
f2fs_write_checkpoint+0x104/0x1150 [f2fs]
f2fs_sync_fs+0xa2/0x120 [f2fs]
f2fs_balance_fs_bg+0x33c/0x390 [f2fs]
f2fs_write_node_pages+0x4c/0x1f0 [f2fs]
do_writepages+0x1c/0x70
__writeback_single_inode+0x45/0x320
writeback_sb_inodes+0x273/0x5c0
wb_writeback+0xff/0x2e0
wb_workfn+0xa1/0x370
process_one_work+0x138/0x350
worker_thread+0x4d/0x3d0
kthread+0x109/0x140
ret_from_fork+0x25/0x30
The root cause here is, with IO alignment feature enables, in worst
case, we need F2FS_IO_SIZE() free blocks space for single one 4k write
due to IO alignment feature will fill dummy pages to make IO being
aligned.
So we will easily run out of free segments during non-inline directory's
data writeback, even in process of foreground GC.
In order to fix this issue, I just propose to reserve additional free
space for IO alignment feature to handle worst case of free space usage
ratio during FGGC.
Fixes: 0a595ebaaa6b ("f2fs: support IO alignment for DATA and NODE writes")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
f2fs: support fault injection for f2fs_trylock_op()
This patch supports to inject fault into f2fs_trylock_op().
Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Added a new sysfs node called gc_urgent_high_remaining. The user can
set the trial count limit for GC urgent high mode with this value. If
GC thread gets to the limit, the mode will turn back to GC normal mode.
By default, the value is zero, which means there is no limit like before.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Pull zstd update from Nick Terrell:
"Update to zstd-1.4.10.
Add myself as the maintainer of zstd and update the zstd version in
the kernel, which is now 4 years out of date, to a much more recent
zstd release. This includes bug fixes, much more extensive fuzzing,
and performance improvements. And generates the kernel zstd
automatically from upstream zstd, so it is easier to keep the zstd
verison up to date, and we don't fall so far out of date again.
This includes 5 commits that update the zstd library version:
- Adds a new kernel-style wrapper around zstd.
This wrapper API is functionally equivalent to the subset of the
current zstd API that is currently used. The wrapper API changes to
be kernel style so that the symbols don't collide with zstd's
symbols. The update to zstd-1.4.10 maintains the same API and
preserves the semantics, so that none of the callers need to be
updated. All callers are updated in the commit, because there are
zero functional changes.
- Adds an indirection for `lib/decompress_unzstd.c` so it doesn't
depend on the layout of `lib/zstd/` to include every source file.
This allows the next patch to be automatically generated.
- Imports the zstd-1.4.10 source code. This commit is automatically
generated from upstream zstd (https://github.com/facebook/zstd).
- Adds me (terrelln@fb.com) as the maintainer of `lib/zstd`.
- Fixes a newly added build warning for clang.
The discussion around this patchset has been pretty long, so I've
included a FAQ-style summary of the history of the patchset, and why
we are taking this approach.
Why do we need to update?
-------------------------
The zstd version in the kernel is based off of zstd-1.3.1, which is
was released August 20, 2017. Since then zstd has seen many bug fixes
and performance improvements. And, importantly, upstream zstd is
continuously fuzzed by OSS-Fuzz, and bug fixes aren't backported to
older versions. So the only way to sanely get these fixes is to keep
up to date with upstream zstd.
There are no known security issues that affect the kernel, but we need
to be able to update in case there are. And while there are no known
security issues, there are relevant bug fixes. For example the problem
with large kernel decompression has been fixed upstream for over 2
years [1]
Additionally the performance improvements for kernel use cases are
significant. Measured for x86_64 on my Intel i9-9900k @ 3.6 GHz:
- BtrFS zstd compression at levels 1 and 3 is 5% faster
- BtrFS zstd decompression+read is 15% faster
- SquashFS zstd decompression+read is 15% faster
- F2FS zstd compression+write at level 3 is 8% faster
- F2FS zstd decompression+read is 20% faster
- ZRAM decompression+read is 30% faster
- Kernel zstd decompression is 35% faster
- Initramfs zstd decompression+build is 5% faster
On top of this, there are significant performance improvements coming
down the line in the next zstd release, and the new automated update
patch generation will allow us to pull them easily.
How is the update patch generated?
----------------------------------
The first two patches are preparation for updating the zstd version.
Then the 3rd patch in the series imports upstream zstd into the
kernel. This patch is automatically generated from upstream. A script
makes the necessary changes and imports it into the kernel. The
changes are:
- Replace all libc dependencies with kernel replacements and rewrite
includes.
- Remove unncessary portability macros like: #if defined(_MSC_VER).
- Use the kernel xxhash instead of bundling it.
This automation gets tested every commit by upstream's continuous
integration. When we cut a new zstd release, we will submit a patch to
the kernel to update the zstd version in the kernel.
The automated process makes it easy to keep the kernel version of zstd
up to date. The current zstd in the kernel shares the guts of the
code, but has a lot of API and minor changes to work in the kernel.
This is because at the time upstream zstd was not ready to be used in
the kernel envrionment as-is. But, since then upstream zstd has
evolved to support being used in the kernel as-is.
Why are we updating in one big patch?
-------------------------------------
The 3rd patch in the series is very large. This is because it is
restructuring the code, so it both deletes the existing zstd, and
re-adds the new structure. Future updates will be directly
proportional to the changes in upstream zstd since the last import.
They will admittidly be large, as zstd is an actively developed
project, and has hundreds of commits between every release. However,
there is no other great alternative.
One option ruled out is to replay every upstream zstd commit. This is
not feasible for several reasons:
- There are over 3500 upstream commits since the zstd version in the
kernel.
- The automation to automatically generate the kernel update was only
added recently, so older commits cannot easily be imported.
- Not every upstream zstd commit builds.
- Only zstd releases are "supported", and individual commits may have
bugs that were fixed before a release.
Another option to reduce the patch size would be to first reorganize
to the new file structure, and then apply the patch. However, the
current kernel zstd is formatted with clang-format to be more
"kernel-like". But, the new method imports zstd as-is, without
additional formatting, to allow for closer correlation with upstream,
and easier debugging. So the patch wouldn't be any smaller.
It also doesn't make sense to import upstream zstd commit by commit
going forward. Upstream zstd doesn't support production use cases
running of the development branch. We have a lot of post-commit
fuzzing that catches many bugs, so indiviudal commits may be buggy,
but fixed before a release. So going forward, I intend to import every
(important) zstd release into the Kernel.
So, while it isn't ideal, updating in one big patch is the only patch
I see forward.
Who is responsible for this code?
---------------------------------
I am. This patchset adds me as the maintainer for zstd. Previously,
there was no tree for zstd patches. Because of that, there were
several patches that either got ignored, or took a long time to merge,
since it wasn't clear which tree should pick them up. I'm officially
stepping up as maintainer, and setting up my tree as the path through
which zstd patches get merged. I'll make sure that patches to the
kernel zstd get ported upstream, so they aren't erased when the next
version update happens.
How is this code tested?
------------------------
I tested every caller of zstd on x86_64 (BtrFS, ZRAM, SquashFS, F2FS,
Kernel, InitRAMFS). I also tested Kernel & InitRAMFS on i386 and
aarch64. I checked both performance and correctness.
Also, thanks to many people in the community who have tested these
patches locally.
Lastly, this code will bake in linux-next before being merged into
v5.16.
Why update to zstd-1.4.10 when zstd-1.5.0 has been released?
------------------------------------------------------------
This patchset has been outstanding since 2020, and zstd-1.4.10 was the
latest release when it was created. Since the update patch is
automatically generated from upstream, I could generate it from
zstd-1.5.0.
However, there were some large stack usage regressions in zstd-1.5.0,
and are only fixed in the latest development branch. And the latest
development branch contains some new code that needs to bake in the
fuzzer before I would feel comfortable releasing to the kernel.
Once this patchset has been merged, and we've released zstd-1.5.1, we
can update the kernel to zstd-1.5.1, and exercise the update process.
You may notice that zstd-1.4.10 doesn't exist upstream. This release
is an artifical release based off of zstd-1.4.9, with some fixes for
the kernel backported from the development branch. I will tag the
zstd-1.4.10 release after this patchset is merged, so the Linux Kernel
is running a known version of zstd that can be debugged upstream.
Why was a wrapper API added?
----------------------------
The first versions of this patchset migrated the kernel to the
upstream zstd API. It first added a shim API that supported the new
upstream API with the old code, then updated callers to use the new
shim API, then transitioned to the new code and deleted the shim API.
However, Cristoph Hellwig suggested that we transition to a kernel
style API, and hide zstd's upstream API behind that. This is because
zstd's upstream API is supports many other use cases, and does not
follow the kernel style guide, while the kernel API is focused on the
kernel's use cases, and follows the kernel style guide.
Where is the previous discussion?
---------------------------------
Links for the discussions of the previous versions of the patch set
below. The largest changes in the design of the patchset are driven by
the discussions in v11, v5, and v1. Sorry for the mix of links, I
couldn't find most of the the threads on lkml.org"
Link: https://lkml.org/lkml/2020/9/29/27 [1]
Link: https://www.spinics.net/lists/linux-crypto/msg58189.html [v12]
Link: https://lore.kernel.org/linux-btrfs/20210430013157.747152-1-nickrterrell@gmail.com/ [v11]
Link: https://lore.kernel.org/lkml/20210426234621.870684-2-nickrterrell@gmail.com/ [v10]
Link: https://lore.kernel.org/linux-btrfs/20210330225112.496213-1-nickrterrell@gmail.com/ [v9]
Link: https://lore.kernel.org/linux-f2fs-devel/20210326191859.1542272-1-nickrterrell@gmail.com/ [v8]
Link: https://lkml.org/lkml/2020/12/3/1195 [v7]
Link: https://lkml.org/lkml/2020/12/2/1245 [v6]
Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v5]
Link: https://www.spinics.net/lists/linux-btrfs/msg105783.html [v4]
Link: https://lkml.org/lkml/2020/9/23/1074 [v3]
Link: https://www.spinics.net/lists/linux-btrfs/msg105505.html [v2]
Link: https://lore.kernel.org/linux-btrfs/20200916034307.2092020-1-nickrterrell@gmail.com/ [v1]
Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
* tag 'zstd-for-linus-v5.16' of git://github.com/terrelln/linux:
lib: zstd: Add cast to silence clang's -Wbitwise-instead-of-logical
MAINTAINERS: Add maintainer entry for zstd
lib: zstd: Upgrade to latest upstream zstd version 1.4.10
lib: zstd: Add decompress_sources.h for decompress_unzstd
lib: zstd: Add kernel-specific API
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we've applied relatively small number of patches which
fix subtle corner cases mainly, while introducing a new mount option
to be able to fragment the disk intentionally for performance tests.
Enhancements:
- add a mount option to fragmente on-disk layout to understand the
performance
- support direct IO for multi-partitions
- add a fault injection of dquot_initialize
Bug fixes:
- address some lockdep complaints
- fix a deadlock issue with quota
- fix a memory tuning condition
- fix compression condition to improve the ratio
- fix disabling compression on the non-empty compressed file
- invalidate cached pages before IPU/DIO writes
And, we've added some minor clean-ups as usual"
* tag 'f2fs-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: fix UAF in f2fs_available_free_memory
f2fs: invalidate META_MAPPING before IPU/DIO write
f2fs: support fault injection for dquot_initialize()
f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()
f2fs: compress: disallow disabling compress on non-empty compressed file
f2fs: compress: fix overwrite may reduce compress ratio unproperly
f2fs: multidevice: support direct IO
f2fs: introduce fragment allocation mode mount option
f2fs: replace snprintf in show functions with sysfs_emit
f2fs: include non-compressed blocks in compr_written_block
f2fs: fix wrong condition to trigger background checkpoint correctly
f2fs: fix to use WHINT_MODE
f2fs: fix up f2fs_lookup tracepoints
f2fs: set SBI_NEED_FSCK flag when inconsistent node block found
f2fs: introduce excess_dirty_threshold()
f2fs: avoid attaching SB_ACTIVE flag during mount
f2fs: quota: fix potential deadlock
f2fs: should use GFP_NOFS for directory inodes
|
|
if2fs_fill_super
-> f2fs_build_segment_manager
-> create_discard_cmd_control
-> f2fs_start_discard_thread
It invokes kthread_run to create a thread and run issue_discard_thread.
However, if f2fs_build_node_manager fails, the control flow goes to
free_nm and calls f2fs_destroy_node_manager. This function will free
sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info
after the deallocation, but before the f2fs_stop_discard_thread, it will
cause UAF(Use-after-free).
-> f2fs_destroy_segment_manager
-> destroy_discard_cmd_control
-> f2fs_stop_discard_thread
Fix this by stopping discard thread before f2fs_destroy_node_manager.
Note that, the commit d6d2b491a82e1 introduces the call of
f2fs_available_free_memory into issue_discard_thread.
Cc: stable@vger.kernel.org
Fixes: d6d2b491a82e ("f2fs: allow to change discard policy based on cached discard cmds")
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch:
- Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
- Updates modified zstd headers to yearless copyright
- Adds a new API in `include/linux/zstd.h` that is functionally
equivalent to the in-use subset of the current API. Functions are
renamed to avoid symbol collisions with zstd, to make it clear it is
not the upstream zstd API, and to follow the kernel style guide.
- Updates all callers to use the new API.
There are no functional changes in this patch. Since there are no
functional change, I felt it was okay to update all the callers in a
single patch. Once the API is approved, the callers are mechanically
changed.
This patch is preparing for the 3rd patch in this series, which updates
zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
to callers, the update can happen transparently.
Signed-off-by: Nick Terrell <terrelln@fb.com>
Tested By: Paul Jones <paul@pauljones.id.au>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
Tested-by: Jean-Denis Girard <jd.girard@sysnux.pf>
|
|
This patch adds a new function f2fs_dquot_initialize() to wrap
dquot_initialize(), and it supports to inject fault into
f2fs_dquot_initialize() to simulate inner failure occurs in
dquot_initialize().
Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Pavel Machek reported in [1]
This code looks quite confused: part of function returns 1 on
corruption, part returns -errno. The problem is not stable-specific.
[1] https://lkml.org/lkml/2021/9/19/207
Let's fix to make 'insane cp_payload case' to return 1 rater than
EFSCORRUPTED, so that return value can be kept consistent for all
error cases, it can avoid confusion of code logic.
Fixes: 65ddf6564843 ("f2fs: fix to do sanity check for sb/cp fields correctly")
Reported-by: Pavel Machek <pavel@denx.de>
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 3c62be17d4f5 ("f2fs: support multiple devices") missed
to support direct IO for multiple device feature, this patch
adds to support the missing part of multidevice feature.
In addition, for multiple device image, we should be aware of
any issued direct write IO rather than just buffered write IO,
so that fsync and syncfs can issue a preflush command to the
device where direct write IO goes, to persist user data for
posix compliant.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|