Age | Commit message (Collapse) | Author |
|
Busy extent tracking is primarily used to ensure that freed blocks are
not reused for data allocations before the transaction that deleted them
has been committed to stable storage, and secondarily to drive online
discard. None of the use cases applies to zoned RTGs, as the zoned
allocator can't overwrite blocks before resetting the zone, which already
flushes out all transactions touching the RTGs.
So the busy extent tracking is not needed for zoned RTGs, and also not
called for zoned RTGs. But somehow the code to skip allocating and
freeing the structure got lost during the zoned XFS upstreaming process.
This not only causes these structures to unnecessarily allocated, but can
also lead to memory leaks as the xg_busy_extents pointer in the
xfs_group structure is overlayed with the pointer for the linked list
of to be reset zones.
Stop allocating and freeing the structure to not pointlessly allocate
memory which is then leaked when the zone is reset.
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # v6.15
[cem: Fix type and add stable tag]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When processing mount options, efivarfs allocates efivarfs_fs_info (sfi)
early in fs_context initialization. However, sfi is associated with the
superblock and typically freed when the superblock is destroyed. If the
fs_context is released (final put) before fill_super is called—such as
on error paths or during reconfiguration—the sfi structure would leak,
as ownership never transfers to the superblock.
Implement the .free callback in efivarfs_context_ops to ensure any
allocated sfi is properly freed if the fs_context is torn down before
fill_super, preventing this memory leak.
Suggested-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fixes: 5329aa5101f73c ("efivarfs: Add uid/gid mount options")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The only remaining user of ovl_cleanup() is ovl_cleanup_locked(), so we
no longer need both.
This patch renames ovl_cleanup() to ovl_cleanup_locked() and makes it
static.
ovl_cleanup_unlocked() is renamed to ovl_cleanup().
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-22-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of passing an inode *dir, pass a dentry *parent. This makes the
calling slightly cleaner.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-21-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_check_rename_whiteout() now only holds the directory lock when
needed, and takes it again if necessary.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-20-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_whiteout() relies on the workdir i_rwsem to provide exclusive access
to ofs->whiteout which it manipulates. Rather than depending on this,
add a new mutex, "whiteout_lock" to explicitly provide the required
locking. Use guard(mutex) for this so that we can return without
needing to explicitly unlock.
Then take the lock on workdir only when needed - to lookup the temp name
and to do the whiteout or link.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-19-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than locking the directory(s) before calling
ovl_cleanup_and_whiteout(), change it (and ovl_whiteout()) to do the
locking, so the locking can be fine grained as will be needed for
proposed locking changes.
Sometimes this is called to whiteout something in the index dir, in
which case only that dir must be locked. In one case it is called on
something in an upperdir, so two directories must be locked. We use
ovl_lock_rename_workdir() for this and remove the restriction that
upperdir cannot be indexdir - because now sometimes it is.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-18-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This code:
performs a lookup_upper
creates a whiteout object
renames the whiteout over the result of the lookup
The create and the rename must be locked separately for proposed
directory locking changes. This patch takes a first step of moving the
lookup out of the locked region. A subsequent patch will separate the
create from the rename.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-17-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than calling ovl_workdir_cleanup() with the dir already locked,
change it to take the dir lock only when needed.
Also change ovl_workdir_cleanup() to take a dentry for the parent rather
than an inode.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-16-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Only take the dir lock when needed, rather than for the whole loop.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-15-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Instead of taking the directory lock for the whole cleanup, only take it
when needed.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-14-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In ovl_workdir_create() don't hold the dir lock for the whole time, but
only take it when needed.
It now gets taken separately for ovl_workdir_cleanup(). A subsequent
patch will move the locking into that function.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-13-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_cleanup_index() takes a lock on the directory and then does a lookup
and possibly one of two different cleanups.
This patch narrows the locking to use the _unlocked() versions of the
lookup and one cleanup, and just takes the lock for the other cleanup.
A subsequent patch will take the lock into the cleanup.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-12-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than lock the directory for the whole operation, use
ovl_lookup_upper_unlocked() and ovl_cleanup_unlocked() to take the lock
only when needed.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-11-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the rename lock immediately after the rename, and use
ovl_cleanup_unlocked() for cleanup.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-10-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Rather than having three separate goto label: out_unlock, out_dput_old,
and out_dput, make use of that fact that dput() happily accepts a NULL
pointer to reduce this to just one goto label: out_unlock.
olddentry and newdentry are initialised to NULL and only set once a
value dentry is found. They are then dput() late in the function.
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-9-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Unlock the parents immediately after the rename, and use
ovl_cleanup_unlocked() for cleanup, which takes a separate lock.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-8-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the locks immediately after rename, and use a separate lock for
cleanup.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Note that ovl_cleanup_whiteouts() operates on "upper", a child of
"upperdir" and does not require upperdir or workdir to be locked.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-7-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the directory lock immediately after the ovl_create_real() call and
take a separate lock later for cleanup in ovl_cleanup_unlocked() - if
needed.
This makes way for future changes where locks are taken on individual
dentries rather than the whole directory.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-6-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In ovl_copy_up_workdir() unlock immediately after the rename. There is
nothing else in the function that needs the lock.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-5-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl currently locks a directory or two and then performs multiple actions
in one or both directories. This is incompatible with proposed changes
which will lock just the dentry of objects being acted on.
This patch moves calls to ovl_create_temp() out of the locked regions and
has it take and release the relevant lock itself.
The lock that was taken before this function was called is now taken
after. This means that any code between where the lock was taken and
ovl_create_temp() is now unlocked. This necessitates the use of
ovl_cleanup_unlocked() and the creation of ovl_lookup_upper_unlocked().
These will be used more widely in future patches.
Now that the file is created before the lock is taken for rename, we
need to ensure the parent wasn't changed before the lock was gained.
ovl_lock_rename_workdir() is changed to optionally receive the dentries
that will be involved in the rename. If either is present but has the
wrong parent, an error is returned.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-4-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
ovl_copy_up_workdir() currently take a rename lock on two directories,
then use the lock to both create a file in one directory, perform a
rename, and possibly unlink the file for cleanup. This is incompatible
with proposed changes which will lock just the dentry of objects being
acted on.
This patch moves the call to ovl_create_index() earlier in
ovl_copy_up_workdir() to before the lock is taken.
ovl_create_index() then takes the required lock only when needed.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-3-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If ovl_copy_up_data() fails the error is not immediately handled but the
code continues on to call ovl_start_write() and lock_rename(),
presumably because both of these locks are needed for the cleanup.
Only then (if the lock was successful) is the error checked.
This makes the code a little hard to follow and could be fragile.
This patch changes to handle the error after the ovl_start_write()
(which cannot fail, so there aren't multiple errors to deail with). A
new ovl_cleanup_unlocked() is created which takes the required directory
lock. This will be used extensively in later patches.
In general we need to check the parent is still correct after taking the
lock (as ovl_copy_up_workdir() does after a successful lock_rename()) so
that is included in ovl_cleanup_unlocked() using new ovl_parent_lock()
and ovl_parent_unlock() calls (it is planned to move this API into VFS code
eventually, though in a slightly different form).
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/20250716004725.1206467-2-neil@brown.name
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Case folding is often applied to subtrees and not on an entire
filesystem.
Disallowing layers from filesystems that support case folding is over
limiting.
Replace the rule that case-folding capable are not allowed as layers
with a rule that case folded directories are not allowed in a merged
directory stack.
Should case folding be enabled on an underlying directory while
overlayfs is mounted the outcome is generally undefined.
Specifically in ovl_lookup(), we check the base underlying directory
and fail with -ESTALE and write a warning to kmsg if an underlying
directory case folding is enabled.
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/linux-fsdevel/20250520051600.1903319-1-kent.overstreet@linux.dev/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250602171702.1941891-1-amir73il@gmail.com
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
file_user_path() now takes a const file ptr.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250607115304.2521155-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add internal helper backing_file_set_user_path() for the only
two cases that need to modify backing_file fields.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250607115304.2521155-2-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
There was a lot of common code in the codepaths used to convert an
inline directory and to creaet a new directory. To address this,
rename ext4_init_dot_dotdot() to ext4_init_dirblock() and then move
common code into that function.
This reduces the lines of code count in fs/ext4/inline.c and
fs/ext4/namei.c, as well as reducing the size of their object files.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20250712181249.434530-3-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The strcpy() function is considered dangerous and eeeevil by people
who are using sophisticated code analysis tools such as "grep". This
is true even when a quick inspection would show that the source is a
constant string ("." or "..") and the destination is a fixed array
which is guaranteed to have enough space. Make the "grep" code
analysis tool happy by using memcpy() isstead of strcpy(). :-)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20250712181249.434530-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In a discussion over a proposed patch, "ext4: replace strcpy() with
'.' assignment"[1], I had asserted that directory entries in ext4 were
not NUL terminated, and hence it was safe to replace strcpy() with a
direct assignment. As it turns out, this was incorrect. It's true
for all all directory entries *except* for '.' and '..' where the
kernel was using strcmp() and where e2fsck actually checks and offers
to fix things if '.' and '..' are not NUL terminated.
[1] https://lore.kernel.org/r/202505191316.JJMnPobO-lkp@intel.com
We can't change this without breaking old kernel versions, but in the
spirit of "be liberal in what you receive", use direct comparison of
de->name_len and de->name[0,1] instead of strcmp(). This has the side
benefit of reducing the compiled text size by 96 bytes on x86_64.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Link: https://patch.msgid.link/20250712181249.434530-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Currently we clear BH_New bit in case of error and also in the standard
ext4_write_end() handler (in block_commit_write()). However
ext4_journalled_write_end() misses this clearing and thus we are leaving
stale BH_New bits behind. Generally ext4_block_write_begin() clears
these bits before any harm can be done but in case blocksize < pagesize
and we hit some error when processing a page with these stale bits,
we'll try to zero buffers with these stale BH_New bits and jbd2 will
complain (as buffers were not prepared for writing in this transaction).
Fix the problem by clearing BH_New bits in ext4_journalled_write_end()
and WARN if ext4_block_write_begin() sees stale BH_New bits.
Reported-by: Baolin Liu <liubaolin12138@163.com>
Reported-by: Zhi Long <longzhi@sangfor.com.cn>
Fixes: 3910b513fcdf ("ext4: persist the new uptodate buffers in ext4_journalled_zero_new_buffers")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250709084831.23876-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In ext4_io_end_defer_completion(), check if io_end->list_vec is empty to
avoid adding an io_end that requires no conversion to the
i_rsv_conversion_list, which in turn prevents starting an unnecessary
worker. An ext4_emergency_state() check is also added to avoid attempting
to abort the journal in an emergency state.
Additionally, ext4_put_io_end_defer() is refactored to call
ext4_io_end_defer_completion() directly instead of being open-coded.
This also prevents starting an unnecessary worker when EXT4_IO_END_FAILED
is set but data_err=abort is not enabled.
This ensures that the check in ext4_put_io_end_defer() is consistent with
the check in ext4_end_bio(). Otherwise, we might add an io_end to the
i_rsv_conversion_list and then call ext4_finish_bio(), after which the
inode could be freed before ext4_end_io_rsv_work() is called, triggering
a use-after-free issue.
Fixes: ce51afb8cc5e ("ext4: abort journal on data writeback failure if in data_err=abort mode")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250708111504.3208660-1-libaokun@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We have assembly code generated by a script. GCC successfully compiles
it. However, the kernel cannot load it on an ARM64 platform with a 4K
page size. In contrast, the same ELF file loads correctly on the same
platform with a 64K page size.
The root cause is the Linux kernel's ELF_MIN_ALIGN limitation on the
program headers of ELF files. The ELF file contains 78 program headers
(the script inserts many holes when generating the assembly code). On
ARM64 with a 4K page size, the ELF_MIN_ALLIGN enforces a maximum of 74
program headers, causing the ELF file to fail. However, with a 64K page
size, the ELF_MIN_ALIGN is relaxed to over 1,184 program headers, allowing
the file to run correctly.
Cook kindly identified[1] that this limitation was introduced in
Linux-0.99.15f without an explanation for its purpose.
The ELF specification does not impose such a restriction on program
headers. Removing the ELF_MIN_ALIGN limitation on program headers to
align with the ELF spec. After removing ELF_MIN_ALIGN limitation,
64K size limitation still exist which should be sufficient.
Suggested-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/linux-mm/202506270854.A729825@keescook/ [1]
Signed-off-by: Yin Fengwei <fengwei_yin@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250717110108.55586-1-fengwei_yin@linux.alibaba.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.16-rc7).
Conflicts:
Documentation/netlink/specs/ovpn.yaml
880d43ca9aa4 ("netlink: specs: clean up spaces in brackets")
af52020fc599 ("ovpn: reject unexpected netlink attributes")
drivers/net/phy/phy_device.c
a44312d58e78 ("net: phy: Don't register LEDs for genphy")
f0f2b992d818 ("net: phy: Don't register LEDs for genphy")
https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org
drivers/net/wireless/intel/iwlwifi/fw/regulatory.c
drivers/net/wireless/intel/iwlwifi/mld/regulatory.c
5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap")
ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware")
net/ipv6/mcast.c
ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()")
a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()")
https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor the condition for breaking the loop within xattr_find_entry().
Elimate the usage of "<=" and take condition shortcut when "!cmp" is
true.
Originally, the condition was "(cmp <= 0 && (sorted || cmp == 0))", which
means after it knows "cmp <= 0" is true, it has to check the value of
"sorted" and "cmp". The checking of "cmp" here would be redundant since
it has already checked it.
Observing from the logic, when "cmp == 0" the branch is going to be true,
no need to check "cmp == 0" again, so we only need to take shortcut when
"cmp == 0", on the other hand, we'll check "sorted" when "cmp < 0".
The refactor can shrink the generated code size by 44 bytes. Numerous
instructions can be saved thus should also benefit execution efficiency
as well.
$ ./scripts/bloat-o-meter vmlinux_old vmlinux_new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44)
Function old new delta
xattr_find_entry 300 256 -44
Total: Before=22989434, After=22989390, chg -0.00%
The test is done on kernel version 6.16 with x86_64 defconfig
and gcc 13.3.0.
Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
Link: https://patch.msgid.link/20250708020013.175728-1-richard120310@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Relocate the function max_pow_of_two_factor() to common ilog2.h from the
xfs code, as it will be used elsewhere.
Also simplify the function, as advised by Mikulas Patocka.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20250711105258.3135198-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
struct iomap_writepage_ctx includes a pointer to the file inode. In
writeback, use that instead of also passing the inode into
fuse_fill_wb_data.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/20250715202122.2282532-6-joannelkoong@gmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Hook into iomap_invalidate_folio() so that if the entire folio is being
invalidated during truncation, the dirty state is cleared and the folio
doesn't get written back. As well the folio's corresponding ifs struct
will get freed.
Hook into iomap_is_partially_uptodate() since iomap tracks uptodateness
granularly when it does buffered writes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/20250715202122.2282532-5-joannelkoong@gmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use iomap for folio laundering, which will do granular dirty
writeback when laundering a large folio.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/20250715202122.2282532-4-joannelkoong@gmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Use iomap for dirty folio writeback in ->writepages().
This allows for granular dirty writeback of large folios.
Only the dirty portions of the large folio will be written instead of
having to write out the entire folio. For example if there is a 1 MB
large folio and only 2 bytes in it are dirty, only the page for those
dirty bytes will be written out.
.dirty_folio needs to be set to iomap_dirty_folio so that the bitmap
iomap uses for dirty tracking correctly reflects dirty regions that need
to be written back.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/20250715202122.2282532-3-joannelkoong@gmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Have buffered writes go through iomap. This has two advantages:
* granular large folio synchronous reads
* granular large folio dirty tracking
If for example there is a 1 MB large folio and a write issued at pos 1
to pos 1 MB - 2, only the head and tail pages will need to be read in
and marked uptodate instead of the entire folio needing to be read in.
Non-relevant trailing pages are also skipped (eg if for a 1 MB large
folio a write is issued at pos 1 to 4099, only the first two pages are
read in and the ones after that are skipped).
iomap also has granular dirty tracking. This is useful in that when it
comes to writeback time, only the dirty portions of the large folio will
be written instead of having to write out the entire folio. For example
if there is a 1 MB large folio and only 2 bytes in it are dirty, only
the page for those dirty bytes get written out. Please note that
granular writeback is only done once fuse also uses iomap in writeback
(separate commit).
.release_folio needs to be set to iomap_release_folio so that any
allocated iomap ifs structs get freed.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/20250715202122.2282532-2-joannelkoong@gmail.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
maybe_casefold() shouldn't have been nooped, just bch2_casefold().
Fixes: 94426e4201fb ("bcachefs: opts.casefold_disabled")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
94426e4201fb, which added the killswitch for casefolding, accidentally
removed some of the ifdefs we need to avoid build errors.
It appears we need better build testing for different configurations, it
took two weeks for the robots to catch this one.
Fixes: 94426e4201fb ("bcachefs: opts.casefold_disabled")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Use bch2_dev_bucket_tryget() instead of bch2_dev_tryget() before
checking the bucket bitmap.
Reported-by: syzbot+3168625f36f4a539237e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_btree_node_drop_keys_outside_node() will (re)build aux search
trees, because it's also called by topology repair.
bch2_btree_node_read_done() was calling it before validating individual
keys; invalid ones have to be dropped.
If we call drop_keys_outside_node() first, then
bch2_bset_build_aux_tree() doesn't run because the node already has an
aux search tree - which was invalidated by the repair.
Reported-by: syzbot+c5e7a66b3b23ae65d44f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The allocator path has a "if we're really low on free buckets, check if
we should issue discards" - tweak this to also trigger discards if more
than 1/128th of the device is in need_discard state.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
It becomes possible to do discards after a journal flush, which
naturally the journal code is reponsible for.
A prior refactoring seems to have broken this - which went unnoticed
because the foreground allocator path can also trigger discards.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When a node withdraws and it turns out that it is the only node that has
the filesystem mounted, gfs2 currently tries to replay the local journal
to bring the filesystem back into a consistent state. Not only is that
a very bad idea, it has also never worked because gfs2_recover_func()
will refuse to do anything during a withdraw.
However, before even getting to this point, gfs2_recover_func()
dereferences sdp->sd_jdesc->jd_inode. This was a use-after-free before
commit 04133b607a78 ("gfs2: Prevent double iput for journal on error")
and is a NULL pointer dereference since then.
Simply get rid of self recovery to fix that.
Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish")
Reported-by: Chunjie Zhu <chunjie.zhu@cloud.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
A fuzzer test introduced corruption that ends up with a depth of 0 in
dir_e_read(), causing an undefined shift by 32 at:
index = hash >> (32 - dip->i_depth);
As calculated in an open-coded way in dir_make_exhash(), the minimum
depth for an exhash directory is ilog2(sdp->sd_hash_ptrs) and 0 is
invalid as sdp->sd_hash_ptrs is fixed as sdp->bsize / 16 at mount time.
So we can avoid the undefined behaviour by checking for depth values
lower than the minimum in gfs2_dinode_in(). Values greater than the
maximum are already being checked for there.
Also switch the calculation in dir_make_exhash() to use ilog2() to
clarify how the depth is calculated.
Tested with the syzkaller repro.c and xfstests '-g quick'.
Reported-by: syzbot+4708579bb230a0582a57@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Set FOP_DONTCACHE in ext4_file_operations to declare support for
uncached buffered I/O.
To handle this flag, update ext4_write_begin() and ext4_da_write_begin()
to use write_begin_get_folio(), which encapsulates FGP_DONTCACHE logic
based on iocb->ki_flags.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-6-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Change the address_space_operations callbacks write_begin() and
write_end() to take struct kiocb * as the first argument instead of
struct file *.
Update all affected function prototypes, implementations, call sites,
and related documentation across VFS, filesystems, and block layer.
Part of a series refactoring address_space_operations write_begin and
write_end callbacks to use struct kiocb for passing write context and
flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|