linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
8 days	ext4: implement linear-like traversal across order xarrays	Baokun Li
	Although we now perform ordered traversal within an xarray, this is currently limited to a single xarray. However, we have multiple such xarrays, which prevents us from guaranteeing a linear-like traversal where all groups on the right are visited before all groups on the left. For example, suppose we have 128 block groups, with a target group of 64, a target length corresponding to an order of 1, and available free groups of 16 (order 1) and group 65 (order 8): For linear traversal, when no suitable free block is found in group 64, it will search in the next block group until group 127, then start searching from 0 up to block group 63. It ensures continuous forward traversal, which is consistent with the unidirectional rotation behavior of HDD platters. Additionally, the block group lock contention during freeing block is unavoidable. The goal increasing from 0 to 64 indicates that previously scanned groups (which had no suitable free space and are likely to free blocks later) and skipped groups (which are currently in use) have newly freed some used blocks. If we allocate blocks in these groups, the probability of competing with other processes increases. For non-linear traversal, we first traverse all groups in order_1. If only group 16 has free space in this list, we first traverse [63, 128), then traverse [0, 64) to find the available group 16, and then allocate blocks in group 16. Therefore, it cannot guarantee continuous traversal in one direction, thus increasing the probability of contention. So refactor ext4_mb_scan_groups_xarray() to ext4_mb_scan_groups_xa_range() to only traverse a fixed range of groups, and move the logic for handling wrap around to the caller. The caller first iterates through all xarrays in the range [start, ngroups) and then through the range [0, start). This approach simulates a linear scan, which reduces contention between freeing blocks and allocating blocks. Assume we have the following groups, where "\|" denotes the xarray traversal start position: order_1_groups: AB \| CD order_2_groups: EF \| GH Traversal order: Before: C > D > A > B > G > H > E > F After: C > D > G > H > A > B > E > F Performance test data follows: \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 19555 \| 20049 (+2.5%) \| 315636 \| 316724 (-0.3%) \| \|mb_optimize_scan=1 \| 15496 \| 19342 (+24.8%) \| 323569 \| 328324 (+1.4%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53192 \| 52125 (-2.0%) \| 212678 \| 215136 (+1.1%) \| \|mb_optimize_scan=1 \| 37636 \| 50331 (+33.7%) \| 214189 \| 209431 (-2.2%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-18-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: refactor choose group to scan group	Baokun Li
	This commit converts the `choose group` logic to `scan group` using previously prepared helper functions. This allows us to leverage xarrays for ordered non-linear traversal, thereby mitigating the "bouncing" issue inherent in the `choose group` mechanism. This also decouples linear and non-linear traversals, leading to cleaner and more readable code. Key changes: * ext4_mb_choose_next_group() is refactored to ext4_mb_scan_groups(). * Replaced ext4_mb_good_group() with ext4_mb_scan_group() in non-linear traversals, and related functions now return error codes instead of group info. * Added ext4_mb_scan_groups_linear() for performing linear scans starting from a specific group for a set number of times. * Linear scans now execute up to sbi->s_mb_max_linear_groups times, so ac_groups_linear_remaining is removed as it's no longer used. * ac->ac_criteria is now used directly instead of passing cr around. Also, ac->ac_criteria is incremented directly after groups scan fails for the corresponding criteria. * Since we're now directly scanning groups instead of finding a good group then scanning, the following variables and flags are no longer needed, s_bal_cX_groups_considered is sufficient. s_bal_p2_aligned_bad_suggestions s_bal_goal_fast_bad_suggestions s_bal_best_avail_bad_suggestions EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-17-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: convert free groups order lists to xarrays	Baokun Li
	While traversing the list, holding a spin_lock prevents load_buddy, making direct use of ext4_try_lock_group impossible. This can lead to a bouncing scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group() fails, forcing the list traversal to repeatedly restart from grp_A. In contrast, linear traversal directly uses ext4_try_lock_group(), avoiding this bouncing. Therefore, we need a lockless, ordered traversal to achieve linear-like efficiency. Therefore, this commit converts both average fragment size lists and largest free order lists into ordered xarrays. In an xarray, the index represents the block group number and the value holds the block group information; a non-empty value indicates the block group's presence. While insertion and deletion complexity remain O(1), lookup complexity changes from O(1) to O(nlogn), which may slightly reduce single-threaded performance. Additionally, xarray insertions might fail, potentially due to memory allocation issues. However, since we have linear traversal as a fallback, this isn't a major problem. Therefore, we've only added a warning message for insertion failures here. A helper function ext4_mb_find_good_group_xarray() is added to find good groups in the specified xarray starting at the specified position start, and when it reaches ngroups-1, it wraps around to 0 and then to start-1. This ensures an ordered traversal within the xarray. Performance test results are as follows: Single-process operations on an empty disk show negligible impact, while multi-process workloads demonstrate a noticeable performance gain. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 20097 \| 19555 (-2.6%) \| 316141 \| 315636 (-0.2%) \| \|mb_optimize_scan=1 \| 13318 \| 15496 (+16.3%) \| 325273 \| 323569 (-0.5%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53603 \| 53192 (-0.7%) \| 214243 \| 212678 (-0.7%) \| \|mb_optimize_scan=1 \| 20887 \| 37636 (+80.1%) \| 213632 \| 214189 (+0.2%) \| [ Applied spelling fixes per discussion on the ext4-list see thread referened in the Link tag. --tytso] Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-16-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: factor out ext4_mb_scan_group()	Baokun Li
	Extract ext4_mb_scan_group() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-15-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: factor out ext4_mb_might_prefetch()	Baokun Li
	Extract ext4_mb_might_prefetch() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-14-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: factor out __ext4_mb_scan_group()	Baokun Li
	Extract __ext4_mb_scan_group() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-13-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: fix largest free orders lists corruption on mb_optimize_scan switch	Baokun Li
	The grp->bb_largest_free_order is updated regardless of whether mb_optimize_scan is enabled. This can lead to inconsistencies between grp->bb_largest_free_order and the actual s_mb_largest_free_orders list index when mb_optimize_scan is repeatedly enabled and disabled via remount. For example, if mb_optimize_scan is initially enabled, largest free order is 3, and the group is in s_mb_largest_free_orders[3]. Then, mb_optimize_scan is disabled via remount, block allocations occur, updating largest free order to 2. Finally, mb_optimize_scan is re-enabled via remount, more block allocations update largest free order to 1. At this point, the group would be removed from s_mb_largest_free_orders[3] under the protection of s_mb_largest_free_orders_locks[2]. This lock mismatch can lead to list corruption. To fix this, whenever grp->bb_largest_free_order changes, we now always attempt to remove the group from its old order list. However, we only insert the group into the new order list if `mb_optimize_scan` is enabled. This approach helps prevent lock inconsistencies and ensures the data in the order lists remains reliable. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-12-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: fix zombie groups in average fragment size lists	Baokun Li
	Groups with no free blocks shouldn't be in any average fragment size list. However, when all blocks in a group are allocated(i.e., bb_fragments or bb_free is 0), we currently skip updating the average fragment size, which means the group isn't removed from its previous s_mb_avg_fragment_size[old] list. This created "zombie" groups that were always skipped during traversal as they couldn't satisfy any block allocation requests, negatively impacting traversal efficiency. Therefore, when a group becomes completely full, bb_avg_fragment_size_order is now set to -1. If the old order was not -1, a removal operation is performed; if the new order is not -1, an insertion is performed. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-11-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: merge freed extent with existing extents before insertion	Baokun Li
	Attempt to merge ext4_free_data with already inserted free extents prior to adding new ones. This strategy drastically cuts down the number of times locks are held. For example, if prev, new, and next extents are all mergeable, the existing code (before this patch) requires acquiring the s_md_lock three times: prev merge into new and free prev // hold lock next merge into new and free next // hold lock insert new // hold lock After the patch, it only needs to be acquired once: new merge into next and free new // no lock next merge into prev and free next // hold lock Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 20043 \| 20097 (+0.2%) \| 314331 \| 316141 (+0.5%) \| \|mb_optimize_scan=1 \| 7290 \| 13318 (+87.4%) \| 324226 \| 325273 (+0.3%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 54999 \| 53603 (-2.5%) \| 214380 \| 214243 (-0.06%)\| \|mb_optimize_scan=1 \| 13497 \| 20887 (+54.6%) \| 216276 \| 213632 (-1.2%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-10-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: convert sbi->s_mb_free_pending to atomic_t	Baokun Li
	Previously, s_md_lock was used to protect s_mb_free_pending during modifications, while smp_mb() ensured fresh reads, so s_md_lock just guarantees the atomicity of s_mb_free_pending. Thus we optimized it by converting s_mb_free_pending into an atomic variable, thereby eliminating s_md_lock and minimizing lock contention. This also prepares for future lockless merging of free extents. Following this modification, s_md_lock is exclusively responsible for managing insertions and deletions within s_freed_data_list, along with operations involving list_splice. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 19628 \| 20043 (+2.1%) \| 320885 \| 314331 (-2.0%) \| \|mb_optimize_scan=1 \| 7129 \| 7290 (+2.2%) \| 321275 \| 324226 (+0.9%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53760 \| 54999 (+2.3%) \| 213145 \| 214380 (+0.5%) \| \|mb_optimize_scan=1 \| 12716 \| 13497 (+6.1%) \| 215262 \| 216276 (+0.4%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-9-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: utilize multiple global goals to reduce contention	Baokun Li
	When allocating data blocks, if the first try (goal allocation) fails and stream allocation is on, it tries a global goal starting from the last group we used (s_mb_last_group). This helps cluster large files together to reduce free space fragmentation, and the data block contiguity also accelerates write-back to disk. However, when multiple processes allocate blocks, having just one global goal means they all fight over the same group. This drastically lowers the chances of extents merging and leads to much worse file fragmentation. To mitigate this multi-process contention, we now employ multiple global goals, with the number of goals being the minimum between the number of possible CPUs and one-quarter of the filesystem's total block group count. To ensure a consistent goal for each inode, we select the corresponding goal by taking the inode number modulo the total number of goals. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 9636 \| 19628 (+103%) \| 337597 \| 320885 (-4.9%) \| \|mb_optimize_scan=1 \| 4834 \| 7129 (+47.4%) \| 341440 \| 321275 (-5.9%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 22341 \| 53760 (+140%) \| 219707 \| 213145 (-2.9%) \| \|mb_optimize_scan=1 \| 9177 \| 12716 (+38.5%) \| 215732 \| 215262 (+0.2%) \| Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-6-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: remove unnecessary s_md_lock on update s_mb_last_group	Baokun Li
	After we optimized the block group lock, we found another lock contention issue when running will-it-scale/fallocate2 with multiple processes. The fallocate's block allocation and the truncate's block release were fighting over the s_md_lock. The problem is, this lock protects totally different things in those two processes: the list of freed data blocks (s_freed_data_list) when releasing, and where to start looking for new blocks (mb_last_group) when allocating. Now we only need to track s_mb_last_group and no longer need to track s_mb_last_start, so we don't need the s_md_lock lock to ensure that the two are consistent. Since s_mb_last_group is merely a hint and doesn't require strong synchronization, READ_ONCE/WRITE_ONCE is sufficient. Besides, the s_mb_last_group data type only requires ext4_group_t (i.e., unsigned int), rendering unsigned long superfluous. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 4821 \| 9636 (+99.8%) \| 314065 \| 337597 (+7.4%) \| \|mb_optimize_scan=1 \| 4784 \| 4834 (+1.04%) \| 316344 \| 341440 (+7.9%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 15371 \| 22341 (+45.3%) \| 205851 \| 219707 (+6.7%) \| \|mb_optimize_scan=1 \| 6101 \| 9177 (+50.4%) \| 207373 \| 215732 (+4.0%) \| Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: remove unnecessary s_mb_last_start	Baokun Li
	Since stream allocation does not use ac->ac_f_ex.fe_start, it is set to -1 by default, so the no longer needed sbi->s_mb_last_start is removed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: separate stream goal hits from s_bal_goals for better tracking	Baokun Li
	In ext4_mb_regular_allocator(), after the call to ext4_mb_find_by_goal() fails to achieve the inode goal, allocation continues with the stream allocation global goal. Currently, hits for both are combined in sbi->s_bal_goals, hindering accurate optimization. This commit separates global goal hits into sbi->s_bal_stream_goals. Since stream allocation doesn't use ac->ac_g_ex.fe_start, set fe_start to -1. This prevents stream allocations from being counted in s_bal_goals. Also clear EXT4_MB_HINT_TRY_GOAL to avoid calling ext4_mb_find_by_goal again. After adding `stream_goal_hits`, `/proc/fs/ext4/sdx/mb_stats` will show: mballoc: reqs: 840347 success: 750992 groups_scanned: 1230506 cr_p2_aligned_stats: hits: 21531 groups_considered: 411664 extents_scanned: 21531 useless_loops: 0 bad_suggestions: 6 cr_goal_fast_stats: hits: 111222 groups_considered: 1806728 extents_scanned: 467908 useless_loops: 0 bad_suggestions: 13 cr_best_avail_stats: hits: 36267 groups_considered: 1817631 extents_scanned: 156143 useless_loops: 0 bad_suggestions: 204 cr_goal_slow_stats: hits: 106396 groups_considered: 5671710 extents_scanned: 22540056 useless_loops: 123747 cr_any_free_stats: hits: 138071 groups_considered: 724692 extents_scanned: 23615593 useless_loops: 585 extents_scanned: 46804261 goal_hits: 1307 stream_goal_hits: 236317 len_goal_hits: 155549 2^n_hits: 21531 breaks: 225096 lost: 35062 buddies_generated: 40/40 buddies_time_used: 48004 preallocated: 5962467 discarded: 4847560 Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
8 days	ext4: add ext4_try_lock_group() to skip busy groups	Baokun Li
	When ext4 allocates blocks, we used to just go through the block groups one by one to find a good one. But when there are tons of block groups (like hundreds of thousands or even millions) and not many have free space (meaning they're mostly full), it takes a really long time to check them all, and performance gets bad. So, we added the "mb_optimize_scan" mount option (which is on by default now). It keeps track of some group lists, so when we need a free block, we can just grab a likely group from the right list. This saves time and makes block allocation much faster. But when multiple processes or containers are doing similar things, like constantly allocating 8k blocks, they all try to use the same block group in the same list. Even just two processes doing this can cut the IOPS in half. For example, one container might do 300,000 IOPS, but if you run two at the same time, the total is only 150,000. Since we can already look at block groups in a non-linear way, the first and last groups in the same list are basically the same for finding a block right now. Therefore, add an ext4_try_lock_group() helper function to skip the current group when it is locked by another process, thereby avoiding contention with other processes. This helps ext4 make better use of having multiple block groups. Also, to make sure we don't skip all the groups that have free space when allocating blocks, we won't try to skip busy groups anymore when ac_criteria is CR_ANY_FREE. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| \|Memory: 512GB \|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| \|-------------------\|-------\|-----------------\| \|mb_optimize_scan=0 \| 2667 \| 4821 (+80.7%) \| \|mb_optimize_scan=1 \| 2643 \| 4784 (+81.0%) \| \|CPU: AMD 9654 * 2 \| P96 \| \|Memory: 1536GB \|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| \|-------------------\|-------\|-----------------\| \|mb_optimize_scan=0 \| 3450 \| 15371 (+345%) \| \|mb_optimize_scan=1 \| 3209 \| 6101 (+90.0%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-04-22	fs/ext4: use sleeping version of sb_find_get_block()	Davidlohr Bueso
	Enable ext4_free_blocks() to use it, which has a cond_resched to begin with. Convert to the new nonatomic flavor to benefit from potential performance benefits and adapt in the future vs migration such that semantics are kept. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://kdevops.org/ext4/v6.15-rc2.html # [0] Link: https://lore.kernel.org/all/aAAEvcrmREWa1SKF@bombadil.infradead.org/ # [1] Link: https://lore.kernel.org/20250418015921.132400-7-dave@stgolabs.net Tested-by: kdevops@lists.linux.dev Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-12	ext4: avoid -Wflex-array-member-not-at-end warning	Gustavo A. R. Silva
	-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the `DEFINE_RAW_FLEX()` helper for an on-stack definition of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. So, with these changes, fix the following warning: fs/ext4/mballoc.c:3041:40: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/Z-SF97N3AxcIMlSi@kspp Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-18	ext4: update the comment about mb_optimize_scan	Zizhi Wo
	Commit 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") introduces the sysfs control interface "mb_max_linear_groups" to address the problem that rotational devices performance degrades when the "mb_optimize_scan" feature is enabled, which may result in distant block group allocation. However, the name of the interface was incorrect in the comment to the ext4/mballoc.c file, and this patch fixes it, without further changes. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250224012005.689549-1-wozizhi@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-03-13	ext4: add ext4_emergency_state() helper function	Baokun Li
	Since both SHUTDOWN and EMERGENCY_RO are emergency states of the ext4 file system, and they are checked in similar locations, we have added a helper function, ext4_emergency_state(), to determine whether the current file system is in one of these two emergency states. Then, replace calls to ext4_forced_shutdown() with ext4_emergency_state() in those functions that could potentially trigger write operations. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250122114130.229709-4-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-13	ext4: use str_yes_no() helper function	Thorsten Blum
	Remove hard-coded strings by using the str_yes_no() helper function. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://patch.msgid.link/20241021100056.5521-2-thorsten.blum@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12	ext4: use string choices helpers	R Sundar
	Use string choice helpers for better readability and to fix cocci warning Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202410062256.BoynX3c2-lkp@intel.com/ Signed-off-by: R Sundar <prosunofficial@gmail.com> Link: https://patch.msgid.link/20241007172006.83339-1-prosunofficial@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12	ext4: fix FS_IOC_GETFSMAP handling	Theodore Ts'o
	The original implementation ext4's FS_IOC_GETFSMAP handling only worked when the range of queried blocks included at least one free (unallocated) block range. This is because how the metadata blocks were emitted was as a side effect of ext4_mballoc_query_range() calling ext4_getfsmap_datadev_helper(), and that function was only called when a free block range was identified. As a result, this caused generic/365 to fail. Fix this by creating a new function ext4_getfsmap_meta_helper() which gets called so that blocks before the first free block range in a block group can get properly reported. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2024-09-03	ext4: convert EXT4_B2C(sbi->s_stripe) users to EXT4_NUM_B2C	Ojaswin Mujoo
	Although we have checks to make sure s_stripe is a multiple of cluster size, in case we accidentally end up with a scenario where this is not the case, use EXT4_NUM_B2C() so that we don't end up with unexpected cases where EXT4_B2C(stripe) becomes 0. Also make the is_stripe_aligned check in regular_allocator a bit more robust while we are at it. This should ideally have no functional change unless we have a bug somewhere causing (stripe % cluster_size != 0) Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/e0c0a3b58a40935a1361f668851d041575861411.1725002410.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-09-02	ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard	yangerkun
	Commit 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") speed up fstrim by skipping trim trimmed group. We also has the chance to clear trimmed once there exists some block free for this group(mount without discard), and the next trim for this group will work well too. For mount with discard, we will issue dicard when we free blocks, so leave trimmed flag keep alive to skip useless trim trigger from userspace seems reasonable. But for some case like ext4 build on dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4 maybe unaligned for dm thinpool, and thinpool will just finish this discard(see process_discard_bio when begein equals to end) without actually process discard. For this case, trim from userspace can really help us to free some thinpool block. So convert to clear trimmed flag for all case no matter mounted with discard or not. Fixes: 3d56b8d2c74c ("ext4: Speed up FITRIM by recording flags in ext4_group_info") Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240817085510.2084444-1-yangerkun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26	ext4: use seq_putc() in two functions	Markus Elfring
	Single characters (line breaks) should be put into a sequence. Thus use the corresponding function “seq_putc”. This issue was transformed by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patch.msgid.link/076974ab-4da3-4176-89dc-0514e020c276@web.de Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-17	ext4: fix error pointer dereference in ext4_mb_load_buddy_gfp()	Dan Carpenter
	This code calls folio_put() on an error pointer which will lead to a crash. Check for both error pointers and NULL pointers before calling folio_put(). Fixes: 5eea586b47f0 ("ext4: convert bd_buddy_page to bd_buddy_folio") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/eaafa1d9-a61c-4af4-9f97-d3ad72c60200@moroto.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07	ext4: fix potential unnitialized variable	Dan Carpenter
	Smatch complains "err" can be uninitialized in the caller. fs/ext4/indirect.c:349 ext4_alloc_branch() error: uninitialized symbol 'err'. Set the error to zero on the success path. Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/363a4673-0fb8-4adf-b4fb-90a499077276@moroto.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07	ext4: convert ac_buddy_page to ac_buddy_folio	Matthew Wilcox (Oracle)
	This just carries around the bd_buddy_folio so should also be a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-6-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07	ext4: convert ac_bitmap_page to ac_bitmap_folio	Matthew Wilcox (Oracle)
	This just carries around the bd_bitmap_folio so should also be a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-5-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07	ext4: convert ext4_mb_init_cache() to take a folio	Matthew Wilcox (Oracle)
	All callers now have a folio, so convert this function from operating on a page to operating on a folio. The folio is assumed to be a single page. Signe-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-4-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07	ext4: convert bd_buddy_page to bd_buddy_folio	Matthew Wilcox (Oracle)
	There is no need to make this a multi-page folio, so leave all the infrastructure around it in pages. But since we're locking it, playing with its refcount and checking whether it's uptodate, it needs to move to the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-3-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07	ext4: convert bd_bitmap_page to bd_bitmap_folio	Matthew Wilcox (Oracle)
	There is no need to make this a multi-page folio, so leave all the infrastructure around it in pages. But since we're locking it, playing with its refcount and checking whether it's uptodate, it needs to move to the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-2-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-03	ext4: open coding repeated check in next_linear_group	Kemeng Shi
	Open coding repeated check in next_linear_group. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240424061904.987525-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-03	ext4: use correct criteria name instead stale integer number in comment	Kemeng Shi
	Use correct criteria name instead stale integer number in comment Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240424061904.987525-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-03	ext4: call ext4_mb_mark_free_simple to free continuous bits in found chunk	Kemeng Shi
	In mb_mark_used, we will find free chunk and mark it inuse. For chunk in mid of passed range, we could simply mark whole chunk inuse. For chunk at end of range, we may need to mark a continuous bits at end of part of chunk inuse and keep rest part of chunk free. To only mark a part of chunk inuse, we firstly mark whole chunk inuse and then mark a continuous range at end of chunk free. Function mb_mark_used does several times of "mb_find_buddy; mb_clear_bit; ..." to mark a continuous range free which can be done by simply calling ext4_mb_mark_free_simple which free continuous bits in a more effective way. Just call ext4_mb_mark_free_simple in mb_mark_used to use existing and effective code to free continuous blocks in chunk at end of passed range. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240424061904.987525-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-03	ext4: keep "prefetch_grp" and "nr" consistent	Kemeng Shi
	Keep "prefetch_grp" and "nr" consistent to avoid to call ext4_mb_prefetch_fini with non-prefetched groups. When we step into next criteria, "prefetch_grp" is set to prefetch start of new criteria while "nr" is number of the prefetched group in previous criteria. If previous criteria and next criteria are both inexpensive (< CR_GOAL_LEN_SLOW) and prefetch_ios reachs sbi->s_mb_prefetch_limit in previous criteria, "prefetch_grp" and "nr" will be inconsistent and may introduce unexpected cost to do ext4_mb_init_group for non-prefetched groups. Reset "nr" to 0 when we reset "prefetch_grp" to goal group to keep them consistent. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240424061904.987525-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-02	ext4: clean up s_mb_rb_lock to fix build warnings with C=1	Baokun Li
	Running sparse (make C=1) on mballoc.c we get the following warning: fs/ext4/mballoc.c:3194:13: warning: context imbalance in 'ext4_mb_seq_structs_summary_start' - wrong count at exit This is because __acquires(&EXT4_SB(sb)->s_mb_rb_lock) was called in ext4_mb_seq_structs_summary_start(), but s_mb_rb_lock was removed in commit 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree"), so remove the __acquires to silence the warning. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240319113325.3110393-10-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-02	ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()	Baokun Li
	We can trigger a slab-out-of-bounds with the following commands: mkfs.ext4 -F /dev/$disk 10G mount /dev/$disk /tmp/test echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc echo test > /tmp/test/file && sync ================================================================== BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4] Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11 CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521 Call Trace: dump_stack_lvl+0x2c/0x50 kasan_report+0xb6/0xf0 ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4] ext4_mb_regular_allocator+0x19e9/0x2370 [ext4] ext4_mb_new_blocks+0x88a/0x1370 [ext4] ext4_ext_map_blocks+0x14f7/0x2390 [ext4] ext4_map_blocks+0x569/0xea0 [ext4] ext4_do_writepages+0x10f6/0x1bc0 [ext4] [...] ================================================================== The flow of issue triggering is as follows: // Set s_mb_group_prealloc to 2147483647 via sysfs ext4_mb_new_blocks ext4_mb_normalize_request ext4_mb_normalize_group_request ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc ext4_mb_regular_allocator ext4_mb_choose_next_group ext4_mb_choose_next_group_best_avail mb_avg_fragment_size_order order = fls(len) - 2 = 29 ext4_mb_find_good_group_avg_frag_lists frag_list = &sbi->s_mb_avg_fragment_size[order] if (list_empty(frag_list)) // Trigger SOOB! At 4k block size, the length of the s_mb_avg_fragment_size list is 14, but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds to be triggered by an attempt to access an element at index 29. Add a new attr_id attr_clusters_in_group with values in the range [0, sbi->s_clusters_per_group] and declare mb_group_prealloc as that type to fix the issue. In addition avoid returning an order from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb) and reduce some useless loops. Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)") CC: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20240319113325.3110393-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-03-07	ext4: remove unused parameter biop in ext4_issue_discard()	Wenchao Hao
	all caller of ext4_issue_discard() would set biop to NULL since 'commit 55cdd0af2bc5 ("ext4: get discard out of jbd2 commit kthread contex")', it's unnecessary to keep this parameter any more. Signed-off-by: Wenchao Hao <haowenchao2@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240226081731.3224470-1-haowenchao2@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-03-07	ext4: don't report EOPNOTSUPP errors from discard	Jan Kara
	When ext4 is mounted without journal, with discard mount option, and on a device not supporting trim, we print error for each and every freed extent. This is not only useless but actively harmful. Instead ignore the EOPNOTSUPP error. Trim is only advisory anyway and when the filesystem has journal we silently ignore trim error as well. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240213101601.17463-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-21	ext4: correct best extent lstart adjustment logic	Baokun Li
	When yangerkun review commit 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()"), it was found that the best extent did not completely cover the original request after adjusting the best extent lstart in ext4_mb_new_inode_pa() as follows: original request: 2/10(8) normalized request: 0/64(64) best extent: 0/9(9) When we check if best ex can be kept at start of goal, ac_o_ex.fe_logical is 2 less than the adjusted best extent logical end 9, so we think the adjustment is done. But obviously 0/9(9) doesn't cover 2/10(8), so we should determine here if the original request logical end is less than or equal to the adjusted best extent logical end. In addition, add a comment stating when adjusted best_ex will not cover the original request, and remove the duplicate assertion because adjusting lstart makes no change to b_ex.fe_len. Link: https://lore.kernel.org/r/3630fa7f-b432-7afd-5f79-781bc3b2c5ea@huawei.com Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()") Cc: <stable@kernel.org> Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20240201141845.1879253-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-21	ext4: add a hint for block bitmap corrupt state in mb_groups	Zhang Yi
	If one group is marked as block bitmap corrupted, its free blocks cannot be used and its free count is also deducted from the global sbi->s_freeclusters_counter. User might be confused about the absent free space because we can't query the information about corrupted block groups except unreliable error messages in syslog. So add a hint to show block bitmap corrupted groups in mb_groups. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240119061154.1525781-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-21	ext4: improve error msg for ext4_mb_seq_groups_show	yangerkun
	While cat mb_groups for a mounted ext4 image which has some corrupted group, the string return to userspace was just "I/O error" which confuse me a lot. Improve it with ext4_decode_error. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240118042557.380058-2-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-21	ext4: remove unused buddy_loaded in ext4_mb_seq_groups_show	yangerkun
	We can just first call ext4_mb_unload_buddy, then copy information from ext4_group_info. So remove this unused value. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240118042557.380058-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18	ext4: remove 'needed' in trace_ext4_discard_preallocations	Kemeng Shi
	As 'needed' to trace_ext4_discard_preallocations is always 0 which is meaningless. Just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-10-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18	ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations	Kemeng Shi
	The "needed" controls the number of ext4_prealloc_space to discard in ext4_discard_preallocations. Function ext4_discard_preallocations is supposed to discard all non-used preallocated blocks when "needed" is 0 and now ext4_discard_preallocations is always called with "needed" = 0. Remove unnecessary parameter "needed" and remove all non-used preallocated spaces in ext4_discard_preallocations to simplify the code. Note: If count of non-used preallocated spaces could be more than UINT_MAX, there was a memory leak as some non-used preallocated spaces are left ununsed and this commit will fix it. Otherwise, there is no behavior change. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18	ext4: remove unused return value of ext4_mb_release_group_pa	Kemeng Shi
	Remove unused return value of ext4_mb_release_group_pa. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18	ext4: remove unused return value of ext4_mb_release_inode_pa	Kemeng Shi
	Remove unused return value of ext4_mb_release_inode_pa Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18	ext4: remove unused return value of ext4_mb_release	Kemeng Shi
	Remove unused return value of ext4_mb_release. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-18	ext4: remove unneeded return value of ext4_mb_release_context	Kemeng Shi
	Function ext4_mb_release_context always return 0 and the return value is never used. Just remove unneeded return value of ext4_mb_release_context. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240105092102.496631-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>