Age | Commit message (Collapse) | Author |
|
Patch series "Buddy allocator like (or non-uniform) folio split", v10.
This patchset adds a new buddy allocator like (or non-uniform) large folio
split from a order-n folio to order-m with m < n. It reduces
1. the total number of after-split folios from 2^(n-m) to n-m+1;
2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to
n/6-m/6, assuming XA_CHUNK_SHIFT=6;
3. keep more large folios after a split from all order-m folios to
order-(n-1) to order-m folios.
For example, to split an order-9 to order-0, folio split generates 10 (or
11 for anonymous memory) folios instead of 512, allocates 1 xa_node
instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2
order-0 folios (or 4 order-0 for anonymous memory) instead of 512 order-0
folios.
Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split(). __folio_split() can
support both uniform split and buddy allocator like (or non-uniform)
split. All existing split_huge_page*() users can be gradually converted
to use folio_split() if possible. In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs. I also
semi-replicated Hugh's test[12] and ran it without any issue for almost 24
hours.
This patch (of 8):
A preparation patch for non-uniform folio split, which always split a
folio into half iteratively, and minimal xarray entry split.
Currently, xas_split_alloc() and xas_split() always split all slots from a
multi-index entry. They cost the same number of xa_node as the
to-be-split slots. For example, to split an order-9 entry, which takes
2^(9-6)=8 slots, assuming XA_CHUNK_SHIFT is 6 (!CONFIG_BASE_SMALL), 8
xa_node are needed. Instead xas_try_split() is intended to be used
iteratively to split the order-9 entry into 2 order-8 entries, then split
one order-8 entry, based on the given index, to 2 order-7 entries, ...,
and split one order-1 entry to 2 order-0 entries. When splitting the
order-6 entry and a new xa_node is needed, xas_try_split() will try to
allocate one if possible. As a result, xas_try_split() would only need 1
xa_node instead of 8.
When a new xa_node is needed during the split, xas_try_split() can try to
allocate one but no more. -ENOMEM will be return if a node cannot be
allocated. -EINVAL will be return if a sibling node is split or cascade
split happens, where two or more new nodes are needed, and these are not
supported by xas_try_split().
xas_split_alloc() and xas_split() split an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
| | | |
------- --- --- -------
| | ... | |
V V V V
----------- ----------- ----------- -----------
| xa_node | | xa_node | ... | xa_node | | xa_node |
----------- ----------- ----------- -----------
xas_try_split() splits an order-9 to order-0:
---------------------------------
| | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| | | | | | | | |
---------------------------------
|
|
V
-----------
| xa_node |
-----------
Link: https://lkml.kernel.org/r/20250307174001.242794-1-ziy@nvidia.com
Link: https://lkml.kernel.org/r/20250307174001.242794-2-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
- Replace "they" with "you" where "you" is used in the preceding
sentence fragment.
- Mention `xa_erase` in discussion of multi-index entries. Split this
into a separate sentence.
- Add "call" parentheses on "xa_store" for consistency and
linkification.
- Add caveat that `xa_store` and `xa_erase` are not equivalent in the
presence of `XA_FLAGS_ALLOC`.
Link: https://lkml.kernel.org/r/20241105-xarray-documentation-v5-1-8e1702321b41@gmail.com
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It wasn't obvious to all readers that it's unsafe to reuse an xa_state
after dropping the xas_lock() or the rcu_read_lock().
Reported-by: Charan Teja Kalla <charante@codeaurora.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
In order to use multi-index entries for huge pages in the page cache, we
need to be able to split a multi-index entry (eg if a file is truncated in
the middle of a huge page entry). This version does not support splitting
more than one level of the tree at a time. This is an acceptable
limitation for the page cache as we do not expect to support order-12
pages in the near future.
[akpm@linux-foundation.org: export xas_split_alloc() to modules]
[willy@infradead.org: fix xarray split]
Link: https://lkml.kernel.org/r/20200910175450.GV6583@casper.infradead.org
[willy@infradead.org: fix xarray]
Link: https://lkml.kernel.org/r/20201001233943.GW20115@casper.infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Qian Cai <cai@lca.pw>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/r/20200903183029.14930-3-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This function supports iterating over a range of an array. Also add
documentation links for xa_for_each_start().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
Move most of the mark-related documentation to its own section to make
it easier to understand. Add clarification that you can't search for
an unset mark, and you can't yet search for combinations of marks.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
Now that the build system automatically marks up function references, we
don't have to clutter the source files, so take it out.
[Some paragraphs could now benefit from refilling, but that was left out to
avoid obscuring the real changes.]
Acked-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Jason feels this is clearer, and it saves a function and an exported
symbol.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
This differs slightly from the IDR equivalent in five ways.
1. It can allocate up to UINT_MAX instead of being limited to INT_MAX,
like xa_alloc(). Also like xa_alloc(), it will write to the 'id'
pointer before placing the entry in the XArray.
2. The 'next' cursor is allocated separately from the XArray instead
of being part of the IDR. This saves memory for all the users which
do not use the cyclic allocation API and suits some users better.
3. It returns -EBUSY instead of -ENOSPC.
4. It will attempt to wrap back to the minimum value on memory allocation
failure as well as on an -EBUSY error, assuming that a user would
rather allocate a small ID than suffer an ID allocation failure.
5. It reports whether it has wrapped, which is important to some users.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
A lot of places want to allocate IDs starting at 1 instead of 0.
While the xa_alloc() API supports this, it's not very efficient if lots
of IDs are allocated, due to having to walk down to the bottom of the
tree to see if ID 1 is available, then all the way over to the next
non-allocated ID. This method marks ID 0 as being occupied which wastes
one slot in the XArray, but preserves xa_empty() as working.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
Userspace translates EEXIST to "File exists" which isn't a very good
error message for the problem. "Device or resource busy" is a better
indication of what went wrong.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
xa_insert() should treat reserved entries as occupied, not as available.
Also, it should treat requests to insert a NULL pointer as a request
to reserve the slot. Add xa_insert_bh() and xa_insert_irq() for
completeness.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
These convenience wrappers match the other _irq and _bh wrappers we
already have. It turns out I'd already open-coded xa_cmpxchg_irq()
in the shmem code, so convert that.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
Minor fixes.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
For allocating XArrays, it makes sense to distinguish beteen erasing an
entry and storing NULL. Storing NULL keeps the index allocated with a
NULL pointer associated with it while xa_erase() frees the index. Some
existing IDR users rely on this ability.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
These convenience wrappers disable interrupts while taking the spinlock.
A number of drivers would otherwise have to open-code these functions.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
The xa_reserve() function was a little unusual in that it attempted to
be callable for all kinds of locking scenarios. Make it look like the
other APIs with __xa_reserve, xa_reserve_bh and xa_reserve_irq variants.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
This version of xa_store_range() really only supports load and store.
Our only user only needs basic load and store functionality, so there's
no need to do the extra work to support marking and overlapping stores
correctly yet.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
Add the optional ability to track which entries in an XArray are free
and provide xa_alloc() to replace most of the functionality of the IDR.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
This function reserves a slot in the XArray for users which need
to acquire multiple locks before storing their entry in the tree and
so cannot use a plain xa_store().
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|
This is documentation on how to use the XArray, not details about its
internal implementation.
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Josef Bacik <jbacik@fb.com>
|