linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
12 days	btrfs: accessors: rename variable for folio offset	David Sterba
	There used to be 'oip' short for offset in page, which got changed during conversion to folios, the name is a bit confusing so rename it. Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: factor out split memcpy with two sources	David Sterba
	The case of a reading the bytes from 2 folios needs two memcpy()s, the compiler does not emit calls but two inline loops. Factoring out the code makes some improvement (stack, code) and in the future will provide an optimized implementation as well. (The analogical version with two destinations is not done as it increases stack usage but can be done if needed.) The address of the second folio is reordered before the first memcpy, which leads to an optimization reusing the vmemmap_base and page_offset_base (implementing folio_address()). Stack usage reduction: btrfs_get_32 -8 (32 -> 24) btrfs_get_64 -8 (32 -> 24) Code size reduction: text data bss dec hex filename 1454279 115665 16088 1586032 183370 pre/btrfs.ko 1454229 115665 16088 1585982 18333e post/btrfs.ko DELTA: -50 As this is the last patch in this series, here's the overall diff starting and including commit "btrfs: accessors: simplify folio bounds checks": Stack: btrfs_set_16 -72 (88 -> 16) btrfs_get_32 -56 (80 -> 24) btrfs_set_8 -72 (88 -> 16) btrfs_set_64 -64 (88 -> 24) btrfs_get_8 -72 (80 -> 8) btrfs_get_16 -64 (80 -> 16) btrfs_set_32 -64 (88 -> 24) btrfs_get_64 -56 (80 -> 24) NEW (48): report_setget_bounds 48 LOST/NEW DELTA: +48 PRE/POST DELTA: -472 Code: text data bss dec hex filename 1456601 115665 16088 1588354 183c82 pre/btrfs.ko 1454229 115665 16088 1585982 18333e post/btrfs.ko DELTA: -2372 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: set target address at initialization	David Sterba
	The target address for the read/write can be simplified as it's the same expression for the first folio. This improves the generated code as the folio address does not have to be cached on stack. Stack usage reduction: btrfs_set_32 -8 (32 -> 24) btrfs_set_64 -8 (32 -> 24) btrfs_get_16 -8 (24 -> 16) Code size reduction: text data bss dec hex filename 1454459 115665 16088 1586212 183424 pre/btrfs.ko 1454279 115665 16088 1586032 183370 post/btrfs.ko DELTA: -180 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: compile-time fast path for u16	David Sterba
	Reading/writing 2 bytes (u16) may need 2 folios to be written to, each time it's just one byte so using memcpy for that is an overkill. Add a branch for the split case so that memcpy is now used for u32 and u64. Another side effect is that the u16 types now don't need additional stack space, everything fits to registers. Stack usage is reduced: btrfs_get_16 -8 (32 -> 24) btrfs_set_16 -16 (32 -> 16) Code size reduction: text data bss dec hex filename 1454691 115665 16088 1586444 18350c pre/btrfs.ko 1454459 115665 16088 1586212 183424 post/btrfs.ko DELTA: -232 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: compile-time fast path for u8	David Sterba
	Reading/writing 1 byte (u8) is a special case compared to the others as it's always contained in the folio we find, so the split memcpy will never be needed. Turn it to a compile-time check that the memcpy part can be optimized out. The stack usage is reduced: btrfs_set_8 -16 (32 -> 16) btrfs_get_8 -16 (24 -> 8) Code size reduction: text data bss dec hex filename 1454951 115665 16088 1586704 183610 pre/btrfs.ko 1454691 115665 16088 1586444 18350c post/btrfs.ko DELTA: -260 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: inline eb bounds check and factor out the error report	David Sterba
	There's a check in each set/get helper if the requested range is within extent buffer bounds, and if it's not then report it. This was in an ASSERT statement so with CONFIG_BTRFS_ASSERT this crashes right away, on other configs this is only reported but reading out of the bounds is done anyway. There are currently no known reports of this particular condition failing. There are some drawbacks though. The behaviour dependence on the assertions being compiled in or not and a less visible effect of inlining report_setget_bounds() into each helper. As the bounds check is expected to succeed almost always it's ok to inline it but make the report a function and move it out of the helper completely (__cold puts it to a different section). This also skips reading/writing the requested range in case it fails. This improves stack usage significantly: btrfs_get_16 -48 (80 -> 32) btrfs_get_32 -48 (80 -> 32) btrfs_get_64 -48 (80 -> 32) btrfs_get_8 -48 (72 -> 24) btrfs_set_16 -56 (88 -> 32) btrfs_set_32 -56 (88 -> 32) btrfs_set_64 -56 (88 -> 32) btrfs_set_8 -48 (80 -> 32) NEW (48): report_setget_bounds 48 LOST/NEW DELTA: +48 PRE/POST DELTA: -360 Same as .ko size: text data bss dec hex filename 1456079 115665 16088 1587832 183a78 pre/btrfs.ko 1454951 115665 16088 1586704 183610 post/btrfs.ko DELTA: -1128 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: use type sizeof constants directly	David Sterba
	Now unit_size is used only once, so use it directly in 'part' calculation. Don't cache sizeof(type) in a variable. While this is a compile-time constant, forcing the type 'int' generates worse code as it leads to additional conversion from 32 to 64 bit type on x86_64. The sizeof() is used only a few times and it does not make the code that harder to read, so use it directly and let the compiler utilize the immediate constants in the context it needs. The .ko code size slightly increases (+50) but further patches will reduce that again. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: simplify folio bounds checks	David Sterba
	As we can a have non-contiguous range in the eb->folios, any item can be straddling two folios and we need to check if it can be read in one go or in two parts. For that there's a check which is not implemented in the simplest way: offset in folio + size <= folio size With a simple expression transformation: oil + size <= unit_size size <= unit_size - oil sizeof() <= part this can be simplified and reusing existing run-time or compile-time constants. Add likely() annotation for this expression as this is the fast path and compiler sometimes reorders that after the followup block with the memcpy (observed in practice with other simplifications). Overall effect on stack consumption: btrfs_get_8 -8 (80 -> 72) btrfs_set_8 -8 (88 -> 80) And .ko size (due to optimizations making use of the direct constants): text data bss dec hex filename 1456601 115665 16088 1588354 183c82 pre/btrfs.ko 1456093 115665 16088 1587846 183a86 post/btrfs.ko DELTA: -508 Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
12 days	btrfs: accessors: delete token versions of set/get helpers	David Sterba
	Once upon a time there was a need to cache address of extent buffer pages, as it was a costly operation (map_private_extent_buffer(), cfed81a04eb555 ("Btrfs: add the ability to cache a pointer into the eb")). This was not even due to use of HIGHMEM, this had been removed before that due to possible locking issues (a65917156e3459 ("Btrfs: stop using highmem for extent_buffers")). Over time the amount of work in the set/get helpers got reduced and became quite straightforward bounds checking with an unaligned read/write, commit db3756c879773c ("btrfs: remove unused map_private_extent_buffer"). The actual caching of the page_address()/folio_address() in the token was more work for very little gain. This depended on subsequent access into the same page/folio, otherwise the cached pointer had to be updated. For metadata-heavy operations this showed up in the 'perf top' profile where the btrfs_get_token_32() calls were at the top, on my testing machine consuming about 2-3%. The other generic 32/64 bit helpers also appeared in the profile with similar fraction. After removing use of the token helpers we can remove them completely, this leads to reduction of btrfs.ko by 6.7KiB on release config. text data bss dec hex filename 1463289 115665 16088 1595042 1856a2 pre/btrfs.ko 1456601 115665 16088 1588354 183c82 post/btrfs.ko DELTA: -6688 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-02	move asm/unaligned.h to linux/unaligned.h	Al Viro
	asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-03-04	btrfs: remove unused included headers	David Sterba
	With help of neovim, LSP and clangd we can identify header files that are not actually needed to be included in the .c files. This is focused only on removal (with minor fixups), further cleanups are possible but will require doing the header files properly with forward declarations, minimized includes and include-what-you-use care. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-04	btrfs: cache folio size and shift in extent_buffer	Qu Wenruo
	After the conversion to folio interfaces (but without the patch to enable larger folio allocation), there is an LTP report about observable performance drop on metadata heavy operations. https://lore.kernel.org/linux-btrfs/202312221750.571925bd-oliver.sang@intel.com/ This drop is caused by the extra code of calculating the folio_size()/folio_shift(), instead of the old hard coded PAGE_SIZE/PAGE_SHIFT. To slightly reduce the overhead, just cache both folio_size and folio_shift in extent_buffer. The two new members (u32 folio_size and u8 folio_shift) are stored inside the holes of extent_buffer. folio_size is shared with len, which is reduced to u32. The size of eb does not change. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15	btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios	Qu Wenruo
	These two functions are still using the old page based code, which is not going to handle larger folios at all. The migration itself is going to involve the following changes: - PAGE_SIZE -> folio_size() - PAGE_SHIFT -> folio_shift() - get_eb_page_index() -> get_eb_folio_index() - get_eb_offset_in_page() -> get_eb_offset_in_folio() And since we're going to support larger folios, although above straight conversion is good enough, this patch would add extra comments in the involved functions to explain why the same single line code can now cover 3 cases: - folio_size == PAGE_SIZE, sectorsize == PAGE_SIZE, nodesize >= PAGE_SIZE The common, non-subpage case with per-page folio. - folio_size > PAGE_SIZE, sectorsize == PAGE_SIZE, nodesize >= PAGE_SIZE The incoming larger folio, non-subpage case. - folio_size == PAGE_SIZE, sectorsize < PAGE_SIZE, nodesize < PAGE_SIZE The existing subpage case, we won't larger folio anyway. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15	btrfs: migrate extent_buffer::pages[] to folio	Qu Wenruo
	For now extent_buffer::pages[] are still only accepting single page pointer, thus we can migrate to folios pretty easily. As for single page, page and folio are 1:1 mapped, including their page flags. This patch would just do the conversion from struct page to struct folio, providing the first step to higher order folio in the future. This conversion is pretty simple: - extent_buffer::pages[] -> extent_buffer::folios[] - page_address(eb->pages[i]) -> folio_address(eb->pages[i]) - eb->pages[i] -> folio_page(eb->folios[i], 0) There would be more specific cleanups preparing for the incoming higher order folio support. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05	btrfs: add eb to btrfs_node_key_ptr_offset	Josef Bacik
	This is a change needed for extent tree v2, as we will be growing the header size. This exists in btrfs-progs currently, and not having it makes syncing accessors.[ch] more problematic. So make this change to set us up for extent tree v2 and match what btrfs-progs does to make syncing easier. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05	btrfs: move btrfs_map_token to accessors	Josef Bacik
	This is specific to the item-accessor code, move it out of ctree.h into accessor.h/.c and then update the users to include the new header file. This un-inlines btrfs_init_map_token, however this is only called once per function so it's not critical to be inlined. This also saves 904 bytes of code on a release build. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05	btrfs: rename struct-funcs.c to accessors.c	Josef Bacik
	Rename struct-funcs.c to accessors.c so we can move the item accessors out of ctree.h. accessors.c is a better description of the code that is contained in these files. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>