summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-23scripts/faddr2line: Don't filter out non-function symbols from readelfWill Deacon
As Josh points out in 20230724234734.zy67gm674vl3p3wv@treble: > Problem is, I think the kernel's symbol printing code prints the > nearest kallsyms symbol, and there are some valid non-FUNC code > symbols. For example, syscall_return_via_sysret. so we shouldn't be considering only 'FUNC'-type symbols in the output from readelf. Drop the function symbol type filtering from the faddr2line outer loop. Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230724234734.zy67gm674vl3p3wv@treble Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20231002165750.1661-2-will@kernel.org Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-10-23btrfs: fix unwritten extent buffer after snapshotting a new subvolumeFilipe Manana
When creating a snapshot of a subvolume that was created in the current transaction, we can end up not persisting a dirty extent buffer that is referenced by the snapshot, resulting in IO errors due to checksum failures when trying to read the extent buffer later from disk. A sequence of steps that leads to this is the following: 1) At ioctl.c:create_subvol() we allocate an extent buffer, with logical address 36007936, for the leaf/root of a new subvolume that has an ID of 291. We mark the extent buffer as dirty, and at this point the subvolume tree has a single node/leaf which is also its root (level 0); 2) We no longer commit the transaction used to create the subvolume at create_subvol(). We used to, but that was recently removed in commit 1b53e51a4a8f ("btrfs: don't commit transaction for every subvol create"); 3) The transaction used to create the subvolume has an ID of 33, so the extent buffer 36007936 has a generation of 33; 4) Several updates happen to subvolume 291 during transaction 33, several files created and its tree height changes from 0 to 1, so we end up with a new root at level 1 and the extent buffer 36007936 is now a leaf of that new root node, which is extent buffer 36048896. The commit root remains as 36007936, since we are still at transaction 33; 5) Creation of a snapshot of subvolume 291, with an ID of 292, starts at ioctl.c:create_snapshot(). This triggers a commit of transaction 33 and we end up at transaction.c:create_pending_snapshot(), in the critical section of a transaction commit. There we COW the root of subvolume 291, which is extent buffer 36048896. The COW operation returns extent buffer 36048896, since there's no need to COW because the extent buffer was created in this transaction and it was not written yet. The we call btrfs_copy_root() against the root node 36048896. During this operation we allocate a new extent buffer to turn into the root node of the snapshot, copy the contents of the root node 36048896 into this snapshot root extent buffer, set the owner to 292 (the ID of the snapshot), etc, and then we call btrfs_inc_ref(). This will create a delayed reference for each leaf pointed by the root node with a reference root of 292 - this includes a reference for the leaf 36007936. After that we set the bit BTRFS_ROOT_FORCE_COW in the root's state. Then we call btrfs_insert_dir_item(), to create the directory entry in in the tree of subvolume 291 that points to the snapshot. This ends up needing to modify leaf 36007936 to insert the respective directory items. Because the bit BTRFS_ROOT_FORCE_COW is set for the root's state, we need to COW the leaf. We end up at btrfs_force_cow_block() and then at update_ref_for_cow(). At update_ref_for_cow() we call btrfs_block_can_be_shared() which returns false, despite the fact the leaf 36007936 is shared - the subvolume's root and the snapshot's root point to that leaf. The reason that it incorrectly returns false is because the commit root of the subvolume is extent buffer 36007936 - it was the initial root of the subvolume when we created it. So btrfs_block_can_be_shared() which has the following logic: int btrfs_block_can_be_shared(struct btrfs_root *root, struct extent_buffer *buf) { if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) && buf != root->node && buf != root->commit_root && (btrfs_header_generation(buf) <= btrfs_root_last_snapshot(&root->root_item) || btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC))) return 1; return 0; } Returns false (0) since 'buf' (extent buffer 36007936) matches the root's commit root. As a result, at update_ref_for_cow(), we don't check for the number of references for extent buffer 36007936, we just assume it's not shared and therefore that it has only 1 reference, so we set the local variable 'refs' to 1. Later on, in the final if-else statement at update_ref_for_cow(): static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf, struct extent_buffer *cow, int *last_ref) { (...) if (refs > 1) { (...) } else { (...) btrfs_clear_buffer_dirty(trans, buf); *last_ref = 1; } } So we mark the extent buffer 36007936 as not dirty, and as a result we don't write it to disk later in the transaction commit, despite the fact that the snapshot's root points to it. Attempting to access the leaf or dumping the tree for example shows that the extent buffer was not written: $ btrfs inspect-internal dump-tree -t 292 /dev/sdb btrfs-progs v6.2.2 file tree key (292 ROOT_ITEM 33) node 36110336 level 1 items 2 free space 119 generation 33 owner 292 node 36110336 flags 0x1(WRITTEN) backref revision 1 checksum stored a8103e3e checksum calced a8103e3e fs uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 chunk uuid e8c9c885-78f4-4d31-85fe-89e5f5fd4a07 key (256 INODE_ITEM 0) block 36007936 gen 33 key (257 EXTENT_DATA 0) block 36052992 gen 33 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 total bytes 107374182400 bytes used 38572032 uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 The respective on disk region is full of zeroes as the device was trimmed at mkfs time. Obviously 'btrfs check' also detects and complains about this: $ btrfs check /dev/sdb Opening filesystem to check... Checking filesystem on /dev/sdb UUID: 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79 generation: 33 (33) [1/7] checking root items [2/7] checking extents checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 owner ref check failed [36007936 4096] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29 bad tree block 36007936, bytenr mismatch, want=36007936, have=0 The following tree block(s) is corrupted in tree 292: tree block bytenr: 36110336, level: 1, node key: (256, 1, 0) root 292 root dir 256 not found ERROR: errors found in fs roots found 38572032 bytes used, error(s) found total csum bytes: 16048 total tree bytes: 1265664 total fs tree bytes: 1118208 total extent tree bytes: 65536 btree space waste bytes: 562598 file data blocks allocated: 65978368 referenced 36569088 Fix this by updating btrfs_block_can_be_shared() to consider that an extent buffer may be shared if it matches the commit root and if its generation matches the current transaction's generation. This can be reproduced with the following script: $ cat test.sh #!/bin/bash MNT=/mnt/sdi DEV=/dev/sdi # Use a filesystem with a 64K node size so that we have the same node # size on every machine regardless of its page size (on x86_64 default # node size is 16K due to the 4K page size, while on PPC it's 64K by # default). This way we can make sure we are able to create a btree for # the subvolume with a height of 2. mkfs.btrfs -f -n 64K $DEV mount $DEV $MNT btrfs subvolume create $MNT/subvol # Create a few empty files on the subvolume, this bumps its btree # height to 2 (root node at level 1 and 2 leaves). for ((i = 1; i <= 300; i++)); do echo -n > $MNT/subvol/file_$i done btrfs subvolume snapshot -r $MNT/subvol $MNT/subvol/snap umount $DEV btrfs check $DEV Running it on a 6.5 kernel (or any 6.6-rc kernel at the moment): $ ./test.sh Create subvolume '/mnt/sdi/subvol' Create a readonly snapshot of '/mnt/sdi/subvol' in '/mnt/sdi/subvol/snap' Opening filesystem to check... Checking filesystem on /dev/sdi UUID: bbdde2ff-7d02-45ca-8a73-3c36f23755a1 [1/7] checking root items [2/7] checking extents parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure owner ref check failed [30539776 65536] ERROR: errors found in extent allocation tree or chunk allocation [3/7] checking free space tree [4/7] checking fs roots parent transid verify failed on 30539776 wanted 7 found 5 Ignoring transid failure Wrong key of child node/leaf, wanted: (256, 1, 0), have: (2, 132, 0) Wrong generation of child node/leaf, wanted: 5, have: 7 root 257 root dir 256 not found ERROR: errors found in fs roots found 917504 bytes used, error(s) found total csum bytes: 0 total tree bytes: 851968 total fs tree bytes: 393216 total extent tree bytes: 65536 btree space waste bytes: 736550 file data blocks allocated: 0 referenced 0 A test case for fstests will follow soon. Fixes: 1b53e51a4a8f ("btrfs: don't commit transaction for every subvol create") CC: stable@vger.kernel.org # 6.5+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-23drm/amdkfd: reserve a fence slot while locking the BOChristian König
Looks like the KFD still needs this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 8abc1eb2987a ("drm/amdkfd: switch over to using drm_exec v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com
2023-10-23powerpc/mm: Fix boot crash with FLATMEMMichael Ellerman
Erhard reported that his G5 was crashing with v6.6-rc kernels: mpic: Setting up HT PICs workarounds for U3/U4 BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe Faulting instruction address: 0xc00000000005dc40 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 6.6.0-rc3-PMacGS #1 Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c00000000005dc40 LR: c000000000066660 CTR: c000000000007730 REGS: c0000000022bf510 TRAP: 0380 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 44004242 XER: 00000000 IRQMASK: 3 GPR00: 0000000000000000 c0000000022bf7b0 c0000000010c0b00 00000000000001ac GPR04: 0000000003c80000 0000000000000300 c0000000f20001ae 0000000000000300 GPR08: 0000000000000006 feffbb62ffec65ff 0000000000000001 0000000000000000 GPR12: 9000000000001032 c000000002362000 c000000000f76b80 000000000349ecd8 GPR16: 0000000002367ba8 0000000002367f08 0000000000000006 0000000000000000 GPR20: 00000000000001ac c000000000f6f920 c0000000022cd985 000000000000000c GPR24: 0000000000000300 00000003b0a3691d c0003e008030000e 0000000000000000 GPR28: c00000000000000c c0000000f20001ee feffbb62ffec65fe 00000000000001ac NIP hash_page_do_lazy_icache+0x50/0x100 LR __hash_page_4K+0x420/0x590 Call Trace: hash_page_mm+0x364/0x6f0 do_hash_fault+0x114/0x2b0 data_access_common_virt+0x198/0x1f0 --- interrupt: 300 at mpic_init+0x4bc/0x10c4 NIP: c000000002020a5c LR: c000000002020a04 CTR: 0000000000000000 REGS: c0000000022bf9f0 TRAP: 0300 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 24004248 XER: 00000000 DAR: c0003e008030000e DSISR: 40000000 IRQMASK: 1 ... NIP mpic_init+0x4bc/0x10c4 LR mpic_init+0x464/0x10c4 --- interrupt: 300 pmac_setup_one_mpic+0x258/0x2dc pmac_pic_init+0x28c/0x3d8 init_IRQ+0x90/0x140 start_kernel+0x57c/0x78c start_here_common+0x1c/0x20 A bisect pointed to the breakage beginning with commit 9fee28baa601 ("powerpc: implement the new page table range API"). Analysis of the oops pointed to a struct page with a corrupted compound_head being loaded via page_folio() -> _compound_head() in hash_page_do_lazy_icache(). The access by the mpic code is to an MMIO address, so the expectation is that the struct page for that address would be initialised by init_unavailable_range(), as pointed out by Aneesh. Instrumentation showed that was not the case, which eventually lead to the realisation that pfn_valid() was returning false for that address, causing the struct page to not be initialised. Because the system is using FLATMEM, the version of pfn_valid() in memory_model.h is used: static inline int pfn_valid(unsigned long pfn) { ... return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr; } Which relies on max_mapnr being initialised. Early in boot max_mapnr is zero meaning no PFNs are valid. max_mapnr is initialised in mem_init() called via: start_kernel() mm_core_init() # init/main.c:928 mem_init() But that is too late for the usage in init_unavailable_range() called via: start_kernel() setup_arch() # init/main.c:893 paging_init() free_area_init() init_unavailable_range() Although max_mapnr is currently set in mem_init(), the value is actually already available much earlier, as soon as mem_topology_setup() has completed, which is also before paging_init() is called. So move the initialisation there, which causes paging_init() to correctly initialise the struct page and fixes the bug. This bug seems to have been lurking for years, but went unnoticed because the pre-folio code was inspecting the uninitialised page->flags but not dereferencing it. Thanks to Erhard and Aneesh for help debugging. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20230929132750.3cd98452@yea/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231023112500.1550208-1-mpe@ellerman.id.au
2023-10-23wifi: mac80211: don't drop all unprotected public action framesAvraham Stern
Not all public action frames have a protected variant. When MFP is enabled drop only public action frames that have a dual protected variant. Fixes: 76a3059cf124 ("wifi: mac80211: drop some unprotected action frames") Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20231016145213.2973e3c8d3bb.I6198b8d3b04cf4a97b06660d346caec3032f232a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-10-23wifi: cfg80211: fix assoc response warning on failed linksJohannes Berg
The warning here shouldn't be done before we even set the bss field (or should've used the input data). Move the assignment before the warning to fix it. We noticed this now because of Wen's bugfix, where the bug fixed there had previously hidden this other bug. Fixes: 53ad07e9823b ("wifi: cfg80211: support reporting failed links") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-10-23wifi: cfg80211: pass correct pointer to rdev_inform_bss()Ben Greear
Confusing struct member names here resulted in passing the wrong pointer, causing crashes. Pass the correct one. Fixes: eb142608e2c4 ("wifi: cfg80211: use a struct for inform_single_bss data") Signed-off-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20231021154827.1142734-1-greearb@candelatech.com [rewrite commit message, add fixes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-10-23Merge tag 'v6.6-rc7' into sched/core, to pick up fixesIngo Molnar
Pick up recent sched/urgent fixes merged upstream. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-10-23isdn: mISDN: hfcsusb: Spelling fix in commentKunwu Chan
protocoll -> protocol Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-22Linux 6.6-rc7v6.6-rc7Linus Torvalds
2023-10-22bcachefs: Refactor memcpy into direct assignmentKees Cook
The memcpy() in bch2_bkey_append_ptr() is operating on an embedded fake flexible array which looks to the compiler like it has 0 size. This causes W=1 builds to emit warnings due to -Wstringop-overflow: In file included from include/linux/string.h:254, from include/linux/bitmap.h:11, from include/linux/cpumask.h:12, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/radix-tree.h:14, from include/linux/backing-dev-defs.h:6, from fs/bcachefs/bcachefs.h:182: fs/bcachefs/extents.c: In function 'bch2_bkey_append_ptr': include/linux/fortify-string.h:57:33: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=] 57 | #define __underlying_memcpy __builtin_memcpy | ^ include/linux/fortify-string.h:648:9: note: in expansion of macro '__underlying_memcpy' 648 | __underlying_##op(p, q, __fortify_size); \ | ^~~~~~~~~~~~~ include/linux/fortify-string.h:693:26: note: in expansion of macro '__fortify_memcpy_chk' 693 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ | ^~~~~~~~~~~~~~~~~~~~ fs/bcachefs/extents.c:235:17: note: in expansion of macro 'memcpy' 235 | memcpy((void *) &k->v + bkey_val_bytes(&k->k), | ^~~~~~ fs/bcachefs/bcachefs_format.h:287:33: note: destination object 'v' of size 0 287 | struct bch_val v; | ^ Avoid making any structure changes and just replace the u64 copy into a direct assignment, side-stepping the entire problem. Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309192314.VBsjiIm5-lkp@intel.com/ Link: https://lore.kernel.org/r/20231010235609.work.594-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix drop_alloc_keys()Kent Overstreet
For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: snapshot_create_lockKent Overstreet
Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix snapshot skiplists during snapshot deletionKent Overstreet
In snapshot deleion, we have to pick new skiplist nodes for entries that point to nodes being deleted. The function that finds a new skiplist node, skipping over entries being deleted, was incorrect: if n = 0, but the parent node is being deleted, we also need to skip over that node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_sb_field_get() refactoringKent Overstreet
Instead of using token pasting to generate methods for each superblock section, just make the type a parameter to bch2_sb_field_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: KEY_TYPE_error now counts towards i_sectorsKent Overstreet
KEY_TYPE_error is used when all replicas in an extent are marked as failed; it indicates that data was present, but has been lost. So that i_sectors doesn't change when replacing extents with KEY_TYPE_error, we now have to count error keys as allocations - this fixes fsck errors later. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix handling of unknown bkey typesKent Overstreet
min_val_size was U8_MAX for unknown key types, causing us to flag any known key as invalid - it should have been 0. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Switch to unsafe_memcpy() in a few placesKent Overstreet
The new fortify checking doesn't work for us in all places; this switches to unsafe_memcpy() where appropriate to silence a few warnings/errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use struct_size()Christophe JAILLET
Use struct_size() instead of hand writing it. This is less verbose and more robust. While at it, prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Correctly initialize new buckets on device resizeKent Overstreet
bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix another smatch complaintKent Overstreet
This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Use strsep() in split_devs()Kent Overstreet
Minor refactoring to fix a smatch complaint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add iops fields to bch_memberHunter Shaffer
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1Hunter Shaffer
Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: New superblock section members_v2Hunter Shaffer
members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add new helper to retrieve bch_member from sbHunter Shaffer
Prep work for introducing bch_sb_field_members_v2 - introduce new helpers that will check for members_v2 if it exists, otherwise using v1 Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bucket_lock() is now a sleepable lockKent Overstreet
fsck_err() may sleep - it takes a mutex and may allocate memory, so bucket_lock() needs to be a sleepable lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: fix crc32c checksum merge byte order problemBrian Foster
An fsstress task on a big endian system (s390x) quickly produces a bunch of CRC errors in the system logs. Most of these are related to the narrow CRCs path, but the fundamental problem can be reduced to a single write and re-read (after dropping caches) of a previously merged extent. The key merge path that handles extent merges eventually calls into bch2_checksum_merge() to combine the CRCs of the associated extents. This code attempts to avoid a byte order swap by feeding the le64 values into the crc32c code, but the latter casts the resulting u64 value down to a u32, which truncates the high bytes where the actual crc value ends up. This results in a CRC value that does not change (since it is merged with a CRC of 0), and checksum failures ensue. Fix the checksum merge code to swap to cpu byte order on the boundaries to the external crc code such that any value casting is handled properly. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix bch2_inode_delete_keys()Kent Overstreet
bch2_inode_delete_keys() was using BTREE_ITER_NOT_EXTENTS, on the assumption that it would never need to split extents. But that caused a race with extents being split by other threads - specifically, the data move path. Extents iterators have the iterator position pointing to the start of the extent, which avoids the race. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make btree root read errors recoverableKent Overstreet
The entire btree will be lost, but that is better than the entire filesystem not being recoverable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fall back to requesting passphrase directlyKent Overstreet
We can only do this in userspace, unfortunately - but kernel keyrings have never seemed to worked reliably, this is a useful fallback. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix looping around bch2_propagate_key_to_snapshot_leaves()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errorsKent Overstreet
These errors aren't actual errors, and should never be printed - do this in the common helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Silence transaction restart error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More assertions for nocow lockingKent Overstreet
- assert in shutdown path that no nocow locks are held - check for overflow when taking nocow locks Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: nocow locking: Fix lock leakKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fixes for building in userspaceKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ignore unknown mount optionsKent Overstreet
This makes mount option handling consistent with other filesystems - options may be handled at different layers, so an option we don't know about might not be intended for us. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Always check for invalid bkeys in main commit pathKent Overstreet
Previously, we would check for invalid bkeys at transaction commit time, but only if CONFIG_BCACHEFS_DEBUG=y. This check is important enough to always be on - it appears there's been corruption making it into the journal that would have been caught by it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make sure to initialize equiv when creating new snapshotsKent Overstreet
Previously, equiv was set in the snapshot deletion path, which is where it's needed - equiv, for snapshot ID equivalence classes, would ideally be a private data structure to the snapshot deletion path. But if a new snapshot is created while snapshot deletion is running, move_key_to_correct_snapshot() moves a key to snapshot id 0 - oops. Fixes: https://github.com/koverstreet/bcachefs/issues/593 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a null ptr deref in bch2_get_alloc_in_memory_pos()Kent Overstreet
Reported-by: smatch Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix changing durability using sysfsTorge Matthies
Signed-off-by: Torge Matthies <openglfreak@googlemail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: initial freeze/unfreeze supportBrian Foster
Initial support for the vfs superblock freeze and unfreeze operations. Superblock freeze occurs in stages, where the vfs attempts to quiesce high level write operations, page faults, fs internal operations, and then finally calls into the filesystem for any last stage steps (i.e. log flushing, etc.) before marking the superblock frozen. The majority of write paths are covered by freeze protection (i.e. sb_start_write() and friends) in higher level common code, with the exception of the fs-internal SB_FREEZE_FS stage (i.e. sb_start_intwrite()). This typically maps to active filesystem transactions in a manner that allows the vfs to implement a barrier of internal fs operations during the freeze sequence. This is not a viable model for bcachefs, however, because it utilizes transactions both to populate the journal as well as to perform journal reclaim. This means that mapping intwrite protection to transaction lifecycle or transaction commit is likely to deadlock freeze, as quiescing the journal requires transactional operations blocked by the final stage of freeze. The flipside of this is that bcachefs does already maintain its own internal sets of write references for similar purposes, currently utilized for transitions from read-write to read-only mode. Since this largely mirrors the high level sequence involved with freeze, we can simply invoke this mechanism in the freeze callback to fully quiesce the filesystem in the final stage. This means that while the SB_FREEZE_FS stage is essentially a no-op, the ->freeze_fs() callback that immediately follows begins by performing effectively the same step by quiescing all internal write references. One caveat to this approach is that without integration of internal freeze protection, write operations gated on internal write refs will fail with an internal -EROFS error rather than block on acquiring freeze protection. IOW, this is roughly equivalent to only having support for sb_start_intwrite_trylock(), and not the blocking variant. Many of these paths already use non-blocking internal write refs and so would map into an sb_start_intwrite_trylock() anyways. The only instance of this I've been able to uncover that doesn't explicitly rely on a higher level non-blocking write ref is the bch2_rbio_narrow_crcs() path, which updates crcs in certain read cases, and Kent has pointed out isn't critical if it happens to fail due to read-only status. Given that, implement basic freeze support as described above and leave tighter integration with internal freeze protection as a possible future enhancement. There are multiple potential ideas worth exploring here. For example, we could implement a multi-stage freeze callback that might allow bcachefs to quiesce its internal write references without deadlocks, we could integrate intwrite protection with bcachefs' internal write references somehow or another, or perhaps consider implementing blocking support for internal write refs to be used specifically for freeze, etc. In the meantime, this enables functional freeze support and the associated test coverage that comes with it. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: More minor smatch fixesKent Overstreet
- fix a few uninitialized return values - return a proper error code in lookup_lostfound() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Minor bch2_btree_node_get() smatch fixesKent Overstreet
- it's no longer possible for trans to be NULL - also, move "wait for read to complete" to the slowpath, __bch2_btree_node_get(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: snapshots: Use kvfree_rcu_mightsleep()Kent Overstreet
kvfree_rcu() was renamed - not removed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix strndup_user() error checkingKent Overstreet
strndup_user() returns an error pointer, not NULL. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: drop journal lock before calling journal_writeKent Overstreet
bch2_journal_write() expects process context, it takes journal_lock as needed. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_ioctl_disk_resize_journal(): check for integer truncationKent Overstreet
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix error checks in bch2_chacha_encrypt_key()Kent Overstreet
crypto_alloc_sync_skcipher() returns an ERR_PTR, not NULL. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>