summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-21bcachefs: bch_snapshot::btimeKent Overstreet
Add a field to bch_snapshot for creation time; this will be important when we start exposing the snapshot tree to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: add missing __GFP_NOWARNKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: opts->compression can now also be applied in the backgroundKent Overstreet
The "apply this compression method in the background" paths now use the compression option if background_compression is not set; this means that setting or changing the compression option will cause existing data to be compressed accordingly in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Prep work for variable size btree node buffersKent Overstreet
bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: grab s_umount only if snapshottingSu Yue
When I was testing mongodb over bcachefs with compression, there is a lockdep warning when snapshotting mongodb data volume. $ cat test.sh prog=bcachefs $prog subvolume create /mnt/data $prog subvolume create /mnt/data/snapshots while true;do $prog subvolume snapshot /mnt/data /mnt/data/snapshots/$(date +%s) sleep 1s done $ cat /etc/mongodb.conf systemLog: destination: file logAppend: true path: /mnt/data/mongod.log storage: dbPath: /mnt/data/ lockdep reports: [ 3437.452330] ====================================================== [ 3437.452750] WARNING: possible circular locking dependency detected [ 3437.453168] 6.7.0-rc7-custom+ #85 Tainted: G E [ 3437.453562] ------------------------------------------------------ [ 3437.453981] bcachefs/35533 is trying to acquire lock: [ 3437.454325] ffffa0a02b2b1418 (sb_writers#10){.+.+}-{0:0}, at: filename_create+0x62/0x190 [ 3437.454875] but task is already holding lock: [ 3437.455268] ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.456009] which lock already depends on the new lock. [ 3437.456553] the existing dependency chain (in reverse order) is: [ 3437.457054] -> #3 (&type->s_umount_key#48){.+.+}-{3:3}: [ 3437.457507] down_read+0x3e/0x170 [ 3437.457772] bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.458206] __x64_sys_ioctl+0x93/0xd0 [ 3437.458498] do_syscall_64+0x42/0xf0 [ 3437.458779] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.459155] -> #2 (&c->snapshot_create_lock){++++}-{3:3}: [ 3437.459615] down_read+0x3e/0x170 [ 3437.459878] bch2_truncate+0x82/0x110 [bcachefs] [ 3437.460276] bchfs_truncate+0x254/0x3c0 [bcachefs] [ 3437.460686] notify_change+0x1f1/0x4a0 [ 3437.461283] do_truncate+0x7f/0xd0 [ 3437.461555] path_openat+0xa57/0xce0 [ 3437.461836] do_filp_open+0xb4/0x160 [ 3437.462116] do_sys_openat2+0x91/0xc0 [ 3437.462402] __x64_sys_openat+0x53/0xa0 [ 3437.462701] do_syscall_64+0x42/0xf0 [ 3437.462982] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.463359] -> #1 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}: [ 3437.463843] down_write+0x3b/0xc0 [ 3437.464223] bch2_write_iter+0x5b/0xcc0 [bcachefs] [ 3437.464493] vfs_write+0x21b/0x4c0 [ 3437.464653] ksys_write+0x69/0xf0 [ 3437.464839] do_syscall_64+0x42/0xf0 [ 3437.465009] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.465231] -> #0 (sb_writers#10){.+.+}-{0:0}: [ 3437.465471] __lock_acquire+0x1455/0x21b0 [ 3437.465656] lock_acquire+0xc6/0x2b0 [ 3437.465822] mnt_want_write+0x46/0x1a0 [ 3437.465996] filename_create+0x62/0x190 [ 3437.466175] user_path_create+0x2d/0x50 [ 3437.466352] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.466617] __x64_sys_ioctl+0x93/0xd0 [ 3437.466791] do_syscall_64+0x42/0xf0 [ 3437.466957] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.467180] other info that might help us debug this: [ 3437.469670] 2 locks held by bcachefs/35533: other info that might help us debug this: [ 3437.467507] Chain exists of: sb_writers#10 --> &c->snapshot_create_lock --> &type->s_umount_key#48 [ 3437.467979] Possible unsafe locking scenario: [ 3437.468223] CPU0 CPU1 [ 3437.468405] ---- ---- [ 3437.468585] rlock(&type->s_umount_key#48); [ 3437.468758] lock(&c->snapshot_create_lock); [ 3437.469030] lock(&type->s_umount_key#48); [ 3437.469291] rlock(sb_writers#10); [ 3437.469434] *** DEADLOCK *** [ 3437.469670] 2 locks held by bcachefs/35533: [ 3437.469838] #0: ffffa0a02ce00a88 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_fs_file_ioctl+0x1e3/0xc90 [bcachefs] [ 3437.470294] #1: ffffa0a02b2b10e0 (&type->s_umount_key#48){.+.+}-{3:3}, at: bch2_fs_file_ioctl+0x232/0xc90 [bcachefs] [ 3437.470744] stack backtrace: [ 3437.470922] CPU: 7 PID: 35533 Comm: bcachefs Kdump: loaded Tainted: G E 6.7.0-rc7-custom+ #85 [ 3437.471313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 3437.471694] Call Trace: [ 3437.471795] <TASK> [ 3437.471884] dump_stack_lvl+0x57/0x90 [ 3437.472035] check_noncircular+0x132/0x150 [ 3437.472202] __lock_acquire+0x1455/0x21b0 [ 3437.472369] lock_acquire+0xc6/0x2b0 [ 3437.472518] ? filename_create+0x62/0x190 [ 3437.472683] ? lock_is_held_type+0x97/0x110 [ 3437.472856] mnt_want_write+0x46/0x1a0 [ 3437.473025] ? filename_create+0x62/0x190 [ 3437.473204] filename_create+0x62/0x190 [ 3437.473380] user_path_create+0x2d/0x50 [ 3437.473555] bch2_fs_file_ioctl+0x2ec/0xc90 [bcachefs] [ 3437.473819] ? lock_acquire+0xc6/0x2b0 [ 3437.474002] ? __fget_files+0x2a/0x190 [ 3437.474195] ? __fget_files+0xbc/0x190 [ 3437.474380] ? lock_release+0xc5/0x270 [ 3437.474567] ? __x64_sys_ioctl+0x93/0xd0 [ 3437.474764] ? __pfx_bch2_fs_file_ioctl+0x10/0x10 [bcachefs] [ 3437.475090] __x64_sys_ioctl+0x93/0xd0 [ 3437.475277] do_syscall_64+0x42/0xf0 [ 3437.475454] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 3437.475691] RIP: 0033:0x7f2743c313af ====================================================== In __bch2_ioctl_subvolume_create(), we grab s_umount unconditionally and unlock it at the end of the function. There is a comment "why do we need this lock?" about the lock coming from commit 42d237320e98 ("bcachefs: Snapshot creation, deletion") The reason is that __bch2_ioctl_subvolume_create() calls sync_inodes_sb() which enforce locked s_umount to writeback all dirty nodes before doing snapshot works. Fix it by read locking s_umount for snapshotting only and unlocking s_umount after sync_inodes_sb(). Signed-off-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: kvfree bch_fs::snapshots in bch2_fs_snapshots_exitSu Yue
bch_fs::snapshots is allocated by kvzalloc in __snapshot_t_mut. It should be freed by kvfree not kfree. Or umount will triger: [ 406.829178 ] BUG: unable to handle page fault for address: ffffe7b487148008 [ 406.830676 ] #PF: supervisor read access in kernel mode [ 406.831643 ] #PF: error_code(0x0000) - not-present page [ 406.832487 ] PGD 0 P4D 0 [ 406.832898 ] Oops: 0000 [#1] PREEMPT SMP PTI [ 406.833512 ] CPU: 2 PID: 1754 Comm: umount Kdump: loaded Tainted: G OE 6.7.0-rc7-custom+ #90 [ 406.834746 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 406.835796 ] RIP: 0010:kfree+0x62/0x140 [ 406.836197 ] Code: 80 48 01 d8 0f 82 e9 00 00 00 48 c7 c2 00 00 00 80 48 2b 15 78 9f 1f 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 56 9f 1f 01 <48> 8b 50 08 48 89 c7 f6 c2 01 0f 85 b0 00 00 00 66 90 48 8b 07 f6 [ 406.837810 ] RSP: 0018:ffffb9d641607e48 EFLAGS: 00010286 [ 406.838213 ] RAX: ffffe7b487148000 RBX: ffffb9d645200000 RCX: ffffb9d641607dc4 [ 406.838738 ] RDX: 000065bb00000000 RSI: ffffffffc0d88b84 RDI: ffffb9d645200000 [ 406.839217 ] RBP: ffff9a4625d00068 R08: 0000000000000001 R09: 0000000000000001 [ 406.839650 ] R10: 0000000000000001 R11: 000000000000001f R12: ffff9a4625d4da80 [ 406.840055 ] R13: ffff9a4625d00000 R14: ffffffffc0e2eb20 R15: 0000000000000000 [ 406.840451 ] FS: 00007f0a264ffb80(0000) GS:ffff9a4e2d500000(0000) knlGS:0000000000000000 [ 406.840851 ] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 406.841125 ] CR2: ffffe7b487148008 CR3: 000000018c4d2000 CR4: 00000000000006f0 [ 406.841464 ] Call Trace: [ 406.841583 ] <TASK> [ 406.841682 ] ? __die+0x1f/0x70 [ 406.841828 ] ? page_fault_oops+0x159/0x470 [ 406.842014 ] ? fixup_exception+0x22/0x310 [ 406.842198 ] ? exc_page_fault+0x1ed/0x200 [ 406.842382 ] ? asm_exc_page_fault+0x22/0x30 [ 406.842574 ] ? bch2_fs_release+0x54/0x280 [bcachefs] [ 406.842842 ] ? kfree+0x62/0x140 [ 406.842988 ] ? kfree+0x104/0x140 [ 406.843138 ] bch2_fs_release+0x54/0x280 [bcachefs] [ 406.843390 ] kobject_put+0xb7/0x170 [ 406.843552 ] deactivate_locked_super+0x2f/0xa0 [ 406.843756 ] cleanup_mnt+0xba/0x150 [ 406.843917 ] task_work_run+0x59/0xa0 [ 406.844083 ] exit_to_user_mode_prepare+0x197/0x1a0 [ 406.844302 ] syscall_exit_to_user_mode+0x16/0x40 [ 406.844510 ] do_syscall_64+0x4e/0xf0 [ 406.844675 ] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 406.844907 ] RIP: 0033:0x7f0a2664e4fb Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bios must be 512 byte alginedKent Overstreet
Fixes: 023f9ac9f70f bcachefs: Delete dio read alignment check Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: remove redundant variable tmpColin Ian King
The variable tmp is being assigned a value but it isn't being read afterwards. The assignment is redundant and so tmp can be removed. Cleans up clang scan build warning: warning: Although the value stored to 'ret' is used in the enclosing expression, the value is never actually read from 'ret' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Improve trace_trans_restart_relockKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Fix excess transaction restarts in __bchfs_fallocate()Kent Overstreet
drop_locks_do() should not be used in a fastpath without first trying the do in nonblocking mode - the unlock and relock will cause excessive transaction restarts and potentially livelocking with other threads that are contending for the same locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: extents_to_bp_stateKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bkey_and_val_eq()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Better journal tracepointsKent Overstreet
Factor out bch2_journal_bufs_to_text(), and use it in the journal_entry_full() tracepoint; when we can't get a journal reservation we need to know the outstanding journal entry sizes to know if the problem is due to excessive flushing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Print size of superblock with space allocatedKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Avoid flushing the journal in the discard pathKent Overstreet
When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Improve move_extent tracepointKent Overstreet
Also print out the data_opts, so that we can see what specifically is being done to an extent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Add missing bch2_moving_ctxt_flush_all()Kent Overstreet
This fixes a bug with rebalance IOs getting stuck with reads completed, but writes never being issued. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Re-add move_extent_write tracepointKent Overstreet
It appears this was accidentally deleted at some point - also, do a bit of cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch2_kthread_io_clock_wait() no longer sleeps until full amountKent Overstreet
Drop t he loop in bch2_kthread_io_clock_wait(): this allows the code that uses it to be woken up for other reasons, and fixes a bug where rebalance wouldn't wake up when a scan was requested. This raises the possibility of spurious wakeups, but callers should always be able to handle that reasonably well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Add .val_to_text() for KEY_TYPE_cookieKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Don't pass memcmp() as a pointerKent Overstreet
Some (buggy!) compilers have issues with this. Fixes: https://github.com/koverstreet/bcachefs/issues/625 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21Merge tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull header fix from Kent Overstreet: "Just one small fixup for the RT build" * tag 'header_cleanup-2024-01-20' of https://evilpiepirate.org/git/bcachefs: spinlock: Fix failing build for PREEMPT_RT
2024-01-21idpf: distinguish vports by the dev_port attributeMichal Schmidt
idpf registers multiple netdevs (virtual ports) for one PCI function, but it does not provide a way for userspace to distinguish them with sysfs attributes. Per Documentation/ABI/testing/sysfs-class-net, it is a bug not to set dev_port for independent ports on the same PCI bus, device and function. Without dev_port set, systemd-udevd's default naming policy attempts to assign the same name ("ens2f0") to all four idpf netdevs on my test system and obviously fails, leaving three of them with the initial eth<N> name. With this patch, systemd-udevd is able to assign unique names to the netdevs (e.g. "ens2f0", "ens2f0d1", "ens2f0d2", "ens2f0d3"). The Intel-provided out-of-tree idpf driver already sets dev_port. In this patch I chose to do it in the same place in the idpf_cfg_netdev function. Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-21udp: fix busy pollingEric Dumazet
Generic sk_busy_loop_end() only looks at sk->sk_receive_queue for presence of packets. Problem is that for UDP sockets after blamed commit, some packets could be present in another queue: udp_sk(sk)->reader_queue In some cases, a busy poller could spin until timeout expiration, even if some packets are available in udp_sk(sk)->reader_queue. v3: - make sk_busy_loop_end() nicer (Willem) v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats. - add a sk_is_inet() check in sk_is_udp() (Willem feedback) - add a sk_is_inet() check in sk_is_tcp(). Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-21bcachefs: Reduce would_deadlock restartsKent Overstreet
We don't have to take locks in any particular ordering - we'll make forward progress just fine - but if we try to stick to an ordering, it can help to avoid excessive would_deadlock transaction restarts. This tweaks the reflink path to take extents btree locks in the right order. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch2_trans_account_disk_usage_change()Kent Overstreet
The disk space accounting rewrite is splitting out accounting for each replicas set - those are moving to btree keys, instead of percpu counters. This breaks bch2_trans_fs_usage_apply() up, splitting out the part we will still need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch_fs_usage_baseKent Overstreet
Split out base filesystem usage into its own type; prep work for breaking up bch2_trans_fs_usage_apply(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: bch2_prt_compression_type()Kent Overstreet
bounds checking helper, since compression types are extensible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: helpers for printing data typesKent Overstreet
We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: BTREE_TRIGGER_ATOMICKent Overstreet
Add a new flag to be explicit about when we're running atomic triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: drop to_text code for obsolete bps in alloc keysKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: eytzinger_for_each() declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Don't log errors if BCH_WRITE_ALLOC_NOWAITKent Overstreet
Previously, we added logging in the write path to ensure that any unexpected errors getting reported to userspace have a log message; but BCH_WRITE_ALLOC_NOWAIT is a special case, it's used for promotes where errors are expected and not reported out to userspace - so we need to silence those. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: fix memleak in bch2_split_devsSu Yue
The pointer dev_name can be modified by strseq(), then causes the memleak: unreferenced object 0xffff9d08a2916c80 (size 32): comm "mount.bcachefs", pid 9090, jiffies 4295856224 (age 17.564s) hex dump (first 32 bytes): 2f 64 65 76 2f 6d 61 70 70 65 72 2f 74 65 73 74 /dev/mapper/test 2d 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -0.............. backtrace: [<00000000c5d3be7d>] __kmem_cache_alloc_node+0x1f3/0x2c0 [<0000000052215d26>] __kmalloc_node_track_caller+0x51/0x150 [<0000000069fea956>] kstrdup+0x32/0x60 [<000000000877fcf1>] bch2_split_devs+0x3f/0x150 [bcachefs] [<000000007ee93204>] bch2_mount+0xcb/0x640 [bcachefs] [<000000002dd1e04b>] legacy_get_tree+0x30/0x60 [<000000006afc31d3>] vfs_get_tree+0x28/0xf0 [<000000007b0c538e>] path_mount+0x475/0xb60 [<0000000092de5882>] __x64_sys_mount+0x105/0x140 [<0000000054fc05d8>] do_syscall_64+0x42/0xf0 [<00000000df584910>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 Fix it by copy pointer dev_name at beginning and free the copied pointer at end. Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21fbdev: sis: Error out if pixclock equals zeroFullway Wang
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error. In sisfb_check_var(), var->pixclock is used as a divisor to caculate drate before it is checked against zero. Fix this by checking it at the beginning. This is similar to CVE-2022-3061 in i740fb which was fixed by commit 15cf0b8. Signed-off-by: Fullway Wang <fullwaywang@outlook.com> Signed-off-by: Helge Deller <deller@gmx.de>
2024-01-21fbdev: savage: Error out if pixclock equals zeroFullway Wang
The userspace program could pass any values to the driver through ioctl() interface. If the driver doesn't check the value of pixclock, it may cause divide-by-zero error. Although pixclock is checked in savagefb_decode_var(), but it is not checked properly in savagefb_probe(). Fix this by checking whether pixclock is zero in the function savagefb_check_var() before info->var.pixclock is used as the divisor. This is similar to CVE-2022-3061 in i740fb which was fixed by commit 15cf0b8. Signed-off-by: Fullway Wang <fullwaywang@outlook.com> Signed-off-by: Helge Deller <deller@gmx.de>
2024-01-21fbdev: vt8500lcdfb: Remove unnecessary print function dev_err()Jiapeng Chong
The print function dev_err() is redundant because platform_get_irq() already prints an error. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7824 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Helge Deller <deller@gmx.de>
2024-01-20Merge tag 'v6.8-rc-part2-smb-client' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client updates from Steve French: "Various smb client fixes, including multichannel and for SMB3.1.1 POSIX extensions: - debugging improvement (display start time for stats) - two reparse point handling fixes - various multichannel improvements and fixes - SMB3.1.1 POSIX extensions open/create parsing fix - retry (reconnect) improvement including new retrans mount parm, and handling of two additional return codes that need to be retried on - two minor cleanup patches and another to remove duplicate query info code - two documentation cleanup, and one reviewer email correction" * tag 'v6.8-rc-part2-smb-client' of git://git.samba.org/sfrench/cifs-2.6: cifs: update iface_last_update on each query-and-update cifs: handle servers that still advertise multichannel after disabling cifs: new mount option called retrans cifs: reschedule periodic query for server interfaces smb: client: don't clobber ->i_rdev from cached reparse points smb: client: get rid of smb311_posix_query_path_info() smb: client: parse owner/group when creating reparse points smb: client: fix parsing of SMB3.1.1 POSIX create context cifs: update known bugs mentioned in kernel docs for cifs cifs: new nt status codes from MS-SMB2 cifs: pick channel for tcon and tdis cifs: open_cached_dir should not rely on primary channel smb3: minor documentation updates Update MAINTAINERS email address cifs: minor comment cleanup smb3: show beginning time for per share stats cifs: remove redundant variable tcon_exist
2024-01-20Merge tag 'dmaengine-fix-6.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New support: - Loongson LS2X APB DMA controller - sf-pdma: mpfs-pdma support - Qualcomm X1E80100 GPI dma controller support Updates: - Xilinx XDMA updates to support interleaved DMA transfers - TI PSIL threads for AM62P and J722S and cfg register regions description - axi-dmac Improving the cyclic DMA transfers - Tegra Support dma-channel-mask property - Remaining platform remove callback returning void conversions Driver fixes for: - Xilinx xdma driver operator precedence and initialization fix - Excess kernel-doc warning fix in imx-sdma xilinx xdma drivers - format-overflow warning fix for rz-dmac, sh usb dmac drivers - 'output may be truncated' fix for shdma, fsl-qdma and dw-edma drivers" * tag 'dmaengine-fix-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (58 commits) dmaengine: dw-edma: increase size of 'name' in debugfs code dmaengine: fsl-qdma: increase size of 'irq_name' dmaengine: shdma: increase size of 'dev_id' dmaengine: xilinx: xdma: Fix kernel-doc warnings dmaengine: usb-dmac: Avoid format-overflow warning dmaengine: sh: rz-dmac: Avoid format-overflow warning dmaengine: imx-sdma: fix Excess kernel-doc warnings dmaengine: xilinx: xdma: Fix initialization location of desc in xdma_channel_isr() dmaengine: xilinx: xdma: Fix operator precedence in xdma_prep_interleaved_dma() dmaengine: xilinx: xdma: statify xdma_prep_interleaved_dma dmaengine: xilinx: xdma: Workaround truncation compilation error dmaengine: pl330: issue_pending waits until WFP state dmaengine: xilinx: xdma: Implement interleaved DMA transfers dmaengine: xilinx: xdma: Prepare the introduction of interleaved DMA transfers dmaengine: xilinx: xdma: Add transfer error reporting dmaengine: xilinx: xdma: Add error checking in xdma_channel_isr() dmaengine: xilinx: xdma: Rework xdma_terminate_all() dmaengine: xilinx: xdma: Ease dma_pool alignment requirements dmaengine: xilinx: xdma: Add necessary macro definitions dmaengine: xilinx: xdma: Get rid of unused code ...
2024-01-20Merge tag 'coccinelle-for-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux Pull coccinelle updates from Julia Lawall: "Updates to the device_attr_show semantic patch to reflect the new guidelines of the Linux kernel documentation. The problem was identified by Li Zhijian <lizhijian@fujitsu.com>, who proposed an initial fix" * tag 'coccinelle-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux: coccinelle: device_attr_show: simplify patch case coccinelle: device_attr_show: Adapt to the latest Documentation/filesystems/sysfs.rst
2024-01-20media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c)Aurelien Jarno
This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10 driver. This improves the readability and more importantly, for the solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13): - the preprocessed size from 121 MiB to 4.5 MiB; - the build CPU time from 46.8 s to 1.6 s; - the build memory from 2786 MiB to 98MiB. In fine, this allows this relatively simple C file to be built on a 32-bit system. Reported-by: Jiri Slaby <jirislaby@gmail.com> Closes: https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/ Cc: <stable@vger.kernel.org> # v6.7+ Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: David Laight <David.Laight@ACULAB.COM> Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-20coccinelle: device_attr_show: simplify patch caseJulia Lawall
Replacing the final expression argument by ... allows the format string to have multiple arguments. It also has the advantage of allowing the change to be recognized as a change in a single statement, thus avoiding adding unneeded braces. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
2024-01-20execve: open the executable file before doing anything elseLinus Torvalds
No point in allocating a new mm, counting arguments and environment variables etc if we're just going to return ENOENT. This patch does expose the fact that 'do_filp_open()' that execve() uses is still unnecessarily expensive in the failure case, because it allocates the 'struct file *' early, even if the path lookup (which is heavily optimized) fails. So that remains an unnecessary cost in the "no such executable" case, but it's a separate issue. Regardless, I do not want to do _both_ a filename_lookup() and a later do_filp_open() like the origin patch by Josh Triplett did in [1]. Reported-by: Josh Triplett <josh@joshtriplett.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/lkml/5c7333ea4bec2fad1b47a8fa2db7c31e4ffc4f14.1663334978.git.josh@joshtriplett.org/ [1] Link: https://lore.kernel.org/lkml/202209161637.9EDAF6B18@keescook/ Link: https://lore.kernel.org/lkml/CAHk-=wgznerM-xs+x+krDfE7eVBiy_HOam35rbsFMMOwvYuEKQ@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAHk-=whf9qLO8ipps4QhmS0BkM8mtWJhvnuDSdtw5gFjhzvKNA@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-01-20Merge tag 'riscv-for-linus-6.8-mw4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for tuning for systems with fast misaligned accesses. - Support for SBI-based suspend. - Support for the new SBI debug console extension. - The T-Head CMOs now use PA-based flushes. - Support for enabling the V extension in kernel code. - Optimized IP checksum routines. - Various ftrace improvements. - Support for archrandom, which depends on the Zkr extension. - The build is no longer broken under NET=n, KUNIT=y for ports that don't define their own ipv6 checksum. * tag 'riscv-for-linus-6.8-mw4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (56 commits) lib: checksum: Fix build with CONFIG_NET=n riscv: lib: Check if output in asm goto supported riscv: Fix build error on rv32 + XIP riscv: optimize ELF relocation function in riscv RISC-V: Implement archrandom when Zkr is available riscv: Optimize hweight API with Zbb extension riscv: add dependency among Image(.gz), loader(.bin), and vmlinuz.efi samples: ftrace: Add RISC-V support for SAMPLE_FTRACE_DIRECT[_MULTI] riscv: ftrace: Add DYNAMIC_FTRACE_WITH_DIRECT_CALLS support riscv: ftrace: Make function graph use ftrace directly riscv: select FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY lib/Kconfig.debug: Update AS_HAS_NON_CONST_LEB128 comment and name riscv: Restrict DWARF5 when building with LLVM to known working versions riscv: Hoist linker relaxation disabling logic into Kconfig kunit: Add tests for csum_ipv6_magic and ip_fast_csum riscv: Add checksum library riscv: Add checksum header riscv: Add static key for misaligned accesses asm-generic: Improve csum_fold RISC-V: selftests: cbo: Ensure asm operands match constraints ...
2024-01-20Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Final round of fixes that came in too late to send in the first request. It's nine bug fixes and one version update (because of a bug fix) and one set of PCI ID additions. There's one bug fix in the core which is really a one liner (except that an additional sdev pointer was added for convenience) and the rest are in drivers" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: core: Add TMF to tmr_list handling scsi: core: Kick the requeue list after inserting when flushing scsi: fnic: unlock on error path in fnic_queuecommand() scsi: fcoe: Fix unsigned comparison with zero in store_ctlr_mode() scsi: mpi3mr: Fix mpi3mr_fw.c kernel-doc warnings scsi: smartpqi: Bump driver version to 2.1.26-030 scsi: smartpqi: Fix logical volume rescan race condition scsi: smartpqi: Add new controller PCI IDs scsi: ufs: qcom: Remove unnecessary goto statement from ufs_qcom_config_esi() scsi: ufs: core: Remove the ufshcd_hba_exit() call from ufshcd_async_scan() scsi: ufs: core: Simplify power management during async scan
2024-01-20Merge tag 'sh-for-v6.8-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "Since the large patch series to convert arch/sh to device tree support has not been finalized yet due to various maintainers still asking for changes to the series, this ended up being rather small consisting of just two fixes. The first patch by Geert Uytterhoeven addresses a build failure in the EcoVec platform code. And the second patch by Masahiro Yamada removes an unnecessary $(foreach ...) found in a Makefile of the vsyscall code. - Rename missed backlight field from fbdev to dev - Remove unnecessary $(foreach ...)" * tag 'sh-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: vsyscall: Remove unnecessary $(foreach ...) sh: ecovec24: Rename missed backlight field from fbdev to dev
2024-01-20Merge tag 'fbdev-for-6.8-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev fix from Helge Deller: "There were various reports from people without any graphics output on the screen and it turns out one commit triggers the problem. - Revert 'firmware/sysfb: Clear screen_info state after consuming it'" * tag 'fbdev-for-6.8-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: Revert "firmware/sysfb: Clear screen_info state after consuming it"
2024-01-19llc: Drop support for ETH_P_TR_802_2.Kuniyuki Iwashima
syzbot reported an uninit-value bug below. [0] llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2 (0x0011), and syzbot abused the latter to trigger the bug. write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16) llc_conn_handler() initialises local variables {saddr,daddr}.mac based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes them to __llc_lookup(). However, the initialisation is done only when skb->protocol is htons(ETH_P_802_2), otherwise, __llc_lookup_established() and __llc_lookup_listener() will read garbage. The missing initialisation existed prior to commit 211ed865108e ("net: delete all instances of special processing for token ring"). It removed the part to kick out the token ring stuff but forgot to close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv(). Let's remove llc_tr_packet_type and complete the deprecation. [0]: BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90 __llc_lookup_established+0xe9d/0xf90 __llc_lookup net/llc/llc_conn.c:611 [inline] llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 __netif_receive_skb_one_core net/core/dev.c:5527 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641 netif_receive_skb_internal net/core/dev.c:5727 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5786 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555 tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x1490 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b Local variable daddr created at: llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023 Fixes: 211ed865108e ("net: delete all instances of special processing for token ring") Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-19llc: make llc_ui_sendmsg() more robust against bonding changesEric Dumazet
syzbot was able to trick llc_ui_sendmsg(), allocating an skb with no headroom, but subsequently trying to push 14 bytes of Ethernet header [1] Like some others, llc_ui_sendmsg() releases the socket lock before calling sock_alloc_send_skb(). Then it acquires it again, but does not redo all the sanity checks that were performed. This fix: - Uses LL_RESERVED_SPACE() to reserve space. - Check all conditions again after socket lock is held again. - Do not account Ethernet header for mtu limitation. [1] skbuff: skb_under_panic: text:ffff800088baa334 len:1514 put:14 head:ffff0000c9c37000 data:ffff0000c9c36ff2 tail:0x5dc end:0x6c0 dev:bond0 kernel BUG at net/core/skbuff.c:193 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6875 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00101-g0802e17d9aca-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_panic net/core/skbuff.c:189 [inline] pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 lr : skb_panic net/core/skbuff.c:189 [inline] lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 sp : ffff800096f97000 x29: ffff800096f97010 x28: ffff80008cc8d668 x27: dfff800000000000 x26: ffff0000cb970c90 x25: 00000000000005dc x24: ffff0000c9c36ff2 x23: ffff0000c9c37000 x22: 00000000000005ea x21: 00000000000006c0 x20: 000000000000000e x19: ffff800088baa334 x18: 1fffe000368261ce x17: ffff80008e4ed000 x16: ffff80008a8310f8 x15: 0000000000000001 x14: 1ffff00012df2d58 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000ff0100 x9 : e28a51f1087e8400 x8 : e28a51f1087e8400 x7 : ffff80008028f8d0 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff800082b78714 x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000089 Call trace: skb_panic net/core/skbuff.c:189 [inline] skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 skb_push+0xf0/0x108 net/core/skbuff.c:2451 eth_header+0x44/0x1f8 net/ethernet/eth.c:83 dev_hard_header include/linux/netdevice.h:3188 [inline] llc_mac_hdr_init+0x110/0x17c net/llc/llc_output.c:33 llc_sap_action_send_xid_c+0x170/0x344 net/llc/llc_s_ac.c:85 llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline] llc_sap_next_state net/llc/llc_sap.c:182 [inline] llc_sap_state_process+0x1ec/0x774 net/llc/llc_sap.c:209 llc_build_and_send_xid_pkt+0x12c/0x1c0 net/llc/llc_sap.c:270 llc_ui_sendmsg+0x7bc/0xb1c net/llc/af_llc.c:997 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_sendmsg+0x194/0x274 net/socket.c:767 splice_to_socket+0x7cc/0xd58 fs/splice.c:881 do_splice_from fs/splice.c:933 [inline] direct_splice_actor+0xe4/0x1c0 fs/splice.c:1142 splice_direct_to_actor+0x2a0/0x7e4 fs/splice.c:1088 do_splice_direct+0x20c/0x348 fs/splice.c:1194 do_sendfile+0x4bc/0xc70 fs/read_write.c:1254 __do_sys_sendfile64 fs/read_write.c:1322 [inline] __se_sys_sendfile64 fs/read_write.c:1308 [inline] __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1308 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595 Code: aa1803e6 aa1903e7 a90023f5 94792f6a (d4210000) Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+2a7024e9502df538e8ef@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240118183625.4007013-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-19vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPINGLin Ma
In the vlan_changelink function, a loop is used to parse the nested attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to obtain the struct ifla_vlan_qos_mapping. These two nested attributes are checked in the vlan_validate_qos_map function, which calls nla_validate_nested_deprecated with the vlan_map_policy. However, this deprecated validator applies a LIBERAL strictness, allowing the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC. Consequently, the loop in vlan_changelink may parse an attribute of type IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of struct ifla_vlan_qos_mapping, which is not necessarily true. To address this issue and ensure compatibility, this patch introduces two type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING. Fixes: 07b5b17e157b ("[VLAN]: Use rtnl_link API") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>