summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-27bcachefs: dirent_points_to_inode() now warns on mismatchKent Overstreet
if an inode backpointer points to a dirent that doesn't point back, that's an error we should warn about. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix lost wake upAlan Huang
If the reader acquires the read lock and then the writer enters the slow path, while the reader proceeds to the unlock path, the following scenario can occur without the change: writer: pcpu_read_count(lock) return 1 (so __do_six_trylock will return 0) reader: this_cpu_dec(*lock->readers) reader: smp_mb() reader: state = atomic_read(&lock->state) (there is no waiting flag set) writer: six_set_bitmask() then the writer will sleep forever. Signed-off-by: Alan Huang <mmpgouride@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Check for logged ops when cleanKent Overstreet
If we shut down successfully, there shouldn't be any logged ops to resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: BCH_FS_clean_recoveryKent Overstreet
Add a filesystem flag to indicate whether we did a clean recovery - using c->sb.clean after we've got rw is incorrect, since c->sb is updated whenever we write the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Convert disk accounting BUG_ON() to WARN_ON()Kent Overstreet
We had a bug where disk accounting keys didn't always have their version field set in journal replay; change the BUG_ON() to a WARN(), and exclude this case since it's now checked for elsewhere (in the bkey validate function). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_applyKent Overstreet
This was added to avoid double-counting accounting keys in journal replay. But applied incorrectly (easily done since it applies to the transaction commit, not a particular update), it leads to skipping in-mem accounting for real accounting updates, and failure to give them a version number - which leads to journal replay becoming very confused the next time around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Check for accounting keys with bversion=0Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: rename version -> bversionKent Overstreet
give bversions a more distinct name, to aid in grepping Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Don't delete unlinked inodes before logged op resumeKent Overstreet
Previously, check_inode() would delete unlinked inodes if they weren't on the deleted list - this code dating from before there was a deleted list. But, if we crash during a logged op (truncate or finsert/fcollapse) of an unlinked file, logged op resume will get confused if the inode has already been deleted - instead, just add it to the deleted list if it needs to be there; delete_dead_inodes runs after logged op resume. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix BCH_SB_ERRS() so we can reorderKent Overstreet
BCH_SB_ERRS() has a field for the actual enum val so that we can reorder to reorganize, but the way BCH_SB_ERR_MAX was defined didn't allow for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix fsck warnings from bkey validationKent Overstreet
__bch2_fsck_err() warns if the current task has a btree_trans object and it wasn't passed in, because if it has to prompt for user input it has to be able to unlock it. But plumbing the btree_trans through bkey_validate(), as well as transaction restarts, is problematic - so instead make bkey fsck errors FSCK_AUTOFIX, which doesn't need to warn. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Move transaction commit path validation to as late as possibleKent Overstreet
In order to check for accounting keys with version=0, we need to run validation after they've been assigned version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix disk accounting attempting to mark invalid replicas entryKent Overstreet
This fixes the following bug, where a disk accounting key has an invalid replicas entry, and we attempt to add it to the superblock: bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): starting version 1.12: rebalance_work_acct_fix opts=metadata_replicas=2,data_replicas=2,foreground_target=ssd,background_target=hdd,nopromote_whole_extents,verbose,fsck,fix_errors=yes bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): recovering from clean shutdown, journal seq 15211644 bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): accounting_read... accounting not marked in superblock replicas replicas cached: 1/1 [0], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0] replicas_v0 (size 88): user: 2 [3 5] user: 2 [1 4] cached: 1 [2] btree: 2 [1 2] user: 2 [2 5] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 2 [1 2] user: 2 [2 3] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 3] user: 2 [1 5] user: 2 [2 4] bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): inconsistency detected - emergency read only at journal seq 15211644 accounting not marked in superblock replicas replicas user: 1/1 [3], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 0 in entry cached: 1/1 [0] replicas_v0 (size 96): user: 2 [3 5] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] accounting not marked in superblock replicas replicas user: 1/2 [3 7], fixing bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): sb invalid before write: Invalid superblock section replicas_v0: invalid device 7 in entry user: 1/2 [3 7] replicas_v0 (size 96): user: 2 [3 7] user: 2 [1 3] cached: 1 [2] btree: 2 [1 2] user: 2 [2 4] cached: 1 [0] cached: 1 [4] journal: 2 [1 5] user: 1 [3] user: 2 [1 5] user: 2 [3 4] user: 2 [4 5] cached: 1 [1] cached: 1 [3] cached: 1 [5] journal: 2 [1 2] journal: 2 [2 5] btree: 2 [2 5] user: 2 [1 2] user: 2 [1 4] user: 2 [2 3] user: 2 [2 5] user: 2 [3 5] done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): alloc_read... done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): stripes_read... done bcachefs (3c0860e8-07ca-4276-8954-11c1774be868): snapshots_read... done Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix accounting read + device removalKent Overstreet
accounting read was checking if accounting replicas entries were marked in the superblock prior to applying accounting from the journal, which meant that a recently removed device could spuriously trigger a "not marked in superblocked" error (when journal entries zero out the offending counter). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: bch_accounting_modeKent Overstreet
Minor refactoring - replace multiple bool arguments with an enum; prep work for fixing a bug in accounting read. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: fix transaction restart handling in check_extents(), check_dirents()Kent Overstreet
Dealing with outside state within a btree transaction is always tricky. check_extents() and check_dirents() have to accumulate counters for i_sectors and i_nlink (for subdirectories). There were two bugs: - transaction commit may return a restart; therefore we have to commit before accumulating to those counters - get_inode_all_snapshots() may return a transaction restart, before updating w->last_pos; then, on the restart, check_i_sectors()/check_subdir_count() would see inodes that were not for w->last_pos Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: kill inode_walker_entry.seen_this_posKent Overstreet
dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix incorrect IS_ERR_OR_NULL usageKent Overstreet
Returning a positive integer instead of an error code causes error paths to become very confused. Closes: syzbot+c0360e8367d6d8d04a66@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: fix the memory leak in exception caseHongbo Li
The pointer clean points the memory allocated by kmemdup, when the return value of bch2_sb_clean_validate_late is not zero. The memory pointed by clean is leaked. So we should free it in this case. Fixes: a37ad1a3aba9 ("bcachefs: sb-clean.c") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: fast exit when darray_make_room failedHongbo Li
In downgrade_table_extra, the return value is needed. When it return failed, we should exit immediately. Fixes: 7773df19c35f ("bcachefs: metadata version bucket_stripe_sectors") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix iterator leak in check_subvol()Kent Overstreet
A couple small error handling fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Add snapshot to bch_inode_unpackedKent Overstreet
this allows for various cleanups in fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: assign return error when iterating through layoutDiogo Jahchan Koike
syzbot reported a null ptr deref in __copy_user [0] In __bch2_read_super, when a corrupt backup superblock matches the default opts offset, no error is assigned to ret and the freed superblock gets through, possibly being assigned as the best sb in bch2_fs_open and being later dereferenced, causing a fault. Assign EINVALID to ret when iterating through layout. [0]: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Reported-by: syzbot+18a5c5e8a9c856944876@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=18a5c5e8a9c856944876 Signed-off-by: Diogo Jahchan Koike <djahchankoike@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix srcu warning in check_topologyKent Overstreet
check_topology doesn't need the srcu lock and doesn't use normal btree transactions - we can just drop the srcu lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Fix error path in check_dirent_inode_dirent()Kent Overstreet
fsck_err() jumps to the fsck_err label when bailing out; need to make sure bp_iter was initialized... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: memset bounce buffer portion to 0 after key_sort_fix_overlappingPiotr Zalewski
Zero-initialize part of allocated bounce buffer which wasn't touched by subsequent bch2_key_sort_fix_overlapping to mitigate later uinit-value use KMSAN bug[1]. After applying the patch reproducer still triggers stack overflow[2] but it seems unrelated to the uninit-value use warning. After further investigation it was found that stack overflow occurs because KMSAN adds too many function calls[3]. Backtrace of where the stack magic number gets smashed was added as a reply to syzkaller thread[3]. It was confirmed that task's stack magic number gets smashed after the code path where KSMAN detects uninit-value use is executed, so it can be assumed that it doesn't contribute in any way to uninit-value use detection. [1] https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 [2] https://lore.kernel.org/lkml/66e57e46.050a0220.115905.0002.GAE@google.com [3] https://lore.kernel.org/all/rVaWgPULej8K7HqMPNIu8kVNyXNjjCiTB-QBtItLFBmk0alH6fV2tk4joVPk97Evnuv4ZRDd8HB5uDCkiFG6u81xKdzDj-KrtIMJSlF6Kt8=@proton.me Reported-by: syzbot+6f655a60d3244d0c6718@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6f655a60d3244d0c6718 Fixes: ec4edd7b9d20 ("bcachefs: Prep work for variable size btree node buffers") Suggested-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Improve bch2_is_inode_open() warning messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Add extra padding in bkey_make_mut_noupdate()Kent Overstreet
This fixes a kasan splat in propagate_key_to_snapshot_leaves() - varint_decode_fast() does reads (that it never uses) up to 7 bytes past the end of the integer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-27bcachefs: Mark inode errors as autofixKent Overstreet
Most or all errors will be autofix in the future, we're currently just doing the ones that we know are well tested. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-28Merge tag 'amd-drm-fixes-6.12-2024-09-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-6.12-2024-09-27: amdgpu: - MES 12 fix - KFD fence sync fix - SR-IOV fixes - VCN 4.0.6 fix - SDMA 7.x fix - Bump driver version to note cleared VRAM support - SWSMU fix amdgpu: - CU occupancy logic fix - SDMA queue fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927202819.2978109-1-alexander.deucher@amd.com
2024-09-28fbdev: sisfb: Fix strbuf array overflowAndrey Shumilin
The values of the variables xres and yres are placed in strbuf. These variables are obtained from strbuf1. The strbuf1 array contains digit characters and a space if the array contains non-digit characters. Then, when executing sprintf(strbuf, "%ux%ux8", xres, yres); more than 16 bytes will be written to strbuf. It is suggested to increase the size of the strbuf array to 24. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Andrey Shumilin <shum.sdl@nppct.ru> Signed-off-by: Helge Deller <deller@gmx.de>
2024-09-27Merge tag 'pm-6.12-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix idle states enumeration in the intel_idle driver on platforms supporting multiple flavors of the C6 idle state (Artem Bityutskiy)" * tag 'pm-6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: fix ACPI _CST matching for newer Xeon platforms
2024-09-27sched_ext: Remove redundant p->nr_cpus_allowed checkerZhang Qiao
select_rq_task() already checked that 'p->nr_cpus_allowed > 1', 'p->nr_cpus_allowed == 1' checker in scx_select_cpu_dfl() is redundant. Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27sched_ext: Decouple locks in scx_ops_enable()Tejun Heo
The enable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and cpus_read_lock. Currently, the locks are grabbed together which is prone to locking order problems. For example, currently, there is a possible deadlock involving scx_fork_rwsem and cpus_read_lock. cpus_read_lock has to nest inside scx_fork_rwsem due to locking order existing in other subsystems. However, there exists a dependency in the other direction during hotplug if hotplug needs to fork a new task, which happens in some cases. This leads to the following deadlock: scx_ops_enable() hotplug percpu_down_write(&cpu_hotplug_lock) percpu_down_write(&scx_fork_rwsem) block on cpu_hotplug_lock kthread_create() waits for kthreadd kthreadd blocks on scx_fork_rwsem Note that this doesn't trigger lockdep because the hotplug side dependency bounces through kthreadd. With the preceding scx_cgroup_enabled change, this can be solved by decoupling cpus_read_lock, which is needed for static_key manipulations, from the other two locks. - Move the first block of static_key manipulations outside of scx_fork_rwsem and scx_cgroup_rwsem. This is now safe with the preceding scx_cgroup_enabled change. - Drop scx_cgroup_rwsem and scx_fork_rwsem between the two task iteration blocks so that __scx_ops_enabled static_key enabling is outside the two rwsems. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
2024-09-27sched_ext: Decouple locks in scx_ops_disable_workfn()Tejun Heo
The disable path uses three big locks - scx_fork_rwsem, scx_cgroup_rwsem and cpus_read_lock. Currently, the locks are grabbed together which is prone to locking order problems. With the preceding scx_cgroup_enabled change, we can decouple them: - As cgroup disabling no longer requires modifying a static_key which requires cpus_read_lock(), no need to grab cpus_read_lock() before grabbing scx_cgroup_rwsem. - cgroup can now be independently disabled before tasks are moved back to the fair class. Relocate scx_cgroup_exit() invocation before scx_fork_rwsem is grabbed, drop now unnecessary cpus_read_lock() and move static_key operations out of scx_fork_rwsem. This decouples all three locks in the disable path. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Link: http://lkml.kernel.org/r/8cd0ec0c4c7c1bc0119e61fbef0bee9d5e24022d.camel@linux.ibm.com
2024-09-27sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix ↵Tejun Heo
scx_tg_online() If the BPF scheduler does not implement ops.cgroup_init(), scx_tg_online() didn't set SCX_TG_INITED which meant that ops.cgroup_exit(), even if implemented, won't be called from scx_tg_offline(). This is because SCX_HAS_OP(cgroupt_init) is used to test both whether SCX cgroup operations are enabled and ops.cgroup_init() exists. Fix it by introducing a separate bool scx_cgroup_enabled to gate cgroup operations and use SCX_HAS_OP(cgroup_init) only to test whether ops.cgroup_init() exists. Make all cgroup operations consistently use scx_cgroup_enabled to test whether cgroup operations are enabled. scx_cgroup_enabled is added instead of using scx_enabled() to ease planned locking updates. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27sched_ext: Enable scx_ops_init_task() separatelyTejun Heo
scx_ops_init_task() and the follow-up scx_ops_enable_task() in the fork path were gated by scx_enabled() test and thus __scx_ops_enabled had to be turned on before the first scx_ops_init_task() loop in scx_ops_enable(). However, if an external entity causes sched_class switch before the loop is complete, tasks which are not initialized could be switched to SCX. The following can be reproduced by running a program which keeps toggling a process between SCHED_OTHER and SCHED_EXT using sched_setscheduler(2). sched_ext: Invalid task state transition 0 -> 3 for fish[1623] WARNING: CPU: 1 PID: 1650 at kernel/sched/ext.c:3392 scx_ops_enable_task+0x1a1/0x200 ... Sched_ext: simple (enabling) RIP: 0010:scx_ops_enable_task+0x1a1/0x200 ... switching_to_scx+0x13/0xa0 __sched_setscheduler+0x850/0xa50 do_sched_setscheduler+0x104/0x1c0 __x64_sys_sched_setscheduler+0x18/0x30 do_syscall_64+0x7b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix it by gating scx_ops_init_task() separately using scx_ops_init_task_enabled. __scx_ops_enabled is now set after all tasks are finished with scx_ops_init_task(). Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable()Tejun Heo
scx_ops_enable() has two task iteration loops. The first one calls scx_ops_init_task() on every task and the latter switches the eligible ones into SCX. The first loop left the tasks in SCX_TASK_INIT state and then the second loop switched it into READY before switching the task into SCX. The distinction between INIT and READY is only meaningful in the fork path where it's used to tell whether the task finished forking so that we can tell ops.exit_task() accordingly. Leaving task in INIT state between the two loops is incosistent with the fork path and incorrect. The following can be triggered by running a program which keeps toggling a task between SCHED_OTHER and SCHED_SCX while enabling a task: sched_ext: Invalid task state transition 1 -> 3 for fish[1526] WARNING: CPU: 2 PID: 1615 at kernel/sched/ext.c:3393 scx_ops_enable_task+0x1a1/0x200 ... Sched_ext: qmap (enabling+all) RIP: 0010:scx_ops_enable_task+0x1a1/0x200 ... switching_to_scx+0x13/0xa0 __sched_setscheduler+0x850/0xa50 do_sched_setscheduler+0x104/0x1c0 __x64_sys_sched_setscheduler+0x18/0x30 do_syscall_64+0x7b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix it by transitioning to READY in the first loop right after scx_ops_init_task() succeeds. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Vernet <void@manifault.com>
2024-09-27sched_ext: Initialize in bypass modeTejun Heo
scx_ops_enable() used preempt_disable() around the task iteration loop to switch tasks into SCX to guarantee forward progress of the task which is running scx_ops_enable(). However, in the gap between setting __scx_ops_enabled and preeempt_disable(), an external entity can put tasks including the enabling one into SCX prematurely, which can lead to malfunctions including stalls. The bypass mode can wrap the entire enabling operation and guarantee forward progress no matter what the BPF scheduler does. Use the bypass mode instead to guarantee forward progress while enabling. While at it, release and regrab scx_tasks_lock between the two task iteration locks in scx_ops_enable() for clarity as there is no reason to keep holding the lock between them. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27sched_ext: Remove SCX_OPS_PREPPINGTejun Heo
The distinction between SCX_OPS_PREPPING and SCX_OPS_ENABLING is not used anywhere and only adds confusion. Drop SCX_OPS_PREPPING. Signed-off-by: Tejun Heo <tj@kernel.org>
2024-09-27sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable()Tejun Heo
check_hotplug_seq() is used to detect CPU hotplug event which occurred while the BPF scheduler is being loaded so that initialization can be retried if CPU hotplug events take place before the CPU hotplug callbacks are online. As such, the best place to call it is in the same cpu_read_lock() section that enables the CPU hotplug ops. Currently, it is called in the next cpus_read_lock() block in scx_ops_enable(). The side effect of this placement is a small window in which hotplug sequence detection can trigger unnecessarily, which isn't critical. Move check_hotplug_seq() invocation to the same cpus_read_lock() block as the hotplug operation enablement to close the window and get the invocation out of the way for planned locking updates. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: David Vernet <void@manifault.com>
2024-09-27Merge tag 'uml-for-linus-6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Removal of dead code (TT mode leftovers, etc) - Fixes for the network vector driver - Fixes for time-travel mode * tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: fix time-travel syscall scheduling hack um: Remove outdated asm/sysrq.h header um: Remove the declaration of user_thread function um: Remove the call to SUBARCH_EXECVE1 macro um: Remove unused mm_fd field from mm_id um: Remove unused fields from thread_struct um: Remove the redundant newpage check in update_pte_range um: Remove unused kpte_clear_flush macro um: Remove obsoleted declaration for execute_syscall_skas user_mode_linux_howto_v2: add VDE vector support in doc vector_user: add VDE support um: remove ARCH_NO_PREEMPT_DYNAMIC um: vector: Fix NAPI budget handling um: vector: Replace locks guarding queue depth with atomics um: remove variable stack array in os_rcv_fd_msg()
2024-09-27ovl: fix file leak in ovl_real_fdget_meta()Amir Goldstein
ovl_open_realfile() is wrongly called twice after conversion to new struct fd. Fixes: 88a2f6468d01 ("struct fd: representation change") Reported-by: syzbot+d9efec94dcbfa0de1c07@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-27Merge tag 'random-6.12-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull more random number generator updates from Jason Donenfeld: - Christophe realized that the LoongArch64 instructions could be scheduled more similar to how GCC generates code, which Ruoyao implemented, for a 5% speedup from basically some rearrangements - An update to MAINTAINERS to match the right files * tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: LoongArch: vDSO: Tune chacha implementation MAINTAINERS: make vDSO getrandom matches more generic
2024-09-27Merge tag 'bitmap-for-6.12' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - switch all bitmamp APIs from inline to __always_inline (Brian Norris) The __always_inline series improves on code generation, and now with the latest compiler versions is required to avoid compilation warnings. It spent enough in my backlog, and I'm thankful to Brian Norris for taking over and moving it forward. - introduce GENMASK_U128() macro (Anshuman Khandual) GENMASK_U128() is a prerequisite needed for arm64 development * tag 'bitmap-for-6.12' of https://github.com/norov/linux: lib/test_bits.c: Add tests for GENMASK_U128() uapi: Define GENMASK_U128 nodemask: Switch from inline to __always_inline cpumask: Switch from inline to __always_inline bitmap: Switch from inline to __always_inline find: Switch from inline to __always_inline
2024-09-27Merge tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyoLinus Torvalds
Pull tomoyo updates from Tetsuo Handa: "One bugfix patch, one preparation patch, and one conversion patch. TOMOYO is useful as an analysis tool for learning how a Linux system works. My boss was hoping that SELinux's policy is generated from what TOMOYO has observed. A translated paper describing it is available at https://master.dl.sourceforge.net/project/tomoyo/docs/nsf2003-en.pdf/nsf2003-en.pdf?viasf=1 Although that attempt failed due to mapping problem between inode and pathname, TOMOYO remains as an access restriction tool due to ability to write custom policy by individuals. I was delivering pure LKM version of TOMOYO (named AKARI) to users who cannot afford rebuilding their distro kernels with TOMOYO enabled. But since the LSM framework was converted to static calls, it became more difficult to deliver AKARI to such users. Therefore, I decided to update TOMOYO so that people can use mostly LKM version of TOMOYO with minimal burden for both distributors and users" * tag 'tomoyo-pr-20240927' of git://git.code.sf.net/p/tomoyo/tomoyo: tomoyo: fallback to realpath if symlink's pathname does not exist tomoyo: allow building as a loadable LSM module tomoyo: preparation step for building as a loadable LSM module
2024-09-27Merge tag 'cxl-for-6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull compute express link (cxl) updates from Dave Jiang: "Major changes address HDM decoder initialization from DVSEC ranges, refactoring the code related to cxl mailboxes to be independent of the memory devices, and adding support for shared upstream link access_coordinate calculation, as well as a change to remove locking from memory notifier callback. In addition, a number of misc cleanups and refactoring of the code are also included. Address HDM decoder initialization from DVSEC ranges: - Only register non-zero DVSEC ranges - Remove duplicate implementation of waiting for memory_info_valid - Simplify the checking of mem_enabled in cxl_hdm_decode_init() Refactor the code related to cxl mailboxes to be independent of the memory devices: - Move cxl headers in include/linux/ to include/cxl - Move all mailbox related data to 'struct cxl_mailbox' - Refactor mailbox APIs with 'struct cxl_mailbox' as input instead of memory device state Add support for shared upstream link access_coordinate calculation for configurations that have multiple targets under a switch or a root port where the aggregated bandwidth can be greater than the upstream link of the switch/RP upstream link: - Preserve the CDAT access_coordinate from an endpoint - Add the support for shared upstream link access_coordinate calculation - Add documentation to explain how the calculations are done Remove locking from memory notifier callback. Misc cleanups: - Convert devm_cxl_add_root() to return using ERR_CAST() - cxl_test use dev_is_platform() instead of open coding - Remove duplicate include of header core.h in core/cdat.c - use scoped resource management to drop put_device() for cxl_port - Use scoped_guard to drop device_lock() for cxl_port - Refactor __devm_cxl_add_port() to drop gotos - Rename cxl_setup_parent_dport to cxl_dport_init_aer and cxl_dport_map_regs() to cxl_dport_map_ras() - Refactor cxl_dport_init_aer() to be more concise - Remove duplicate host_bridge->native_aer checking in cxl_dport_init_ras_reporting() - Fix comment for cxl_query_cmd()" * tag 'cxl-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (21 commits) cxl: Add documentation to explain the shared link bandwidth calculation cxl: Calculate region bandwidth of targets with shared upstream link cxl: Preserve the CDAT access_coordinate for an endpoint cxl: Fix comment regarding cxl_query_cmd() return data cxl: Convert cxl_internal_send_cmd() to use 'struct cxl_mailbox' as input cxl: Move mailbox related bits to the same context cxl: move cxl headers to new include/cxl/ directory cxl/region: Remove lock from memory notifier callback cxl/pci: simplify the check of mem_enabled in cxl_hdm_decode_init() cxl/pci: Check Mem_info_valid bit for each applicable DVSEC cxl/pci: Remove duplicated implementation of waiting for memory_info_valid cxl/pci: Fix to record only non-zero ranges cxl/pci: Remove duplicate host_bridge->native_aer checking cxl/pci: cxl_dport_map_rch_aer() cleanup cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs() cxl/port: Refactor __devm_cxl_add_port() to drop goto pattern cxl/port: Use scoped_guard()/guard() to drop device_lock() for cxl_port cxl/port: Use __free() to drop put_device() for cxl_port cxl: Remove duplicate included header file core.h tools/testing/cxl: Use dev_is_platform() ...
2024-09-27selftests: vDSO: align stack for O2-optimized memcpyJason A. Donenfeld
When switching on -O2, gcc generates SSE2 instructions that assume a 16-byte aligned stack, which the standalone test's start point wasn't aligning. Fix this with the usual alignment sequence. Fixes: ecb8bd70d51 ("selftests: vDSO: build tests with O2 optimization") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409241558.98e13f6f-oliver.sang@intel.com Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-27HID: bpf: fix cfi stubs for hid_bpf_opsBenjamin Tissoires
With the introduction of commit e42ac1418055 ("bpf: Check unsupported ops from the bpf_struct_ops's cfi_stubs"), a HID-BPF struct_ops containing a .hid_hw_request() or a .hid_hw_output_report() was failing to load as the cfi stubs were not defined. Fix that by defining those simple static functions and restore HID-BPF functionality. This was detected with the HID selftests suddenly failing on Linus' tree. Cc: stable@vger.kernel.org # v6.11+ Fixes: 9286675a2aed ("HID: bpf: add HID-BPF hooks for hid_hw_output_report") Fixes: 8bd0488b5ea5 ("HID: bpf: add HID-BPF hooks for hid_hw_raw_requests") Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com>