summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-15selinux: prevent KMSAN warning in selinux_inet_conn_request()Andrew Kanner
KMSAN reports the following issue: [ 81.822503] ===================================================== [ 81.823222] BUG: KMSAN: uninit-value in selinux_inet_conn_request+0x2c8/0x4b0 [ 81.823891] selinux_inet_conn_request+0x2c8/0x4b0 [ 81.824385] security_inet_conn_request+0xc0/0x160 [ 81.824886] tcp_v4_route_req+0x30e/0x490 [ 81.825343] tcp_conn_request+0xdc8/0x3400 [ 81.825813] tcp_v4_conn_request+0x134/0x190 [ 81.826292] tcp_rcv_state_process+0x1f4/0x3b40 [ 81.826797] tcp_v4_do_rcv+0x9ca/0xc30 [ 81.827236] tcp_v4_rcv+0x3bf5/0x4180 [ 81.827670] ip_protocol_deliver_rcu+0x822/0x1230 [ 81.828174] ip_local_deliver_finish+0x259/0x370 [ 81.828667] ip_local_deliver+0x1c0/0x450 [ 81.829105] ip_sublist_rcv+0xdc1/0xf50 [ 81.829534] ip_list_rcv+0x72e/0x790 [ 81.829941] __netif_receive_skb_list_core+0x10d5/0x1180 [ 81.830499] netif_receive_skb_list_internal+0xc41/0x1190 [ 81.831064] napi_complete_done+0x2c4/0x8b0 [ 81.831532] e1000_clean+0x12bf/0x4d90 [ 81.831983] __napi_poll+0xa6/0x760 [ 81.832391] net_rx_action+0x84c/0x1550 [ 81.832831] __do_softirq+0x272/0xa6c [ 81.833239] __irq_exit_rcu+0xb7/0x1a0 [ 81.833654] irq_exit_rcu+0x17/0x40 [ 81.834044] common_interrupt+0x8d/0xa0 [ 81.834494] asm_common_interrupt+0x2b/0x40 [ 81.834949] default_idle+0x17/0x20 [ 81.835356] arch_cpu_idle+0xd/0x20 [ 81.835766] default_idle_call+0x43/0x70 [ 81.836210] do_idle+0x258/0x800 [ 81.836581] cpu_startup_entry+0x26/0x30 [ 81.837002] __pfx_ap_starting+0x0/0x10 [ 81.837444] secondary_startup_64_no_verify+0x17a/0x17b [ 81.837979] [ 81.838166] Local variable nlbl_type.i created at: [ 81.838596] selinux_inet_conn_request+0xe3/0x4b0 [ 81.839078] security_inet_conn_request+0xc0/0x160 KMSAN warning is reproducible with: * netlabel_mgmt_protocount is 0 (e.g. netlbl_enabled() returns 0) * CONFIG_SECURITY_NETWORK_XFRM may be set or not * CONFIG_KMSAN=y * `ssh USER@HOSTNAME /bin/date` selinux_skb_peerlbl_sid() will call selinux_xfrm_skb_sid(), then fall to selinux_netlbl_skbuff_getsid() which will not initialize nlbl_type, but it will be passed to: err = security_net_peersid_resolve(nlbl_sid, nlbl_type, xfrm_sid, sid); and checked by KMSAN, although it will not be used inside security_net_peersid_resolve() (at least now), since this function will check either (xfrm_sid == SECSID_NULL) or (nlbl_sid == SECSID_NULL) first and return before using uninitialized nlbl_type. Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> [PM: subject line tweak, removed 'fixes' tag as code is not broken] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: use unsigned iterator in nlmsgtab codeChristian Göttsche
Use an unsigned type as loop iterator. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: avoid implicit conversions in policydb codeChristian Göttsche
Use the identical type for local variables, e.g. loop counters. Declare members of struct policydb_compat_info unsigned to consistently use unsigned iterators. They hold read-only non-negative numbers in the global variable policydb_compat. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: avoid implicit conversions in selinuxfs codeChristian Göttsche
Use umode_t as parameter type for sel_make_inode(), which assigns the value to the member i_mode of struct inode. Use identical and unsigned types for loop iterators. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: make left shifts well definedChristian Göttsche
The loops upper bound represent the number of permissions used (for the current class or in general). The limit for this is 32, thus we might left shift of one less, 31. Shifting a base of 1 results in undefined behavior; use (u32)1 as base. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: update type for number of class permissions in services codeChristian Göttsche
Security classes have only up to 32 permissions, hence using an u16 is sufficient (while improving padding in struct selinux_mapping). Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: avoid implicit conversions in avtab codeChristian Göttsche
Return u32 from avtab_hash() instead of int, since the hashing is done on u32 and the result is used as an index on the hash array. Use the type of the limit in for loops. Avoid signed to unsigned conversion of multiplication result in avtab_hash_eval() and perform multiplication in destination type. Use unsigned loop iterator for index operations, to avoid sign extension. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-09selinux: revert SECINITSID_INIT supportPaul Moore
This commit reverts 5b0eea835d4e ("selinux: introduce an initial SID for early boot processes") as it was found to cause problems on distros with old SELinux userspace tools/libraries, specifically Ubuntu 16.04. Hopefully we will be able to re-add this functionality at a later date, but let's revert this for now to help ensure a stable and backwards compatible SELinux tree. Link: https://lore.kernel.org/selinux/87edkseqf8.fsf@mail.lhotse Acked-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-08selinux: use GFP_KERNEL while reading binary policyChristian Göttsche
Use GFP_KERNEL instead of GFP_ATOMIC while reading a binary policy in sens_read() and cat_read(), similar to surrounding code. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-08selinux: update comment on selinux_hooks[]Xiu Jianfeng
After commit f22f9aaf6c3d ("selinux: remove the runtime disable functionality"), the comment on selinux_hooks[] is out-of-date, remove the last paragraph about runtime disable functionality. Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-03selinux: avoid implicit conversions in services codeChristian Göttsche
Use u32 as the output parameter type in security_get_classes() and security_get_permissions(), based on the type of the symtab nprim member. Declare the read-only class string parameter of security_get_permissions() const. Avoid several implicit conversions by using the identical type for the destination. Use the type identical to the source for local variables. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: cleanup extra whitespace in subject] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-03selinux: avoid implicit conversions in mls codeChristian Göttsche
Use u32 for ebitmap bits and sensitivity levels, char for the default range of a class. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: description tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-08-03selinux: use identical iterator type in hashtab_duplicate()Christian Göttsche
Use the identical type u32 for the loop iterator. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: remove extra whitespace in subject] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-28selinux: move debug functions into debug configurationChristian Göttsche
avtab_hash_eval() and hashtab_stat() are only used in policydb.c when the configuration SECURITY_SELINUX_DEBUG is enabled. Move the function definitions under that configuration as well and provide empty definitions in case SECURITY_SELINUX_DEBUG is disabled, to avoid using #ifdef in the callers. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-28selinux: log about VM being executable by defaultChristian Göttsche
In case virtual memory is being marked as executable by default, SELinux checks regarding explicit potential dangerous use are disabled. Inform the user about it. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-20selinux: fix a 0/NULL mistmatch in ad_net_init_from_iif()Paul Moore
Use a NULL instead of a zero to resolve a int/pointer mismatch. Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307210332.4AqFZfzI-lkp@intel.com/ Fixes: dd51fcd42fd6 ("selinux: introduce and use lsm_ad_net_init*() helpers") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-20selinux: introduce SECURITY_SELINUX_DEBUG configurationChristian Göttsche
The policy database code contains several debug output statements related to hashtable utilization. Those are guarded by the macro DEBUG_HASHES, which is neither documented nor set anywhere. Introduce a new Kconfig configuration guarding this and potential other future debugging related code. Disable the setting by default. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: fixed line lengths in the help text] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: introduce and use lsm_ad_net_init*() helpersPaolo Abeni
Perf traces of network-related workload shows a measurable overhead inside the network-related selinux hooks while zeroing the lsm_network_audit struct. In most cases we can delay the initialization of such structure to the usage point, avoiding such overhead in a few cases. Additionally, the audit code accesses the IP address information only for AF_INET* families, and selinux_parse_skb() will fill-out the relevant fields in such cases. When the family field is zeroed or the initialization is followed by the mentioned parsing, the zeroing can be limited to the sk, family and netif fields. By factoring out the audit-data initialization to new helpers, this patch removes some duplicate code and gives small but measurable performance gain under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: update my email addressStephen Smalley
Update my email address; MAINTAINERS was updated some time ago. Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: add missing newlines in pr_err() statementsChristian Göttsche
The kernel print statements do not append an implicit newline to format strings. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: drop avtab_search()Christian Göttsche
avtab_search() shares the same logic with avtab_search_node(), except that it returns, if found, a pointer to the struct avtab_node member datum instead of the node itself. Since the member is an embedded struct, and not a pointer, the returned value of avtab_search() and avtab_search_node() will always in unison either be NULL or non-NULL. Drop avtab_search() and replace its calls by avtab_search_node() to deduplicate logic and adopt the only caller caring for the type of the returned value accordingly. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: de-brand SELinuxStephen Smalley
Change "NSA SELinux" to just "SELinux" in Kconfig help text and comments. While NSA was the original primary developer and continues to help maintain SELinux, SELinux has long since transitioned to a wide community of developers and maintainers. SELinux has been part of the mainline Linux kernel for nearly 20 years now [1] and has received contributions from many individuals and organizations. [1] https://lore.kernel.org/lkml/Pine.LNX.4.44.0308082228470.1852-100000@home.osdl.org/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions regarding enforcing statusChristian Göttsche
Use the type bool as parameter type in selinux_status_update_setenforce(). The related function enforcing_enabled() returns the type bool, while the struct selinux_kernel_status member enforcing uses an u32. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: fix implicit conversions in the symtabChristian Göttsche
hashtab_init() takes an u32 as size parameter type. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: use consistent type for AV rule specifierChristian Göttsche
The specifier for avtab keys is always supplied with a type of u16, either as a macro to security_compute_sid() or the member specified of the struct avtab_key. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions in the LSM hooksChristian Göttsche
Use the identical types in assignments of local variables for the destination. Merge tail calls into return statements. Avoid using leading underscores for function local variable. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions in the AVC codeChristian Göttsche
Use a consistent type of u32 for sequence numbers. Use a non-negative and input parameter matching type for the hash result. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions in the netif codeChristian Göttsche
Use the identical type sel_netif_hashfn() returns. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: consistently use u32 as sequence number type in the status codeChristian Göttsche
Align the type with the one used in selinux_notify_policy_change() and the sequence member of struct selinux_kernel_status. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid avtab overflowsChristian Göttsche
Prevent inserting more than the supported U32_MAX number of entries. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: check for multiplication overflow in put_entry()Christian Göttsche
The function is always inlined and most of the time both relevant arguments are compile time constants, allowing compilers to elide the check. Also the function is part of outputting the policy, which is not performance critical. Also convert the type of the third parameter into a size_t, since it should always be a non-negative number of elements. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-10selinux: introduce an initial SID for early boot processesOndrej Mosnacek
Currently, SELinux doesn't allow distinguishing between kernel threads and userspace processes that are started before the policy is first loaded - both get the label corresponding to the kernel SID. The only way a process that persists from early boot can get a meaningful label is by doing a voluntary dyntransition or re-executing itself. Reusing the kernel label for userspace processes is problematic for several reasons: 1. The kernel is considered to be a privileged domain and generally needs to have a wide range of permissions allowed to work correctly, which prevents the policy writer from effectively hardening against early boot processes that might remain running unintentionally after the policy is loaded (they represent a potential extra attack surface that should be mitigated). 2. Despite the kernel being treated as a privileged domain, the policy writer may want to impose certain special limitations on kernel threads that may conflict with the requirements of intentional early boot processes. For example, it is a good hardening practice to limit what executables the kernel can execute as usermode helpers and to confine the resulting usermode helper processes. However, a (legitimate) process surviving from early boot may need to execute a different set of executables. 3. As currently implemented, overlayfs remembers the security context of the process that created an overlayfs mount and uses it to bound subsequent operations on files using this context. If an overlayfs mount is created before the SELinux policy is loaded, these "mounter" checks are made against the kernel context, which may clash with restrictions on the kernel domain (see 2.). To resolve this, introduce a new initial SID (reusing the slot of the former "init" initial SID) that will be assigned to any userspace process started before the policy is first loaded. This is easy to do, as we can simply label any process that goes through the bprm_creds_for_exec LSM hook with the new init-SID instead of propagating the kernel SID from the parent. To provide backwards compatibility for existing policies that are unaware of this new semantic of the "init" initial SID, introduce a new policy capability "userspace_initial_context" and set the "init" SID to the same context as the "kernel" SID unless this capability is set by the policy. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-10selinux: cleanup the policycap accessor functionsPaul Moore
In the process of reverting back to directly accessing the global selinux_state pointer we left behind some artifacts in the selinux_policycap_XXX() helper functions. This patch cleans up some of that left-behind cruft. Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-09Linux 6.5-rc1v6.5-rc1Linus Torvalds
2023-07-09MAINTAINERS 2: Electric BoogalooLinus Torvalds
We just sorted the entries and fields last release, so just out of a perverse sense of curiosity, I decided to see if we can keep things ordered for even just one release. The answer is "No. No we cannot". I suggest that all kernel developers will need weekly training sessions, involving a lot of Big Bird and Sesame Street. And at the yearly maintainer summit, we will all sing the alphabet song together. I doubt I will keep doing this. At some point "perverse sense of curiosity" turns into just a cold dark place filled with sadness and despair. Repeats: 80e62bc8487b ("MAINTAINERS: re-sort all entries and fields") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-09Merge tag 'dma-mapping-6.5-2023-07-09' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - swiotlb area sizing fixes (Petr Tesarik) * tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: reduce the number of areas to match actual memory pool size swiotlb: always set the number of areas before allocating the pool
2023-07-09Merge tag 'irq_urgent_for_v6.5_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq update from Borislav Petkov: - Optimize IRQ domain's name assignment * tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqdomain: Use return value of strreplace()
2023-07-09Merge tag 'x86_urgent_for_v6.5_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu fix from Borislav Petkov: - Do FPU AP initialization on Xen PV too which got missed by the recent boot reordering work * tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/xen: Fix secondary processors' FPU initialization
2023-07-09Merge tag 'x86-core-2023-07-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A single fix for the mechanism to park CPUs with an INIT IPI. On shutdown or kexec, the kernel tries to park the non-boot CPUs with an INIT IPI. But the same code path is also used by the crash utility. If the CPU which panics is not the boot CPU then it sends an INIT IPI to the boot CPU which resets the machine. Prevent this by validating that the CPU which runs the stop mechanism is the boot CPU. If not, leave the other CPUs in HLT" * tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/smp: Don't send INIT to boot CPU
2023-07-09Merge tag 'mips_6.5_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fixes for KVM - fix for loongson build and cpu probing - DT fixes * tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled MIPS: dts: add missing space before { MIPS: Loongson: Fix build error when make modules_install MIPS: KVM: Fix NULL pointer dereference MIPS: Loongson: Fix cpu_probe_loongson() again
2023-07-09Merge tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Darrick Wong: "Nothing exciting here, just getting rid of a gcc warning that I got tired of seeing when I turn on gcov" * tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix uninit warning in xfs_growfs_data
2023-07-09Merge tag '6.5-rc-smb3-client-fixes-part2' of ↵Linus Torvalds
git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - fix potential use after free in unmount - minor cleanup - add worker to cleanup stale directory leases * tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: Add a laundromat thread for cached directories smb: client: remove redundant pointer 'server' cifs: fix session state transition to avoid use-after-free issue
2023-07-09Merge tag 'ntb-6.5' of https://github.com/jonmason/ntbLinus Torvalds
Pull NTB updates from Jon Mason: "Fixes for pci_clean_master, error handling in driver inits, and various other issues/bugs" * tag 'ntb-6.5' of https://github.com/jonmason/ntb: ntb: hw: amd: Fix debugfs_create_dir error checking ntb.rst: Fix copy and paste error ntb_netdev: Fix module_init problem ntb: intel: Remove redundant pci_clear_master ntb: epf: Remove redundant pci_clear_master ntb_hw_amd: Remove redundant pci_clear_master ntb: idt: drop redundant pci_enable_pcie_error_reporting() MAINTAINERS: git://github -> https://github.com for jonmason NTB: EPF: fix possible memory leak in pci_vntb_probe() NTB: ntb_tool: Add check for devm_kcalloc NTB: ntb_transport: fix possible memory leak while device_register() fails ntb: intel: Fix error handling in intel_ntb_pci_driver_init() NTB: amd: Fix error handling in amd_ntb_pci_driver_init() ntb: idt: Fix error handling in idt_pci_driver_init()
2023-07-08mm: lock newly mapped VMA with corrected orderingHugh Dickins
Lockdep is certainly right to complain about (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f but task is already holding lock: (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db Invert those to the usual ordering. Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible") Cc: stable@vger.kernel.org Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "16 hotfixes. Six are cc:stable and the remainder address post-6.4 issues" The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since it was all hopefully fixed in mainline. * tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: lib: dhry: fix sleeping allocations inside non-preemptable section kasan, slub: fix HW_TAGS zeroing with slub_debug kasan: fix type cast in memory_is_poisoned_n mailmap: add entries for Heiko Stuebner mailmap: update manpage link bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page MAINTAINERS: add linux-next info mailmap: add Markus Schneider-Pargmann writeback: account the number of pages written back mm: call arch_swap_restore() from do_swap_page() squashfs: fix cache race with migration mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison docs: update ocfs2-devel mailing list address MAINTAINERS: update ocfs2-devel mailing list address mm: disable CONFIG_PER_VMA_LOCK until its fixed fork: lock VMAs of the parent process when forking
2023-07-08fork: lock VMAs of the parent process when forkingSuren Baghdasaryan
When forking a child process, the parent write-protects anonymous pages and COW-shares them with the child being forked using copy_present_pte(). We must not take any concurrent page faults on the source vma's as they are being processed, as we expect both the vma and the pte's behind it to be stable. For example, the anon_vma_fork() expects the parents vma->anon_vma to not change during the vma copy. A concurrent page fault on a page newly marked read-only by the page copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the source vma, defeating the anon_vma_clone() that wasn't done because the parent vma originally didn't have an anon_vma, but we now might end up copying a pte entry for a page that has one. Before the per-vma lock based changes, the mmap_lock guaranteed exclusion with concurrent page faults. But now we need to do a vma_start_write() to make sure no concurrent faults happen on this vma while it is being processed. This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~5% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic. Suggested-by: David Hildenbrand <david@redhat.com> Reported-by: Jiri Slaby <jirislaby@kernel.org> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/ Reported-by: Jacob Young <jacobly.alt@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624 Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08mm: lock newly mapped VMA which can be modified after it becomes visibleSuren Baghdasaryan
mmap_region adds a newly created VMA into VMA tree and might modify it afterwards before dropping the mmap_lock. This poses a problem for page faults handled under per-VMA locks because they don't take the mmap_lock and can stumble on this VMA while it's still being modified. Currently this does not pose a problem since post-addition modifications are done only for file-backed VMAs, which are not handled under per-VMA lock. However, once support for handling file-backed page faults with per-VMA locks is added, this will become a race. Fix this by write-locking the VMA before inserting it into the VMA tree. Other places where a new VMA is added into VMA tree do not modify it after the insertion, so do not need the same locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08mm: lock a vma before stack expansionSuren Baghdasaryan
With recent changes necessitating mmap_lock to be held for write while expanding a stack, per-VMA locks should follow the same rules and be write-locked to prevent page faults into the VMA being expanded. Add the necessary locking. Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "A few late arriving patches that missed the initial pull request. It's mostly bug fixes (the dt-bindings is a fix for the initial pull)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Remove unused function declaration scsi: target: docs: Remove tcm_mod_builder.py scsi: target: iblock: Quiet bool conversion warning with pr_preempt use scsi: dt-bindings: ufs: qcom: Fix ICE phandle scsi: core: Simplify scsi_cdl_check_cmd() scsi: isci: Fix comment typo scsi: smartpqi: Replace one-element arrays with flexible-array members scsi: target: tcmu: Replace strlcpy() with strscpy() scsi: ncr53c8xx: Replace strlcpy() with strscpy() scsi: lpfc: Fix lpfc_name struct packing
2023-07-08Merge tag 'i2c-for-6.5-rc1-part2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull more i2c updates from Wolfram Sang: - xiic patch should have been in the original pull but slipped through - mpc patch fixes a build regression - nomadik cleanup * tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mpc: Drop unused variable i2c: nomadik: Remove a useless call in the remove function i2c: xiic: Don't try to handle more interrupt events after error