summaryrefslogtreecommitdiff
path: root/security/selinux
AgeCommit message (Collapse)Author
3 daysMerge tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull d_name audit update from Al Viro: "Simplifying ->d_name audits, easy part. Turn dentry->d_name into an anon union of const struct qsrt (d_name itself) and a writable alias (__d_name). With constification of some struct qstr * arguments of functions that get &dentry->d_name passed to them, that ends up with all modifications provably done only in fs/dcache.c (and a fairly small part of it). Any new places doing modifications will be easy to find - grep for __d_name will suffice" * tag 'pull-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make it easier to catch those who try to modify ->d_name generic_ci_validate_strict_name(): constify name argument afs_dir_search: constify qstr argument afs_edit_dir_{add,remove}(): constify qstr argument exfat_find(): constify qstr argument security_dentry_init_security(): constify qstr argument
6 daysMerge tag 'lsm-pr-20250926' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Move the management of the LSM BPF security blobs into the framework In order to enable multiple LSMs we need to allocate and free the various security blobs in the LSM framework and not the individual LSMs as they would end up stepping all over each other. - Leverage the lsm_bdev_alloc() helper in lsm_bdev_alloc() Make better use of our existing helper functions to reduce some code duplication. - Update the Rust cred code to use 'sync::aref' Part of a larger effort to move the Rust code over to the 'sync' module. - Make CONFIG_LSM dependent on CONFIG_SECURITY As the CONFIG_LSM Kconfig setting is an ordered list of the LSMs to enable a boot, it obviously doesn't make much sense to enable this when CONFIG_SECURITY is disabled. - Update the LSM and CREDENTIALS sections in MAINTAINERS with Rusty bits Add the Rust helper files to the associated LSM and CREDENTIALS entries int the MAINTAINERS file. We're trying to improve the communication between the two groups and making sure we're all aware of what is going on via cross-posting to the relevant lists is a good way to start. * tag 'lsm-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: CONFIG_LSM can depend on CONFIG_SECURITY MAINTAINERS: add the associated Rust helper to the CREDENTIALS section MAINTAINERS: add the associated Rust helper to the LSM section rust,cred: update AlwaysRefCounted import to sync::aref security: use umax() to improve code lsm,selinux: Add LSM blob support for BPF objects lsm: use lsm_blob_alloc() in lsm_bdev_alloc()
6 daysMerge tag 'selinux-pr-20250926' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Support per-file labeling for functionfs Both genfscon and user defined labeling methods are supported. This should help users who want to provide separation between the control endpoint file, "ep0", and other endpoints. - Remove our use of get_zeroed_page() in sel_read_bool() Update sel_read_bool() to use a four byte stack buffer instead of a memory page fetched via get_zeroed_page(), and fix a memory in the process. Needless to say we should have done this a long time ago, but it was in a very old chunk of code that "just worked" and I don't think anyone had taken a real look at it in many years. - Better use of the netdev skb/sock helper functions Convert a sk_to_full_sk(skb->sk) into a skb_to_full_sk(skb) call. - Remove some old, dead, and/or redundant code * tag 'selinux-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: enable per-file labeling for functionfs selinux: fix sel_read_bool() allocation and error handling selinux: Remove redundant __GFP_NOWARN selinux: use a consistent method to get full socket from skb selinux: Remove unused function selinux_policycap_netif_wildcard()
6 daysMerge tag 'audit-pr-20250926' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: - Proper audit support for multiple LSMs As the audit subsystem predated the work to enable multiple LSMs, some additional work was needed to support logging the different LSM labels for the subjects/tasks and objects on the system. Casey's patches add new auxillary records for subjects and objects that convey the additional labels. - Ensure fanotify audit events are always generated Generally speaking security relevant subsystems always generate audit events, unless explicitly ignored. However, up to this point fanotify events had been ignored by default, but starting with this pull request fanotify follows convention and generates audit events by default. - Replace an instance of strcpy() with strscpy() - Minor indentation, style, and comment fixes * tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix skb leak when audit rate limit is exceeded audit: init ab->skb_list earlier in audit_buffer_alloc() audit: add record for multiple object contexts audit: add record for multiple task security contexts lsm: security_lsmblob_to_secctx module selection audit: create audit_stamp structure audit: add a missing tab audit: record fanotify event regardless of presence of rules audit: fix typo in auditfilter.c comment audit: Replace deprecated strcpy() with strscpy() audit: fix indentation in audit_log_exit()
2025-09-15security_dentry_init_security(): constify qstr argumentAl Viro
Nothing outside of fs/dcache.c has any business modifying dentry names; passing &dentry->d_name as an argument should have that argument declared as a const pointer. Acked-by: Casey Schaufler <casey@schaufler-ca.com> # smack part Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-09-07selinux: enable per-file labeling for functionfsNeill Kapron
This patch adds support for genfscon per-file labeling of functionfs files as well as support for userspace to apply labels after new functionfs endpoints are created. This allows for separate labels and therefore access control on a per-endpoint basis. An example use case would be for the default endpoint EP0 used as a restricted control endpoint, and additional usb endpoints to be used by other more permissive domains. It should be noted that if there are multiple functionfs mounts on a system, genfs file labels will apply to all mounts, and therefore will not likely be as useful as the userspace relabeling portion of this patch - the addition to selinux_is_genfs_special_handling(). This patch introduces the functionfs_seclabel policycap to maintain existing functionfs genfscon behavior unless explicitly enabled. Signed-off-by: Neill Kapron <nkapron@google.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: trim changelog, apply boolean logic fixup] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-03selinux: fix sel_read_bool() allocation and error handlingStephen Smalley
Switch sel_read_bool() from using get_zeroed_page() and free_page() to a stack-allocated buffer. This also fixes a memory leak in the error path when security_get_bool_value() returns an error. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-01copy_process: pass clone_flags as u64 across calltreeSimon Schuster
With the introduction of clone3 in commit 7f192e3cd316 ("fork: add clone3") the effective bit width of clone_flags on all architectures was increased from 32-bit to 64-bit, with a new type of u64 for the flags. However, for most consumers of clone_flags the interface was not changed from the previous type of unsigned long. While this works fine as long as none of the new 64-bit flag bits (CLONE_CLEAR_SIGHAND and CLONE_INTO_CGROUP) are evaluated, this is still undesirable in terms of the principle of least surprise. Thus, this commit fixes all relevant interfaces of callees to sys_clone3/copy_process (excluding the architecture-specific copy_thread) to consistently pass clone_flags as u64, so that no truncation to 32-bit integers occurs on 32-bit architectures. Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Link: https://lore.kernel.org/20250901-nios2-implement-clone3-v2-2-53fcf5577d57@siemens-energy.com Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-08-30audit: add record for multiple object contextsCasey Schaufler
Create a new audit record AUDIT_MAC_OBJ_CONTEXTS. An example of the MAC_OBJ_CONTEXTS record is: type=MAC_OBJ_CONTEXTS msg=audit(1601152467.009:1050): obj_selinux=unconfined_u:object_r:user_home_t:s0 When an audit event includes a AUDIT_MAC_OBJ_CONTEXTS record the "obj=" field in other records in the event will be "obj=?". An AUDIT_MAC_OBJ_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on an object security context. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-30audit: add record for multiple task security contextsCasey Schaufler
Replace the single skb pointer in an audit_buffer with a list of skb pointers. Add the audit_stamp information to the audit_buffer as there's no guarantee that there will be an audit_context containing the stamp associated with the event. At audit_log_end() time create auxiliary records as have been added to the list. Functions are created to manage the skb list in the audit_buffer. Create a new audit record AUDIT_MAC_TASK_CONTEXTS. An example of the MAC_TASK_CONTEXTS record is: type=MAC_TASK_CONTEXTS msg=audit(1600880931.832:113) subj_apparmor=unconfined subj_smack=_ When an audit event includes a AUDIT_MAC_TASK_CONTEXTS record the "subj=" field in other records in the event will be "subj=?". An AUDIT_MAC_TASK_CONTEXTS record is supplied when the system has multiple security modules that may make access decisions based on a subject security context. Refactor audit_log_task_context(), creating a new audit_log_subj_ctx(). This is used in netlabel auditing to provide multiple subject security contexts as necessary. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> [PM: subj tweak, audit example readability indents] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-12selinux: Remove redundant __GFP_NOWARNQianfeng Rong
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: fixed horizontal spacing / alignment, line wraps] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11lsm,selinux: Add LSM blob support for BPF objectsBlaise Boscaccy
This patch introduces LSM blob support for BPF maps, programs, and tokens to enable LSM stacking and multiplexing of LSM modules that govern BPF objects. Additionally, the existing BPF hooks used by SELinux have been updated to utilize the new blob infrastructure, removing the assumption of exclusive ownership of the security pointer. Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com> [PM: dropped local variable init, style fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11selinux: use a consistent method to get full socket from skbTianjia Zhang
In order to maintain code consistency and readability, skb_to_full_sk() is used to get full socket from skb. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11selinux: Remove unused function selinux_policycap_netif_wildcard()Yue Haibing
This is unused since commit a3d3043ef24a ("selinux: get netif_wildcard policycap from policy instead of cache"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-07-28Merge tag 'selinux-pr-20250725' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Introduce the concept of a SELinux "neveraudit" type which prevents all auditing of the given type/domain. Taken by itself, the benefit of marking a SELinux domain with the "neveraudit" tag is likely not very interesting, especially given the significant overlap with the "dontaudit" tag. However, given that the "neveraudit" tag applies to *all* auditing of the tagged domain, we can do some fairly interesting optimizations when a SELinux domain is marked as both "permissive" and "dontaudit" (think of the unconfined_t domain). While this pull request includes optimized inode permission and getattr hooks, these optimizations require SELinux policy changes, therefore the improvements may not be visible on standard downstream Linux distos for a period of time. - Continue the deprecation process of /sys/fs/selinux/user. After removing the associated userspace code in 2020, we marked the /sys/fs/selinux/user interface as deprecated in Linux v6.13 with pr_warn() and the usual documention update. This adds a five second sleep after the pr_warn(), following a previous deprecation process pattern that has worked well for us in the past in helping identify any existing users that we haven't yet reached. - Add a __GFP_NOWARN flag to our initial hash table allocation. Fuzzers such a syzbot often attempt abnormally large SELinux policy loads, which the SELinux code gracefully handles by checking for allocation failures, but not before the allocator emits a warning which causes the automated fuzzing to flag this as an error and report it to the list. While we want to continue to support the work done by the fuzzing teams, we want to focus on proper issues and not an error case that is already handled safely. Add a NOWARN flag to quiet the allocator and prevent syzbot from tripping on this again. - Remove some unnecessary selinuxfs cleanup code, courtesy of Al. - Update the SELinux in-kernel documentation with pointers to additional information. * tag 'selinux-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: don't bother with selinuxfs_info_free() on failures selinux: add __GFP_NOWARN to hashtab_init() allocations selinux: optimize selinux_inode_getattr/permission() based on neveraudit|permissive selinux: introduce neveraudit types documentation: add links to SELinux resources selinux: add a 5 second sleep to /sys/fs/selinux/user
2025-07-28Merge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-01selinux: implement inode_file_[g|s]etattr hooksAndrey Albershteyn
These hooks are called on inode extended attribute retrieval/change. Cc: selinux@vger.kernel.org Cc: Paul Moore <paul@paul-moore.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Andrey Albershteyn <aalbersh@kernel.org> Link: https://lore.kernel.org/20250630-xattrat-syscall-v6-3-c4e3bc35227b@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-24selinux: don't bother with selinuxfs_info_free() on failuresAl Viro
Failures in sel_fill_super() will be followed by sel_kill_sb(), which will call selinuxfs_info_free() anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Christian Brauner <brauner@kernel.org> [PM: subj and description tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19selinux: add __GFP_NOWARN to hashtab_init() allocationsPaul Moore
As reported by syzbot, hashtab_init() can be affected by abnormally large policy loads which would cause the kernel's allocator to emit a warning in some configurations. Since the SELinux hashtab_init() code handles the case where the allocation fails, due to a large request or some other reason, we can safely add the __GFP_NOWARN flag to squelch these abnormally large allocation warnings. Reported-by: syzbot+bc2c99c2929c3d219fb3@syzkaller.appspotmail.com Tested-by: syzbot+bc2c99c2929c3d219fb3@syzkaller.appspotmail.com Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19selinux: optimize selinux_inode_getattr/permission() based on ↵Stephen Smalley
neveraudit|permissive Extend the task avdcache to also cache whether the task SID is both permissive and neveraudit, and return immediately if so in both selinux_inode_getattr() and selinux_inode_permission(). The same approach could be applied to many of the hook functions although the avdcache would need to be updated for more than directory search checks in order for this optimization to be beneficial for checks on objects other than directories. To test, apply https://github.com/SELinuxProject/selinux/pull/473 to your selinux userspace, build and install libsepol, and use the following CIL policy module: $ cat neverauditpermissive.cil (typeneveraudit unconfined_t) (typepermissive unconfined_t) Without this module inserted, running the following commands: perf record make -jN # on an already built allmodconfig tree perf report --sort=symbol,dso yields the following percentages (only showing __d_lookup_rcu for reference and only showing relevant SELinux functions): 1.65% [k] __d_lookup_rcu 0.53% [k] selinux_inode_permission 0.40% [k] selinux_inode_getattr 0.15% [k] avc_lookup 0.05% [k] avc_has_perm 0.05% [k] avc_has_perm_noaudit 0.02% [k] avc_policy_seqno 0.02% [k] selinux_file_permission 0.01% [k] selinux_inode_alloc_security 0.01% [k] selinux_file_alloc_security for a total of 1.24% for SELinux compared to 1.65% for __d_lookup_rcu(). After running the following command to insert this module: semodule -i neverauditpermissive.cil and then re-running the same perf commands from above yields the following non-zero percentages: 1.74% [k] __d_lookup_rcu 0.31% [k] selinux_inode_permission 0.03% [k] selinux_inode_getattr 0.03% [k] avc_policy_seqno 0.01% [k] avc_lookup 0.01% [k] selinux_file_permission 0.01% [k] selinux_file_open for a total of 0.40% for SELinux compared to 1.74% for __d_lookup_rcu(). Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19selinux: introduce neveraudit typesStephen Smalley
Introduce neveraudit types i.e. types that should never trigger audit messages. This allows the AVC to skip all audit-related processing for such types. Note that neveraudit differs from dontaudit not only wrt being applied for all checks with a given source type but also in that it disables all auditing, not just permission denials. When a type is both a permissive type and a neveraudit type, the security server can short-circuit the security_compute_av() logic, allowing all permissions and not auditing any permissions. This change just introduces the basic support but does not yet further optimize the AVC or hook function logic when a type is both a permissive type and a dontaudit type. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19selinux: change security_compute_sid to return the ssid or tsid on matchStephen Smalley
If the end result of a security_compute_sid() computation matches the ssid or tsid, return that SID rather than looking it up again. This avoids the problem of multiple initial SIDs that map to the same context. Cc: stable@vger.kernel.org Reported-by: Guido Trentalancia <guido@trentalancia.com> Fixes: ae254858ce07 ("selinux: introduce an initial SID for early boot processes") Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Guido Trentalancia <guido@trentalancia.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-16selinux: fix selinux_xfrm_alloc_user() to set correct ctx_lenStephen Smalley
We should count the terminating NUL byte as part of the ctx_len. Otherwise, UBSAN logs a warning: UBSAN: array-index-out-of-bounds in security/selinux/xfrm.c:99:14 index 60 is out of range for type 'char [*]' The allocation itself is correct so there is no actual out of bounds indexing, just a warning. Cc: stable@vger.kernel.org Suggested-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/selinux/CAEjxPJ6tA5+LxsGfOJokzdPeRomBHjKLBVR6zbrg+_w3ZZbM3A@mail.gmail.com/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-16selinux: add a 5 second sleep to /sys/fs/selinux/userPaul Moore
Commit d7b6918e22c7 ("selinux: Deprecate /sys/fs/selinux/user") started the deprecation process for /sys/fs/selinux/user: The selinuxfs "user" node allows userspace to request a list of security contexts that can be reached for a given SELinux user from a given starting context. This was used by libselinux when various login-style programs requested contexts for users, but libselinux stopped using it in 2020. Kernel support will be removed no sooner than Dec 2025. A pr_warn() message has been in place since Linux v6.13, this patch adds a five second sleep to /sys/fs/selinux/user to help make the deprecation and upcoming removal more noticeable. Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-05-28Merge tag 'net-next-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ...
2025-05-28Merge tag 'selinux-pr-20250527' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Reduce the SELinux impact on path walks. Add a small directory access cache to the per-task SELinux state. This cache allows SELinux to cache the most recently used directory access decisions in order to avoid repeatedly querying the AVC on path walks where the majority of the directories have similar security contexts/labels. My performance measurements are crude, but prior to this patch the time spent in SELinux code on a 'make allmodconfig' run was 103% that of __d_lookup_rcu(), and with this patch the time spent in SELinux code dropped to 63% of __d_lookup_rcu(), a ~40% improvement. Additional improvments can be expected in the future, but those will require additional SELinux policy/toolchain support. - Add support for wildcards in genfscon policy statements. This patch allows for wildcards in the genfscon patch matching logic as opposed to the prefix matching that was used prior to this change. Adding wilcard support allows for more expressive and efficient path matching in the policy which is especially helpful for sysfs, and has resulted in a ~15% boot time reduction in Android. SELinux policies can opt into wilcard matching by using the "genfs_seclabel_wildcard" policy capability. - Unify the error/OOM handling of the SELinux network caches. A failure to allocate memory for the SELinux network caches isn't fatal as the object label can still be safely returned to the caller, it simply means that we cannot add the new data to the cache, at least temporarily. This patch corrects this behavior for the InfiniBand cache and does some minor cleanup. - Minor improvements around constification, 'likely' annotations, and removal of bogus comments. * tag 'selinux-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix the kdoc header for task_avdcache_update selinux: remove a duplicated include selinux: reduce path walk overhead selinux: support wildcard match in genfscon selinux: drop copy-paste comment selinux: unify OOM handling in network hashtables selinux: add likely hints for fast paths selinux: contify network namespace pointer selinux: constify network address pointer
2025-04-12selinux: fix the kdoc header for task_avdcache_updatePaul Moore
The kdoc header incorrectly references an older parameter, update it to reference what is currently used in the function. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504122308.Ch8PzJdD-lkp@intel.com/ Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-12selinux: remove a duplicated includePaul Moore
The "linux/parser.h" header was included twice, we only need it once. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504121945.Q0GDD0sG-lkp@intel.com/ Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11net: Retire DCCP socket.Kuniyuki Iwashima
DCCP was orphaned in 2021 by commit 054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"), which noted that the last maintainer had been inactive for five years. In recent years, it has become a playground for syzbot, and most changes to DCCP have been odd bug fixes triggered by syzbot. Apart from that, the only changes have been driven by treewide or networking API updates or adjustments related to TCP. Thus, in 2023, we announced we would remove DCCP in 2025 via commit b144fcaf46d4 ("dccp: Print deprecation notice."). Since then, only one individual has contacted the netdev mailing list. [0] There is ongoing research for Multipath DCCP. The repository is hosted on GitHub [1], and development is not taking place through the upstream community. While the repository is published under the GPLv2 license, the scheduling part remains proprietary, with a LICENSE file [2] stating: "This is not Open Source software." The researcher mentioned a plan to address the licensing issue, upstream the patches, and step up as a maintainer, but there has been no further communication since then. Maintaining DCCP for a decade without any real users has become a burden. Therefore, it's time to remove it. Removing DCCP will also provide significant benefits to TCP. It allows us to freely reorganize the layout of struct inet_connection_sock, which is currently shared with DCCP, and optimize it to reduce the number of cachelines accessed in the TCP fast path. Note that we keep DCCP netfilter modules as requested. [3] Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0] Link: https://github.com/telekom/mp-dccp #[1] Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2] Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11selinux: reduce path walk overheadPaul Moore
Reduce the SELinux performance overhead during path walks through the use of a per-task directory access cache and some minor code optimizations. The directory access cache is per-task because it allows for a lockless cache while also fitting well with a common application pattern of heavily accessing a relatively small number of SELinux directory labels. The cache is inherited by child processes when the child runs with the same SELinux domain as the parent, and invalidated on changes to the task's SELinux domain or the loaded SELinux policy. A cache of four entries was chosen based on testing with the Fedora "targeted" policy, a SELinux Reference Policy variant, and 'make allmodconfig' on Linux v6.14. Code optimizations include better use of inline functions to reduce function calls in the common case, especially in the inode revalidation code paths, and elimination of redundant checks between the LSM and SELinux layers. As mentioned briefly above, aside from general use and regression testing with the selinux-testsuite, performance was measured using 'make allmodconfig' with Linux v6.14 as a base reference. As expected, there were variations from one test run to another, but the measurements below are a good representation of the test results seen on my test system. * Linux v6.14 REF 1.26% [k] __d_lookup_rcu SELINUX (1.31%) 0.58% [k] selinux_inode_permission 0.29% [k] avc_lookup 0.25% [k] avc_has_perm_noaudit 0.19% [k] __inode_security_revalidate * Linux v6.14 + patch REF 1.41% [k] __d_lookup_rcu SELINUX (0.89%) 0.65% [k] selinux_inode_permission 0.15% [k] avc_lookup 0.05% [k] avc_has_perm_noaudit 0.04% [k] avc_policy_seqno X.XX% [k] __inode_security_revalidate (now inline) In both cases the __d_lookup_rcu() function was used as a reference point to establish a context for the SELinux related functions. On a unpatched Linux v6.14 system we see the time spent in the combined SELinux functions exceeded that of __d_lookup_rcu(), 1.31% compared to 1.26%. However, with this patch applied the time spent in the combined SELinux functions dropped to roughly 65% of the time spent in __d_lookup_rcu(), 0.89% compared to 1.41%. Aside from the significant decrease in time spent in the SELinux AVC, it appears that any additional time spent searching and updating the cache is offset by other code improvements, e.g. time spent in selinux_inode_permission() + __inode_security_revalidate() + avc_policy_seqno() is less on the patched kernel than the unpatched kernel. It is worth noting that in this patch the use of the per-task cache is limited to the security_inode_permission() LSM callback, selinux_inode_permission(), but future work could expand the cache into inode_has_perm(), likely through consolidation of the two functions. While this would likely have little to no impact on path walks, it may benefit other operations. Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: support wildcard match in genfsconTakaya Saeki
Currently, genfscon only supports string prefix match to label files. Thus, labeling numerous dynamic sysfs entries requires many specific path rules. For example, labeling device paths such as `/sys/devices/pci0000:00/0000:00:03.1/<...>/0000:04:00.1/wakeup` requires listing all specific PCI paths, which is challenging to maintain. While user-space restorecon can handle these paths with regular expression rules, relabeling thousands of paths under sysfs after it is mounted is inefficient compared to using genfscon. This commit adds wildcard matching to genfscon to make rules more efficient and expressive. This new behavior is enabled by genfs_seclabel_wildcard capability. With this capability, genfscon does wildcard matching instead of prefix matching. When multiple wildcard rules match against a path, then the longest rule (determined by the length of the rule string) will be applied. If multiple rules of the same length match, the first matching rule encountered in the given genfscon policy will be applied. Users are encouraged to write longer, more explicit path rules to avoid relying on this behavior. This change resulted in nice real-world performance improvements. For example, boot times on test Android devices were reduced by 15%. This improvement is due to the elimination of the "restorecon -R /sys" step during boot, which takes more than two seconds in the worst case. Signed-off-by: Takaya Saeki <takayas@chromium.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: drop copy-paste commentChristian Göttsche
Port labeling is based on port number and protocol (TCP/UDP/...) but not based on network family (IPv4/IPv6). Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: unify OOM handling in network hashtablesChristian Göttsche
For network objects, like interfaces, nodes, port and InfiniBands, the object to SID lookup is cached in hashtables. OOM during such hashtable additions of new objects is considered non-fatal and the computed SID is simply returned without adding the compute result into the hash table. Actually ignore OOM in the InfiniBand code, despite the comment already suggesting to do so. This reverts commit c350f8bea271 ("selinux: Fix error return code in sel_ib_pkey_sid_slow()"). Add comments in the other places. Use kmalloc() instead of kzalloc(), since all members are initialized on success and the data is only used in internbal hash tables, so no risk of information leakage to userspace. Fixes: c350f8bea271 ("selinux: Fix error return code in sel_ib_pkey_sid_slow()") Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: add likely hints for fast pathsChristian Göttsche
In the network hashtable lookup code add likely() compiler hints in the fast path, like already done in sel_netif_sid(). Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: contify network namespace pointerChristian Göttsche
The network namespace is not modified. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: constify network address pointerChristian Göttsche
The network address, either an IPv4 or IPv6 one, is not modified. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-08Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFSNeilBrown
try_lookup_noperm() and d_hash_and_lookup() are nearly identical. The former does some validation of the name where the latter doesn't. Outside of the VFS that validation is likely valuable, and having only one exported function for this task is certainly a good idea. So make d_hash_and_lookup() local to VFS files and change all other callers to try_lookup_noperm(). Note that the arguments are swapped. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-01Merge tag 'driver-core-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updatesk from Greg KH: "Here is the big set of driver core updates for 6.15-rc1. Lots of stuff happened this development cycle, including: - kernfs scaling changes to make it even faster thanks to rcu - bin_attribute constify work in many subsystems - faux bus minor tweaks for the rust bindings - rust binding updates for driver core, pci, and platform busses, making more functionaliy available to rust drivers. These are all due to people actually trying to use the bindings that were in 6.14. - make Rafael and Danilo full co-maintainers of the driver core codebase - other minor fixes and updates" * tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits) rust: platform: require Send for Driver trait implementers rust: pci: require Send for Driver trait implementers rust: platform: impl Send + Sync for platform::Device rust: pci: impl Send + Sync for pci::Device rust: platform: fix unrestricted &mut platform::Device rust: pci: fix unrestricted &mut pci::Device rust: device: implement device context marker rust: pci: use to_result() in enable_device_mem() MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers rust/kernel/faux: mark Registration methods inline driver core: faux: only create the device if probe() succeeds rust/faux: Add missing parent argument to Registration::new() rust/faux: Drop #[repr(transparent)] from faux::Registration rust: io: fix devres test with new io accessor functions rust: io: rename `io::Io` accessors kernfs: Move dput() outside of the RCU section. efi: rci2: mark bin_attribute as __ro_after_init rapidio: constify 'struct bin_attribute' firmware: qemu_fw_cfg: constify 'struct bin_attribute' powerpc/perf/hv-24x7: Constify 'struct bin_attribute' ...
2025-03-30Merge tag 'bpf-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "For this merge window we're splitting BPF pull request into three for higher visibility: main changes, res_spin_lock, try_alloc_pages. These are the main BPF changes: - Add DFA-based live registers analysis to improve verification of programs with loops (Eduard Zingerman) - Introduce load_acquire and store_release BPF instructions and add x86, arm64 JIT support (Peilin Ye) - Fix loop detection logic in the verifier (Eduard Zingerman) - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet) - Add kfunc for populating cpumask bits (Emil Tsalapatis) - Convert various shell based tests to selftests/bpf/test_progs format (Bastien Curutchet) - Allow passing referenced kptrs into struct_ops callbacks (Amery Hung) - Add a flag to LSM bpf hook to facilitate bpf program signing (Blaise Boscaccy) - Track arena arguments in kfuncs (Ihor Solodrai) - Add copy_remote_vm_str() helper for reading strings from remote VM and bpf_copy_from_user_task_str() kfunc (Jordan Rome) - Add support for timed may_goto instruction (Kumar Kartikeya Dwivedi) - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy) - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup local storage (Martin KaFai Lau) - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko) - Allow retrieving BTF data with BTF token (Mykyta Yatsenko) - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix (Song Liu) - Reject attaching programs to noreturn functions (Yafang Shao) - Introduce pre-order traversal of cgroup bpf programs (Yonghong Song)" * tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits) selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid bpf: Fix out-of-bounds read in check_atomic_load/store() libbpf: Add namespace for errstr making it libbpf_errstr bpf: Add struct_ops context information to struct bpf_prog_aux selftests/bpf: Sanitize pointer prior fclose() selftests/bpf: Migrate test_xdp_vlan.sh into test_progs selftests/bpf: test_xdp_vlan: Rename BPF sections bpf: clarify a misleading verifier error message selftests/bpf: Add selftest for attaching fexit to __noreturn functions bpf: Reject attaching fexit/fmod_ret to __noreturn functions bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage bpf: Make perf_event_read_output accessible in all program types. bpftool: Using the right format specifiers bpftool: Add -Wformat-signedness flag to detect format errors selftests/bpf: Test freplace from user namespace libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID bpf: Return prog btf_id without capable check bpf: BPF token support for BPF_BTF_GET_FD_BY_ID bpf, x86: Fix objtool warning for timed may_goto bpf: Check map->record at the beginning of check_and_free_fields() ...
2025-03-25Merge tag 'selinux-pr-20250323' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add additional SELinux access controls for kernel file reads/loads The SELinux kernel file read/load access controls were never updated beyond the initial kernel module support, this pull request adds support for firmware, kexec, policies, and x.509 certificates. - Add support for wildcards in network interface names There are a number of userspace tools which auto-generate network interface names using some pattern of <XXXX>-<NN> where <XXXX> is a fixed string, e.g. "podman", and <NN> is a increasing counter. Supporting wildcards in the SELinux policy for network interfaces simplifies the policy associted with these interfaces. - Fix a potential problem in the kernel read file SELinux code SELinux should always check the file label in the security_kernel_read_file() LSM hook, regardless of if the file is being read in chunks. Unfortunately, the existing code only considered the file label on the first chunk; this pull request fixes this problem. There is more detail in the individual commit, but thankfully the existing code didn't expose a bug due to multi-stage reads only taking place in one driver, and that driver loading a file type that isn't targeted by the SELinux policy. - Fix the subshell error handling in the example policy loader Minor fix to SELinux example policy loader in scripts/selinux due to an undesired interaction with subshells and errexit. * tag 'selinux-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: get netif_wildcard policycap from policy instead of cache selinux: support wildcard network interface names selinux: Chain up tool resolving errors in install_policy.sh selinux: add permission checks for loading other kinds of kernel files selinux: always check the file label in selinux_kernel_read_file() selinux: fix spelling error
2025-03-25Merge tag 'lsm-pr-20250323' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Various minor updates to the LSM Rust bindings Changes include marking trivial Rust bindings as inlines and comment tweaks to better reflect the LSM hooks. - Add LSM/SELinux access controls to io_uring_allowed() Similar to the io_uring_disabled sysctl, add a LSM hook to io_uring_allowed() to enable LSMs a simple way to enforce security policy on the use of io_uring. This pull request includes SELinux support for this new control using the io_uring/allowed permission. - Remove an unused parameter from the security_perf_event_open() hook The perf_event_attr struct parameter was not used by any currently supported LSMs, remove it from the hook. - Add an explicit MAINTAINERS entry for the credentials code We've seen problems in the past where patches to the credentials code sent by non-maintainers would often languish on the lists for multiple months as there was no one explicitly tasked with the responsibility of reviewing and/or merging credentials related code. Considering that most of the code under security/ has a vested interest in ensuring that the credentials code is well maintained, I'm volunteering to look after the credentials code and Serge Hallyn has also volunteered to step up as an official reviewer. I posted the MAINTAINERS update as a RFC to LKML in hopes that someone else would jump up with an "I'll do it!", but beyond Serge it was all crickets. - Update Stephen Smalley's old email address to prevent confusion This includes a corresponding update to the mailmap file. * tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: mailmap: map Stephen Smalley's old email addresses lsm: remove old email address for Stephen Smalley MAINTAINERS: add Serge Hallyn as a credentials reviewer MAINTAINERS: add an explicit credentials entry cred,rust: mark Credential methods inline lsm,rust: reword "destroy" -> "release" in SecurityCtx lsm,rust: mark SecurityCtx methods inline perf: Remove unnecessary parameter of security check lsm: fix a missing security_uring_allowed() prototype io_uring,lsm,selinux: add LSM hooks for io_uring_setup() io_uring: refactor io_uring_allowed()
2025-03-17selinux: get netif_wildcard policycap from policy instead of cacheChristian Göttsche
Retrieve the netif_wildcard policy capability in security_netif_sid() from the locked active policy instead of the cached value in selinux_state. Fixes: 8af43b61c17e ("selinux: support wildcard network interface names") Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: /netlabel/netif/ due to a typo in the description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-03-15security: Propagate caller information in bpf hooksBlaise Boscaccy
Certain bpf syscall subcommands are available for usage from both userspace and the kernel. LSM modules or eBPF gatekeeper programs may need to take a different course of action depending on whether or not a BPF syscall originated from the kernel or userspace. Additionally, some of the bpf_attr struct fields contain pointers to arbitrary memory. Currently the functionality to determine whether or not a pointer refers to kernel memory or userspace memory is exposed to the bpf verifier, but that information is missing from various LSM hooks. Here we augment the LSM hooks to provide this data, by simply passing a boolean flag indicating whether or not the call originated in the kernel, in any hook that contains a bpf_attr struct that corresponds to a subcommand that may be called from the kernel. Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/r/20250310221737.821889-2-bboscaccy@linux.microsoft.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-03-07selinux: support wildcard network interface namesChristian Göttsche
Add support for wildcard matching of network interface names. This is useful for auto-generated interfaces, for example podman creates network interfaces for containers with the naming scheme podman0, podman1, podman2, ... To maintain backward compatibility guard this feature with a new policy capability 'netif_wildcard'. Netifcon definitions are compared against in the order given by the policy, so userspace tools should sort them in a reasonable order. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-27selinux: add FILE__WATCH_MOUNTNSMiklos Szeredi
Watching mount namespaces for changes (mount, umount, move mount) was added by previous patches. This patch adds the file/watch_mountns permission that can be applied to nsfs files (/proc/$$/ns/mnt), making it possible to allow or deny watching a particular namespace for changes. Suggested-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/all/CAHC9VhTOmCjCSE2H0zwPOmpFopheexVb6jyovz92ZtpKtoVv6A@mail.gmail.com/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250224154836.958915-1-mszeredi@redhat.com Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26selinux: add permission checks for loading other kinds of kernel files"Kipp N. Davis"
Although the LSM hooks for loading kernel modules were later generalized to cover loading other kinds of files, SELinux didn't implement corresponding permission checks, leaving only the module case covered. Define and add new permission checks for these other cases. Signed-off-by: Cameron K. Williams <ckwilliams.work@gmail.com> Signed-off-by: Kipp N. Davis <kippndavis.work@gmx.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: merge fuzz, line length, and spacing fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-26perf: Remove unnecessary parameter of security checkLuo Gengkun
It seems that the attr parameter was never been used in security checks since it was first introduced by: commit da97e18458fb ("perf_event: Add support for LSM and SELinux checks") so remove it. Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-15kernfs: Use RCU to access kernfs_node::name.Sebastian Andrzej Siewior
Using RCU lifetime rules to access kernfs_node::name can avoid the trouble with kernfs_rename_lock in kernfs_name() and kernfs_path_from_node() if the fs was created with KERNFS_ROOT_INVARIANT_PARENT. This is usefull as it allows to implement kernfs_path_from_node() only with RCU protection and avoiding kernfs_rename_lock. The lock is only required if the __parent node can be changed and the function requires an unchanged hierarchy while it iterates from the node to its parent. The change is needed to allow the lookup of the node's path (kernfs_path_from_node()) from context which runs always with disabled preemption and or interrutps even on PREEMPT_RT. The problem is that kernfs_rename_lock becomes a sleeping lock on PREEMPT_RT. I went through all ::name users and added the required access for the lookup with a few extensions: - rdtgroup_pseudo_lock_create() drops all locks and then uses the name later on. resctrl supports rename with different parents. Here I made a temporal copy of the name while it is used outside of the lock. - kernfs_rename_ns() accepts NULL as new_parent. This simplifies sysfs_move_dir_ns() where it can set NULL in order to reuse the current name. - kernfs_rename_ns() is only using kernfs_rename_lock if the parents are different. All users use either kernfs_rwsem (for stable path view) or just RCU for the lookup. The ::name uses always RCU free. Use RCU lifetime guarantees to access kernfs_node::name. Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Reported-by: syzbot+6ea37e2e6ffccf41a7e6@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/67251dc6.050a0220.529b6.015e.GAE@google.com/ Reported-by: Hillf Danton <hdanton@sina.com> Closes: https://lore.kernel.org/20241102001224.2789-1-hdanton@sina.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20250213145023.2820193-7-bigeasy@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-07io_uring,lsm,selinux: add LSM hooks for io_uring_setup()Hamza Mahfooz
It is desirable to allow LSM to configure accessibility to io_uring because it is a coarse yet very simple way to restrict access to it. So, add an LSM for io_uring_allowed() to guard access to io_uring. Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Acked-by: Jens Axboe <axboe@kernel.dk> [PM: merge fuzz due to changes in preceding patches, subj tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>