Age | Commit message (Collapse) | Author |
|
Use the BIT() and BIT_ULL() macros in the new audit code instead of
explicit shifts to improve readability. Use bitmask instead of modulo
operation to simplify code.
Add test_range1_rand15() and test_range2_rand15() KUnit tests to improve
get_id_range() coverage.
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20250512093732.1408485-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This reverts commit f5c68a4e84f9feca3be578199ec648b676db2030.
It is again possible to build "allmodconfig" with the randstruct GCC
plugin, so enable it for COMPILE_TEST to catch future bugs.
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
A KUnit test checking boundaries triggers a canary warning, which may be
disturbing. Let's remove this test for now. Hopefully, KUnit will soon
get support for suppressing warning backtraces [1].
Cc: Alessandro Carminati <acarmina@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Günther Noack <gnoack@google.com>
Reported-by: Tingmao Wang <m@maowtm.org>
Closes: https://lore.kernel.org/r/20250327213807.12964-1-m@maowtm.org
Link: https://lore.kernel.org/r/20250425193249.78b45d2589575c15f483c3d8@linux-foundation.org [1]
Link: https://lore.kernel.org/r/20250503065359.3625407-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc5).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
gcc-12 and higher support the -ftrivial-auto-var-init= flag, after
gcc-8 is the minimum version, this is half of the supported ones, and
the vast majority of the versions that users are actually likely to
have, so it seems like a good time to stop having the fallback
plugin implementation
Older toolchains are still able to build kernels normally without
this plugin, but won't be able to use variable initialization..
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The amount of memory allocated at kexec load, even with the extra memory
allocated, might not be large enough for the entire measurement list. The
indeterminate interval between kexec 'load' and 'execute' could exacerbate
this problem.
Define two new IMA events, 'kexec_load' and 'kexec_execute', to be
measured as critical data at kexec 'load' and 'execute' respectively.
Report the allocated kexec segment size, IMA binary log size and the
runtime measurements count as part of those events.
These events, and the values reported through them, serve as markers in
the IMA log to verify the IMA events are captured during kexec soft
reboot. The presence of a 'kexec_load' event in between the last two
'boot_aggregate' events in the IMA log implies this is a kexec soft
reboot, and not a cold-boot. And the absence of 'kexec_execute' event
after kexec soft reboot implies missing events in that window which
results in inconsistency with TPM PCR quotes, necessitating a cold boot
for a successful remote attestation.
These critical data events are displayed as hex encoded ascii in the
ascii_runtime_measurement_list. Verifying the critical data hash requires
calculating the hash of the decoded ascii string.
For example, to verify the 'kexec_load' data hash:
sudo cat /sys/kernel/security/integrity/ima/ascii_runtime_measurements
| grep kexec_load | cut -d' ' -f 6 | xxd -r -p | sha256sum
To verify the 'kexec_execute' data hash:
sudo cat /sys/kernel/security/integrity/ima/ascii_runtime_measurements
| grep kexec_execute | cut -d' ' -f 6 | xxd -r -p | sha256sum
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
The extra memory allocated for carrying the IMA measurement list across
kexec is hard-coded as half a PAGE. Make it configurable.
Define a Kconfig option, IMA_KEXEC_EXTRA_MEMORY_KB, to configure the
extra memory (in kb) to be allocated for IMA measurements added during
kexec soft reboot. Ensure the default value of the option is set such
that extra half a page of memory for additional measurements is allocated
for the additional measurements.
Update ima_add_kexec_buffer() function to allocate memory based on the
Kconfig option value, rather than the currently hard-coded one.
Suggested-by: Stefan Berger <stefanb@linux.ibm.com>
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
kexec 'load' may be called multiple times. Free and realloc the buffer
only if the segment_size is changed from the previous kexec 'load' call.
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
The IMA log is currently copied to the new kernel during kexec 'load' using
ima_dump_measurement_list(). However, the IMA measurement list copied at
kexec 'load' may result in loss of IMA measurements records that only
occurred after the kexec 'load'. Move the IMA measurement list log copy
from kexec 'load' to 'execute'
Make the kexec_segment_size variable a local static variable within the
file, so it can be accessed during both kexec 'load' and 'execute'.
Define kexec_post_load() as a wrapper for calling ima_kexec_post_load() and
machine_kexec_post_load(). Replace the existing direct call to
machine_kexec_post_load() with kexec_post_load().
When there is insufficient memory to copy all the measurement logs, copy as
much of the measurement list as possible.
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
The IMA log is currently copied to the new kernel during kexec 'load'
using ima_dump_measurement_list(). However, the log copied at kexec
'load' may result in loss of IMA measurements that only occurred after
kexec "load'. Setup the needed infrastructure to move the IMA log copy
from kexec 'load' to 'execute'.
Define a new IMA hook ima_update_kexec_buffer() as a stub function.
It will be used to call ima_dump_measurement_list() during kexec 'execute'.
Implement ima_kexec_post_load() function to be invoked after the new
Kernel image has been loaded for kexec. ima_kexec_post_load() maps the
IMA buffer to a segment in the newly loaded Kernel. It also registers
the reboot notifier_block to trigger ima_update_kexec_buffer() at
kexec 'execute'.
Set the priority of register_reboot_notifier to INT_MIN to ensure that the
IMA log copy operation will happen at the end of the operation chain, so
that all the IMA measurement records extended into the TPM are copied
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
Currently, the function kexec_calculate_store_digests() calculates and
stores the digest of the segment during the kexec_file_load syscall,
where the IMA segment is also allocated.
Later, the IMA segment will be updated with the measurement log at the
kexec execute stage when a kexec reboot is initiated. Therefore, the
digests should be updated for the IMA segment in the normal case. The
problem is that the content of memory segments carried over to the new
kernel during the kexec systemcall can be changed at kexec 'execute'
stage, but the size and the location of the memory segments cannot be
changed at kexec 'execute' stage.
To address this, skip the calculation and storage of the digest for the
IMA segment in kexec_calculate_store_digests() so that it is not added
to the purgatory_sha_regions.
With this change, the IMA segment is not included in the digest
calculation, storage, and verification.
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
[zohar@linux.ibm.com: Fixed Signed-off-by tag to match author's email ]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
In the current implementation, the ima_dump_measurement_list() API is
called during the kexec "load" phase, where a buffer is allocated and
the measurement records are copied. Due to this, new events added after
kexec load but before kexec execute are not carried over to the new kernel
during kexec operation
Carrying the IMA measurement list across kexec requires allocating a
buffer and copying the measurement records. Separate allocating the
buffer and copying the measurement records into separate functions in
order to allocate the buffer at kexec 'load' and copy the measurements
at kexec 'execute'.
After moving the vfree() here at this stage in the patch set, the IMA
measurement list fails to verify when doing two consecutive "kexec -s -l"
with/without a "kexec -s -u" in between. Only after "ima: kexec: move
IMA log copy from kexec load to execute" the IMA measurement list verifies
properly with the vfree() here.
Co-developed-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Reviewed-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
Before making the function local seq_file "file" variable file static
global, rename it to "ima_kexec_file".
Signed-off-by: Steven Chen <chenste@linux.microsoft.com>
Acked-by: Baoquan He <bhe@redhat.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com> # ppc64/kvm
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"Fix some Landlock audit issues, add related tests, and updates
documentation"
* tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Update log documentation
landlock: Fix documentation for landlock_restrict_self(2)
landlock: Fix documentation for landlock_create_ruleset(2)
selftests/landlock: Add PID tests for audit records
selftests/landlock: Factor out audit fixture in audit_test
landlock: Log the TGID of the domain creator
landlock: Remove incorrect warning
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc4).
This pull includes wireless and a fix to vxlan which isn't
in Linus's tree just yet. The latter creates with a silent conflict
/ build breakage, so merging it now to avoid causing problems.
drivers/net/vxlan/vxlan_vnifilter.c
094adad91310 ("vxlan: Use a single lock to protect the FDB table")
087a9eb9e597 ("vxlan: vnifilter: Fix unlocked deletion of default FDB entry")
https://lore.kernel.org/20250423145131.513029-1-idosch@nvidia.com
No "normal" conflicts, or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
security_netlink_send() is a networking hook, so it fits better under
CONFIG_SECURITY_NETWORK.
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
On IMA policy update, if a measure rule exists in the policy,
IMA_MEASURE is set for ima_policy_flags which makes the violation_check
variable always true. Coupled with a no-action on MAY_READ for a
FILE_CHECK call, we're always taking the inode_lock().
This becomes a performance problem for extremely heavy read-only workloads.
Therefore, prevent this only in the case there's no action to be taken.
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Acked-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
Fix, deduplicate, and improve rendering of landlock_restrict_self(2)'s
flags documentation.
The flags are now rendered like the syscall's parameters and
description.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Move and fix the flags documentation, and improve formatting.
It makes more sense and it eases maintenance to document syscall flags
in landlock.h, where they are defined. This is already the case for
landlock_restrict_self(2)'s flags.
The flags are now rendered like the syscall's parameters and
description.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
There is a GCC crash bug in the randstruct for latest GCC versions that
is being tickled by landlock[1]. Temporarily disable GCC randstruct for
COMPILE_TEST builds to unbreak CI systems for the coming -rc2. This can
be restored once the bug is fixed.
Suggested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/all/20250407-kbuild-disable-gcc-plugins-v1-1-5d46ae583f5e@kernel.org/ [1]
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250409151154.work.872-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The kdoc header incorrectly references an older parameter, update it
to reference what is currently used in the function.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504122308.Ch8PzJdD-lkp@intel.com/
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The "linux/parser.h" header was included twice, we only need it once.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202504121945.Q0GDD0sG-lkp@intel.com/
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
DCCP was orphaned in 2021 by commit 054c4610bd05 ("MAINTAINERS: dccp:
move Gerrit Renker to CREDITS"), which noted that the last maintainer
had been inactive for five years.
In recent years, it has become a playground for syzbot, and most changes
to DCCP have been odd bug fixes triggered by syzbot. Apart from that,
the only changes have been driven by treewide or networking API updates
or adjustments related to TCP.
Thus, in 2023, we announced we would remove DCCP in 2025 via commit
b144fcaf46d4 ("dccp: Print deprecation notice.").
Since then, only one individual has contacted the netdev mailing list. [0]
There is ongoing research for Multipath DCCP. The repository is hosted
on GitHub [1], and development is not taking place through the upstream
community. While the repository is published under the GPLv2 license,
the scheduling part remains proprietary, with a LICENSE file [2] stating:
"This is not Open Source software."
The researcher mentioned a plan to address the licensing issue, upstream
the patches, and step up as a maintainer, but there has been no further
communication since then.
Maintaining DCCP for a decade without any real users has become a burden.
Therefore, it's time to remove it.
Removing DCCP will also provide significant benefits to TCP. It allows
us to freely reorganize the layout of struct inet_connection_sock, which
is currently shared with DCCP, and optimize it to reduce the number of
cachelines accessed in the TCP fast path.
Note that we keep DCCP netfilter modules as requested. [3]
Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0]
Link: https://github.com/telekom/mp-dccp #[1]
Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2]
Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux)
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reduce the SELinux performance overhead during path walks through the
use of a per-task directory access cache and some minor code
optimizations. The directory access cache is per-task because it allows
for a lockless cache while also fitting well with a common application
pattern of heavily accessing a relatively small number of SELinux
directory labels. The cache is inherited by child processes when the
child runs with the same SELinux domain as the parent, and invalidated
on changes to the task's SELinux domain or the loaded SELinux policy.
A cache of four entries was chosen based on testing with the Fedora
"targeted" policy, a SELinux Reference Policy variant, and
'make allmodconfig' on Linux v6.14.
Code optimizations include better use of inline functions to reduce
function calls in the common case, especially in the inode revalidation
code paths, and elimination of redundant checks between the LSM and
SELinux layers.
As mentioned briefly above, aside from general use and regression
testing with the selinux-testsuite, performance was measured using
'make allmodconfig' with Linux v6.14 as a base reference. As expected,
there were variations from one test run to another, but the measurements
below are a good representation of the test results seen on my test
system.
* Linux v6.14
REF
1.26% [k] __d_lookup_rcu
SELINUX (1.31%)
0.58% [k] selinux_inode_permission
0.29% [k] avc_lookup
0.25% [k] avc_has_perm_noaudit
0.19% [k] __inode_security_revalidate
* Linux v6.14 + patch
REF
1.41% [k] __d_lookup_rcu
SELINUX (0.89%)
0.65% [k] selinux_inode_permission
0.15% [k] avc_lookup
0.05% [k] avc_has_perm_noaudit
0.04% [k] avc_policy_seqno
X.XX% [k] __inode_security_revalidate (now inline)
In both cases the __d_lookup_rcu() function was used as a reference
point to establish a context for the SELinux related functions. On a
unpatched Linux v6.14 system we see the time spent in the combined
SELinux functions exceeded that of __d_lookup_rcu(), 1.31% compared to
1.26%. However, with this patch applied the time spent in the combined
SELinux functions dropped to roughly 65% of the time spent in
__d_lookup_rcu(), 0.89% compared to 1.41%. Aside from the significant
decrease in time spent in the SELinux AVC, it appears that any additional
time spent searching and updating the cache is offset by other code
improvements, e.g. time spent in selinux_inode_permission() +
__inode_security_revalidate() + avc_policy_seqno() is less on the
patched kernel than the unpatched kernel.
It is worth noting that in this patch the use of the per-task cache is
limited to the security_inode_permission() LSM callback,
selinux_inode_permission(), but future work could expand the cache into
inode_has_perm(), likely through consolidation of the two functions.
While this would likely have little to no impact on path walks, it
may benefit other operations.
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Currently, genfscon only supports string prefix match to label files.
Thus, labeling numerous dynamic sysfs entries requires many specific
path rules. For example, labeling device paths such as
`/sys/devices/pci0000:00/0000:00:03.1/<...>/0000:04:00.1/wakeup`
requires listing all specific PCI paths, which is challenging to
maintain. While user-space restorecon can handle these paths with
regular expression rules, relabeling thousands of paths under sysfs
after it is mounted is inefficient compared to using genfscon.
This commit adds wildcard matching to genfscon to make rules more
efficient and expressive. This new behavior is enabled by
genfs_seclabel_wildcard capability. With this capability, genfscon does
wildcard matching instead of prefix matching. When multiple wildcard
rules match against a path, then the longest rule (determined by the
length of the rule string) will be applied. If multiple rules of the
same length match, the first matching rule encountered in the given
genfscon policy will be applied. Users are encouraged to write longer,
more explicit path rules to avoid relying on this behavior.
This change resulted in nice real-world performance improvements. For
example, boot times on test Android devices were reduced by 15%. This
improvement is due to the elimination of the "restorecon -R /sys" step
during boot, which takes more than two seconds in the worst case.
Signed-off-by: Takaya Saeki <takayas@chromium.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
Port labeling is based on port number and protocol (TCP/UDP/...) but not
based on network family (IPv4/IPv6).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
For network objects, like interfaces, nodes, port and InfiniBands, the
object to SID lookup is cached in hashtables. OOM during such hashtable
additions of new objects is considered non-fatal and the computed SID is
simply returned without adding the compute result into the hash table.
Actually ignore OOM in the InfiniBand code, despite the comment already
suggesting to do so. This reverts commit c350f8bea271 ("selinux: Fix
error return code in sel_ib_pkey_sid_slow()").
Add comments in the other places.
Use kmalloc() instead of kzalloc(), since all members are initialized on
success and the data is only used in internbal hash tables, so no risk
of information leakage to userspace.
Fixes: c350f8bea271 ("selinux: Fix error return code in sel_ib_pkey_sid_slow()")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
In the network hashtable lookup code add likely() compiler hints in the
fast path, like already done in sel_netif_sid().
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The network namespace is not modified.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The network address, either an IPv4 or IPv6 one, is not modified.
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
As for other Audit's "pid" fields, Landlock should use the task's TGID
instead of its TID. Fix this issue by keeping a reference to the TGID
of the domain creator.
Existing tests already check for the PID but only with the thread group
leader, so always the TGID. A following patch adds dedicated tests for
non-leader thread.
Remove the current_real_cred() check which does not make sense because
we only reference a struct pid, whereas a previous version did reference
a struct cred instead.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20250410171725.1265860-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
landlock_put_hierarchy() can be called when an error occurs in
landlock_merge_ruleset() due to insufficient memory. In this case, the
domain's audit details might not have been allocated yet, which would
cause landlock_free_hierarchy_details() to print a warning (but still
safely handle this case).
We could keep the WARN_ON_ONCE(!hierarchy) but it's not worth it for
this kind of function, so let's remove it entirely.
Cc: Paul Moore <paul@paul-moore.com>
Reported-by: syzbot+8bca99e91de7e060e4ea@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20250331104709.897062-1-mic@digikod.net
Reviewed-by: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
try_lookup_noperm() and d_hash_and_lookup() are nearly identical. The
former does some validation of the name where the latter doesn't.
Outside of the VFS that validation is likely valuable, and having only
one exported function for this task is certainly a good idea.
So make d_hash_and_lookup() local to VFS files and change all other
callers to try_lookup_noperm(). Note that the arguments are swapped.
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250319031545.2999807-6-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The lookup_one_len family of functions is (now) only used internally by
a filesystem on itself either
- in a context where permission checking is irrelevant such as by a
virtual filesystem populating itself, or xfs accessing its ORPHANAGE
or dquota accessing the quota file; or
- in a context where a permission check (MAY_EXEC on the parent) has just
been performed such as a network filesystem finding in "silly-rename"
file in the same directory. This is also the context after the
_parentat() functions where currently lookup_one_qstr_excl() is used.
So the permission check is pointless.
The name "one_len" is unhelpful in understanding the purpose of these
functions and should be changed. Most of the callers pass the len as
"strlen()" so using a qstr and QSTR() can simplify the code.
This patch renames these functions (include lookup_positive_unlocked()
which is part of the family despite the name) to have a name based on
"lookup_noperm". They are changed to receive a 'struct qstr' instead
of separate name and len. In a few cases the use of QSTR() results in a
new call to strlen().
try_lookup_noperm() takes a pointer to a qstr instead of the whole
qstr. This is consistent with d_hash_and_lookup() (which is nearly
identical) and useful for lookup_noperm_unlocked().
The new lookup_noperm_common() doesn't take a qstr yet. That will be
tidied up in a subsequent patch.
Signed-off-by: NeilBrown <neil@brown.name>
Link: https://lore.kernel.org/r/20250319031545.2999807-5-neil@brown.name
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Patch series "mseal system mappings", v9.
As discussed during mseal() upstream process [1], mseal() protects the
VMAs of a given virtual memory range against modifications, such as the
read/write (RW) and no-execute (NX) bits. For complete descriptions of
memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect them
from ever changing to writable or unmmap/remapped as different attributes.
System mappings such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode), are created by the kernel during
program initialization, and could be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not established
during program startup. However, its lifetime is the same as the
process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000), which
is outside the mm managed range. This means mprotect, munmap, and mremap
won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's
security, it is skipped in this patch. If we ever seal the vsyscall, it
is probably only for decorative purpose, i.e. showing the 'sl' flag in
the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag. Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime. The existence of this flag for an architecture implies that it
does not require the remapping of thest system mappings during process
lifetime, so sealing these mappings is safe from a kernel perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel. Other
architectures can choose whether or not to adopt this feature. Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace. The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4]. The UML kernel can add this
support when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing
from previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
than its actual size). The dynamic linker, which relies on PT_LOAD
information to determine mapping size, would then split and partially
seal the vdso mapping. Since each architecture has its own vdso/vvar
code, fixing this in the kernel would require going through each
archiecture. Our initial goal was to enable sealing readonly mappings,
e.g. .text, across all architectures, sealing vdso from kernel since
creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not
length information. Similar issues might exist for other special
mappings.
- Mappings like uprobe are not covered by the dynamic linker, and there
is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android, and
other high security systems.
Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
Android to ensure the sealing doesn't affect the functionality of
Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
This patch (of 7):
Provide infrastructure to mseal system mappings. Establish two kernel
configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future
patches.
Link: https://lkml.kernel.org/r/20250305021711.3867874-1-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250305021711.3867874-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in
6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates"
* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
rust: platform: require Send for Driver trait implementers
rust: pci: require Send for Driver trait implementers
rust: platform: impl Send + Sync for platform::Device
rust: pci: impl Send + Sync for pci::Device
rust: platform: fix unrestricted &mut platform::Device
rust: pci: fix unrestricted &mut pci::Device
rust: device: implement device context marker
rust: pci: use to_result() in enable_device_mem()
MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
rust/kernel/faux: mark Registration methods inline
driver core: faux: only create the device if probe() succeeds
rust/faux: Add missing parent argument to Registration::new()
rust/faux: Drop #[repr(transparent)] from faux::Registration
rust: io: fix devres test with new io accessor functions
rust: io: rename `io::Io` accessors
kernfs: Move dput() outside of the RCU section.
efi: rci2: mark bin_attribute as __ro_after_init
rapidio: constify 'struct bin_attribute'
firmware: qemu_fw_cfg: constify 'struct bin_attribute'
powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"For this merge window we're splitting BPF pull request into three for
higher visibility: main changes, res_spin_lock, try_alloc_pages.
These are the main BPF changes:
- Add DFA-based live registers analysis to improve verification of
programs with loops (Eduard Zingerman)
- Introduce load_acquire and store_release BPF instructions and add
x86, arm64 JIT support (Peilin Ye)
- Fix loop detection logic in the verifier (Eduard Zingerman)
- Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)
- Add kfunc for populating cpumask bits (Emil Tsalapatis)
- Convert various shell based tests to selftests/bpf/test_progs
format (Bastien Curutchet)
- Allow passing referenced kptrs into struct_ops callbacks (Amery
Hung)
- Add a flag to LSM bpf hook to facilitate bpf program signing
(Blaise Boscaccy)
- Track arena arguments in kfuncs (Ihor Solodrai)
- Add copy_remote_vm_str() helper for reading strings from remote VM
and bpf_copy_from_user_task_str() kfunc (Jordan Rome)
- Add support for timed may_goto instruction (Kumar Kartikeya
Dwivedi)
- Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)
- Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
local storage (Martin KaFai Lau)
- Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)
- Allow retrieving BTF data with BTF token (Mykyta Yatsenko)
- Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
(Song Liu)
- Reject attaching programs to noreturn functions (Yafang Shao)
- Introduce pre-order traversal of cgroup bpf programs (Yonghong
Song)"
* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
bpf: Fix out-of-bounds read in check_atomic_load/store()
libbpf: Add namespace for errstr making it libbpf_errstr
bpf: Add struct_ops context information to struct bpf_prog_aux
selftests/bpf: Sanitize pointer prior fclose()
selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
selftests/bpf: test_xdp_vlan: Rename BPF sections
bpf: clarify a misleading verifier error message
selftests/bpf: Add selftest for attaching fexit to __noreturn functions
bpf: Reject attaching fexit/fmod_ret to __noreturn functions
bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
bpf: Make perf_event_read_output accessible in all program types.
bpftool: Using the right format specifiers
bpftool: Add -Wformat-signedness flag to detect format errors
selftests/bpf: Test freplace from user namespace
libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
bpf: Return prog btf_id without capable check
bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
bpf, x86: Fix objtool warning for timed may_goto
bpf: Check map->record at the beginning of check_and_free_fields()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Remove legacy compression interface
- Improve scatterwalk API
- Add request chaining to ahash and acomp
- Add virtual address support to ahash and acomp
- Add folio support to acomp
- Remove NULL dst support from acomp
Algorithms:
- Library options are fuly hidden (selected by kernel users only)
- Add Kerberos5 algorithms
- Add VAES-based ctr(aes) on x86
- Ensure LZO respects output buffer length on compression
- Remove obsolete SIMD fallback code path from arm/ghash-ce
Drivers:
- Add support for PCI device 0x1134 in ccp
- Add support for rk3588's standalone TRNG in rockchip
- Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93
- Fix bugs in tegra uncovered by multi-threaded self-test
- Fix corner cases in hisilicon/sec2
Others:
- Add SG_MITER_LOCAL to sg miter
- Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp"
* tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits)
crypto: testmgr - Add multibuffer acomp testing
crypto: acomp - Fix synchronous acomp chaining fallback
crypto: testmgr - Add multibuffer hash testing
crypto: hash - Fix synchronous ahash chaining fallback
crypto: arm/ghash-ce - Remove SIMD fallback code path
crypto: essiv - Replace memcpy() + NUL-termination with strscpy()
crypto: api - Call crypto_alg_put in crypto_unregister_alg
crypto: scompress - Fix incorrect stream freeing
crypto: lib/chacha - remove unused arch-specific init support
crypto: remove obsolete 'comp' compression API
crypto: compress_null - drop obsolete 'comp' implementation
crypto: cavium/zip - drop obsolete 'comp' implementation
crypto: zstd - drop obsolete 'comp' implementation
crypto: lzo - drop obsolete 'comp' implementation
crypto: lzo-rle - drop obsolete 'comp' implementation
crypto: lz4hc - drop obsolete 'comp' implementation
crypto: lz4 - drop obsolete 'comp' implementation
crypto: deflate - drop obsolete 'comp' implementation
crypto: 842 - drop obsolete 'comp' implementation
crypto: nx - Migrate to scomp API
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This brings two main changes to Landlock:
- A signal scoping fix with a new interface for user space to know if
it is compatible with the running kernel.
- Audit support to give visibility on why access requests are denied,
including the origin of the security policy, missing access rights,
and description of object(s). This was designed to limit log spam
as much as possible while still alerting about unexpected blocked
access.
With these changes come new and improved documentation, and a lot of
new tests"
* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
landlock: Add audit documentation
selftests/landlock: Add audit tests for network
selftests/landlock: Add audit tests for filesystem
selftests/landlock: Add audit tests for abstract UNIX socket scoping
selftests/landlock: Add audit tests for ptrace
selftests/landlock: Test audit with restrict flags
selftests/landlock: Add tests for audit flags and domain IDs
selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
selftests/landlock: Add test for invalid ruleset file descriptor
samples/landlock: Enable users to log sandbox denials
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
landlock: Log scoped denials
landlock: Log TCP bind and connect denials
landlock: Log truncate and IOCTL denials
landlock: Factor out IOCTL hooks
landlock: Log file-related denials
landlock: Log mount-related denials
landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities update from Serge Hallyn:
"This contains just one patch that removes a helper function whose last
user (smack) stopped using it in 2018"
* tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
capability: Remove unused has_capability
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull ima updates from Mimi Zohar:
"Two performance improvements, which minimize the number of integrity
violations"
* tag 'integrity-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: limit the number of ToMToU integrity violations
ima: limit the number of open-writers integrity violations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe update from Fan Wu:
"This contains just one commit from Randy Dunlap, which fixes
kernel-doc warnings in the IPE subsystem"
* tag 'ipe-pr-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: policy_fs: fix kernel-doc warnings
|
|
Each time a file in policy, that is already opened for read, is opened
for write, a Time-of-Measure-Time-of-Use (ToMToU) integrity violation
audit message is emitted and a violation record is added to the IMA
measurement list. This occurs even if a ToMToU violation has already
been recorded.
Limit the number of ToMToU integrity violations per file open for read.
Note: The IMA_MAY_EMIT_TOMTOU atomic flag must be set from the reader
side based on policy. This may result in a per file open for read
ToMToU violation.
Since IMA_MUST_MEASURE is only used for violations, rename the atomic
IMA_MUST_MEASURE flag to IMA_MAY_EMIT_TOMTOU.
Cc: stable@vger.kernel.org # applies cleanly up to linux-6.6
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
Each time a file in policy, that is already opened for write, is opened
for read, an open-writers integrity violation audit message is emitted
and a violation record is added to the IMA measurement list. This
occurs even if an open-writers violation has already been recorded.
Limit the number of open-writers integrity violations for an existing
file open for write to one. After the existing file open for write
closes (__fput), subsequent open-writers integrity violations may be
emitted.
Cc: stable@vger.kernel.org # applies cleanly up to linux-6.6
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move vm_table members out of kernel/sysctl.c
All vm_table array members have moved to their respective subsystems
leading to the removal of vm_table from kernel/sysctl.c. This
increases modularity by placing the ctl_tables closer to where they
are actually used and at the same time reducing the chances of merge
conflicts in kernel/sysctl.c.
- ctl_table range fixes
Replace the proc_handler function that checks variable ranges in
coredump_sysctls and vdso_table with the one that actually uses the
extra{1,2} pointers as min/max values. This tightens the range of the
values that users can pass into the kernel effectively preventing
{under,over}flows.
- Misc fixes
Correct grammar errors and typos in test messages. Update sysctl
files in MAINTAINERS. Constified and removed array size in
declaration for alignment_tbl
* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
selftests/sysctl: fix wording of help messages
selftests: fix spelling/grammar errors in sysctl/sysctl.sh
MAINTAINERS: Update sysctl file list in MAINTAINERS
sysctl: Fix underflow value setting risk in vm_table
coredump: Fixes core_pipe_limit sysctl proc_handler
sysctl: remove unneeded include
sysctl: remove the vm_table
sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
fs: dcache: move the sysctl to fs/dcache.c
sunrpc: simplify rpcauth_cache_shrink_count()
fs: drop_caches: move sysctl to fs/drop_caches.c
fs: fs-writeback: move sysctl to fs/fs-writeback.c
mm: nommu: move sysctl to mm/nommu.c
security: min_addr: move sysctl to security/min_addr.c
mm: mmap: move sysctl to mm/mmap.c
mm: util: move sysctls to mm/util.c
mm: vmscan: move vmscan sysctls to mm/vmscan.c
mm: swap: move sysctl to mm/swap.c
mm: filemap: move sysctl to mm/filemap.c
...
|
|
Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF for the case of sandboxer
tools, init systems, or runtime containers launching programs sandboxing
themselves in an inconsistent way. Setting this flag should only
depends on runtime configuration (i.e. not hardcoded).
We don't create a new ruleset's option because this should not be part
of the security policy: only the task that enforces the policy (not the
one that create it) knows if itself or its children may request denied
actions.
This is the first and only flag that can be set without actually
restricting the caller (i.e. without providing a ruleset).
Extend struct landlock_cred_security with a u8 log_subdomains_off.
struct landlock_file_security is still 16 bytes.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Closes: https://github.com/landlock-lsm/linux/issues/3
Link: https://lore.kernel.org/r/20250320190717.2287696-19-mic@digikod.net
[mic: Fix comment]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Most of the time we want to log denied access because they should not
happen and such information helps diagnose issues. However, when
sandboxing processes that we know will try to access denied resources
(e.g. unknown, bogus, or malicious binary), we might want to not log
related access requests that might fill up logs.
By default, denied requests are logged until the task call execve(2).
If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied
requests will not be logged for the same executed file.
If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied
requests from after an execve(2) call will be logged.
The rationale is that a program should know its own behavior, but not
necessarily the behavior of other programs.
Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific
Landlock domain, it makes it possible to selectively mask some access
requests that would be logged by a parent domain, which might be handy
for unprivileged processes to limit logs. However, system
administrators should still use the audit filtering mechanism. There is
intentionally no audit nor sysctl configuration to re-enable these logs.
This is delegated to the user space program.
Increment the Landlock ABI version to reflect this interface change.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net
[mic: Rename variables and fix __maybe_unused]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support for unix_stream_connect, unix_may_send, task_kill, and
file_send_sigiotask hooks.
The related blockers are:
- scope.abstract_unix_socket
- scope.signal
Audit event sample for abstract unix socket:
type=LANDLOCK_DENY msg=audit(1729738800.268:30): domain=195ba459b blockers=scope.abstract_unix_socket path=00666F6F
Audit event sample for signal:
type=LANDLOCK_DENY msg=audit(1729738800.291:31): domain=195ba459b blockers=scope.signal opid=1 ocomm="systemd"
Refactor and simplify error handling in LSM hooks.
Extend struct landlock_file_security with fown_layer and use it to log
the blocking domain. The struct aligned size is still 16 bytes.
Cc: Günther Noack <gnoack@google.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-17-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support to socket_bind and socket_connect hooks.
The related blockers are:
- net.bind_tcp
- net.connect_tcp
Audit event sample:
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=net.connect_tcp daddr=127.0.0.1 dest=80
Cc: Günther Noack <gnoack@google.com>
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-16-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support to the file_truncate and file_ioctl hooks.
Add a deny_masks_t type and related helpers to store the domain's layer
level per optional access rights (i.e. LANDLOCK_ACCESS_FS_TRUNCATE and
LANDLOCK_ACCESS_FS_IOCTL_DEV) when opening a file, which cannot be
inferred later. In practice, the landlock_file_security aligned blob size is
still 16 bytes because this new one-byte deny_masks field follows the
existing two-bytes allowed_access field and precede the packed
fown_subject.
Implementing deny_masks_t with a bitfield instead of a struct enables a
generic implementation to store and extract layer levels.
Add KUnit tests to check the identification of a layer level from a
deny_masks_t, and the computation of a deny_masks_t from an access right
with its layer level or a layer_mask_t array.
Audit event sample:
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=fs.ioctl_dev path="/dev/tty" dev="devtmpfs" ino=9 ioctlcmd=0x5401
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-15-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|