Age | Commit message (Collapse) | Author |
|
Flag IOMAP_ATOMIC_SW is not really required. The idea of having this flag
is that the FS ->iomap_begin callback could check if this flag is set to
decide whether to do a SW (FS-based) atomic write. But the FS can set
which ->iomap_begin callback it wants when deciding to do a FS-based
atomic write.
Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, as
the block driver can use SW-methods to emulate an atomic write. So change
back to IOMAP_ATOMIC.
The ->iomap_begin callback needs though to indicate to iomap core that
REQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.
These changes were suggested by Christoph Hellwig and Dave Chinner.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Help explain the code.
Also clarify the comment for bio size check.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-3-john.g.garry@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
It is neater to build blk_opf_t fully in one place, so inline
iomap_dio_bio_opflags() in iomap_dio_bio_iter().
Also tidy up the logic in dealing with IOMAP_DIO_CALLER_COMP, in generally
separate the logic in dealing with flags associated with reads and writes.
Originally-from: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20250320120250.4087011-2-john.g.garry@oracle.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
This also fixes a wrong definitions for SCM_TS_OPT_ID & SO_RCVPRIORITY.
Accidentally found while working on another patchset.
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>
Cc: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Fixes: a89568e9be75 ("selftests: txtimestamp: add SCM_TS_OPT_ID test")
Fixes: e45469e594b2 ("sock: Introduce SO_RCVPRIORITY socket option")
Link: https://lore.kernel.org/netdev/20250314195257.34854-1-kuniyu@amazon.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://patch.msgid.link/20250314214155.16046-1-aleksandr.mikhalitsyn@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Because of the size restriction in the TCP options space, the MPTCP
ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones.
For this reason, in the linked mptcp_out_options structure, group of
fields linked to different options are part of the same union.
There is a case where the mptcp_pm_add_addr_signal() function can modify
opts->addr, but not ended up sending an ADD_ADDR. Later on, back in
mptcp_established_options, other options will be sent, but with
unexpected data written in other fields due to the union, e.g. in
opts->ext_copy. This could lead to a data stream corruption in the next
packet.
Using an intermediate variable, prevents from corrupting previously
established DSS option. The assignment of the ADD_ADDR option
parameters is now done once we are sure this ADD_ADDR option can be set
in the packet, e.g. after having dropped other suboptions.
Fixes: 1bff1e43a30e ("mptcp: optimize out option generation")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: the commit message has been updated: long lines splits and some
clarifications. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
irq allocated with devm_request_irq() will be freed in devm_irq_release(),
using free_irq() in ->remove() will causes a dangling pointer, and a
subsequent double free. So remove the free_irq() in the error path and
remove path.
Fixes: 969864efae78 ("i2c: amd-mp2: use msix/msi if the hardware supports")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20221103121146.99836-1-yangyingliang@huawei.com
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
|
|
There is an issue in the kernel:
In tmpfs, when using the "ls" command to list the contents
of a directory with a large number of files, glibc performs
the getdents call in multiple rounds. If a concurrent unlink
occurs between these getdents calls, it may lead to duplicate
directory entries in the ls output. One possible reproduction
scenario is as follows:
Create 1026 files and execute ls and rm concurrently:
for i in {1..1026}; do
echo "This is file $i" > /tmp/dir/file$i
done
ls /tmp/dir rm /tmp/dir/file4
->getdents(file1026-file5)
->unlink(file4)
->getdents(file5,file3,file2,file1)
It is expected that the second getdents call to return file3
through file1, but instead it returns an extra file5.
The root cause of this problem is in the offset_dir_lookup
function. It uses mas_find to determine the starting position
for the current getdents call. Since mas_find locates the first
position that is greater than or equal to mas->index, when file4
is deleted, it ends up returning file5.
It can be fixed by replacing mas_find with mas_find_rev, which
finds the first position that is less than or equal to mas->index.
Fixes: b9b588f22a0c ("libfs: Use d_children list to iterate simple_offset directories")
Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
Link: https://lore.kernel.org/r/20250320034417.555810-1-sunyongjian@huaweicloud.com
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
As both locks are highly contended during significant inode churn,
holding the inode hash lock while waiting for the sb list lock
exacerbates the problem.
Why moving it out is safe: the inode at hand still has I_NEW set and
anyone who finds it through legitimate means waits for the bit to clear,
by which time inode_sb_list_add() is guaranteed to have finished.
This significantly drops hash lock contention for me when stating 20
separate trees in parallel, each with 1000 directories * 1000 files.
However, no speed up was observed as contention increased on the other
locks, notably dentry LRU.
Even so, removal of the lock ordering will help making this faster
later.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250320004643.1903287-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Justin Iurman says:
====================
net: fix lwtunnel reentry loops
When the destination is the same after the transformation, we enter a
lwtunnel loop. This is true for most of lwt users: ioam6, rpl, seg6,
seg6_local, ila_lwt, and lwt_bpf. It can happen in their input() and
output() handlers respectively, where either dst_input() or dst_output()
is called at the end. It can also happen in xmit() handlers.
Here is an example for rpl_input():
dump_stack_lvl+0x60/0x80
rpl_input+0x9d/0x320
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
[...]
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
lwtunnel_input+0x64/0xa0
ip6_sublist_rcv_finish+0x85/0x90
ip6_sublist_rcv+0x236/0x2f0
... until rpl_do_srh() fails, which means skb_cow_head() failed.
This series provides a fix at the core level of lwtunnel to catch such
loops when they're not caught by the respective lwtunnel users, and
handle the loop case in ioam6 which is one of the users. This series
also comes with a new selftest to detect some dst cache reference loops
in lwtunnel users.
====================
Link: https://patch.msgid.link/20250314120048.12569-1-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As recently specified by commit 0ea09cbf8350 ("docs: netdev: add a note
on selftest posting") in net-next, the selftest is therefore shipped in
this series. However, this selftest does not really test this series. It
needs this series to avoid crashing the kernel. What it really tests,
thanks to kmemleak, is what was fixed by the following commits:
- commit c71a192976de ("net: ipv6: fix dst refleaks in rpl, seg6 and
ioam6 lwtunnels")
- commit 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and
ioam6 lwtunnels")
- commit c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6
lwt")
- commit 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl
lwt")
- commit 0e7633d7b95b ("net: ipv6: fix dst ref loop in ila lwtunnel")
- commit 5da15a9c11c1 ("net: ipv6: fix missing dst ref drop in ila
lwtunnel")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-4-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix the lwtunnel_output() reentry loop in ioam6_iptunnel when the
destination is the same after transformation. Note that a check on the
destination address was already performed, but it was not enough. This
is the example of a lwtunnel user taking care of loops without relying
only on the last resort detection offered by lwtunnel.
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-3-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This patch acts as a parachute, catch all solution, by detecting
recursion loops in lwtunnel users and taking care of them (e.g., a loop
between routes, a loop within the same route, etc). In general, such
loops are the consequence of pathological configurations. Each lwtunnel
user is still free to catch such loops early and do whatever they want
with them. It will be the case in a separate patch for, e.g., seg6 and
seg6_local, in order to provide drop reasons and update statistics.
Another example of a lwtunnel user taking care of loops is ioam6, which
has valid use cases that include loops (e.g., inline mode), and which is
addressed by the next patch in this series. Overall, this patch acts as
a last resort to catch loops and drop packets, since we don't want to
leak something unintentionally because of a pathological configuration
in lwtunnels.
The solution in this patch reuses dev_xmit_recursion(),
dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine
considering the context.
Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/
Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/
Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function")
Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input")
Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently the API emac_update_hardware_stats() reads different ICSSG
stats without any lock protection.
This API gets called by .ndo_get_stats64() which is only under RCU
protection and nothing else. Add lock to this API so that the reading of
statistics happens during lock.
Fixes: c1e10d5dc7a1 ("net: ti: icssg-prueth: Add ICSSG Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250314102721.1394366-1-danishanwar@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The ->send() operation frees skb so save the length before calling
->send() to avoid a use after free.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/c751531d-4af4-42fe-affe-6104b34b791d@stanley.mountain
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Otherwise gcc 13 generates conditional forward jumps (aka branch
mispredict by default) for build_open_flags() being succesfull.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250320092331.1921700-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Merge the slab feature branch kfree_rcu_tiny for 6.15:
- Move the TINY_RCU kvfree_rcu() implementation from RCU to SLAB
subsystem and cleanup its integration.
|
|
In architectures that use the polling bit, current_clr_polling() employs
smp_mb() to ensure that the clearing of the polling bit is visible to
other cores before checking TIF_NEED_RESCHED.
However, smp_mb() can be costly. Given that clear_bit() is an atomic
operation, replacing smp_mb() with smp_mb__after_atomic() is appropriate.
Many architectures implement smp_mb__after_atomic() as a lighter-weight
barrier compared to smp_mb(), leading to performance improvements.
For instance, on x86, smp_mb__after_atomic() is a no-op. This change
eliminates a smp_mb() instruction in the cpuidle wake-up path, saving
several CPU cycles and thereby reducing wake-up latency.
Architectures that do not use the polling bit will retain the original
smp_mb() behavior to ensure that existing dependencies remain unaffected.
Signed-off-by: Yujun Dong <yujundong@pascal-lab.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241230141624.155356-1-yujundong@pascal-lab.net
|
|
1. predict the file was found
2. explicitly compare the ref to "one", ignoring the dead zone
The latter arguably improves the behavior to begin with. Suppose the
count turned bad -- the previously used ref routine is going to check
for it and return 0, indicating the count does not necessitate taking
->f_pos_lock. But there very well may be several users.
i.e. not paying for special-casing the dead zone improves semantics.
While here spell out each condition in a dedicated if statement. This
has no effect on generated code.
Sizes are as follows (in bytes; gcc 13, x86-64):
stock: 321
likely(): 298
likely()+ref: 280
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250319215801.1870660-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Since the i and pool->chunk_size variables are of type 'u32',
their product can wrap around and then be cast to 'u64'.
This can lead to two different XDP buffers pointing to the same
memory area.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 94033cd8e73b ("xsk: Optimize for aligned case")
Cc: stable@vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru>
Link: https://patch.msgid.link/20250313085007.3116044-1-Ilia.Gavrilov@infotecs.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
For more than a decade, CONFIG_SCHED_DEBUG=y has been enabled
in all the major Linux distributions:
/boot/config-6.11.0-19-generic:CONFIG_SCHED_DEBUG=y
The reason is that while originally CONFIG_SCHED_DEBUG started
out as a debugging feature, over the years (decades ...) it has
grown various bits of statistics, instrumentation and
control knobs that are useful for sysadmin and general software
development purposes as well.
But within the kernel we still pretend that there's a choice,
and sometimes code that is seemingly 'debug only' creates overhead
that should be optimized in reality.
So make it all official and make CONFIG_SCHED_DEBUG unconditional.
Now that all uses of CONFIG_SCHED_DEBUG are removed from
the code by previous patches, remove the Kconfig option as well.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-6-mingo@kernel.org
|
|
We leave most of the defconfigs alone (there's over 70 of them),
but let's remove CONFIG_SCHED_DEBUG from the scheduler self-test
Kconfig files.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z9szt3MpQmQ56TRd@gmail.com
|
|
documentation
Since it's enabled unconditionally now, remove all references to it.
(Left out languages I cannot read.)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-5-mingo@kernel.org
|
|
All the big Linux distros enable CONFIG_SCHED_DEBUG, because
the various features it provides help not just with kernel
development, but with system administration and user-space
software development as well.
Reflect this reality and enable this functionality
unconditionally.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-4-mingo@kernel.org
|
|
With CONFIG_SCHED_DEBUG becoming unconditional, remove the
extra 'const_debug' indirection towards __read_mostly.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-3-mingo@kernel.org
|
|
The scheduler has this special SCHED_WARN() facility that
depends on CONFIG_SCHED_DEBUG.
Since CONFIG_SCHED_DEBUG is getting removed, convert
SCHED_WARN() to WARN_ON_ONCE().
Note that the warning output isn't 100% equivalent:
#define SCHED_WARN_ON(x) WARN_ONCE(x, #x)
Because SCHED_WARN_ON() would output the 'x' condition
as well, while WARN_ONCE() will only show a backtrace.
Hopefully these are rare enough to not really matter.
If it does, we should probably introduce a new WARN_ON()
variant that outputs the condition in stringified form,
or improve WARN_ON() itself.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250317104257.3496611-2-mingo@kernel.org
|
|
Track whether pages were unmapped from any MM (even ones with a currently
empty mm_cpumask) by the reclaim code, to figure out whether or not
broadcast TLB flush should be done when reclaim finishes.
The reason any MM must be tracked, and not only ones contributing to the
tlbbatch cpumask, is that broadcast ASIDs are expected to be kept up to
date even on CPUs where the MM is not currently active.
This change allows reclaim to avoid doing TLB flushes when only clean page
cache pages and/or slab memory were reclaimed, which is fairly common.
( This is a simpler alternative to the code that was in my INVLPGB series
before, and it seems to capture most of the benefit due to how common
it is to reclaim only page cache. )
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250319132520.6b10ad90@fangorn
|
|
Introduce a name for an old Pentium 4 model and replace the x86_model
checks with VFM ones. This gets rid of one of the last remaining
Intel-specific x86_model checks.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318223828.2945651-3-sohil.mehta@intel.com
|
|
Architectural Perfmon was introduced on the Family 6 "Core" processors
starting with Yonah. Processors before Yonah need their own customized
PMU initialization.
p6_pmu_init() is expected to provide that initialization for early
Family 6 processors. But, currently, it could get called for any Family
6 processor if the architectural perfmon feature is disabled on that
processor. To simplify, restrict the P6 PMU initialization to early
Family 6 processors that do not have architectural perfmon support and
truly need the special handling.
As a result, the "unsupported" console print becomes practically
unreachable because all the released P6 processors are covered by the
switch cases. Move the console print to a common location where it can
cover all modern processors (including Family >15) that may not have
architectural perfmon support enumerated.
Also, use this opportunity to get rid of the unnecessary switch cases in
P6 initialization. Only the Pentium Pro processor needs a quirk, and the
rest of the processors do not need any special handling. The gaps in the
case numbers are only due to no processor with those model numbers being
released.
Use decimal numbers to represent Intel Family numbers. Also, convert one
of the last few Intel x86_model comparisons to a VFM-based one.
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250318223828.2945651-2-sohil.mehta@intel.com
|
|
When the rseq UAPI header is included, 'union rseq' clashes with 'struct
rseq'. It's not the case in the rseq selftests but it does break the KVM
selftests that also include this file.
Rename 'union rseq' to 'union rseq_tls' to fix this.
Fixes: e6644c967d3c ("rseq/selftests: Ensure the rseq ABI TLS is actually 1024 bytes")
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/20250319202144.1141542-1-mjeanson@efficios.com
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/alexs/linux into docs-mw
Chinese translation docs for 6.15-rc1
This is the Chinese translation subtree for 6.15-rc1. It just
includes few changes:
- Chinese disclaimer change
- add a new translation doc: snp-tdx-threat-model
- fix a typo
Above patches are tested by 'make htmldocs/pdfdocs'
Signed-off-by: Alex Shi <alexs@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_event: Fix connection regression between LE and non-LE adapters
- Fix error code in chan_alloc_skb_cb()
* tag 'for-net-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_event: Fix connection regression between LE and non-LE adapters
Bluetooth: Fix error code in chan_alloc_skb_cb()
====================
Link: https://patch.msgid.link/20250314163847.110069-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix an entry in MAINTAINERS to avoid sending hwmon review requests to
the i2c mailing list
- Fix an out-of-bounds access in nct6775 driver
* tag 'hwmon-fixes-for-v6.14-rc8/6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775-core) Fix out of bounds access for NCT679{8,9}
MAINTAINERS: correct list and scope of LTC4286 HARDWARE MONITOR
|
|
Everywhere else in the driver uses devm_kzalloc() when allocating the
AXI data, so there is no kfree() of this structure. However,
dwc-qos-eth uses kzalloc(), which leads to this memory being leaked.
Switch to use devm_kzalloc().
Fixes: d8256121a91a ("stmmac: adding new glue driver dwmac-dwc-qos-eth")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1tsRyv-0064nU-O9@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Previously with tegra-smmu, even with CONFIG_IOMMU_DMA, the default domain
could have been left as NULL. The NULL domain is specially recognized by
host1x_iommu_attach() as meaning it is not the DMA domain and
should be replaced with the special shared domain.
This happened prior to the below commit because tegra-smmu was using the
NULL domain to mean IDENTITY.
Now that the domain is properly labled the test in DRM doesn't see NULL.
Check for IDENTITY as well to enable the special domains.
This is the same issue and basic fix as seen in
commit fae6e669cdc5 ("drm/tegra: Do not assume that a NULL domain means no
DMA IOMMU").
Fixes: c8cc2655cc6c ("iommu/tegra-smmu: Implement an IDENTITY domain")
Reported-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Closes: https://lore.kernel.org/all/c6a6f114-3acd-4d56-a13b-b88978e927dc@tecnico.ulisboa.pt/
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0-v1-10dcc8ce3869+3a7-host1x_identity_jgg@nvidia.com
|
|
cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while
iterating all possible cpus. It only drops the lock if there is
scheduler or spin lock contention. If neither, then interrupts can be
disabled for a long time. On large machines this can disable interrupts
for a long enough time to drop network packets. On 400+ CPU machines
I've seen interrupt disabled for over 40 msec.
Prevent rstat from disabling interrupts while processing all possible
cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This
approach was previously discussed in
https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/,
though this was in the context of an non-irq rstat spin lock.
Benchmark this change with:
1) a single stat_reader process with 400 threads, each reading a test
memcg's memory.stat repeatedly for 10 seconds.
2) 400 memory hog processes running in the test memcg and repeatedly
charging memory until oom killed. Then they repeat charging and oom
killing.
v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds
interrupts are disabled by rstat for 45341 usec:
# => started at: _raw_spin_lock_irq
# => ended at: cgroup_rstat_flush
#
#
# _------=> CPU#
# / _-----=> irqs-off/BH-disabled
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / _-=> migrate-disable
# ||||| / delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
stat_rea-96532 52d.... 0us*: _raw_spin_lock_irq
stat_rea-96532 52d.... 45342us : cgroup_rstat_flush
stat_rea-96532 52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat_flush
stat_rea-96532 52d.... 45343us : <stack trace>
=> memcg1_stat_format
=> memory_stat_format
=> memory_stat_show
=> seq_read_iter
=> vfs_read
=> ksys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe
With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the
longest holder. The longest irqs-off holder has irqs disabled for
4142 usec, a huge reduction from previous 45341 usec rstat finding.
Running stat_reader memory.stat reader for 10 seconds:
- without memory hogs: 9.84M accesses => 12.7M accesses
- with memory hogs: 9.46M accesses => 11.1M accesses
The throughput of memory.stat access improves.
The mode of memory.stat access latency after grouping by of 2 buckets:
- without memory hogs: 64 usec => 16 usec
- with memory hogs: 64 usec => 8 usec
The memory.stat latency improves.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Tested-by: Greg Thelen <gthelen@google.com>
Acked-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Make sure the test cleans up after itself. The XDP off statements
at the end of the test may not be reached.
Fixes: 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312131040.660386-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Niklas Cassel:
- Fix a regression on ATI AHCI controllers, where certain Samsung
drives fails to be detected on a warm boot when LPM is enabled.
LPM on ATI AHCI works fine with other drives. Likewise, the
Samsung drives works fine with LPM with other AHI controllers.
Thus, just like the weirdo ATA_QUIRK_NO_NCQ_ON_ATI quirk, add a
new ATA_QUIRK_NO_LPM_ON_ATI quirk to disable LPM only on ATI
AHCI controllers.
* tag 'ata-6.14-final' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Add ATA_QUIRK_NO_LPM_ON_ATI for certain Samsung SSDs
|
|
When we currently create a pidfd we check that the task hasn't been
reaped right before we create the pidfd. But it is of course possible
that by the time we return the pidfd to userspace the task has already
been reaped since we don't check again after having created a dentry for
it.
This was fine until now because that race was meaningless. But now that
we provide PIDFD_INFO_EXIT it is a problem because it is possible that
the kernel returns a reaped pidfd and it depends on the race whether
PIDFD_INFO_EXIT information is available. This depends on if the task
gets reaped before or after a dentry has been attached to struct pid.
Make this consistent and only returned pidfds for reaped tasks if
PIDFD_INFO_EXIT information is available. This is done by performing
another check whether the task has been reaped right after we attached a
dentry to struct pid.
Since pidfs_exit() is called before struct pid's task linkage is removed
the case where the task got reaped but a dentry was already attached to
struct pid and exit information was recorded and published can be
handled correctly. In that case we do return a pidfd for a reaped task
like we would've before.
Link: https://lore.kernel.org/r/20250316-kabel-fehden-66bdb6a83436@brauner
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Eliminates a jump over a call to capable() in the common case.
By default the limit is not even set, in which case the check can't even
fail to begin with. It remains unset at least on Debian and Ubuntu.
For this cases this can probably become a static branch instead.
In the meantime tidy it up.
I note the check separate from the bump makes the entire thing racy.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20250319124923.1838719-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Previously, iomap_readpage_iter() returning 0 would break out of the
loops of iomap_readahead_iter(), which is what iomap_read_inline_data()
relies on.
However, commit d9dc477ff6a2 ("iomap: advance the iter directly on
buffered read") changes this behavior without calling
iomap_iter_advance(), which causes EROFS to get stuck in
iomap_readpage_iter().
It seems iomap_iter_advance() cannot be called in
iomap_read_inline_data() because of the iomap_write_begin() path, so
handle this in iomap_readpage_iter() instead.
Reported-and-tested-by: Bo Liu <liubo03@inspur.com>
Fixes: d9dc477ff6a2 ("iomap: advance the iter directly on buffered read")
Cc: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250319085125.4039368-1-hsiangkao@linux.alibaba.com
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Holding the pte lock for the page that is being converted to secure is
needed to avoid races. A previous commit removed the locking, which
caused issues. Fix by locking the pte again.
|
|
When mounting a user-space filesystem using io_uring, the initialization
of the rings is done separately in the server side. If for some reason
(e.g. a server bug) this step is not performed it will be impossible to
unmount the filesystem if there are already requests waiting.
This issue is easily reproduced with the libfuse passthrough_ll example,
if the queue depth is set to '0' and a request is queued before trying to
unmount the filesystem. When trying to force the unmount, fuse_abort_conn()
will try to wake up all tasks waiting in fc->blocked_waitq, but because the
rings were never initialized, fuse_uring_ready() will never return 'true'.
Fixes: 3393ff964e0f ("fuse: block request allocation until io-uring init is complete")
Signed-off-by: Luis Henriques <luis@igalia.com>
Link: https://lore.kernel.org/r/20250306111218.13734-1-luis@igalia.com
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix a reference count leak in slave_show() by properly putting the device
reference obtained from device_find_any_child().
Fixes: 6c364062bfed ("spi: core: Add support for registering SPI slave controllers")
Fixes: c21b0837983d ("spi: Use device_find_any_child() instead of custom approach")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250319032305.70340-1-linmq006@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
While the GCC and Clang compilers already define __ASSEMBLER__
automatically when compiling assembly code, __ASSEMBLY__ is a
macro that only gets defined by the Makefiles in the kernel.
This can be very confusing when switching between userspace
and kernelspace coding, or when dealing with UAPI headers that
rather should use __ASSEMBLER__ instead. So let's standardize on
the __ASSEMBLER__ macro that is provided by the compilers now.
This is mostly a mechanical patch (done with a simple "sed -i"
statement), with some manual tweaks in <asm/frame.h>, <asm/hw_irq.h>
and <asm/setup.h> that mentioned this macro in comments with some
missing underscores.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250314071013.1575167-38-thuth@redhat.com
|
|
__ASSEMBLY__ is only defined by the Makefile of the kernel, so
this is not really useful for UAPI headers (unless the userspace
Makefile defines it, too). Let's switch to __ASSEMBLER__ which
gets set automatically by the compiler when compiling assembly
code.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Brian Gerst <brgerst@gmail.com>
Link: https://lore.kernel.org/r/20250310104256.123527-1-thuth@redhat.com
|
|
locking instructions
According to:
https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html
the usage of asm pseudo directives in the asm template can confuse
the compiler to wrongly estimate the size of the generated
code.
The LOCK_PREFIX macro expands to several asm pseudo directives, so
its usage in atomic locking insns causes instruction length estimates
to fail significantly (the specially instrumented compiler reports
the estimated length of these asm templates to be 6 instructions long).
This incorrect estimate further causes unoptimal inlining decisions,
un-optimal instruction scheduling and un-optimal code block alignments
for functions that use these locking primitives.
Use asm_inline instead:
https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html
which is a feature that makes GCC pretend some inline assembler code
is tiny (while it would think it is huge), instead of just asm.
For code size estimation, the size of the asm is then taken as
the minimum size of one instruction, ignoring how many instructions
compiler thinks it is.
bloat-o-meter reports the following code size increase
(x86_64 defconfig, gcc-14.2.1):
add/remove: 82/283 grow/shrink: 870/372 up/down: 76272/-43618 (32654)
Total: Before=22770320, After=22802974, chg +0.14%
with top grows (>500 bytes):
Function old new delta
----------------------------------------------------------------
copy_process 6465 10191 +3726
balance_dirty_pages_ratelimited_flags 237 2949 +2712
icl_plane_update_noarm 5800 7969 +2169
samsung_input_mapping 3375 5170 +1795
ext4_do_update_inode.isra - 1526 +1526
__schedule 2416 3472 +1056
__i915_vma_resource_unhold - 946 +946
sched_mm_cid_after_execve 175 1097 +922
__do_sys_membarrier - 862 +862
filemap_fault 2666 3462 +796
nl80211_send_wiphy 11185 11874 +689
samsung_input_mapping.cold 900 1500 +600
virtio_gpu_queue_fenced_ctrl_buffer 839 1410 +571
ilk_update_pipe_csc 1201 1735 +534
enable_step - 525 +525
icl_color_commit_noarm 1334 1847 +513
tg3_read_bc_ver - 501 +501
and top shrinks (>500 bytes):
Function old new delta
----------------------------------------------------------------
nl80211_send_iftype_data 580 - -580
samsung_gamepad_input_mapping.isra.cold 604 - -604
virtio_gpu_queue_ctrl_sgs 724 - -724
tg3_get_invariants 9218 8376 -842
__i915_vma_resource_unhold.part 899 - -899
ext4_mark_iloc_dirty 1735 106 -1629
samsung_gamepad_input_mapping.isra 2046 - -2046
icl_program_input_csc 2203 - -2203
copy_mm 2242 - -2242
balance_dirty_pages 2657 - -2657
These code size changes can be grouped into 4 groups:
a) some functions now include once-called functions in full or
in part. These are:
Function old new delta
----------------------------------------------------------------
copy_process 6465 10191 +3726
balance_dirty_pages_ratelimited_flags 237 2949 +2712
icl_plane_update_noarm 5800 7969 +2169
samsung_input_mapping 3375 5170 +1795
ext4_do_update_inode.isra - 1526 +1526
that now include:
Function old new delta
----------------------------------------------------------------
copy_mm 2242 - -2242
balance_dirty_pages 2657 - -2657
icl_program_input_csc 2203 - -2203
samsung_gamepad_input_mapping.isra 2046 - -2046
ext4_mark_iloc_dirty 1735 106 -1629
b) ISRA [interprocedural scalar replacement of aggregates,
interprocedural pass that removes unused function return values
(turning functions returning a value which is never used into void
functions) and removes unused function parameters. It can also
replace an aggregate parameter by a set of other parameters
representing part of the original, turning those passed by reference
into new ones which pass the value directly.]
Top grows and shrinks of this group are listed below:
Function old new delta
----------------------------------------------------------------
ext4_do_update_inode.isra - 1526 +1526
nfs4_begin_drain_session.isra - 249 +249
nfs4_end_drain_session.isra - 168 +168
__guc_action_register_multi_lrc_v70.isra 335 500 +165
__i915_gem_free_objects.isra - 144 +144
...
membarrier_register_private_expedited.isra 108 - -108
syncobj_eventfd_entry_func.isra 445 314 -131
__ext4_sb_bread_gfp.isra 140 - -140
class_preempt_notrace_destructor.isra 145 - -145
p9_fid_put.isra 151 - -151
__mm_cid_try_get.isra 238 - -238
membarrier_global_expedited.isra 294 - -294
mm_cid_get.isra 295 - -295
samsung_gamepad_input_mapping.isra.cold 604 - -604
samsung_gamepad_input_mapping.isra 2046 - -2046
c) different split points of hot/cold split that just move code around:
Top grows and shrinks of this group are listed below:
Function old new delta
----------------------------------------------------------------
samsung_input_mapping.cold 900 1500 +600
__i915_request_reset.cold 311 389 +78
nfs_update_inode.cold 77 153 +76
__do_sys_swapon.cold 404 455 +51
copy_process.cold - 45 +45
tg3_get_invariants.cold 73 115 +42
...
hibernate.cold 671 643 -28
copy_mm.cold 31 - -31
software_resume.cold 249 207 -42
io_poll_wake.cold 106 54 -52
samsung_gamepad_input_mapping.isra.cold 604 - -604
c) full inline of small functions with locking insn (~150 cases).
These bring in most of the code size increase because the removed
function code is now inlined in multiple places. E.g.:
0000000000a50e10 <release_devnum>:
a50e10: 48 63 07 movslq (%rdi),%rax
a50e13: 85 c0 test %eax,%eax
a50e15: 7e 10 jle a50e27 <release_devnum+0x17>
a50e17: 48 8b 4f 50 mov 0x50(%rdi),%rcx
a50e1b: f0 48 0f b3 41 50 lock btr %rax,0x50(%rcx)
a50e21: c7 07 ff ff ff ff movl $0xffffffff,(%rdi)
a50e27: e9 00 00 00 00 jmp a50e2c <release_devnum+0x1c>
a50e28: R_X86_64_PLT32 __x86_return_thunk-0x4
a50e2c: 0f 1f 40 00 nopl 0x0(%rax)
is now fully inlined into the caller function. This is desirable due
to the per function overhead of CPU bug mitigations like retpolines.
FTR a) with -Os (where generated code size really matters) x86_64
defconfig object file decreases by 24.388 kbytes, representing 0.1%
code size decrease:
text data bss dec hex filename
23883860 4617284 814212 29315356 1bf511c vmlinux-old.o
23859472 4615404 814212 29289088 1beea80 vmlinux-new.o
FTR b) clang recognizes "asm inline", but there was no difference in
code sizes:
text data bss dec hex filename
27577163 4503078 807732 32887973 1f5d4a5 vmlinux-clang-patched.o
27577181 4503078 807732 32887991 1f5d4b7 vmlinux-clang-unpatched.o
The performance impact of the patch was assessed by recompiling
fedora-41 6.13.5 kernel and running lmbench with old and new kernel.
The most noticeable improvements were:
Process fork+exit: 270.0952 microseconds
Process fork+execve: 2620.3333 microseconds
Process fork+/bin/sh -c: 6781.0000 microseconds
File /usr/tmp/XXX write bandwidth: 1780350 KB/sec
Pagefaults on /usr/tmp/XXX: 0.3875 microseconds
to:
Process fork+exit: 298.6842 microseconds
Process fork+execve: 1662.7500 microseconds
Process fork+/bin/sh -c: 2127.6667 microseconds
File /usr/tmp/XXX write bandwidth: 1950077 KB/sec
Pagefaults on /usr/tmp/XXX: 0.1958 microseconds
and from:
Socket bandwidth using localhost
0.000001 2.52 MB/sec
0.000064 163.02 MB/sec
0.000128 321.70 MB/sec
0.000256 630.06 MB/sec
0.000512 1207.07 MB/sec
0.001024 2004.06 MB/sec
0.001437 2475.43 MB/sec
10.000000 5817.34 MB/sec
Avg xfer: 3.2KB, 41.8KB in 1.2230 millisecs, 34.15 MB/sec
AF_UNIX sock stream bandwidth: 9850.01 MB/sec
Pipe bandwidth: 4631.28 MB/sec
to:
Socket bandwidth using localhost
0.000001 3.13 MB/sec
0.000064 187.08 MB/sec
0.000128 324.12 MB/sec
0.000256 618.51 MB/sec
0.000512 1137.13 MB/sec
0.001024 1962.95 MB/sec
0.001437 2458.27 MB/sec
10.000000 6168.08 MB/sec
Avg xfer: 3.2KB, 41.8KB in 1.0060 millisecs, 41.52 MB/sec
AF_UNIX sock stream bandwidth: 9921.68 MB/sec
Pipe bandwidth: 4649.96 MB/sec
[ mingo: Prettified the changelog a bit. ]
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20250309170955.48919-1-ubizjak@gmail.com
|
|
Use asm_inline() to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.
bloat-o-meter reports slight increase of the code size
for x86_64 defconfig object file, compiled with gcc-14.2:
add/remove: 0/2 grow/shrink: 3/0 up/down: 190/-59 (131)
Function old new delta
__copy_user_flushcache 166 247 +81
__memcpy_flushcache 369 437 +68
arch_wb_cache_pmem 6 47 +41
__pfx_clean_cache_range 16 - -16
clean_cache_range 43 - -43
Total: Before=22807167, After=22807298, chg +0.00%
The compiler now inlines and removes the clean_cache_range() function.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250313102715.333142-2-ubizjak@gmail.com
|
|
Current minimum required version of binutils is 2.25,
which supports CLFLUSHOPT and CLWB instruction mnemonics.
Replace the byte-wise specification of CLFLUSHOPT and
CLWB with these proper mnemonics.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250313102715.333142-1-ubizjak@gmail.com
|
|
Use asm_inline() to instruct the compiler that the size of asm()
is the minimum size of one instruction, ignoring how many instructions
the compiler thinks it is. ALTERNATIVE macro that expands to several
pseudo directives causes instruction length estimate to count
more than 20 instructions.
bloat-o-meter reports slight reduction of the code size
for x86_64 defconfig object file, compiled with gcc-14.2:
add/remove: 6/12 grow/shrink: 59/50 up/down: 3389/-3560 (-171)
Total: Before=22734393, After=22734222, chg -0.00%
where 29 instances of code blocks involving POPCNT now gets inlined,
resulting in the removal of several functions:
format_is_yuv_semiplanar.part.isra 41 - -41
cdclk_divider 69 - -69
intel_joiner_adjust_timings 140 - -140
nl80211_send_wowlan_tcp_caps 369 - -369
nl80211_send_iftype_data 579 - -579
__do_sys_pidfd_send_signal 809 - -809
One noticeable change is:
pcpu_page_first_chunk 1075 1060 -15
Where the compiler now inlines 4 more instances of POPCNT insns,
but still manages to compile to a function with smaller code size.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250312123905.149298-3-ubizjak@gmail.com
|
|
Use ASM_CALL_CONSTRAINT to prevent inline asm() that includes call
instruction from being scheduled before the frame pointer gets set
up by the containing function. This unconstrained scheduling might
cause objtool to print a "call without frame pointer save/setup"
warning. Current versions of compilers don't seem to trigger this
condition, but without this constraint there's nothing to prevent
the compiler from scheduling the insn in front of frame creation.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250312123905.149298-2-ubizjak@gmail.com
|