Age | Commit message (Collapse) | Author |
|
Instead of holding the FDB hash lock when traversing the FDB linked list
during garbage collection, use RCU and only acquire the lock for entries
that need to be removed (aged out).
Avoid races by using hlist_unhashed() to check that the entry has not
been removed from the list by another thread.
Note that vxlan_fdb_destroy() uses hlist_del_init_rcu() to remove an
entry from the list which should cause list_unhashed() to return true.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-10-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In preparation for removing the fixed size hash table, convert FDB entry
traversal to use the newly added FDB linked list.
No functional changes intended.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-9-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, FDB entries are stored in a hash table with a fixed number of
buckets. The table is used for both lookups and entry traversal.
Subsequent patches will convert the table to rhashtable which is not
suitable for entry traversal.
In preparation for this conversion, add FDB entries to a linked list.
Subsequent patches will convert the driver to use this list when
traversing entries during dump, flush, etc.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-8-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently, the VXLAN driver stores FDB entries in a hash table with a
fixed number of buckets (256). Subsequent patches are going to convert
this table to rhashtable with a linked list for entry traversal, as
rhashtable is more scalable.
In preparation for this conversion, move from a per-bucket spin lock to
a single spin lock that protects the entire FDB table.
The per-bucket spin locks were introduced by commit fe1e0713bbe8
("vxlan: Use FDB_HASH_SIZE hash_locks to reduce contention") citing
"huge contention when inserting/deleting vxlan_fdbs into the fdb_head".
It is not clear from the commit message which code path was holding the
spin lock for long periods of time, but the obvious suspect is the FDB
cleanup routine (vxlan_cleanup()) that periodically traverses the entire
table in order to delete aged-out entries.
This will be solved by subsequent patches that will convert the FDB
cleanup routine to traverse the linked list of FDB entries using RCU,
only acquiring the spin lock when deleting an aged-out entry.
The change reduces the size of the VXLAN device structure from 3600
bytes to 2576 bytes.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-7-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The default FDB entry can be associated with a net device if a physical
device (i.e., 'dev PHYS_DEV') was specified during the creation of the
VXLAN device.
The assignment of the net device pointer to 'dst->remote_dev' logically
belongs in the if block that resolves the pointer from the specified
ifindex, so move it there.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-6-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit 0241b836732f ("vxlan: fix default fdb entry netlink notify
ordering during netdev create") split the creation of the default FDB
entry from its notification to avoid sending a RTM_NEWNEIGH notification
before RTM_NEWLINK.
Previous patches restructured the code so that the default FDB entry is
created after registering the VXLAN device and the notification about
the new entry immediately follows its creation.
Therefore, simplify the code and revert back to vxlan_fdb_update() which
takes care of both creating the FDB entry and notifying user space
about it.
Hold the FDB hash lock when calling vxlan_fdb_update() like it expects.
A subsequent patch will add a lockdep assertion to make sure this is
indeed the case.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-5-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit 7c31e54aeee5 ("vxlan: do not destroy fdb if register_netdevice()
is failed") split the insertion of FDB entries into the FDB hash table
from the function where they are created.
This was done in order to work around a problem that is no longer
possible after the previous patch. Simplify the code and move the body
of vxlan_fdb_insert() back into vxlan_fdb_create().
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-4-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There is asymmetry in how the default FDB entry (all-zeroes) is created
and destroyed in the VXLAN driver. It is created as part of the driver's
newlink() routine, but destroyed as part of its ndo_uninit() routine.
This caused multiple problems in the past. First, commit 0241b836732f
("vxlan: fix default fdb entry netlink notify ordering during netdev
create") split the notification about the entry from its creation so
that it will not be notified to user space before the VXLAN device is
registered.
Then, commit 6db924687139 ("vxlan: Fix error path in
__vxlan_dev_create()") made the error path in __vxlan_dev_create()
asymmetric by destroying the FDB entry before unregistering the net
device. Otherwise, the FDB entry would have been freed twice: By
ndo_uninit() as part of unregister_netdevice() and by
vxlan_fdb_destroy() in the error path.
Finally, commit 7c31e54aeee5 ("vxlan: do not destroy fdb if
register_netdevice() is failed") split the insertion of the FDB entry
into the hash table from its creation, moving the insertion after the
registration of the net device. Otherwise, like before, the FDB entry
would have been freed twice: By ndo_uninit() as part of
register_netdevice()'s error path and by vxlan_fdb_destroy() in the
error path of __vxlan_dev_create().
The end result is that the code is unnecessarily complex. In addition,
the fixed size hash table cannot be converted to rhashtable as
vxlan_fdb_insert() cannot fail, which will no longer be true with
rhashtable.
Solve this by making the addition and deletion of the default FDB entry
completely symmetric. Namely, as part of newlink() routine, create the
entry, insert it into to the hash table and send a notification to user
space after the net device was registered. Note that at this stage the
net device is still administratively down and cannot transmit / receive
packets.
Move the deletion from ndo_uninit() to the dellink routine(): Flush the
default entry together with all the other entries, before unregistering
the net device.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-3-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The Tx path does not run from an RCU read-side critical section which
makes the current lockless accesses to FDB entries invalid. As far as I
am aware, this has not been a problem in practice, but traces will be
generated once we transition the FDB lookup to rhashtable_lookup().
Add rcu_read_{lock,unlock}() around the handling of FDB entries in the
Tx path. Remove the RCU read-side critical section from vxlan_xmit_nh()
as now the function is always called from an RCU read-side critical
section.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250415121143.345227-2-idosch@nvidia.com
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Use kvzalloc() so that large exit_dump buffer allocations don't fail
easily
- Remove cpu.weight / cpu.idle unimplemented warnings which are more
annoying than helpful.
This makes SCX_OPS_HAS_CGROUP_WEIGHT unnecessary. Mark it for
deprecation
* tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Mark SCX_OPS_HAS_CGROUP_WEIGHT for deprecation
sched_ext: Remove cpu.weight / cpu.idle unimplemented warnings
sched_ext: Use kvzalloc for large exit_dump allocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix compilation in CONFIG_LOCKDEP && !CONFIG_PROVE_RCU configurations
- Allow "cpuset_v2_mode" mount option for "cpuset" filesystem type to
make life easier for android
* tag 'cgroup-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset-v1: Add missing support for cpuset_v2_mode
cgroup: Fix compilation issue due to cgroup_mutex not being exported
|
|
Vladimir Oltean says:
====================
ENETC bug fixes for bpf_xdp_adjust_head() and bpf_xdp_adjust_tail()
It has been reported that on the ENETC driver, bpf_xdp_adjust_head()
and bpf_xdp_adjust_tail() are broken in combination with the XDP_PASS
verdict. I have constructed a series a simple XDP programs and tested
with various packet sizes and confirmed that this is the case.
Patch 3/3 fixes the core issue, which is that the sk_buff created on
XDP_PASS is created by the driver as if XDP never ran, but in fact the
geometry needs to be adjusted according to the delta applied by the
program on the original xdp_buff. It depends on commit 539c1fba1ac7
("xdp: add generic xdp_build_skb_from_buff()") which is not available in
"stable" but perhaps should be.
Patch 2/3 is a small refactor necessary for 3/3.
Patch 1/3 fixes a related issue I noticed, which is that
bpf_xdp_adjust_tail() with a positive offset works for linear XDP
buffers, but returns an error for non-linear ones, even if there is
plenty of space in the final page fragment.
====================
Link: https://patch.msgid.link/20250417120005.3288549-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Vlatko Markovikj reported that XDP programs attached to ENETC do not
work well if they use bpf_xdp_adjust_head() or bpf_xdp_adjust_tail(),
combined with the XDP_PASS verdict. A typical use case is to add or
remove a VLAN tag.
The resulting sk_buff passed to the stack is corrupted, because the
algorithm used by the driver for XDP_PASS is to unwind the current
buffer pointer in the RX ring and to re-process the current frame with
enetc_build_skb() as if XDP hadn't run. That is incorrect because XDP
may have modified the geometry of the buffer, which we then are
completely unaware of. We are looking at a modified buffer with the
original geometry.
The initial reaction, both from me and from Vlatko, was to shop around
the kernel for code to steal that would calculate a delta between the
old and the new XDP buffer geometry, and apply that to the sk_buff too.
We noticed that veth and generic xdp have such code.
The headroom adjustment is pretty uncontroversial, but what turned out
severely problematic is the tailroom.
veth has this snippet:
__skb_put(skb, off); /* positive on grow, negative on shrink */
which on first sight looks decent enough, except __skb_put() takes an
"unsigned int" for the second argument, and the arithmetic seems to only
work correctly by coincidence. Second issue, __skb_put() contains a
SKB_LINEAR_ASSERT(). It's not a great pattern to make more widespread.
The skb may still be nonlinear at that point - it only becomes linear
later when resetting skb->data_len to zero.
To avoid the above, bpf_prog_run_generic_xdp() does this instead:
skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
skb->len += off; /* positive on grow, negative on shrink */
which is more open-coded, uses lower-level functions and is in general a
bit too much to spread around in driver code.
Then there is the snippet:
if (xdp_buff_has_frags(xdp))
skb->data_len = skb_shinfo(skb)->xdp_frags_size;
else
skb->data_len = 0;
One would have expected __pskb_trim() to be the function of choice for
this task. But it's not used in veth/xdpgeneric because the extraneous
fragments were _already_ freed by bpf_xdp_adjust_tail() ->
bpf_xdp_frags_shrink_tail() -> ... -> __xdp_return() - the backing
memory for the skb frags and the xdp frags is the same, but they don't
keep individual references.
In fact, that is the biggest reason why this snippet cannot be reused
as-is, because ENETC temporarily constructs an skb with the original len
and the original number of frags. Because the extraneous frags are
already freed by bpf_xdp_adjust_tail() and returned to the page
allocator, it means the entire approach of using enetc_build_skb() is
questionable for XDP_PASS. To avoid that, one would need to elevate the
page refcount of all frags before calling bpf_prog_run_xdp() and drop it
after XDP_PASS.
There are other things that are missing in ENETC's handling of XDP_PASS,
like for example updating skb_shinfo(skb)->meta_len.
These are all handled correctly and cleanly in commit 539c1fba1ac7
("xdp: add generic xdp_build_skb_from_buff()"), added to net-next in
Dec 2024, and in addition might even be quicker that way. I have a very
strong preference towards backporting that commit for "stable", and that
is what is used to fix the handling bugs. It is way too messy to go
this deep into the guts of an sk_buff from the code of a device driver.
Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS")
Reported-by: Vlatko Markovikj <vlatko.markovikj@etas.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250417120005.3288549-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This small snippet of code ensures that we do something with the array
of RX software buffer descriptor elements after passing the skb to the
stack. In this case, we see if the other half of the page is reusable,
and if so, we "turn around" the buffers, making them directly usable by
enetc_refill_rx_ring() without going to enetc_new_page().
We will need to perform this kind of buffer flipping from a new code
path, i.e. from XDP_PASS. Currently, enetc_build_skb() does it there
buffer by buffer, but in a subsequent change we will stop using
enetc_build_skb() for XDP_PASS.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250417120005.3288549-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
At the time when bpf_xdp_adjust_tail() gained support for non-linear
buffers, ENETC was already generating this kind of geometry on RX, due
to its use of 2K half page buffers. Frames larger than 1472 bytes
(without FCS) are stored as multi-buffer, presenting a need for multi
buffer support to work properly even in standard MTU circumstances.
Allow bpf_xdp_frags_increase_tail() to know the allocation size of paged
data, so it can safely permit growing the tailroom of the buffer from
XDP programs.
Fixes: bf25146a5595 ("bpf: add frags support to the bpf_xdp_adjust_tail() API")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250417120005.3288549-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The function xdp_convert_buff_to_frame() may return NULL if it fails
to correctly convert the XDP buffer into an XDP frame due to memory
constraints, internal errors, or invalid data. Failing to check for NULL
may lead to a NULL pointer dereference if the result is used later in
processing, potentially causing crashes, data corruption, or undefined
behavior.
On XDP redirect failure, the associated page must be released explicitly
if it was previously retained via get_page(). Failing to do so may result
in a memory leak, as the pages reference count is not decremented.
Cc: stable@vger.kernel.org # v5.9+
Fixes: 6c5aa6fc4def ("xen networking: add basic XDP support for xen-netfront")
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Link: https://patch.msgid.link/20250417122118.1009824-1-sdl@nppct.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Martin KaFai Lau says:
====================
pull-request: bpf-next 2025-04-17
We've added 12 non-merge commits during the last 9 day(s) which contain
a total of 18 files changed, 1748 insertions(+), 19 deletions(-).
The main changes are:
1) bpf qdisc support, from Amery Hung.
A qdisc can be implemented in bpf struct_ops programs and
can be used the same as other existing qdiscs in the
"tc qdisc" command.
2) Add xsk tail adjustment tests, from Tushar Vyavahare.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
selftests/bpf: Test attaching bpf qdisc to mq and non root
selftests/bpf: Add a bpf fq qdisc to selftest
selftests/bpf: Add a basic fifo qdisc test
libbpf: Support creating and destroying qdisc
bpf: net_sched: Disable attaching bpf qdisc to non root
bpf: net_sched: Support updating bstats
bpf: net_sched: Add a qdisc watchdog timer
bpf: net_sched: Add basic bpf qdisc kfuncs
bpf: net_sched: Support implementation of Qdisc_ops in bpf
bpf: Prepare to reuse get_ctx_arg_idx
selftests/xsk: Add tail adjustment tests and support check
selftests/xsk: Add packet stream replacement function
====================
Link: https://patch.msgid.link/20250417184338.3152168-1-martin.lau@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michael Chan says:
====================
bnxt_en: Update for net-next
The first patch changes the FW message timeout threshold for a warning
message. The second patch adjusts the ethtool -w coredump length to
suppress a warning. The last 2 patches are small cleanup patches for
the bnxt_ulp RoCE auxbus code.
v1: https://lore.kernel.org/netdev/20250415174818.1088646-1-michael.chan@broadcom.com/
====================
Link: https://patch.msgid.link/20250417172448.1206107-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
BNXT_ROCE_ULP and BNXT_MAX_ULP are no longer used. Remove them to
clean up the code.
Reviewed-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250417172448.1206107-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The "ref_count" field in struct bnxt_ulp is unused after
commit a43c26fa2e6c ("RDMA/bnxt_re: Remove the sriov config callback").
So we can just remove it now.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250417172448.1206107-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ethtool first calls .get_dump_flags() to get the dump length. For
coredump, the driver calls the FW to get the coredump length (L1). The
min. of L1 and the user specified length is then passed to
.get_dump_data() (L2) to get the coredump. The actual coredump length
retrieved by the FW (L3) during .get_dump_data() may be smaller than L1.
This length discrepancy will trigger a WARN_ON() in
ethtool_get_dump_data().
ethtool has already vzalloc'ed a buffer with size L1. Just report
the coredump length as L2 even though the actual coredump length L3
may be smaller. The extra zero padding does not matter. This will
prevent the warning that may alarm the user.
For correctness, only do the final length update if there is no error.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Signed-off-by: Shruti Parab <shruti.parab@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250417172448.1206107-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The firmware advertises a "hwrm_cmd_max_timeout" value to the driver
for NVRAM and coredump related functions that can take tens of seconds
to complete. The driver polls for the operation to complete under
mutex and may trigger hung task watchdog warning if the wait is too long.
To warn the user about this, the driver currently prints a warning if
this advertised value exceeds 40 seconds:
Device requests max timeout of %d seconds, may trigger hung task watchdog
Initially, we chose 40 seconds, well below the kernel's default
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT (120 seconds) to avoid triggering
the hung task watchdog. But 60 seconds is the timeout on most
production FW and cannot be reduced further. Change the driver's warning
threshold to 60 seconds to avoid triggering this warning on all
production devices. We also print the warning if the value exceeds
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT which may be set to architecture
specific defaults as low as 10 seconds.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250417172448.1206107-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Russell King says:
====================
net: stmmac: socfpga: fix init ordering and cleanups
This series fixes the init ordering of the socfpga probe function.
The standard rule is to do all setup before publishing any device,
and socfpga violates that. I can see no reason for this, but these
patches have not been tested on hardware.
Address this by moving the initialisation of dwmac->stmmac_rst
along with all the other dwmac initialisers - there's no reason
for this to be late as plat_dat->stmmac_rst has already been
populated.
Next, replace the call to ops->set_phy_mode() with an init function
socfpga_dwmac_init() which will then be linked in to plat_dat->init.
Then, add this to plat_dat->init, and switch to stmmac_pltfr_pm_ops
from the private ops. The runtime suspend/resume socfpga implementations
are identical to the platform ones, but misses the noirq versions
which this will add.
Before we swap the order of socfpga_dwmac_init() and
stmmac_dvr_probe(), we need to change the way the interface is
obtained, as that uses driver data and the struct net_device which
haven't been initialised. Save a pointer to plat_dat in the socfpga
private data, and use that to get the interface mode. We can then swap
the order of the init and probe functions.
Finally, convert to devm_stmmac_pltfr_probe() by moving the call
to ops->set_phy_mode() into an init function appropriately populating
plat_dat->init.
====================
Link: https://patch.msgid.link/aAE2tKlImhwKySq_@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert socfpga to use devm_stmmac_pltfr_probe() to further simplify
the probe function, wrapping the call to the set_phy_mode() method
into socfpga_dwmac_init() which can be called from the plat_dat->init()
method. Also call this from socfpga_dwmac_resume() thereby simplifying
that function.
Using the devm variant also means we can remove the call to
stmmac_pltfr_remove().
Unfortunately, we can't also convert to stmmac_pltfr_pm_ops as there is
extra work done in socfpga_dwmac_resume().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1u5Sns-001IJw-OY@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Initialisation/setup after registration is a bug. This is the second
of two patches fixing this in socfpga.
The set_phy_mode() functions do various hardware setup that would
interfere with a netdev that has been published, and thus available to
be opened by the kernel/userspace.
However, set_phy_mode() relies upon the netdev having been initialised
to get at the plat_stmmacenet_data structure, which is probably why it
was placed after stmmac_drv_probe(). We can remove that need by storing
a pointer to struct plat_stmmacenet_data in struct socfpga_dwmac.
Move the call to set_phy_mode() before calling stmmac_dvr_probe().
This also simplifies the probe function as there is no need to
unregister the netdev if set_phy_mode() fails.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1u5Snn-001IJq-L0@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert socfpga to use the generic stmmac_pltfr_pm_ops, which can be
achieved by adding an appropriate plat_dat->init function to do the
setup.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1u5Sni-001IJk-Gi@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both the resume and probe path needs to configure the phy mode, so
provide a common function to do this which can later be hooked into
plat_dat->init.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1u5Snd-001IJe-Cx@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Initialisation/setup after registration is a bug. This is the first of
two patches fixing this in socfpga.
dwmac->stmmac_rst is initialised from the stmmac plat_dat's stmmac_rst
member, which is itself initialised by devm_stmmac_probe_config_dt().
Therefore, this can be initialised before we call stmmac_dvr_probe().
Move it there.
dwmac->stmmac_rst is used by the set_phy_mode() method.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/E1u5SnY-001IJY-90@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Breno Leitao says:
====================
net: Adopting nlmsg_payload() (final series)
This patchset marks the final step in converting users to the new
nlmsg_payload() function. It addresses the last two files that were not
converted in previous series, specifically updating the following
functions:
neigh_valid_dump_req
rtnl_valid_dump_ifinfo_req
rtnl_valid_getlink_req
valid_fdb_get_strict
valid_bridge_getlink_req
rtnl_valid_stats_req
rtnl_mdb_valid_dump_req
I would like to extend a big thank you to Kuniyuki Iwashima for his
invaluable help and review of this effort.
====================
Link: https://patch.msgid.link/20250417-nlmsg_v3-v1-0-9b09d9d7e61d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250417-nlmsg_v3-v1-2-9b09d9d7e61d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Leverage the new nlmsg_payload() helper to avoid checking for message
size and then reading the nlmsg data.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250417-nlmsg_v3-v1-1-9b09d9d7e61d@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
GCC 14.2.0 reports that passing a non-string literal as the
format argument of dev_set_name() is potentially insecure.
drivers/s390/net/ism_drv.c: In function 'ism_probe':
drivers/s390/net/ism_drv.c:615:2: warning: format not a string literal and no format arguments [-Wformat-security]
615 | dev_set_name(&ism->dev, dev_name(&pdev->dev));
| ^~~~~~~~~~~~
It seems to me that as pdev is a PCIE device then the dev_name
call above should always return the device's BDF, e.g. 00:12.0.
That this should not contain format escape sequences. And thus
the current usage is safe.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue.
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417-ism-str-fmt-v1-1-9818b029874d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Simon Horman says:
====================
MAINTAINERS: Update entries for s390 network driver files
Update the entries for s390 network driver files to:
* Add include/linux/ism.h to MAINTAINERS
* Add s390 network driver files to the NETWORKING DRIVERS section
This is to aid developers, and tooling such as get_maintainer.pl alike
to CC patches to all the appropriate people and mailing lists. And is
in keeping with an ongoing effort for NETWORKING entries in MAINTAINERS
to more accurately reflect the way code is maintained.
====================
Link: https://patch.msgid.link/20250417-ism-maint-v1-0-b001be8545ce@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These files are already correctly covered by the S390 NETWORKING DRIVERS
section. In practice commits for these drivers feed into the Networking
subsystem. So it seems appropriate to also list them under NETWORKING
DRIVERS.
This aids developers, and tooling such as get_maintainer.pl
alike to CC patches to all the appropriate people and mailing lists.
And is in keeping with an ongoing effort for NETWORKING entries
in MAINTAINERS to more accurately reflect the way code is maintained.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417-ism-maint-v1-2-b001be8545ce@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ism.h appears to be part of s390 networking drivers
so add it to the corresponding section in MAINTAINERS.
This aids developers, and tooling such as get_maintainer.pl
alike to CC patches to the appropriate people and mailing lists.
And is in keeping with an ongoing effort for NETWORKING entries
in MAINTAINERS to more accurately reflect the way code is maintained.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417-ism-maint-v1-1-b001be8545ce@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The combined condition was left as is when we converted
from __dev_get_by_index() to netdev_get_by_index_lock().
There was no need to undo anything with the former, for
the latter we need an unlock.
Fixes: 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250418015317.1954107-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
'net-mlx5-fix-null-dereference-and-memory-leak-in-ttc_table-creation'
Henry Martin says:
====================
net/mlx5: Fix NULL dereference and memory leak in ttc_table creation
This patch series addresses two issues in the
mlx5_create_inner_ttc_table() and mlx5_create_ttc_table() functions:
1. A potential NULL pointer dereference if mlx5_get_flow_namespace()
returns NULL.
2. A memory leak in the error path when ttc_type is invalid (default:
switch case).
====================
Link: https://patch.msgid.link/20250418023814.71789-1-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Relocate the memory allocation for ttc table after the switch statement
that validates params->ns_type in both mlx5_create_inner_ttc_table() and
mlx5_create_ttc_table(). This ensures memory is only allocated after
confirming valid input, eliminating potential memory leaks when invalid
ns_type cases occur.
Fixes: 137f3d50ad2a ("net/mlx5: Support matching on l4_type for ttc_table")
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250418023814.71789-3-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add NULL check for mlx5_get_flow_namespace() returns in
mlx5_create_inner_ttc_table() and mlx5_create_ttc_table() to prevent
NULL pointer dereference.
Fixes: 137f3d50ad2a ("net/mlx5: Support matching on l4_type for ttc_table")
Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250418023814.71789-2-bsdhenrymartin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a spelling mistake in a mlx5_core_dbg and two spelling mistakes
in comment blocks. Fix them.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250418135703.542722-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a spelling mistake in a dev_error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://patch.msgid.link/20250418112447.533746-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Various new families and my recent work on rtnetlink missed
adding dependencies on C headers. If the system headers are
up to date or don't include a given header at all this doesn't
make a difference. But if the system headers are in place but
stale - compilation will break.
Reported-by: Kory Maincent <kory.maincent@bootlin.com>
Fixes: 29d34a4d785b ("tools: ynl: generate code for rt-addr and add a sample")
Link: https://lore.kernel.org/20250418190431.69c10431@kmaincent-XPS-13-7390
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250418234942.2344036-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I had left the warning around but as a non-fatal error to get my gcc-15
builds going, but fixed up some of the most annoying warning cases so
that it wouldn't be *too* verbose.
Because I like the _concept_ of the warning, even if I detested the
implementation to shut it up.
It turns out the implementation to shut it up is even more broken than I
thought, and my "shut up most of the warnings" patch just caused fatal
errors on gcc-14 instead.
I had tested with clang, but when I upgrade my development environment,
I try to do it on all machines because I hate having different systems
to maintain, and hadn't realized that gcc-14 now had issues.
The ACPI case is literally why I wanted to have a *type* that doesn't
trigger the warning (see commit d5d45a7f2619: "gcc-15: make
'unterminated string initialization' just a warning"), instead of
marking individual places as "__nonstring".
But gcc-14 doesn't like that __nonstring location that shut gcc-15 up,
because it's on an array of char arrays, not on one single array:
drivers/acpi/tables.c:399:1: error: 'nonstring' attribute ignored on objects of type 'const char[][4]' [-Werror=attributes]
399 | static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst __nonstring = {
| ^~~~~~
and my attempts to nest it properly with a type had failed, because of
how gcc doesn't like marking the types as having attributes, only
symbols.
There may be some trick to it, but I was already annoyed by the bad
attribute design, now I'm just entirely fed up with it.
I wish gcc had a proper way to say "this type is a *byte* array, not a
string".
The obvious thing would be to distinguish between "char []" and an
explicitly signed "unsigned char []" (as opposed to an implicitly
unsigned char, which is typically an architecture-specific default, but
for the kernel is universal thanks to '-funsigned-char').
But any "we can typedef a 8-bit type to not become a string just because
it's an array" model would be fine.
But "__attribute__((nonstring))" is sadly not that sane model.
Reported-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 4b4bd8c50f48 ("gcc-15: acpi: sprinkle random '__nonstring' crumbles around")
Fixes: d5d45a7f2619 ("gcc-15: make 'unterminated string initialization' just a warning")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
The C sequence points are complicated things, and gcc-15 has apparently
added a warning for the case where an object is both used and modified
multiple times within the same sequence point.
That's a great warning.
Or rather, it would be a great warning, except gcc-15 seems to not
really be very exact about it, and doesn't notice that the modification
are to two entirely different members of the same object: the array
counter and the array entries.
So that seems kind of silly.
That said, the code that gcc complains about is unnecessarily
complicated, so moving the array counter update into a separate
statement seems like the most straightforward fix for these warnings:
drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’:
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point]
1102 | netdetect_info->matches[netdetect_info->n_matches++] = match;
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point]
1120 | match->channels[match->n_channels++] =
| ~~~~~~~~~~~~~~~~~^~
side note: the code at that second warning is actively buggy, and only
works on little-endian machines that don't do strict alignment checks.
The code casts an array of integers into an array of unsigned long in
order to use our bitmap iterators. That happens to work fine on any
sane architecture, but it's still wrong.
This does *not* fix that more serious problem. This only splits the two
assignments into two statements and fixes the compiler warning. I need
to get rid of the new warnings in order to be able to actually do any
build testing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
All of these cases are perfectly valid and good traditional C, but hit
by the "you're not NUL-terminating your byte array" warning.
And none of the cases want any terminating NUL character.
Mark them __nonstring to shut up gcc-15 (and in the case of the ak8974
magnetometer driver, I just removed the explicit array size and let gcc
expand the 3-byte and 6-byte arrays by one extra byte, because it was
the simpler change).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This removes two cases of explicit NUL padding that now causes warnings
because of '-Wunterminated-string-initialization' being part of -Wextra
in gcc-15.
Gcc is being silly in this case when it says that it truncates a NUL
terminator, because in these cases there were _multiple_ NUL characters.
But we can get rid of the warning by just simplifying the two
initializers that trigger the warning for me, so this does exactly that.
I'm not sure why the power supply code did that odd
.attr_name = #_name "\0",
pattern: it was introduced in commit 2cabeaf15129 ("power: supply: core:
Cleanup power supply sysfs attribute list"), but that 'attr_name[]'
field is an explicitly sized character array in a statically initialized
variable, and a string initializer always has a terminating NUL _and_
statically initialized character arrays are zero-padded anyway, so it
really seems to be rather extraneous belt-and-suspenders.
The zero_uuid[16] initialization in drivers/md/bcache/super.c makes
perfect sense, but it isn't necessary for the same reasons, and not
worth the new gcc warning noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is not great: I'd much rather introduce a typedef that is a "ACPI
name byte buffer", and use that to mark these special 4-byte ACPI names
that do not use NUL termination.
But as noted in the previous commit ("gcc-15: make 'unterminated string
initialization' just a warning") gcc doesn't actually seem to support
that notion, so instead you have to just mark every single array
declaration individually.
So this is not pretty, but this gets rid of the bulk of the annoying
warnings during an allmodconfig build for me.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
gcc-15 enabling -Wunterminated-string-initialization in -Wextra by
default was done with the best intentions, but the warning is still
quite broken.
What annoys me about the warning is that this is a very traditional AND
CORRECT way to initialize fixed byte arrays in C:
unsigned char hex[16] = "0123456789abcdef";
and we use this all over the kernel. And the warning is fine, but gcc
developers apparently never made a reasonable way to disable it. As is
(sadly) tradition with these things.
Yes, there's "__attribute__((nonstring))", and we have a macro to make
that absolutely disgusting syntax more palatable (ie the kernel syntax
for that monstrosity is just "__nonstring").
But that attribute is misdesigned. What you'd typically want to do is
tell the compiler that you are using a type that isn't a string but a
byte array, but that doesn't work at all:
warning: ‘nonstring’ attribute does not apply to types [-Wattributes]
and because of this fundamental mis-design, you then have to mark each
instance of that pattern.
This is particularly noticeable in our ACPI code, because ACPI has this
notion of a 4-byte "type name" that gets used all over, and is exactly
this kind of byte array.
This is a sad oversight, because the warning is useful, but really would
be so much better if gcc had also given a sane way to indicate that we
really just want a byte array type at a type level, not the broken "each
and every array definition" level.
So now instead of creating a nice "ACPI name" type using something like
typedef char acpi_name_t[4] __nonstring;
we have to do things like
char name[ACPI_NAMESEG_SIZE] __nonstring;
in every place that uses this concept and then happens to have the
typical initializers.
This is annoying me mainly because I think the warning _is_ a good
warning, which is why I'm not just turning it off in disgust. But it is
hampered by this bad implementation detail.
[ And obviously I'm doing this now because system upgrades for me are
something that happen in the middle of the release cycle: don't do it
before or during travel, or just before or during the busy merge
window period. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"16 hotfixes. 2 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
All patches are basically for MM although five are alterations to
MAINTAINERS"
[ Basic counting skills are clearly not a strictly necessary requirement
for kernel maintainers. - Linus ]
* tag 'mm-hotfixes-stable-2025-04-19-21-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: add section for locking of mm's and VMAs
mm: vmscan: fix kswapd exit condition in defrag_mode
mm: vmscan: restore high-cpu watermark safety in kswapd
MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING section
mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount stabilization
mm, hugetlb: increment the number of pages to be reset on HVO
writeback: fix false warning in inode_to_wb()
docs: ABI: replace mcroce@microsoft.com with new Meta address
mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
MAINTAINERS: add memory advice section
MAINTAINERS: add mmap trace events to MEMORY MAPPING
mm: memcontrol: fix swap counter leak from offline cgroup
MAINTAINERS: add MM subsection for the page allocator
MAINTAINERS: update SLAB ALLOCATOR maintainers
fs/dax: fix folio splitting issue by resetting old folio order + _nr_pages
mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
|