summaryrefslogtreecommitdiff
path: root/net/core/rtnetlink.c
AgeCommit message (Collapse)Author
2025-03-19net: reorder dev_addr_sem lockStanislav Fomichev
Lockdep complains about circular lock in 1 -> 2 -> 3 (see below). Change the lock ordering to be: - rtnl_lock - dev_addr_sem - netdev_ops (only for lower devices!) - team_lock (or other per-upper device lock) 1. rtnl_lock -> netdev_ops -> dev_addr_sem rtnl_setlink rtnl_lock do_setlink IFLA_ADDRESS on lower netdev_ops dev_addr_sem 2. rtnl_lock -> team_lock -> netdev_ops rtnl_newlink rtnl_lock do_setlink IFLA_MASTER on lower do_set_master team_add_slave team_lock team_port_add dev_set_mtu netdev_ops 3. rtnl_lock -> dev_addr_sem -> team_lock rtnl_newlink rtnl_lock do_setlink IFLA_ADDRESS on upper dev_addr_sem netif_set_mac_address team_set_mac_address team_lock 4. rtnl_lock -> netdev_ops -> dev_addr_sem rtnl_lock dev_ifsioc dev_set_mac_address_user __tun_chr_ioctl rtnl_lock dev_set_mac_address_user tap_ioctl rtnl_lock dev_set_mac_address_user dev_set_mac_address_user netdev_lock_ops netif_set_mac_address_user dev_addr_sem v2: - move lock reorder to happen after kmalloc (Kuniyuki) Cc: Kohei Enju <enjuk@amazon.com> Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250312190513.1252045-3-sdf@fomichev.me Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-19Revert "net: replace dev_addr_sem with netdev instance lock"Stanislav Fomichev
This reverts commit df43d8bf10316a7c3b1e47e3cc0057a54df4a5b8. Cc: Kohei Enju <enjuk@amazon.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Fixes: df43d8bf1031 ("net: replace dev_addr_sem with netdev instance lock") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250312190513.1252045-2-sdf@fomichev.me Tested-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-08net: move misc netdev_lock flavors to a separate headerJakub Kicinski
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: replace dev_addr_sem with netdev instance lockStanislav Fomichev
Lockdep reports possible circular dependency in [0]. Instead of fixing the ordering, replace global dev_addr_sem with netdev instance lock. Most of the paths that set/get mac are RTNL protected. Two places where it's not, convert to explicit locking: - sysfs address_show - dev_get_mac_address via dev_ioctl 0: https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/993321/24-router-bridge-1d-lag-sh/stderr Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-12-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-06net: hold netdev instance lock during rtnetlink operationsStanislav Fomichev
To preserve the atomicity, hold the lock while applying multiple attributes. The major issue with a full conversion to the instance lock are software nesting devices (bonding/team/vrf/etc). Those devices call into the core stack for their lower (potentially real hw) devices. To avoid explicitly wrapping all those places into instance lock/unlock, introduce new API boundaries: - (some) existing dev_xxx calls are now considered "external" (to drivers) APIs and they transparently grab the instance lock if needed (dev_api.c) - new netif_xxx calls are internal core stack API (naming is sketchy, I've tried netdev_xxx_locked per Jakub's suggestion, but it feels a bit verbose; but happy to get back to this naming scheme if this is the preference) This avoids touching most of the existing ioctl/sysfs/drivers paths. Note the special handling of ndo_xxx_slave operations: I exploit the fact that none of the drivers that call these functions need/use instance lock. At the same time, they use dev_xxx APIs, so the lower device has to be unlocked. Changes in unregister_netdevice_many_notify (to protect dev->state with instance lock) trigger lockdep - the loop over close_list (mostly from cleanup_net) introduces spurious ordering issues. netdev_lock_cmp_fn has a justification on why it's ok to suppress for now. Cc: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250305163732.2766420-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04net: plumb extack in __dev_change_net_namespace()Nicolas Dichtel
It could be hard to understand why the netlink command fails. For example, if dev->netns_immutable is set, the error is "Invalid argument". Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-04net: advertise netns_immutable property via netlinkNicolas Dichtel
Since commit 05c1280a2bcf ("netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local"), there is no way to see if the netns_immutable property s set on a device. Let's add a netlink attribute to advertise it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-21rtnetlink: Create link directly in target net namespaceXiao Liang
Make rtnl_newlink_create() create device in target namespace directly. Avoid extra netns change when link netns is provided. Device drivers has been converted to be aware of link netns, that is not assuming device netns is and link netns is the same when ops->newlink() is called. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-12-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Remove "net" from newlink paramsXiao Liang
Now that devices have been converted to use the specific netns instead of ambiguous "net", let's remove it from newlink parameters. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-11-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Pack newlink() params into structXiao Liang
There are 4 net namespaces involved when creating links: - source netns - where the netlink socket resides, - target netns - where to put the device being created, - link netns - netns associated with the device (backend), - peer netns - netns of peer device. Currently, two nets are passed to newlink() callback - "src_net" parameter and "dev_net" (implicitly in net_device). They are set as follows, depending on netlink attributes in the request. +------------+-------------------+---------+---------+ | peer netns | IFLA_LINK_NETNSID | src_net | dev_net | +------------+-------------------+---------+---------+ | | absent | source | target | | absent +-------------------+---------+---------+ | | present | link | link | +------------+-------------------+---------+---------+ | | absent | peer | target | | present +-------------------+---------+---------+ | | present | peer | link | +------------+-------------------+---------+---------+ When IFLA_LINK_NETNSID is present, the device is created in link netns first and then moved to target netns. This has some side effects, including extra ifindex allocation, ifname validation and link events. These could be avoided if we create it in target netns from the beginning. On the other hand, the meaning of src_net parameter is ambiguous. It varies depending on how parameters are passed. It is the effective link (or peer netns) by design, but some drivers ignore it and use dev_net instead. To provide more netns context for drivers, this patch packs existing newlink() parameters, along with the source netns, link netns and peer netns, into a struct. The old "src_net" is renamed to "net" to avoid confusion with real source netns, and will be deprecated later. The use of src_net are converted to params->net trivially. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-3-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-21rtnetlink: Lookup device in target netns when creating linkXiao Liang
When creating link, lookup for existing device in target net namespace instead of current one. For example, two links created by: # ip link add dummy1 type dummy # ip link add netns ns1 dummy1 type dummy should have no conflict since they are in different namespaces. Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250219125039.18024-2-shaw.leon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06rtnetlink: fix netns leak with rtnl_setlink()Nicolas Dichtel
A call to rtnl_nets_destroy() is needed to release references taken on netns put in rtnl_nets. CC: stable@vger.kernel.org Fixes: 636af13f213b ("rtnetlink: Register rtnl_dellink() and rtnl_setlink() with RTNL_FLAG_DOIT_PERNET_WIP.") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250205221037.2474426-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-05net-sysfs: remove rtnl_trylock from device attributesAntoine Tenart
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4]. There are not many possibilities to improve the above: - Rework the entire net/ locking logic. - Invert two locks in one of the paths — not possible. But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments. Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed. [1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/ [2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/ Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-07rtnetlink: Add rtnl_net_lock_killable().Kuniyuki Iwashima
rtnl_lock_killable() is used only in register_netdev() and will be converted to per-netns RTNL. Let's unexport it and add the corresponding helper. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.13-rc4). No conflicts. Adjacent changes: drivers/net/ethernet/renesas/rswitch.h 32fd46f5b69e ("net: renesas: rswitch: remove speed from gwca structure") 922b4b955a03 ("net: renesas: rswitch: rework ts tags management") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-17rtnetlink: Try the outer netns attribute in rtnl_get_peer_net().Kuniyuki Iwashima
Xiao Liang reported that the cited commit changed netns handling in newlink() of netkit, veth, and vxcan. Before the patch, if we don't find a netns attribute in the peer device attributes, we tried to find another netns attribute in the outer netlink attributes by passing it to rtnl_link_get_net(). Let's restore the original behaviour. Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Reported-by: Xiao Liang <shaw.leon@gmail.com> Closes: https://lore.kernel.org/netdev/CABAhCORBVVU8P6AHcEkENMj+gD2d3ce9t=A_o48E0yOQp8_wUQ@mail.gmail.com/#t Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Xiao Liang <shaw.leon@gmail.com> Link: https://patch.msgid.link/20241216110432.51488-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.13-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10rtnetlink: switch rtnl_fdb_dump() to for_each_netdev_dump()Eric Dumazet
This is the last netdev iterator still using net->dev_index_head[]. Convert to modern for_each_netdev_dump() for better scalability, and use common patterns in our stack. Following patch in this series removes the pad field in struct ndo_fdb_dump_context. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241209100747.2269613-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10rtnetlink: add ndo_fdb_dump_contextEric Dumazet
rtnl_fdb_dump() and various ndo_fdb_dump() helpers share a hidden layout of cb->ctx. Before switching rtnl_fdb_dump() to for_each_netdev_dump() in the following patch, make this more explicit. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241209100747.2269613-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-07rtnetlink: fix error code in rtnl_newlink()Dan Carpenter
If rtnl_get_peer_net() fails, then propagate the error code. Don't return success. Fixes: 48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/a2d20cd4-387a-4475-887c-bb7d0e88e25a@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-03rtnetlink: fix double call of rtnl_link_get_net_ifla()Cong Wang
Currently rtnl_link_get_net_ifla() gets called twice when we create peer devices, once in rtnl_add_peer_net() and once in each ->newlink() implementation. This looks safer, however, it leads to a classic Time-of-Check to Time-of-Use (TOCTOU) bug since IFLA_NET_NS_PID is very dynamic. And because of the lack of checking error pointer of the second call, it also leads to a kernel crash as reported by syzbot. Fix this by getting rid of the second call, which already becomes redudant after Kuniyuki's work. We have to propagate the result of the first rtnl_link_get_net_ifla() down to each ->newlink(). Reported-by: syzbot+21ba4d5adff0b6a7cfc6@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=21ba4d5adff0b6a7cfc6 Fixes: 0eb87b02a705 ("veth: Set VETH_INFO_PEER to veth_link_ops.peer_type.") Fixes: 6b84e558e95d ("vxcan: Set VXCAN_INFO_PEER to vxcan_link_ops.peer_type.") Fixes: fefd5d082172 ("netkit: Set IFLA_NETKIT_PEER_INFO to netkit_link_ops.peer_type.") Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241129212519.825567-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-24rtnetlink: fix rtnl_dump_ifinfo() error pathEric Dumazet
syzbot found that rtnl_dump_ifinfo() could return with a lock held [1] Move code around so that rtnl_link_ops_put() and put_net() can be called at the end of this function. [1] WARNING: lock held when returning to user space! 6.12.0-rc7-syzkaller-01681-g38f83a57aa8e #0 Not tainted syz-executor399/5841 is leaving the kernel with locks still held! 1 lock held by syz-executor399/5841: #0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline] #0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rcu_read_lock include/linux/rcupdate.h:849 [inline] #0: ffffffff8f46c2a0 (&ops->srcu#2){.+.+}-{0:0}, at: rtnl_link_ops_get+0x22/0x250 net/core/rtnetlink.c:555 Fixes: 43c7ce69d28e ("rtnetlink: Protect struct rtnl_link_ops with SRCU.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241121194105.3632507-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15ndo_fdb_del: Add a parameter to report whether notification was sentPetr Machata
In a similar fashion to ndo_fdb_add, which was covered in the previous patch, add the bool *notified argument to ndo_fdb_del. Callees that send a notification on their own set the flag to true. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/06b1acf4953ef0a5ed153ef1f32d7292044f2be6.1731589511.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15ndo_fdb_add: Add a parameter to report whether notification was sentPetr Machata
Currently when FDB entries are added to or deleted from a VXLAN netdevice, the VXLAN driver emits one notification, including the VXLAN-specific attributes. The core however always sends a notification as well, a generic one. Thus two notifications are unnecessarily sent for these operations. A similar situation comes up with bridge driver, which also emits notifications on its own: # ip link add name vx type vxlan id 1000 dstport 4789 # bridge monitor fdb & [1] 1981693 # bridge fdb add de:ad:be:ef:13:37 dev vx self dst 192.0.2.1 de:ad:be:ef:13:37 dev vx dst 192.0.2.1 self permanent de:ad:be:ef:13:37 dev vx self permanent In order to prevent this duplicity, add a paremeter to ndo_fdb_add, bool *notified. The flag is primed to false, and if the callee sends a notification on its own, it sets it to true, thus informing the core that it should not generate another notification. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/cbf6ae8195e85cbf922f8058ce4eba770f3b71ed.1731589511.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Register rtnl_dellink() and rtnl_setlink() with ↵Kuniyuki Iwashima
RTNL_FLAG_DOIT_PERNET_WIP. Currently, rtnl_setlink() and rtnl_dellink() cannot be fully converted to per-netns RTNL due to a lack of handling peer/lower/upper devices in different netns. For example, when we change a device in rtnl_setlink() and need to propagate that to its upper devices, we want to avoid acquiring all netns locks, for which we do not know the upper limit. The same situation happens when we remove a device. rtnl_dellink() could be transformed to remove a single device in the requested netns and delegate other devices to per-netns work, and rtnl_setlink() might be ? Until we come up with a better idea, let's use a new flag RTNL_FLAG_DOIT_PERNET_WIP for rtnl_dellink() and rtnl_setlink(). This will unblock converting RTNL users where such devices are not related. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241108004823.29419-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Convert RTM_NEWLINK to per-netns RTNL.Kuniyuki Iwashima
Now, we are ready to convert rtnl_newlink() to per-netns RTNL; rtnl_link_ops is protected by SRCU and netns is prefetched in rtnl_newlink(). Let's register rtnl_newlink() with RTNL_FLAG_DOIT_PERNET and push RTNL down as rtnl_nets_lock(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241108004823.29419-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Add peer_type in struct rtnl_link_ops.Kuniyuki Iwashima
In ops->newlink(), veth, vxcan, and netkit call rtnl_link_get_net() with a net pointer, which is the first argument of ->newlink(). rtnl_link_get_net() could return another netns based on IFLA_NET_NS_PID and IFLA_NET_NS_FD in the peer device's attributes. We want to get it and fill rtnl_nets->nets[] in advance in rtnl_newlink() for per-netns RTNL. All of the three get the peer netns in the same way: 1. Call rtnl_nla_parse_ifinfomsg() 2. Call ops->validate() (vxcan doesn't have) 3. Call rtnl_link_get_net_tb() Let's add a new field peer_type to struct rtnl_link_ops and prefetch netns in the peer ifla to add it to rtnl_nets in rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241108004823.29419-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Introduce struct rtnl_nets and helpers.Kuniyuki Iwashima
rtnl_newlink() needs to hold 3 per-netns RTNL: 2 for a new device and 1 for its peer. We will add rtnl_nets_lock() later, which performs the nested locking based on struct rtnl_nets, which has an array of struct net pointers. rtnl_nets_add() adds a net pointer to the array and sorts it so that rtnl_nets_lock() can simply acquire per-netns RTNL from array[0] to [2]. Before calling rtnl_nets_add(), get_net() must be called for the net, and rtnl_nets_destroy() will call put_net() for each. Let's apply the helpers to rtnl_newlink(). When CONFIG_DEBUG_NET_SMALL_RTNL is disabled, we do not call rtnl_net_lock() thus do not care about the array order, so rtnl_net_cmp_locks() returns -1 so that the loop in rtnl_nets_add() can be optimised to NOP. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241108004823.29419-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Remove __rtnl_link_register()Kuniyuki Iwashima
link_ops is protected by link_ops_mutex and no longer needs RTNL, so we have no reason to have __rtnl_link_register() separately. Let's remove it and call rtnl_link_register() from ifb.ko and dummy.ko. Note that both modules' init() work on init_net only, so we need not export pernet_ops_rwsem and can use rtnl_net_lock() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Protect link_ops by mutex.Kuniyuki Iwashima
rtnl_link_unregister() holds RTNL and calls synchronize_srcu(), but rtnl_newlink() will acquire SRCU frist and then RTNL. Then, we need to unlink ops and call synchronize_srcu() outside of RTNL to avoid the deadlock. rtnl_link_unregister() rtnl_newlink() ---- ---- lock(rtnl_mutex); lock(&ops->srcu); lock(rtnl_mutex); sync(&ops->srcu); Let's move as such and add a mutex to protect link_ops. Now, link_ops is protected by its dedicated mutex and rtnl_link_register() no longer needs to hold RTNL. While at it, we move the initialisation of ops->dellink and ops->srcu out of the mutex scope. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Remove __rtnl_link_unregister().Kuniyuki Iwashima
rtnl_link_unregister() holds RTNL and calls __rtnl_link_unregister(), where we call synchronize_srcu() to wait inflight RTM_NEWLINK requests for per-netns RTNL. We put synchronize_srcu() in __rtnl_link_unregister() due to ifb.ko and dummy.ko. However, rtnl_newlink() will acquire SRCU before RTNL later in this series. Then, lockdep will detect the deadlock: rtnl_link_unregister() rtnl_newlink() ---- ---- lock(rtnl_mutex); lock(&ops->srcu); lock(rtnl_mutex); sync(&ops->srcu); To avoid the problem, we must call synchronize_srcu() before RTNL in rtnl_link_unregister(). As a preparation, let's remove __rtnl_link_unregister(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: convert to nla_get_*_default()Johannes Berg
Most of the original conversion is from the spatch below, but I edited some and left out other instances that were either buggy after conversion (where default values don't fit into the type) or just looked strange. @@ expression attr, def; expression val; identifier fn =~ "^nla_get_.*"; fresh identifier dfn = fn ## "_default"; @@ ( -if (attr) - val = fn(attr); -else - val = def; +val = dfn(attr, def); | -if (!attr) - val = def; -else - val = fn(attr); +val = dfn(attr, def); | -if (!attr) - return def; -return fn(attr); +return dfn(attr, def); | -attr ? fn(attr) : def +dfn(attr, def) | -!attr ? def : fn(attr) +dfn(attr, def) ) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc6). Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd") 188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links") https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/ net/mac80211/cfg.c c4382d5ca1af ("wifi: mac80211: update the right link for tx power") 8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx") drivers/net/ethernet/intel/ice/ice_ptp_hw.h 6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM") e4291b64e118 ("ice: Align E810T GPIO to other products") ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions") ac532f4f4251 ("ice: Cleanup unused declarations") https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30rtnetlink: Fix an error handling path in rtnl_newlink()Christophe JAILLET
When some code has been moved in the commit in Fixes, some "return err;" have correctly been changed in goto <some_where_in_the_error_handling_path> but this one was missed. Should "ops->maxtype > RTNL_MAX_TYPE" happen, then some resources would leak. Go through the error handling path to fix these leaks. Fixes: 0d3008d1a9ae ("rtnetlink: Move ops->validate to rtnl_newlink().") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/eca90eeb4d9e9a0545772b68aeaab883d9fe2279.1729952228.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29net: fix crash when config small gso_max_size/gso_ipv4_max_sizeWang Liang
Config a small gso_max_size/gso_ipv4_max_size will lead to an underflow in sk_dst_gso_max_size(), which may trigger a BUG_ON crash, because sk->sk_gso_max_size would be much bigger than device limits. Call Trace: tcp_write_xmit tso_segs = tcp_init_tso_segs(skb, mss_now); tcp_set_skb_tso_segs tcp_skb_pcount_set // skb->len = 524288, mss_now = 8 // u16 tso_segs = 524288/8 = 65535 -> 0 tso_segs = DIV_ROUND_UP(skb->len, mss_now) BUG_ON(!tso_segs) Add check for the minimum value of gso_max_size and gso_ipv4_max_size. Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation") Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241023035213.517386-1-wangliang74@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29rtnetlink: Fix kdoc of rtnl_af_register().Kuniyuki Iwashima
Commit 26eebdc4b005 ("rtnetlink: Return int from rtnl_af_register().") made rtnl_af_register() return int again, and kdoc needs to be fixed up. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241022210320.86111-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29rtnetlink: Define rtnl_net_trylock().Kuniyuki Iwashima
We will need the per-netns version of rtnl_trylock(). rtnl_net_trylock() calls __rtnl_net_lock() only when rtnl_trylock() successfully holds RTNL. When RTNL is removed, we will use mutex_trylock() for per-netns RTNL. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Protect struct rtnl_af_ops with SRCU.Kuniyuki Iwashima
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to guarantee that rtnl_af_ops is alive during inflight RTM_SETLINK even when its module is being unloaded. Let's use SRCU to protect ops. rtnl_af_lookup() now iterates rtnl_af_ops under RCU and returns SRCU-protected ops pointer. The caller must call rtnl_af_put() to release the pointer after the use. Also, rtnl_af_unregister() unlinks the ops first and calls synchronize_srcu() to wait for inflight RTM_SETLINK requests to complete. Note that rtnl_af_ops needs to be protected by its dedicated lock when RTNL is removed. Note also that BUG_ON() in do_setlink() is changed to the normal error handling as a different af_ops might be found after validate_linkmsg(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Return int from rtnl_af_register().Kuniyuki Iwashima
The next patch will add init_srcu_struct() in rtnl_af_register(), then we need to handle its error. Let's add the error handling in advance to make the following patch cleaner. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Call rtnl_link_get_net_capable() in do_setlink().Kuniyuki Iwashima
We will push RTNL down to rtnl_setlink(). RTM_SETLINK could call rtnl_link_get_net_capable() in do_setlink() to move a dev to a new netns, but the netns needs to be fetched before holding rtnl_net_lock(). Let's move it to rtnl_setlink() and pass the netns to do_setlink(). Now, RTM_NEWLINK paths (rtnl_changelink() and rtnl_group_changelink()) can pass the prefetched netns to do_setlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Clean up rtnl_setlink().Kuniyuki Iwashima
We will push RTNL down to rtnl_setlink(). Let's unify the error path to make it easy to place rtnl_net_lock(). While at it, keep the variables in reverse xmas order. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Clean up rtnl_dellink().Kuniyuki Iwashima
We will push RTNL down to rtnl_delink(). Let's unify the error path to make it easy to place rtnl_net_lock(). While at it, keep the variables in reverse xmas order. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Fetch IFLA_LINK_NETNSID in rtnl_newlink().Kuniyuki Iwashima
Another netns option for RTM_NEWLINK is IFLA_LINK_NETNSID and is fetched in rtnl_newlink_create(). This must be done before holding rtnl_net_lock(). Let's move IFLA_LINK_NETNSID processing to rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Call rtnl_link_get_net_capable() in rtnl_newlink().Kuniyuki Iwashima
As a prerequisite of per-netns RTNL, we must fetch netns before looking up dev or moving it to another netns. rtnl_link_get_net_capable() is called in rtnl_newlink_create() and do_setlink(), but both of them need to be moved to the RTNL-independent region, which will be rtnl_newlink(). Let's call rtnl_link_get_net_capable() in rtnl_newlink() and pass the netns down to where needed. Note that the latter two have not passed the nets to do_setlink() yet but will do so after the remaining rtnl_link_get_net_capable() is moved to rtnl_setlink() later. While at it, dest_net is renamed to tgt_net in rtnl_newlink_create() to align with rtnl_{del,set}link(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Protect struct rtnl_link_ops with SRCU.Kuniyuki Iwashima
Once RTNL is replaced with rtnl_net_lock(), we need a mechanism to guarantee that rtnl_link_ops is alive during inflight RTM_NEWLINK even when its module is being unloaded. Let's use SRCU to protect ops. rtnl_link_ops_get() now iterates link_ops under RCU and returns SRCU-protected ops pointer. The caller must call rtnl_link_ops_put() to release the pointer after the use. Also, __rtnl_link_unregister() unlinks the ops first and calls synchronize_srcu() to wait for inflight RTM_NEWLINK requests to complete. Note that link_ops needs to be protected by its dedicated lock when RTNL is removed. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Move ops->validate to rtnl_newlink().Kuniyuki Iwashima
ops->validate() does not require RTNL. Let's move it to rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Move rtnl_link_ops_get() and retry to rtnl_newlink().Kuniyuki Iwashima
Currently, if neither dev nor rtnl_link_ops is found in __rtnl_newlink(), we release RTNL and redo the whole process after request_module(), which complicates the logic. The ops will be RTNL-independent later. Let's move the ops lookup to rtnl_newlink() and do the retry earlier. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Move simple validation from __rtnl_newlink() to rtnl_newlink().Kuniyuki Iwashima
We will push RTNL down to rtnl_newlink(). Let's move RTNL-independent validation to rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22rtnetlink: Factorise do_setlink() path from __rtnl_newlink().Kuniyuki Iwashima
__rtnl_newlink() got too long to maintain. For example, netdev_master_upper_dev_get()->rtnl_link_ops is fetched even when IFLA_INFO_SLAVE_DATA is not specified. Let's factorise the single dev do_setlink() path to a separate function. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>