summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-19Merge tag 'loongarch-fixes-6.6-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai ChenL "Fix 4-level pagetable building, disable WUC for pgprot_writecombine() like ioremap_wc(), use correct annotation for exception handlers, and a trivial cleanup" * tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc() LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage() LoongArch: Export symbol invalid_pud_table for modules building LoongArch: Use SYM_CODE_* to annotate exception handlers
2023-10-19Merge tag 'slab-fixes-for-6.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64 due to improperly resolved kmalloc alignment restrictions (Catalin Marinas) * tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
2023-10-19Merge tag 'seccomp-v6.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp fix from Kees Cook: - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby) * tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: perf/benchmark: fix seccomp_unotify benchmark for 32-bit
2023-10-19Merge tag 'v6.6-rc7.vfs.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fix from Christian Brauner: "An openat() call from io_uring triggering an audit call can apparently cause the refcount of struct filename to be incremented from multiple threads concurrently during async execution, triggering a refcount underflow and hitting a BUG_ON(). That bug has been lurking around since at least v5.16 apparently. Switch to an atomic counter to fix that. The underflow check is downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily remove that check altogether tbh" * tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: audit,io_uring: io_uring openat triggers audit reference count underflow
2023-10-19Revert "ethtool: Fix mod state of verbose no_mask bitset"Kory Maincent
This reverts commit 108a36d07c01edbc5942d27c92494d1c6e4d45a0. It was reported that this fix breaks the possibility to remove existing WoL flags. For example: ~$ ethtool lan2 ... Supports Wake-on: pg Wake-on: d ... ~$ ethtool -s lan2 wol gp ~$ ethtool lan2 ... Wake-on: pg ... ~$ ethtool -s lan2 wol d ~$ ethtool lan2 ... Wake-on: pg ... This worked correctly before this commit because we were always updating a zero bitmap (since commit 6699170376ab ("ethtool: fix application of verbose no_mask bitset"), that is) so that the rest was left zero naturally. But now the 1->0 change (old_val is true, bit not present in netlink nest) no longer works. Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Reported-by: Michal Kubecek <mkubecek@suse.cz> Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/ Cc: stable@vger.kernel.org Fixes: 108a36d07c01 ("ethtool: Fix mod state of verbose no_mask bitset") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19Merge tag 'ntfs3_for_6.6' of ↵Linus Torvalds
https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: - memory leak - some logic errors, NULL dereferences - some code was refactored - more sanity checks * tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: Avoid possible memory leak fs/ntfs3: Fix directory element type detection fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e() fs/ntfs3: Fix OOB read in ntfs_init_from_boot fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea() fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame() fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr() fs/ntfs3: Do not allow to change label if volume is read-only fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo fs/ntfs3: Refactoring and comments fs/ntfs3: Fix alternative boot searching fs/ntfs3: Allow repeated call to ntfs3_put_sbi fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super fs/ntfs3: fix deadlock in mark_as_free_ex fs/ntfs3: Add more attributes checks in mi_enum_attr() fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN) fs/ntfs3: Write immediately updated ntfs state fs/ntfs3: Add ckeck in ni_update_parent()
2023-10-19Merge branch 'mptcp-fixes-for-v6-6'Jakub Kicinski
Mat Martineau says: ==================== mptcp: Fixes for v6.6 Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected. Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP backlog queue. Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an expected MPTCP reinjection scenario. Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly instead of always sending RST. Associated selftest is updated. ==================== Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-0-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19selftests: mptcp: join: no RST when rm subflow/addrMatthieu Baerts
Recently, we noticed that some RST were wrongly generated when removing the initial subflow. This patch makes sure RST are not sent when removing any subflows or any addresses. Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19mptcp: avoid sending RST when closing the initial subflowGeliang Tang
When closing the first subflow, the MPTCP protocol unconditionally calls tcp_disconnect(), which in turn generates a reset if the subflow is established. That is unexpected and different from what MPTCP does with MPJ subflows, where resets are generated only on FASTCLOSE and other edge scenarios. We can't reuse for the first subflow the same code in place for MPJ subflows, as MPTCP clean them up completely via a tcp_close() call, while must keep the first subflow socket alive for later re-usage, due to implementation constraints. This patch adds a new helper __mptcp_subflow_disconnect() that encapsulates, a logic similar to tcp_close, issuing a reset only when the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown otherwise. Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures") Cc: stable@vger.kernel.org Reviewed-by: Matthieu Baerts <matttbe@kernel.org> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19mptcp: more conservative check for zero probesPaolo Abeni
Christoph reported that the MPTCP protocol can find the subflow-level write queue unexpectedly not empty while crafting a zero-window probe, hitting a warning: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 188 at net/mptcp/protocol.c:1312 mptcp_sendmsg_frag+0xc06/0xe70 Modules linked in: CPU: 0 PID: 188 Comm: kworker/0:2 Not tainted 6.6.0-rc2-g1176aa719d7a #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:mptcp_sendmsg_frag+0xc06/0xe70 net/mptcp/protocol.c:1312 RAX: 47d0530de347ff6a RBX: 47d0530de347ff6b RCX: ffff8881015d3c00 RDX: ffff8881015d3c00 RSI: 47d0530de347ff6b RDI: 47d0530de347ff6b RBP: 47d0530de347ff6b R08: ffffffff8243c6a8 R09: ffffffff82042d9c R10: 0000000000000002 R11: ffffffff82056850 R12: ffff88812a13d580 R13: 0000000000000001 R14: ffff88812b375e50 R15: ffff88812bbf3200 FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000695118 CR3: 0000000115dfc001 CR4: 0000000000170ef0 Call Trace: <TASK> __subflow_push_pending+0xa4/0x420 net/mptcp/protocol.c:1545 __mptcp_push_pending+0x128/0x3b0 net/mptcp/protocol.c:1614 mptcp_release_cb+0x218/0x5b0 net/mptcp/protocol.c:3391 release_sock+0xf6/0x100 net/core/sock.c:3521 mptcp_worker+0x6e8/0x8f0 net/mptcp/protocol.c:2746 process_scheduled_works+0x341/0x690 kernel/workqueue.c:2630 worker_thread+0x3a7/0x610 kernel/workqueue.c:2784 kthread+0x143/0x180 kernel/kthread.c:388 ret_from_fork+0x4d/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:304 </TASK> The root cause of the issue is that expectations are wrong: e.g. due to MPTCP-level re-injection we can hit the critical condition. Explicitly avoid the zero-window probe when the subflow write queue is not empty and drop the related warnings. Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/444 Fixes: f70cad1085d1 ("mptcp: stop relying on tcp_tx_skb_cache") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-3-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19tcp: check mptcp-level constraints for backlog coalescingPaolo Abeni
The MPTCP protocol can acquire the subflow-level socket lock and cause the tcp backlog usage. When inserting new skbs into the backlog, the stack will try to coalesce them. Currently, we have no check in place to ensure that such coalescing will respect the MPTCP-level DSS, and that may cause data stream corruption, as reported by Christoph. Address the issue by adding the relevant admission check for coalescing in tcp_add_backlog(). Note the issue is not easy to reproduce, as the MPTCP protocol tries hard to avoid acquiring the subflow-level socket lock. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/420 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-2-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19selftests: mptcp: join: correctly check for no RSTMatthieu Baerts
The commit mentioned below was more tolerant with the number of RST seen during a test because in some uncontrollable situations, multiple RST can be generated. But it was not taking into account the case where no RST are expected: this validation was then no longer reporting issues for the 0 RST case because it is not possible to have less than 0 RST in the counter. This patch fixes the issue by adding a specific condition. Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-1-17ecb002e41d@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19net: ti: icssg-prueth: Fix r30 CMDs bitmasksMD Danish Anwar
The bitmasks for EMAC_PORT_DISABLE and EMAC_PORT_FORWARD r30 commands are wrong in the driver. Update the bitmasks of these commands to the correct ones as used by the ICSSG firmware. These bitmasks are backwards compatible and work with any ICSSG firmware version. Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20231018150715.3085380-1-danishanwar@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19i40e: Align devlink info versions with ice driver and add docsIvan Vecera
Align devlink info versions with ice driver so change 'fw.mgmt' version to be 2-digit version [major.minor], add 'fw.mgmt.build' that reports mgmt firmware build number and use '"fw.psid.api' for NVM format version instead of incorrect '"fw.psid'. Additionally add missing i40e devlink documentation. Fixes: 5a423552e0d9 ("i40e: Add handler for devlink .info_get") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231018123558.552453-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19Merge branch 'dt-bindings-net-child-node-schema-cleanups'Jakub Kicinski
Rob Herring says: ==================== dt-bindings: net: Child node schema cleanups This is a series of clean-ups related to ensuring that child node schemas are constrained to not allow undefined properties. Typically, that means just adding additionalProperties or unevaluatedProperties as appropriate. The DSA/switch schemas turned out to be a bit more involved, so there's some more fixes and a bit of restructuring in them. ==================== Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-0-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: dsa: Drop 'ethernet-ports' node propertiesRob Herring
Constraints on 'ethernet-ports' node properties are already defined by the reference to ethernet-switch.yaml, so they can be dropped from the DSA schema. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-8-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: mscc,vsc7514-switch: Simplify DSA and switch referencesRob Herring
The mscc,vsc7514-switch schema doesn't add any custom port properties, so it can just reference ethernet-switch.yaml#/$defs/base and dsa.yaml#/$defs/ethernet-ports instead of the base file and can skip defining port nodes. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-7-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: mscc,vsc7514-switch: Clean-up example indentationRob Herring
The indentation for the example is completely messed up for 'ethernet-ports'. Fix it. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-6-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: ethernet-switch: Rename $defs "base" to 'ethernet-ports'Rob Herring
The name "base" is misleading as the definition is for a complete schema definition without additional properties allowed, not a "base class". Align the same to be the same as dsa.yaml. This schema file without any json pointer path is the base schema which can be extended. There are not yet any references to $defs/base to update. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-5-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: ethernet-switch: Add missing 'ethernet-ports' levelRob Herring
The '$defs/ethernet-ports' schema is referenced by schemas defining a child node 'ethernet-ports', but this schema misses the 'ethernet-ports' node. It would work if referring schemas made a reference like this: properties: ethernet-ports: $ref: ethernet-switch.yaml#/$defs/ethernet-ports However, that would be different from how dsa.yaml works. For consistency, align the schema definition with dsa.yaml and add the missing level. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-4-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: dsa/switch: Make 'ethernet-port' node addresses hexRob Herring
'ethernet-port' node unit-addresses should be in hexadecimal. Some instances have it correct, but fix the ones that don't. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-3-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: renesas: Drop ethernet-phy node schemaRob Herring
What's connected on the MDIO bus is outside the scope of the binding for ethernet controller's MDIO bus unless it's a fixed internal device, so drop the node name and reference to ethernet-phy.yaml. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-2-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19dt-bindings: net: Add missing (unevaluated|additional)Properties on child ↵Rob Herring
node schemas Just as unevaluatedProperties or additionalProperties are required at the top level of schemas, they should (and will) also be required for child node schemas. That ensures only documented properties are present for any node. Add unevaluatedProperties or additionalProperties as appropriate. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20231016-dt-net-cleanups-v1-1-a525a090b444@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-19Merge tag 'for-6.6-rc6-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "Fix a bug in chunk size decision that could lead to suboptimal placement and filling patterns" * tag 'for-6.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix stripe length calculation for non-zoned data chunk allocation
2023-10-19Merge branch 'net-fix-bugs-in-device-netns-move-and-rename'Paolo Abeni
Jakub Kicinski says: ==================== net: fix bugs in device netns-move and rename Daniel reported issues with the uevents generated during netdev namespace move, if the netdev is getting renamed at the same time. While the issue that he actually cares about is not fixed here, there is a bunch of seemingly obvious other bugs in this code. Fix the purely networking bugs while the discussion around the uevent fix is still ongoing. ==================== Link: https://lore.kernel.org/r/20231018013817.2391509-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19selftests: net: add very basic test for netdev names and namespacesJakub Kicinski
Add selftest for fixes around naming netdevs and namespaces. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: move altnames together with the netdeviceJakub Kicinski
The altname nodes are currently not moved to the new netns when netdevice itself moves: [ ~]# ip netns add test [ ~]# ip -netns test link add name eth0 type dummy [ ~]# ip -netns test link property add dev eth0 altname some-name [ ~]# ip -netns test link show dev some-name 2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff altname some-name [ ~]# ip -netns test link set dev eth0 netns 1 [ ~]# ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff altname some-name [ ~]# ip li show dev some-name Device "some-name" does not exist. Remove them from the hash table when device is unlisted and add back when listed again. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: avoid UAF on deleted altnameJakub Kicinski
Altnames are accessed under RCU (dev_get_by_name_rcu()) but freed by kfree() with no synchronization point. Each node has one or two allocations (node and a variable-size name, sometimes the name is netdev->name). Adding rcu_heads here is a bit tedious. Besides most code which unlists the names already has rcu barriers - so take the simpler approach of adding synchronize_rcu(). Note that the one on the unregistration path (which matters more) is removed by the next fix. Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: check for altname conflicts when changing netdev's netnsJakub Kicinski
It's currently possible to create an altname conflicting with an altname or real name of another device by creating it in another netns and moving it over: [ ~]$ ip link add dev eth0 type dummy [ ~]$ ip netns add test [ ~]$ ip -netns test link add dev ethX netns test type dummy [ ~]$ ip -netns test link property add dev ethX altname eth0 [ ~]$ ip -netns test link set dev ethX netns 1 [ ~]$ ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff ... 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff altname eth0 Create a macro for walking the altnames, this hopefully makes it clearer that the list we walk contains only altnames. Which is otherwise not entirely intuitive. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: fix ifname in netlink ntf during netns moveJakub Kicinski
dev_get_valid_name() overwrites the netdev's name on success. This makes it hard to use in prepare-commit-like fashion, where we do validation first, and "commit" to the change later. Factor out a helper which lets us save the new name to a buffer. Use it to fix the problem of notification on netns move having incorrect name: 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff [ ~]# ip link set dev eth0 netns 1 name eth1 ip monitor inside netns: Deleted inet eth0 Deleted inet6 eth0 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7 Name is reported as eth1 in old netns for ifindex 5, already renamed. Fixes: d90310243fd7 ("net: device name allocation cleanups") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19Merge branch 'net-stmmac-improve-tx-timer-logic'Paolo Abeni
Christian Marangi says: ==================== net: stmmac: improve tx timer logic This series comes with the intention of restoring original performance of stmmac on some router/device that used the stmmac driver to handle gigabit traffic. More info are present in patch 3. This cover letter is to show results and improvements of the following change. The move to hr_timer for tx timer and commit 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races") caused big performance regression on these kind of device. This was observed on ipq806x that after kernel 4.19 couldn't handle gigabit speed anymore. The following series is currently applied and tested in OpenWrt SNAPSHOT and have great performance increase. (the scenario is qca8k switch + stmmac dwmac1000) Some good comparison can be found here [1]. The difference is from a swconfig scenario (where dsa tagging is not used so very low CPU impact in handling traffic) and DSA scenario where tagging is used and there is a minimal impact in the CPU. As can be notice even with DSA in place we have better perf. It was observed by other user that also SQM scenario with cake scheduler were improved in the order of 100mbps (this scenario is CPU limited and any increase of perf is caused by removing load on the CPU) Been at least 15 days that this is in use without any complain or bug reported about queue timeout. (was the case with v1 before the additional patch was added, only appear on real world tests and not on iperf tests) [1] https://forum.openwrt.org/t/netgear-r7800-exploration-ipq8065-qca9984/285/3427?u=ansuel ==================== Link: https://lore.kernel.org/r/20231018123550.27110-1-ansuelsmth@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: increase TX coalesce timer to 5msChristian Marangi
Commit 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races") decreased the TX coalesce timer from 40ms to 1ms. This caused some performance regression on some target (regression was reported at least on ipq806x) in the order of 600mbps dropping from gigabit handling to only 200mbps. The problem was identified in the TX timer getting armed too much time. While this was fixed and improved in another commit, performance can be improved even further by increasing the timer delay a bit moving from 1ms to 5ms. The value is a good balance between battery saving by prevending too much interrupt to be generated and permitting good performance for internet oriented devices. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: move TX timer arm after DMA enableChristian Marangi
Move TX timer arm call after DMA interrupt is enabled again. The TX timer arm function changed logic and now is skipped if a napi is already scheduled. By moving the TX timer arm call after DMA is enabled, we permit to correctly skip if a DMA interrupt has been fired and a napi has been scheduled again. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: improve TX timer arm logicChristian Marangi
There is currently a problem with the TX timer getting armed multiple unnecessary times causing big performance regression on some device that suffer from heavy handling of hrtimer rearm. The use of the TX timer is an old implementation that predates the napi implementation and the interrupt enable/disable handling. Due to stmmac being a very old code, the TX timer was never evaluated again with this new implementation and was kept there causing performance regression. The performance regression started to appear with kernel version 4.19 with 8fce33317023 ("net: stmmac: Rework coalesce timer and fix multi-queue races") where the timer was reduced to 1ms causing it to be armed 40 times more than before. Decreasing the timer made the problem more present and caused the regression in the other of 600-700mbps on some device (regression where this was notice is ipq806x). The problem is in the fact that handling the hrtimer on some target is expensive and recent kernel made the timer armed much more times. A solution that was proposed was reverting the hrtimer change and use mod_timer but such solution would still hide the real problem in the current implementation. To fix the regression, apply some additional logic and skip arming the timer when not needed. Arm the timer ONLY if a napi is not already scheduled. Running the timer is redundant since the same function (stmmac_tx_clean) will run in the napi TX poll. Also try to cancel any timer if a napi is scheduled to prevent redundant run of TX call. With the following new logic the original performance are restored while keeping using the hrtimer. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: introduce napi_is_scheduled helperChristian Marangi
We currently have napi_if_scheduled_mark_missed that can be used to check if napi is scheduled but that does more thing than simply checking it and return a bool. Some driver already implement custom function to check if napi is scheduled. Drop these custom function and introduce napi_is_scheduled that simply check if napi is scheduled atomically. Update any driver and code that implement a similar check and instead use this new helper. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19iavf: delete unused iavf_mac_info fieldsMichal Schmidt
'san_addr' and 'mac_fcoeq' members of struct iavf_mac_info are unused. 'type' is write-only. Delete all three. The function iavf_set_mac_type that sets 'type' also checks if the PCI vendor ID is Intel. This is unnecessary. Delete the whole function. If in the future there's a need for the MAC type (or other PCI ID-dependent data), I would prefer to use .driver_data in iavf_pci_tbl[] for this purpose. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231018111527.78194-1-mschmidt@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19inet: lock the socket in ip_sock_set_tos()Eric Dumazet
Christoph Paasch reported a panic in TCP stack [1] Indeed, we should not call sk_dst_reset() without holding the socket lock, as __sk_dst_get() callers do not all rely on bare RCU. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 12bad6067 P4D 12bad6067 PUD 12bad5067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 1 PID: 2750 Comm: syz-executor.5 Not tainted 6.6.0-rc4-g7a5720a344e7 #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:tcp_get_metrics+0x118/0x8f0 net/ipv4/tcp_metrics.c:321 Code: c7 44 24 70 02 00 8b 03 89 44 24 48 c7 44 24 4c 00 00 00 00 66 c7 44 24 58 02 00 66 ba 02 00 b1 01 89 4c 24 04 4c 89 7c 24 10 <49> 8b 0f 48 8b 89 50 05 00 00 48 89 4c 24 30 33 81 00 02 00 00 69 RSP: 0018:ffffc90000af79b8 EFLAGS: 00010293 RAX: 000000000100007f RBX: ffff88812ae8f500 RCX: ffff88812b5f8f01 RDX: 0000000000000002 RSI: ffffffff8300f080 RDI: 0000000000000002 RBP: 0000000000000002 R08: 0000000000000003 R09: ffffffff8205eca0 R10: 0000000000000002 R11: ffff88812b5f8f00 R12: ffff88812a9e0580 R13: 0000000000000000 R14: ffff88812ae8fbd2 R15: 0000000000000000 FS: 00007f70a006b640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000012bad7003 CR4: 0000000000170ee0 Call Trace: <TASK> tcp_fastopen_cache_get+0x32/0x140 net/ipv4/tcp_metrics.c:567 tcp_fastopen_cookie_check+0x28/0x180 net/ipv4/tcp_fastopen.c:419 tcp_connect+0x9c8/0x12a0 net/ipv4/tcp_output.c:3839 tcp_v4_connect+0x645/0x6e0 net/ipv4/tcp_ipv4.c:323 __inet_stream_connect+0x120/0x590 net/ipv4/af_inet.c:676 tcp_sendmsg_fastopen+0x2d6/0x3a0 net/ipv4/tcp.c:1021 tcp_sendmsg_locked+0x1957/0x1b00 net/ipv4/tcp.c:1073 tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1336 __sock_sendmsg+0x83/0xd0 net/socket.c:730 __sys_sendto+0x20a/0x2a0 net/socket.c:2194 __do_sys_sendto net/socket.c:2206 [inline] Fixes: e08d0b3d1723 ("inet: implement lockless IP_TOS") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231018090014.345158-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19Merge branch 'net-stmmac-use-correct-pps-input-indexing'Paolo Abeni
Johannes Zink says: ==================== net: stmmac: use correct PPS input indexing The stmmac can have 0 to 4 auxiliary snapshot in channels, which can be used for capturing external triggers with respect to the eqos PTP timer. Previously when enabling the auxiliary snapshot, an invalid request was written to the hardware register, except for the Intel variant of this driver, where the only snapshot available was hardcoded. Patch 1 of this series cleans up the debug netdev_dbg message indicating the auxiliary snapshot being {en,dis}abled. No functional changes here Patch 2 of this series writes the correct PPS input indexing to the hardware registers instead of a previously used fixed value Patch 3 of this series removes a field member from plat_stmmacnet_data that is no longer needed Patch 4 of this series prepares Patch 5 by protecting the snapshot enabled flag by the aux_ts_lock mutex Patch 5 of this series adds a temporary workaround, since at the moment the driver can handle only one single auxiliary snapshot at a time. Previously the driver silently dropped the previous configuration and enabled the new one. Now, if a snapshot is already enabled and userspace tries to enable another without previously disabling the snapshot currently enabled: issue a netdev_err and return an errorcode indicating the device is busy. This series is a "never worked, doesn't hurt anyone" touchup to the PPS capture for non-intel variants of the dwmac driver. ==================== Link: https://lore.kernel.org/r/20231010-stmmac_fix_auxiliary_event_capture-v2-0-51d5f56542d7@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: do not silently change auxiliary snapshot capture channelJohannes Zink
Even though the hardware theoretically supports up to 4 simultaneous auxiliary snapshot capture channels, the stmmac driver does support only a single channel to be active at a time. Previously in case of a PTP_CLK_REQ_EXTTS request, previously active auxiliary snapshot capture channels were silently dropped and the new channel was activated. Instead of silently changing the state for all consumers, log an error and return -EBUSY if a channel is already in use in order to signal to userspace to disable the currently active channel before enabling another one. Signed-off-by: Johannes Zink <j.zink@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: ptp: stmmac_enable(): move change of plat->flags into mutexJohannes Zink
This is a preparation patch. The next patch will check if an external TS is active and return with an error. So we have to move the change of the plat->flags that tracks if external timestamping is enabled after that check. Prepare for this change and move the plat->flags change into the mutex and the if (on). Signed-off-by: Johannes Zink <j.zink@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: intel: remove unnecessary field struct ↵Johannes Zink
plat_stmmacenet_data::ext_snapshot_num Do not store bitmask for enabling AUX_SNAPSHOT0. The previous commit ("net: stmmac: fix PPS capture input index") takes care of calculating the proper bit mask from the request data's extts.index field, which is 0 if not explicitly specified otherwise. Signed-off-by: Johannes Zink <j.zink@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: use correct PPS capture input indexJohannes Zink
The stmmac supports up to 4 auxiliary snapshots that can be enabled by setting the appropriate bits in the PTP_ACR bitfield. Previously as of commit f4da56529da6 ("net: stmmac: Add support for external trigger timestamping") instead of setting the bits, a fixed value was written to this bitfield instead of passing the appropriate bitmask. Now the correct bit is set according to the ptp_clock_request.extts_index passed as a parameter to stmmac_enable(). Signed-off-by: Johannes Zink <j.zink@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: simplify debug message on stmmac_enable()Johannes Zink
Simplify the netdev_dbg() call in stmmac_enable() in order to reduce code duplication. No functional change. Signed-off-by: Johannes Zink <j.zink@pengutronix.de> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: ethernet: ti: Fix mixed module-builtin objectMD Danish Anwar
With CONFIG_TI_K3_AM65_CPSW_NUSS=y and CONFIG_TI_ICSSG_PRUETH=m, k3-cppi-desc-pool.o is linked to a module and also to vmlinux even though the expected CFLAGS are different between builtins and modules. The build system is complaining about the following: k3-cppi-desc-pool.o is added to multiple modules: icssg-prueth ti-am65-cpsw-nuss Introduce the new module, k3-cppi-desc-pool, to provide the common functions to ti-am65-cpsw-nuss and icssg-prueth. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/20231018064936.3146846-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: stmmac: Remove redundant checking for rx_coalesce_usecsGan Yi Fang
The datatype of rx_coalesce_usecs is u32, always larger or equal to zero. Previous checking does not include value 0, this patch removes the checking to handle the value 0. This change in behaviour making the value of 0 cause an error is not a problem because 0 is out of range of rx_coalesce_usecs. Signed-off-by: Gan Yi Fang <yi.fang.gan@intel.com> Link: https://lore.kernel.org/r/20231018030802.741923-1-yi.fang.gan@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19docs: networking: document multi-RSS contextJakub Kicinski
There seems to be no docs for the concept of multiple RSS contexts and how to configure it. I had to explain it three times recently, the last one being the charm, document it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20231018010758.2382742-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19Merge branch 'rswitch-add-pm-ops'Paolo Abeni
Yoshihiro Shimoda says: ==================== rswitch: Add PM ops This patch is based on the latest net-next.git / next branch. After applied this patch with the following patches, the system can enter/exit Suspend to Idle without any error: https://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git/commit/?h=next&id=aa4c0bbf820ddb9dd8105a403aa12df57b9e5129 https://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy.git/commit/?h=next&id=1a5361189b7acac15b9b086b2300a11b7aa84c06 ==================== Link: https://lore.kernel.org/r/20231017113402.849735-1-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19rswitch: Add PM opsYoshihiro Shimoda
Add PM ops for Suspend to Idle. When the system suspended, the Ethernet Serdes's clock will be stopped. So, this driver needs to re-initialize the Ethernet Serdes by phy_init() in renesas_eth_sw_resume(). Otherwise, timeout happened in phy_power_on(). Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19rswitch: Use unsigned int for port related array indexYoshihiro Shimoda
Array index should not be negative, so modify the condition of rswitch_for_each_enabled_port_continue_reverse() macro, and then use unsigned int instead. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-18Merge tag 'nf-23-10-18' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net First patch, from Phil Sutter, reduces number of audit notifications when userspace requests to re-set stateful objects. This change also comes with a selftest update. Second patch, also from Phil, moves the nftables audit selftest to its own netns to avoid interference with the init netns. Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree" set backend: When set element X has expired, a request to delete element X should fail (like with all other backends). Finally, patch four, also from Pablo, reverts a recent attempt to speed up abort of a large pending update with the "pipapo" set backend. It could cause stray references to remain in the set, which then results in a double-free. * tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: revert do not remove elements if set backend implements .abort netfilter: nft_set_rbtree: .deactivate fails if element has expired selftests: netfilter: Run nft_audit.sh in its own netns netfilter: nf_tables: audit log object reset once per table ==================== Link: https://lore.kernel.org/r/20231018125605.27299-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>