summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-23net: ipa: make ipa_table_hash_support() a real functionAlex Elder
With the exception of ipa_table_hash_support(), nothing defined in "ipa_table.h" requires the full definition of the IPA structure. Change that function to be a "real" function rather than an inline, to avoid requring the IPA structure to be defined. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: remove unneeded FILT_ROUT_HASH_EN definitionsAlex Elder
The FILT_ROUT_HASH_EN register is only used for IPA v4.2. There, routing and filter table hashing are not supported, and so the register must be written to disable the feature. No other version uses this register, so its definition can be removed. If we need to use these some day (for example, explicitly enable the feature) this commit can be reverted. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: call device_init_wakeup() earlierAlex Elder
Currently, enabling wakeup for the IPA device doesn't occur until the setup phase of initialization (in ipa_power_setup()). There is no need to delay doing that, however. We can conveniently do it during the config phase, in ipa_interrupt_config(), where we enable power management wakeup mode for the IPA interrupt. Moving the device_init_wakeup() out of ipa_power_setup() leaves that function empty, so it can just be eliminated. Similarly, rearrange all of the matching inverse calls, disabling device wakeup in ipa_interrupt_deconfig() and removing that function as well. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: only enable the SUSPEND IPA interrupt when neededAlex Elder
Only enable the SUSPEND IPA interrupt type when at least one endpoint has that interrupt enabled. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ipa: maintain bitmap of suspend-enabled endpointsAlex Elder
Keep track of which endpoints have the SUSPEND IPA interrupt enabled in a variable-length bitmap. This will be used in the next patch to allow the SUSPEND interrupt type to be disabled except when needed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch 'net-stmmac-fix-mac-capabilities-procedure'Paolo Abeni
Serge Semin says: ==================== net: stmmac: Fix MAC-capabilities procedure The series got born as a result of the discussions around the recent Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000 MACs support: Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p In particular the Yanteng' patchset needed to implement the Loongson MAC-specific constraints applied to the link speed and link duplex mode. As a result of the discussion with Russel the next preliminary patch was born: Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn The patch above was a temporal solution utilized by Yanteng for further developments and to move on with the on-going review. This patchset is a refactored version of that single patch with formatting required for the fixes patches. The main part of the series has already been merged in on v1 stage. The leftover is the cleanup patches which rename stmmac_ops::phylink_get_caps() callback to stmmac_ops::update_caps() and move the MAC-capabilities init/re-init to the phylink MAC-capabilities getter. Link: https://lore.kernel.org/netdev/20240412180340.7965-1-fancer.lancer@gmail.com/ Changelog v2: - Add a new patch (Romain): [PATCH net-next v2 1/2] net: stmmac: Rename phylink_get_caps() callback to update_caps() - Resubmit the leftover patches to net-next tree (Paolo). Link: https://lore.kernel.org/netdev/20240417140013.12575-1-fancer.lancer@gmail.com/ Changelog v3: - Just resubmit (Jakub). Signed-off-by: Serge Semin <fancer.lancer@gmail.com> ==================== Link: https://lore.kernel.org/r/20240419090357.5547-1-fancer.lancer@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: stmmac: Move MAC caps init to phylink MAC caps getterSerge Semin
After a set of recent fixes the stmmac_phy_setup() and stmmac_reinit_queues() methods have turned to having some duplicated code. Let's get rid from the duplication by moving the MAC-capabilities initialization to the PHYLINK MAC-capabilities getter. The getter is called during each network device interface open/close cycle. So the MAC-capabilities will be initialized in generic device open procedure and in case of the Tx/Rx queues re-initialization as the original code semantics implies. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Romain Gantois <romain.gantois@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: stmmac: Rename phylink_get_caps() callback to update_caps()Serge Semin
Since recent commits the stmmac_ops::phylink_get_caps() callback has no longer been responsible for the phylink MAC capabilities getting, but merely updates the MAC capabilities in the mac_device_info::link::caps field. Rename the callback to comply with the what the method does now. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Romain Gantois <romain.gantois@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch 'enable-rx-hw-timestamp-for-ptp-packets-using-cpts-fifo'Paolo Abeni
Chintan Vankar says: ==================== Enable RX HW timestamp for PTP packets using CPTS FIFO The CPSW offers two mechanisms for communicating packet ingress timestamp information to the host. The first mechanism is via the CPTS Event FIFO which records timestamp when triggered by certain events. One such event is the reception of an Ethernet packet with a specified EtherType field. This is used to capture ingress timestamps for PTP packets. With this mechanism the host must read the timestamp (from the CPTS FIFO) separately from the packet payload which is delivered via DMA. In the second mechanism of timestamping, CPSW driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism is responsible for triggering errata i2401: "CPSW: Host Timestamps Cause CPSW Port to Lock up." The errata affects all K3 SoCs. Link to errata for AM64x: https://www.ti.com/lit/er/sprz457h/sprz457h.pdf As a workaround we can use first mechanism to timestamp received packets. ==================== Link: https://lore.kernel.org/r/20240419082626.57225-1-c-vankar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ethernet: ti: am65-cpsw/ethtool: Enable RX HW timestamp only for PTP ↵Chintan Vankar
packets In the current mechanism of timestamping, am65-cpsw-nuss driver enables hardware timestamping for all received packets by setting the TSTAMP_EN bit in CPTS_CONTROL register, which directs the CPTS module to timestamp all received packets, followed by passing timestamp via DMA descriptors. This mechanism causes CPSW Port to Lock up. To prevent port lock up, don't enable rx packet timestamping by setting TSTAMP_EN bit in CPTS_CONTROL register. The workaround for timestamping received packets is to utilize the CPTS Event FIFO that records timestamps corresponding to certain events. The CPTS module is configured to generate timestamps for Multicast Ethernet, UDP/IPv4 and UDP/IPv6 PTP packets. Update supported hwtstamp_rx_filters values for CPSW's timestamping capability. Fixes: b1f66a5bee07 ("net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support") Signed-off-by: Chintan Vankar <c-vankar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: ethernet: ti: am65-cpts: Enable RX HW timestamp for PTP packets using ↵Chintan Vankar
CPTS FIFO Add a new function "am65_cpts_rx_timestamp()" which checks for PTP packets from header and timestamps them. Add another function "am65_cpts_find_rx_ts()" which finds CPTS FIFO Event to get the timestamp of received PTP packet. Signed-off-by: Chintan Vankar <c-vankar@ti.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23ax25: Fix netdev refcount issueDuoming Zhou
The dev_tracker is added to ax25_cb in ax25_bind(). When the ax25 device is detaching, the dev_tracker of ax25_cb should be deallocated in ax25_kill_by_device() instead of the dev_tracker of ax25_dev. The log reported by ref_tracker is shown below: [ 80.884935] ref_tracker: reference already released. [ 80.885150] ref_tracker: allocated in: [ 80.885349] ax25_dev_device_up+0x105/0x540 [ 80.885730] ax25_device_event+0xa4/0x420 [ 80.885730] notifier_call_chain+0xc9/0x1e0 [ 80.885730] __dev_notify_flags+0x138/0x280 [ 80.885730] dev_change_flags+0xd7/0x180 [ 80.885730] dev_ifsioc+0x6a9/0xa30 [ 80.885730] dev_ioctl+0x4d8/0xd90 [ 80.885730] sock_do_ioctl+0x1c2/0x2d0 [ 80.885730] sock_ioctl+0x38b/0x4f0 [ 80.885730] __se_sys_ioctl+0xad/0xf0 [ 80.885730] do_syscall_64+0xc4/0x1b0 [ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 80.885730] ref_tracker: freed in: [ 80.885730] ax25_device_event+0x272/0x420 [ 80.885730] notifier_call_chain+0xc9/0x1e0 [ 80.885730] dev_close_many+0x272/0x370 [ 80.885730] unregister_netdevice_many_notify+0x3b5/0x1180 [ 80.885730] unregister_netdev+0xcf/0x120 [ 80.885730] sixpack_close+0x11f/0x1b0 [ 80.885730] tty_ldisc_kill+0xcb/0x190 [ 80.885730] tty_ldisc_hangup+0x338/0x3d0 [ 80.885730] __tty_hangup+0x504/0x740 [ 80.885730] tty_release+0x46e/0xd80 [ 80.885730] __fput+0x37f/0x770 [ 80.885730] __x64_sys_close+0x7b/0xb0 [ 80.885730] do_syscall_64+0xc4/0x1b0 [ 80.885730] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 80.893739] ------------[ cut here ]------------ [ 80.894030] WARNING: CPU: 2 PID: 140 at lib/ref_tracker.c:255 ref_tracker_free+0x47b/0x6b0 [ 80.894297] Modules linked in: [ 80.894929] CPU: 2 PID: 140 Comm: ax25_conn_rel_6 Not tainted 6.9.0-rc4-g8cd26fd90c1a #11 [ 80.895190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qem4 [ 80.895514] RIP: 0010:ref_tracker_free+0x47b/0x6b0 [ 80.895808] Code: 83 c5 18 4c 89 eb 48 c1 eb 03 8a 04 13 84 c0 0f 85 df 01 00 00 41 83 7d 00 00 75 4b 4c 89 ff 9 [ 80.896171] RSP: 0018:ffff888009edf8c0 EFLAGS: 00000286 [ 80.896339] RAX: 1ffff1100141ac00 RBX: 1ffff1100149463b RCX: dffffc0000000000 [ 80.896502] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff88800a0d6518 [ 80.896925] RBP: ffff888009edf9b0 R08: ffff88806d3288d3 R09: 1ffff1100da6511a [ 80.897212] R10: dffffc0000000000 R11: ffffed100da6511b R12: ffff88800a4a31d4 [ 80.897859] R13: ffff88800a4a31d8 R14: dffffc0000000000 R15: ffff88800a0d6518 [ 80.898279] FS: 00007fd88b7fe700(0000) GS:ffff88806d300000(0000) knlGS:0000000000000000 [ 80.899436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 80.900181] CR2: 00007fd88c001d48 CR3: 000000000993e000 CR4: 00000000000006f0 ... [ 80.935774] ref_tracker: sp%d@000000000bb9df3d has 1/1 users at [ 80.935774] ax25_bind+0x424/0x4e0 [ 80.935774] __sys_bind+0x1d9/0x270 [ 80.935774] __x64_sys_bind+0x75/0x80 [ 80.935774] do_syscall_64+0xc4/0x1b0 [ 80.935774] entry_SYSCALL_64_after_hwframe+0x67/0x6f Change ax25_dev->dev_tracker to the dev_tracker of ax25_cb in order to mitigate the bug. Fixes: feef318c855a ("ax25: fix UAF bugs of net_device caused by rebinding operation") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20240419020456.29826-1-duoming@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23Merge branch ↵Paolo Abeni
'read-phy-address-of-switch-from-device-tree-on-mt7530-dsa-subdriver' Arınç ÜNAL says: ==================== Read PHY address of switch from device tree on MT7530 DSA subdriver This patch series makes the driver read the PHY address the switch listens on from the device tree which, in result, brings support for MT7530 switches listening on a different PHY address than 31. And the patch series simplifies the core operations. Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> ==================== Link: https://lore.kernel.org/r/20240418-b4-for-netnext-mt7530-phy-addr-from-dt-and-simplify-core-ops-v3-0-3b5fb249b004@arinc9.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: dsa: mt7530: simplify core operationsArınç ÜNAL
The core_rmw() function calls core_read_mmd_indirect() to read the requested register, and then calls core_write_mmd_indirect() to write the requested value to the register. Because Clause 22 is used to access Clause 45 registers, some operations on core_write_mmd_indirect() are unnecessarily run. Get rid of core_read_mmd_indirect() and core_write_mmd_indirect(), and run only the necessary operations on core_write() and core_rmw(). Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-23net: dsa: mt7530-mdio: read PHY address of switch from device treeArınç ÜNAL
Read the PHY address the switch listens on from the reg property of the switch node on the device tree. This change brings support for MT7530 switches on boards with such bootstrapping configuration where the switch listens on a different PHY address than the hardcoded PHY address on the driver, 31. As described on the "MT7621 Programming Guide v0.4" document, the MT7530 switch and its PHYs can be configured to listen on the range of 7-12, 15-20, 23-28, and 31 and 0-4 PHY addresses. There are operations where the switch PHY registers are used. For the PHY address of the control PHY, transform the MT753X_CTRL_PHY_ADDR constant into a macro and use it. The PHY address for the control PHY is 0 when the switch listens on 31. In any other case, it is one greater than the PHY address the switch listens on. Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-04-22net: ethernet: mtk_eth_soc: flower: validate control flagsAsbjørn Sloth Tønnesen
This driver currently doesn't support any control flags. Use flow_rule_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418161821.189263-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22dpaa2-switch: flower: validate control flagsAsbjørn Sloth Tønnesen
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20240418161802.189247-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22cxgb4: flower: validate control flagsAsbjørn Sloth Tønnesen
This driver currently doesn't support any control flags. Use flow_rule_match_has_control_flags() to check for control flags, such as can be set through `tc flower ... ip_flags frag`. In case any control flags are masked, flow_rule_match_has_control_flags() sets a NL extended error message, and we return -EOPNOTSUPP. Only compile-tested. Only compile tested, no hardware available. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240418161751.189226-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: openvswitch: Check vport netdev nameJun Gu
Ensure that the provided netdev name is not one of its aliases to prevent unnecessary creation and destruction of the vport by ovs-vswitchd. Signed-off-by: Jun Gu <jun.gu@easystack.cn> Acked-by: Eelco Chaudron <echaudro@redhat.com> Link: https://lore.kernel.org/r/20240419061425.132723-1-jun.gu@easystack.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'netlink-add-nftables-spec-w-multi-messages'Jakub Kicinski
Donald Hunter says: ==================== netlink: Add nftables spec w/ multi messages This series adds a ynl spec for nftables and extends ynl with a --multi command line option that makes it possible to send transactional batches for nftables. This series includes a patch for nfnetlink which adds ACK processing for batch begin/end messages. If you'd prefer that to be sent separately to nf-next then I can do so, but I included it here so that it gets seen in context. An example of usage is: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi batch-begin '{"res-id": 10}' \ --multi newtable '{"name": "test", "nfgen-family": 1}' \ --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --multi batch-end '{"res-id": 10}' [None, None, None, None] It can also be used for bundling get requests: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi gettable '{"name": "test", "nfgen-family": 1}' \ --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --output-json [{"name": "test", "use": 1, "handle": 1, "flags": [], "nfgen-family": 1, "version": 0, "res-id": 2}, {"table": "test", "name": "chain", "handle": 1, "use": 0, "nfgen-family": 1, "version": 0, "res-id": 2}] There are 2 issues that may be worth resolving: - ynl reports errors by raising an NlError exception so only the first error gets reported. This could be changed to add errors to the list of responses so that multiple errors could be reported. - If any message does not get a response (e.g. batch-begin w/o patch 2) then ynl waits indefinitely. A recv timeout could be added which would allow ynl to terminate. ==================== Link: https://lore.kernel.org/r/20240418104737.77914-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22netfilter: nfnetlink: Handle ACK flags for batch messagesDonald Hunter
The NLM_F_ACK flag is ignored for nfnetlink batch begin and end messages. This is a problem for ynl which wants to receive an ack for every message it sends, not just the commands in between the begin/end messages. Add processing for ACKs for begin/end messages and provide responses when requested. I have checked that iproute2, pyroute2 and systemd are unaffected by this change since none of them use NLM_F_ACK for batch begin/end. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-5-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tools/net/ynl: Add multi message support to ynlDonald Hunter
Add a "--multi <do-op> <json>" command line to ynl that makes it possible to add several operations to a single netlink request payload. The --multi command line option is repeated for each operation. This is used by the nftables family for transaction batches. For example: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi batch-begin '{"res-id": 10}' \ --multi newtable '{"name": "test", "nfgen-family": 1}' \ --multi newchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --multi batch-end '{"res-id": 10}' [None, None, None, None] It can also be used for bundling get requests: ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/nftables.yaml \ --multi gettable '{"name": "test", "nfgen-family": 1}' \ --multi getchain '{"name": "chain", "table": "test", "nfgen-family": 1}' \ --output-json [{"name": "test", "use": 1, "handle": 1, "flags": [], "nfgen-family": 1, "version": 0, "res-id": 2}, {"table": "test", "name": "chain", "handle": 1, "use": 0, "nfgen-family": 1, "version": 0, "res-id": 2}] Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-4-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tools/net/ynl: Fix extack decoding for directional opsDonald Hunter
NetlinkProtocol.decode() was looking up ops by response value which breaks when it is used for extack decoding of directional ops. Instead, pass the op to decode(). Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-3-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22doc/netlink/specs: Add draft nftables specDonald Hunter
Add a spec for nftables that has nearly complete coverage of the ops, but limited coverage of rule types and subexpressions. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20240418104737.77914-2-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'for-uring-ubufops' into HEADJakub Kicinski
Pavel Begunkov says: ==================== implement io_uring notification (ubuf_info) stacking (net part) To have per request buffer notifications each zerocopy io_uring send request allocates a new ubuf_info. However, as an skb can carry only one uarg, it may force the stack to create many small skbs hurting performance in many ways. The patchset implements notification, i.e. an io_uring's ubuf_info extension, stacking. It attempts to link ubuf_info's into a list, allowing to have multiple of them per skb. liburing/examples/send-zerocopy shows up 6 times performance improvement for TCP with 4KB bytes per send, and levels it with MSG_ZEROCOPY. Without the patchset it requires much larger sends to utilise all potential. bytes | before | after (Kqps) 1200 | 195 | 1023 4000 | 193 | 1386 8000 | 154 | 1058 ==================== Link: https://lore.kernel.org/all/cover.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: "Five ksmbd server fixes, most also for stable: - rename fix - two fixes for potential out of bounds - fix for connections from MacOS (padding in close response) - fix for when to enable persistent handles" * tag '6.9-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: add continuous availability share parameter ksmbd: common: use struct_group_attr instead of struct_group for network_open_info ksmbd: clear RENAME_NOREPLACE before calling vfs_rename ksmbd: validate request buffer size in smb2_allocate_rsp_buf() ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
2024-04-22net: add callback for setting a ubuf_info to skbPavel Begunkov
At the moment an skb can only have one ubuf_info associated with it, which might be a performance problem for zerocopy sends in cases like TCP via io_uring. Add a callback for assigning ubuf_info to skb, this way we will implement smarter assignment later like linking ubuf_info together. Note, it's an optional callback, which should be compatible with skb_zcopy_set(), that's because the net stack might potentially decide to clone an skb and take another reference to ubuf_info whenever it wishes. Also, a correct implementation should always be able to bind to an skb without prior ubuf_info, otherwise we could end up in a situation when the send would not be able to progress. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/b7918aadffeb787c84c9e72e34c729dc04f3a45d.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: extend ubuf_info callback to ops structurePavel Begunkov
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/all/a62015541de49c0e2a8a0377a1d5d0a5aeb07016.1713369317.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'tcp-avoid-sending-too-small-packets'Jakub Kicinski
Eric Dumazet says: ==================== tcp: avoid sending too small packets tcp_sendmsg() cooks 'large' skbs, that are later split if needed from tcp_write_xmit(). After a split, the leftover skb size is smaller than the optimal size, and this causes a performance drop. In this series, tcp_grow_skb() helper is added to shift payload from the second skb in the write queue to the first skb to always send optimal sized skbs. This increases TSO efficiency, and decreases number of ACK packets. ==================== Link: https://lore.kernel.org/r/20240418214600.1291486-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tcp: try to send bigger TSO packetsEric Dumazet
While investigating TCP performance, I found that TCP would sometimes send big skbs followed by a single MSS skb, in a 'locked' pattern. For instance, BIG TCP is enabled, MSS is set to have 4096 bytes of payload per segment. gso_max_size is set to 181000. This means that an optimal TCP packet size should contain 44 * 4096 = 180224 bytes of payload, However, I was seeing packets sizes interleaved in this pattern: 172032, 8192, 172032, 8192, 172032, 8192, <repeat> tcp_tso_should_defer() heuristic is defeated, because after a split of a packet in write queue for whatever reason (this might be a too small CWND or a small enough pacing_rate), the leftover packet in the queue is smaller than the optimal size. It is time to try to make 'leftover packets' bigger so that tcp_tso_should_defer() can give its full potential. After this patch, we can see the following output: 14:13:34.009273 IP6 sender > receiver: Flags [P.], seq 4048380:4098360, ack 1, win 256, options [nop,nop,TS val 3425678144 ecr 1561784500], length 49980 14:13:34.010272 IP6 sender > receiver: Flags [P.], seq 4098360:4148340, ack 1, win 256, options [nop,nop,TS val 3425678145 ecr 1561784501], length 49980 14:13:34.011271 IP6 sender > receiver: Flags [P.], seq 4148340:4198320, ack 1, win 256, options [nop,nop,TS val 3425678146 ecr 1561784502], length 49980 14:13:34.012271 IP6 sender > receiver: Flags [P.], seq 4198320:4248300, ack 1, win 256, options [nop,nop,TS val 3425678147 ecr 1561784503], length 49980 14:13:34.013272 IP6 sender > receiver: Flags [P.], seq 4248300:4298280, ack 1, win 256, options [nop,nop,TS val 3425678148 ecr 1561784504], length 49980 14:13:34.014271 IP6 sender > receiver: Flags [P.], seq 4298280:4348260, ack 1, win 256, options [nop,nop,TS val 3425678149 ecr 1561784505], length 49980 14:13:34.015272 IP6 sender > receiver: Flags [P.], seq 4348260:4398240, ack 1, win 256, options [nop,nop,TS val 3425678150 ecr 1561784506], length 49980 14:13:34.016270 IP6 sender > receiver: Flags [P.], seq 4398240:4448220, ack 1, win 256, options [nop,nop,TS val 3425678151 ecr 1561784507], length 49980 14:13:34.017269 IP6 sender > receiver: Flags [P.], seq 4448220:4498200, ack 1, win 256, options [nop,nop,TS val 3425678152 ecr 1561784508], length 49980 14:13:34.018276 IP6 sender > receiver: Flags [P.], seq 4498200:4548180, ack 1, win 256, options [nop,nop,TS val 3425678153 ecr 1561784509], length 49980 14:13:34.019259 IP6 sender > receiver: Flags [P.], seq 4548180:4598160, ack 1, win 256, options [nop,nop,TS val 3425678154 ecr 1561784510], length 49980 With 200 concurrent flows on a 100Gbit NIC, we can see a reduction of TSO packets (and ACK packets) of about 30 %. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tcp: call tcp_set_skb_tso_segs() from tcp_write_xmit()Eric Dumazet
tcp_write_xmit() calls tcp_init_tso_segs() to set gso_size and gso_segs on the packet. tcp_init_tso_segs() requires the stack to maintain an up to date tcp_skb_pcount(), and this makes sense for packets in rtx queue. Not so much for packets still in the write queue. In the following patch, we don't want to deal with tcp_skb_pcount() when moving payload from 2nd skb to 1st skb in the write queue. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22tcp: remove dubious FIN exception from tcp_cwnd_test()Eric Dumazet
tcp_cwnd_test() has a special handing for the last packet in the write queue if it is smaller than one MSS and has the FIN flag. This is in violation of TCP RFC, and seems quite dubious. This packet can be sent only if the current CWND is bigger than the number of packets in flight. Making tcp_cwnd_test() result independent of the first skb in the write queue is needed for the last patch of the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240418214600.1291486-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'mlx5e-per-queue-coalescing'Jakub Kicinski
Tariq Toukan says: ==================== mlx5e per-queue coalescing This patchset adds ethtool per-queue coalescing support for the mlx5e driver. The series introduce some changes needed as preparations for the final patch which adds the support and implements the callbacks. Main changes: - DIM code movements into its own header file. - Switch to dynamic allocation of the DIM struct in the RQs/SQs. - Allow coalescing config change without channels reset when possible. ==================== Link: https://lore.kernel.org/r/20240419080445.417574-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Implement ethtool callbacks for supporting per-queue coalescingRahul Rameshbabu
Use mlx5 on-the-fly coalescing configuration support to enable individual channel configuration. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Support updating coalescing configuration without resetting channelsRahul Rameshbabu
When CQE mode or DIM state is changed, gracefully reconfigure channels to handle new configuration. Previously, would create new channels that would reflect the changes rather than update the original channels. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Dynamically allocate DIM structure for SQs/RQsRahul Rameshbabu
Make it possible for the DIM structure to be torn down while an SQ or RQ is still active. Changing the CQ period mode is an example where the previous sampling done with the DIM structure would need to be invalidated. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Use DIM constants for CQ period mode parameterRahul Rameshbabu
Use core DIM CQ period mode enum values for the CQ parameter for the period mode. Translate the value to the specific mlx5 device constant for the selected period mode when creating a CQ. Avoid needing to translate mlx5 device constants to DIM constants for core DIM functionality. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net/mlx5e: Move DIM function declarations to en/dim.hRahul Rameshbabu
Create a header specifically for DIM-related declarations. Move existing DIM-specific functionality from en.h. Future DIM-related functionality will be declared in en/dim.h in subsequent patches. Co-developed-by: Nabil S. Alramli <dev@nalramli.com> Signed-off-by: Nabil S. Alramli <dev@nalramli.com> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240419080445.417574-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge branch 'net-dsa-vsc73xx-convert-to-phylink-and-do-some-cleanup'Jakub Kicinski
Pawel Dembicki says: ==================== net: dsa: vsc73xx: convert to PHYLINK and do some cleanup This patch series is a result of splitting a larger patch series [0], where some parts needed to be refactored. The first patch switches from a poll loop to read_poll_timeout. The second patch is a simple conversion to phylink because adjust_link won't work anymore. The third patch is preparation for future use. Using the "phy_interface_mode_is_rgmii" macro allows for the proper recognition of all RGMII modes. Patches 4-5 involve some cleanup: The fourth patch introduces a definition with the maximum number of ports to avoid using magic numbers. The next one fills in documentation. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=841034&state=%2A&archive=both ==================== Link: https://lore.kernel.org/r/20240417205048.3542839-1-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: dsa: vsc73xx: add structure descriptionsPawel Dembicki
This commit adds updates to the documentation describing the structures used in vsc73xx. This will help prevent kdoc-related issues in the future. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://lore.kernel.org/r/20240417205048.3542839-6-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: dsa: vsc73xx: Add define for max num of portsPawel Dembicki
This patch introduces a new define: VSC73XX_MAX_NUM_PORTS, which can be used in the future instead of a hardcoded value. Currently, the only hardcoded value is vsc->ds->num_ports. It is being replaced with the new define. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240417205048.3542839-5-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: dsa: vsc73xx: use macros for rgmii recognitionPawel Dembicki
It's preparation for future use. At this moment, the RGMII port is used only for a connection to the MAC interface, but in the future, someone could connect a PHY to it. Using the "phy_interface_mode_is_rgmii" macro allows for the proper recognition of all RGMII modes. Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240417205048.3542839-4-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: dsa: vsc73xx: convert to PHYLINKPawel Dembicki
This patch replaces the adjust_link api with the phylink apis that provide equivalent functionality. The remaining functionality from the adjust_link is now covered in the mac_link_* and mac_config from phylink_mac_ops structure. Removes: .adjust_link Adds phylink_mac_ops structure: .mac_config .mac_link_up .mac_link_down Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://lore.kernel.org/r/20240417205048.3542839-3-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22net: dsa: vsc73xx: use read_poll_timeout instead delay loopPawel Dembicki
Switch the delay loop during the Arbiter empty check from vsc73xx_adjust_link() to use read_poll_timeout(). Functionally, one msleep() call is eliminated at the end of the loop in the timeout case. As Russell King suggested: "This [change] avoids the issue that on the last iteration, the code reads the register, tests it, finds the condition that's being waiting for is false, _then_ waits and end up printing the error message - that last wait is rather useless, and as the arbiter state isn't checked after waiting, it could be that we had success during the last wait." Suggested-by: Russell King <linux@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Link: https://lore.kernel.org/r/20240417205048.3542839-2-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22NFC: trf7970a: disable all regulators on removalPaul Geurts
During module probe, regulator 'vin' and 'vdd-io' are used and enabled, but the vdd-io regulator overwrites the 'vin' regulator pointer. During remove, only the vdd-io is disabled, as the vin regulator pointer is not available anymore. When regulator_put() is called during resource cleanup a kernel warning is given, as the regulator is still enabled. Store the two regulators in separate pointers and disable both the regulators on module remove. Fixes: 49d22c70aaf0 ("NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage") Signed-off-by: Paul Geurts <paul_geurts@live.nl> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/DB7PR09MB26847A4EBF88D9EDFEB1DA0F950E2@DB7PR09MB2684.eurprd09.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22MAINTAINERS: eth: mark IBM eHEA as an OrphanDavid Christensen
Current maintainer Douglas Miller has left IBM and no replacement has been assigned for the driver. The eHEA hardware was last used on IBM POWER7 systems, the last of which reached end-of-support at the end of 2020. Signed-off-by: David Christensen <drc@linux.ibm.com> Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20240418195517.528577-1-drc@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22Merge tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Nothing too crazy in this one, and it looks like (fingers crossed) the recovery and repair issues are settling down - although there's going to be a long tail there, as we've still yet to really ramp up on error injection or syzbot. - fix a few more deadlocks in recovery - fix u32/u64 issues in mi_btree_bitmap - btree key cache shrinker now actually frees, with more instrumentation coming so we can verify that it's working correctly more easily in the future" * tag 'bcachefs-2024-04-22' of https://evilpiepirate.org/git/bcachefs: bcachefs: If we run merges at a lower watermark, they must be nonblocking bcachefs: Fix inode early destruction path bcachefs: Fix deadlock in journal write path bcachefs: Tweak btree key cache shrinker so it actually frees bcachefs: bkey_cached.btree_trans_barrier_seq needs to be a ulong bcachefs: Fix missing call to bch2_fs_allocator_background_exit() bcachefs: Check for journal entries overruning end of sb clean section bcachefs: Fix bio alloc in check_extent_checksum() bcachefs: fix leak in bch2_gc_write_reflink_key bcachefs: KEY_TYPE_error is allowed for reflink bcachefs: Fix bch2_dev_btree_bitmap_marked_sectors() shift bcachefs: make sure to release last journal pin in replay bcachefs: node scan: ignore multiple nodes with same seq if interior bcachefs: Fix format specifier in validate_bset_keys() bcachefs: Fix null ptr deref in twf from BCH_IOCTL_FSCK_OFFLINE
2024-04-22net: dsa: mv88e6xx: fix supported_interfaces setup in ↵Matthias Schiffer
mv88e6250_phylink_get_caps() With the recent PHYLINK changes requiring supported_interfaces to be set, MV88E6250 family switches like the 88E6020 fail to probe - cmode is never initialized on these devices, so mv88e6250_phylink_get_caps() does not set any supported_interfaces flags. Instead of a cmode, on 88E6250 we have a read-only port mode value that encodes similar information. There is no reason to bother mapping port mode to the cmodes of other switch models; instead we introduce a mv88e6250_setup_supported_interfaces() that is called directly from mv88e6250_phylink_get_caps(). Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Link: https://lore.kernel.org/r/20240417103737.166651-1-matthias.schiffer@ew.tq-group.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22ice: Document tx_scheduling_layers parameterMichal Wilczynski
New driver specific parameter 'tx_scheduling_layers' was introduced. Describe parameter in the documentation. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-22ice: Add tx_scheduling_layers devlink paramLukasz Czapnik
It was observed that Tx performance was inconsistent across all queues and/or VSIs and that it was directly connected to existing 9-layer topology of the Tx scheduler. Introduce new private devlink param - tx_scheduling_layers. This parameter gives user flexibility to choose the 5-layer transmit scheduler topology which helps to smooth out the transmit performance. Allowed parameter values are 5 and 9. Example usage: Show: devlink dev param show pci/0000:4b:00.0 name tx_scheduling_layers pci/0000:4b:00.0: name tx_scheduling_layers type driver-specific values: cmode permanent value 9 Set: devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 5 cmode permanent devlink dev param set pci/0000:4b:00.0 name tx_scheduling_layers value 9 cmode permanent Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>