summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-27ipv6: raw: remove useless input parameter in rawv6_errZhengchao Shao
The input parameter 'opt' in rawv6_err() is not used. Therefore, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240224084121.2479603-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27netlink: use kvmalloc() in netlink_alloc_large_skb()Eric Dumazet
This is a followup of commit 234ec0b6034b ("netlink: fix potential sleeping issue in mqueue_flush_file"), because vfree_atomic() overhead is unfortunate for medium sized allocations. 1) If the allocation is smaller than PAGE_SIZE, do not bother with vmalloc() at all. Some arches have 64KB PAGE_SIZE, while NLMSG_GOODSIZE is smaller than 8KB. 2) Use kvmalloc(), which might allocate one high order page instead of vmalloc if memory is not too fragmented. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20240224090630.605917-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27bnxt_en: fix accessing vnic_info before allocating itAlexander Lobakin
bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration block, but allocates it much later. As a result, the following crash happens on my setup: BUG: kernel NULL pointer dereference, address: 0000000000000090 fbcon: Taking over console #PF: supervisor write access in kernel mode #PF: error_code (0x0002) - not-present page PGD 12f382067 P4D 0 Oops: 8002 [#1] PREEMPT SMP NOPTI CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ #49 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540 FS: 000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x3a6/0x400 ? exc_page_fault+0x7a/0x1b0 ? asm_exc_page_fault+0x26/8x30 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en] _bnxt_open_nic+0x198/0xa50 [bnxt_en] ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en] bnxt_open+0xeb/0x1b0 [bnxt_en] _dev_open+0x12e/0x1f0 _dev_change_flags+0xb0/0x200 dev_change_flags+0x25/0x60 do_setlink+0x463/0x1260 ? sock_def_readable+0x14/0xc0 ? rtnl_getlink+0x4b9/0x590 ? _nla_validate_parse+0x91/0xfa0 rtnl_newlink+0xbac/0xe40 <...> Don't create a variable and dereference the first array member directly since it's used only once in the code. Fixes: ef4ee64e9990 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index") Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240226144911.1297336-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27selftests: netdevsim: be less selective for FW for the devlink testJakub Kicinski
Commit 6151ff9c7521 ("selftests: netdevsim: use suitable existing dummy file for flash test") introduced a nice trick to the devlink flashing test. Instead of user having to create a file under /lib/firmware we just pick the first one that already exists. Sadly, in AWS Linux there are no files directly under /lib/firmware, only in subdirectories. Don't limit the search to -maxdepth 1. We can use the %P print format to get the correct path for files inside subdirectories: $ find /lib/firmware -type f -printf '%P\n' | head -1 intel-ucode/06-1a-05 The full path is /lib/firmware/intel-ucode/06-1a-05 This works in GNU find, busybox doesn't have printf at all, so we're not making it worse. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240224050658.930272-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: stmmac: mmc_core: Drop interrupt registers from statsJesper Nilsson
The MMC IPC interrupt status and interrupt mask registers are of little use as Ethernet statistics, but incrementing counters based on the current interrupt and interrupt mask registers makes them actively misleading. For example, if the interrupt mask is set to 0x08420842, the current code will increment by that amount each iteration, leading to the following sequence of nonsense: mmc_rx_ipc_intr_mask: 969816526 mmc_rx_ipc_intr_mask: 1108361744 These registers have been included in the Ethernet statistics since the first version of MMC back in 2011 (commit 1c901a46d57). That commit also mentions the MMC interrupts as "something to add later (if actually useful)". If the registers are actually useful, they should probably be part of the Ethernet register dump instead of statistics, but for now, drop the counters for mmc_rx_ipc_intr and mmc_rx_ipc_intr_mask completely. Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223-stmmac_stats-v3-1-5d483c2a071a@axis.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ethernet: adi: adin1110: Reduce the MDIO_TRDONE poll intervalCiprian Regus
In order to do a clause 22 access to the PHY registers of the ADIN1110, we have to write the MDIO frame to the ADIN1110_MDIOACC register, and then poll the MDIO_TRDONE bit (for a 1) in the same register. The device will set this bit to 1 once the internal MDIO transaction is done. In practice, this bit takes ~50 - 60 us to be set. The first attempt to poll the bit is right after the ADIN1110_MDIOACC register is written, so it will always be read as 0. The next check will only be done after 10 ms, which will result in the MDIO transactions taking a long time to complete. Reduce this polling interval to 100 us. Since this interval is short enough, switch the poll function to readx_poll_timeout_atomic() instead. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: Ciprian Regus <ciprian.regus@analog.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240223162129.154114-1-ciprian.regus@analog.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27Merge branch 'net-ipa-don-t-abort-system-suspend'Paolo Abeni
Alex Elder says: ==================== net: ipa: don't abort system suspend Currently the IPA code aborts an in-progress system suspend if an IPA interrupt arrives before the suspend completes. There is no need to do that though, because the IPA driver handles a forced suspend correctly, quiescing any hardware activity before finally turning off clocks and interconnects. This series drops the call to pm_wakeup_dev_event() if an IPA SUSPEND interrupt arrives during system suspend. Doing this makes the two remaining IPA power flags unnecessary, and allows some additional code to be cleaned up--and best of all, removed. The result is much simpler (and I'm really glad not to be using these flags any more). The first patch implements the main change. The second and third remove the flags that were used to determine whether to call pm_wakeup_dev_event(). The next two remove a function that becomes a trivial wrapper, and the last one just avoids writing a register unnecessarily. Note that the first two patches will have checkpatch warnings, because checkpatch disagrees with my compiler on what to do when a block contains only a semicolon. I went with what the compiler recommends. clang says: warning: suggest braces around empty body checkpatch: WARNING: braces {} are not necessary for single statement blocks ==================== Link: https://lore.kernel.org/r/20240223133930.582041-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ipa: don't bother zeroing an already zero registerAlex Elder
In ipa_interrupt_suspend_clear_all(), if the SUSPEND_INFO register read contains no set bits, there's no interrupt condition to clear. Skip the write to the clear register in that case. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ipa: kill ipa_power_suspend_handler()Alex Elder
Now that ipa_power_suspend_handler() is a trivial wrapper around ipa_interrupt_suspend_clear_all(), we can open-code it in the one place it's used, and get rid of the function. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ipa: move ipa_interrupt_suspend_clear_all() upAlex Elder
The next patch makes ipa_interrupt_suspend_clear_all() static, calling it only within "ipa_interrupt.c". Move its definition higher in the file so no declaration is needed. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ipa: kill the IPA_POWER_FLAG_RESUMED flagAlex Elder
The IPA_POWER_FLAG_RESUMED was originally used to avoid calling pm_wakeup_dev_event() more than once when handling a SUSPEND interrupt. This call is no longer made, so there' no need for the flag, so get rid of it. That leaves no more IPA power flags usefully defined, so just get rid of the bitmap in the IPA power structure and the definition of the ipa_power_flag enumerated type. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ipa: kill IPA_POWER_FLAG_SYSTEMAlex Elder
The SYSTEM IPA power flag is set, cleared, and tested. But nothing happens based on its value when tested, so it serves no purpose. Get rid of this flag. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: ipa: don't bother aborting system resumeAlex Elder
The IPA interrupt can fire if there is data to be delivered to a GSI channel that is suspended. This condition occurs in three scenarios. First, runtime power management automatically suspends the IPA hardware after half a second of inactivity. This has nothing to do with system suspend, so a SYSTEM IPA power flag is used to avoid calling pm_wakeup_dev_event() when runtime suspended. Second, if the system is suspended, the receipt of an IPA interrupt should trigger a system resume. Configuring the IPA interrupt for wakeup accomplishes this. Finally, if system suspend is underway and the IPA interrupt fires, we currently call pm_wakeup_dev_event() to abort the system suspend. The IPA driver correctly handles quiescing the hardware before suspending it, so there's really no need to abort a suspend in progress in the third case. We can simply quiesce and suspend things, and be done. Incoming data can still wake the system after it's suspended. The IPA interrupt has wakeup mode enabled, so if it fires *after* we've suspended, it will trigger a wakeup (if not disabled via sysfs). Stop calling pm_wakeup_dev_event() to abort a system suspend in progress in ipa_power_suspend_handler(). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-27net: phy: simplify genphy_c45_ethtool_set_eeeHeiner Kallweit
Simplify the function, no functional change intended. - Remove not needed variable unsupp, I think code is even better readable now. - Move setting phydev->eee_enabled out of the if clause - Simplify return value handling Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/442277c7-7431-4542-80b5-1d3d691714d7@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-26Merge branch 'mptcp-various-small-improvements'Jakub Kicinski
Matthieu Baerts says: ==================== mptcp: various small improvements This series brings various small improvements to MPTCP and its selftests: Patch 1 prints an error if there are duplicated subtests names. It is important to have unique (sub)tests names in TAP, because some CI environments drop (sub)tests with duplicated names. Patch 2 is a preparation for patches 3 and 4, which check the protocol in tcp_sk() and mptcp_sk() with DEBUG_NET, only in code from net/mptcp/. We recently had the case where an MPTCP socket was wrongly treated as a TCP one, and fuzzers and static checkers never spot the issue. This would prevent such issues in the future. Patches 5 to 7 are some cleanup for the MPTCP selftests. These patches are not supposed to change the behaviour. Patch 8 sets the poll timeout in diag selftest to the same value as the one used in the other selftests. ==================== Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-0-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: diag: change timeout_poll to 30Geliang Tang
Even if it is set to 100ms from the beginning with commit df62f2ec3df6 ("selftests/mptcp: add diag interface tests"), there is no reason not to have it to 30ms like all the other tests. "diag.sh" is not supposed to be slower than the other ones. To maintain consistency with other scripts, this patch changes it to 30. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: join: change capture/checksum as boolGeliang Tang
To maintain consistency with other scripts, this patch changes vars 'capture' and 'checksum' as bool vars in mptcp_join. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-7-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: simult flows: define missing varsGeliang Tang
The variables 'large', 'small', 'sout', 'cout', 'capout' and 'size' are used in multiple functions, so they should be clearly defined as global variables at the top of the file. This patch redefines them at the beginning of simult_flows.sh. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-6-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: netlink: drop duplicate var retGeliang Tang
The variable 'ret' are defined twice in pm_netlink.sh. This patch drops this duplicate one that has been defined from the beginning, with commit eedbc685321b ("selftests: add PM netlink functional tests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-5-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: check the protocol in mptcp_sk() with DEBUG_NETMatthieu Baerts (NGI0)
Fuzzers and static checkers might not detect when mptcp_sk() is used with a non mptcp_sock structure. This is similar to the parent commit, where it is easy to use mptcp_sk() with a TCP sock, e.g. with a subflow sk. So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell kernel devs when a non-MPTCP socket is being used as an MPTCP one. 'mptcp_sk()' macro is then defined differently: with an extra WARN to complain when an unexpected socket is being used. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-4-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: check the protocol in tcp_sk() with DEBUG_NETMatthieu Baerts (NGI0)
Fuzzers and static checkers might not detect when tcp_sk() is used with a non tcp_sock structure. This kind of mistake already happened a few times with MPTCP: when wrongly using TCP-specific helpers with mptcp_sock pointers. On the other hand, there are many 'tcp_xxx()' helpers that are taking a 'struct sock' pointer as arguments, and some of them are only looking at fields from 'struct sock', and nothing from 'struct tcp_sock'. It is then tempting to use them with a 'struct mptcp_sock'. So a new simple check is done when CONFIG_DEBUG_NET is enabled to tell kernel devs when a non-TCP socket is being used as a TCP one. 'tcp_sk()' macro is then re-defined to add a WARN when an unexpected socket is being used. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-3-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26mptcp: token kunit: set protocolMatthieu Baerts (NGI0)
As it would be done when initiating an MPTCP sock. This is not strictly needed for this test, but it will be when a later patch will check if the right protocol is being used when calling mptcp_sk(). Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-2-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26selftests: mptcp: lib: catch duplicated subtest entriesMatthieu Baerts (NGI0)
It is important to have a unique (sub)test name in TAP, because some CI environments drop tests with duplicated name. When adding a new subtest entry, an error message is printed in case of duplicated entries. If there were duplicated entries and if all features were expected to work, the script exits with an error at the end, after having printed all subtests in the TAP format. Thanks to that, the MPTCP CI will catch such issues early. Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-1-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26ipv6: anycast: complete RCU handling of struct ifacaddr6Eric Dumazet
struct ifacaddr6 are already freed after RCU grace period. Add __rcu qualifier to aca_next pointer, and idev->ac_list Add relevant rcu_assign_pointer() and dereference accessors. ipv6_chk_acast_dev() no longer needs to acquire idev->lock. /proc/net/anycast6 is now purely RCU protected, it no longer acquires idev->lock. Similarly in6_dump_addrs() can use RCU protection to iterate through anycast addresses. It was relying on a mixture of RCU and RTNL but next patches will get rid of RTNL there. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240223201054.220534-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26dt-bindings: net: cdns,macb: add sam9x7 ethernet interfaceVarshini Rajendran
Add documentation for sam9x7 ethernet interface. Signed-off-by: Varshini Rajendran <varshini.rajendran@microchip.com> Acked-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20240223172228.671553-1-varshini.rajendran@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26net/vsockmon: Do not set zeroed statisticsBreno Leitao
Do not set rtnl_link_stats64 fields to zero, since they are zeroed before ops->ndo_get_stats64 is called in core dev_get_stats() function. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240223115839.3572852-2-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26net/vsockmon: Leverage core stats allocatorBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the vsockmon driver and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20240223115839.3572852-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-26Merge branch 'pcs-xpcs-cleanups'David S. Miller
Serge Semin says: ==================== net: pcs: xpcs: Cleanups before adding MMIO dev support As stated in the subject this series is a short prequel before submitting the main patches adding the memory-mapped DW XPCS support to the DW XPCS and DW *MAC (STMMAC) drivers. Originally it was a part of the bigger patchset (see the changelog v2 link below) but was detached to a preparation set to shrink down the main series thus simplifying it' review. The patchset' content is straightforward: drop the redundant sentinel entry and the header files; return EINVAL errno from the soft-reset method and make sure that the interface validation method return EINVAL straight away if the requested interface isn't supported by the XPCS device instance. All of these changes are required to simplify the changes being introduced a bit later in the framework of the memory-mapped DW XPCS support patches. Link: https://lore.kernel.org/netdev/20231205103559.9605-1-fancer.lancer@gmail.com Changelog v2: - Move the preparation patches to a separate series. - Simplify the commit messages (@Russell, @Vladimir). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Explicitly return error on caps validationSerge Semin
If an unsupported interface is passed to the PCS validation callback there is no need in further link-modes calculations since the resultant array will be initialized with zeros which will be perceived by the phylink subsystem as error anyway (see phylink_validate_mac_and_pcs()). Instead let's explicitly return the -EINVAL error to inform the caller about the unsupported interface as it's done in the rest of the pcs_validate callbacks. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Return EINVAL in the internal methodsSerge Semin
In particular the xpcs_soft_reset() and xpcs_do_config() functions currently return -1 if invalid auto-negotiation mode is specified. That value might be then passed to the generic kernel subsystems which require a standard kernel errno value. Even though the erroneous conditions are very specific (memory corruption or buggy driver implementation) using a hard-coded -1 literal doesn't seem correct anyway especially when it comes to passing it higher to the network subsystem or printing to the system log. Convert the hard-coded error values to -EINVAL then. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Drop redundant workqueue.h include directiveSerge Semin
There is nothing CM workqueue-related in the driver. So the respective include directive can be dropped. While at it add an empty line delimiter between the generic and local path include directives to visually separate them. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: pcs: xpcs: Drop sentinel entry from 2500basex ifaces listSerge Semin
There are currently only two methods (xpcs_find_compat() and xpcs_get_interfaces()) defined in the driver which loop over the available interfaces. All of them rely on the xpcs_compat::num_interfaces field value to get the total number of supported interfaces. Thus the interface arrays are supposed to be filled with actual interface IDs and there is no need in the dummy terminating ID placed at the end of the arrays. Based on the above drop the PHY_INTERFACE_MODE_MAX entry from the xpcs_2500basex_interfaces array and the PHY_INTERFACE_MODE_MAX-based conditional statement from the xpcs_get_interfaces() method as redundant. Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26Merge branch 'rtnetlink-reduce-rtnl-pressure'David S. Miller
Eric Dumazet says: ==================== rtnetlink: reduce RTNL pressure for dumps This series restarts the conversion of rtnl dump operations to RCU protection, instead of requiring RTNL. In this new attempt (prior one failed in 2011), I chose to allow a gradual conversion of selected operations. After this series, "ip -6 addr" and "ip -4 ro" no longer need to acquire RTNL. I refrained from changing inet_dump_ifaddr() and inet6_dump_addr() to avoid merge conflicts because of two fixes in net tree. I also started the work for "ip link" future conversion. v2: rtnl_fill_link_ifmap() always emit IFLA_MAP (Jiri Pirko) Added "nexthop: allow nexthop_mpath_fill_node() to be called without RTNL" to avoid a lockdep splat (Ido Schimmel) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: provide RCU protection to rtnl_fill_prop_list()Eric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. dev->name_node items are already rcu protected. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: make rtnl_fill_link_ifmap() RCU readyEric Dumazet
Use READ_ONCE() to read the following device fields: dev->mem_start dev->mem_end dev->base_addr dev->irq dev->dma dev->if_port Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26inet: switch inet_dump_fib() to RCU protectionEric Dumazet
No longer hold RTNL while calling inet_dump_fib(). Also change return value for a completed dump: Returning 0 instead of skb->len allows NLMSG_DONE to be appended to the skb. User space does not have to call us again to get a standalone NLMSG_DONE marker. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26nexthop: allow nexthop_mpath_fill_node() to be called without RTNLEric Dumazet
nexthop_mpath_fill_node() will be potentially called from contexts holding rcu_lock instead of RTNL. Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/all/ZdZDWVdjMaQkXBgW@shredder/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26inet: allow ip_valid_fib_dump_req() to be called with RTNL or RCUEric Dumazet
Add a new field into struct fib_dump_filter, to let callers tell if they use RTNL locking or RCU. This is used in the following patch, when inet_dump_fib() no longer holds RTNL. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: switch inet6_dump_ifinfo() to RCU protectionEric Dumazet
No longer hold RTNL while calling inet6_dump_ifinfo() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: add RTNL_FLAG_DUMP_UNLOCKED flagEric Dumazet
Similarly to RTNL_FLAG_DOIT_UNLOCKED, this new flag allows dump operations registered via rtnl_register() or rtnl_register_module() to opt-out from RTNL protection. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: change nlk->cb_mutex roleEric Dumazet
In commit af65bdfce98d ("[NETLINK]: Switch cb_lock spinlock to mutex and allow to override it"), Patrick McHardy used a common mutex to protect both nlk->cb and the dump() operations. The override is used for rtnl dumps, registered with rntl_register() and rntl_register_module(). We want to be able to opt-out some dump() operations to not acquire RTNL, so we need to protect nlk->cb with a per socket mutex. This patch renames nlk->cb_def_mutex to nlk->nl_cb_mutex The optional pointer to the mutex used to protect dump() call is stored in nlk->dump_cb_mutex Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26netlink: hold nlk->cb_mutex longer in __netlink_dump_start()Eric Dumazet
__netlink_dump_start() releases nlk->cb_mutex right before calling netlink_dump() which grabs it again. This seems dangerous, even if KASAN did not bother yet. Add a @lock_taken parameter to netlink_dump() to let it grab the mutex if called from netlink_recvmsg() only. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26netlink: fix netlink_diag_dump() return valueEric Dumazet
__netlink_diag_dump() returns 1 if the dump is not complete, zero if no error occurred. If err variable is zero, this means the dump is complete: We should not return skb->len in this case, but 0. This allows NLMSG_DONE to be appended to the skb. User space does not have to call us again only to get NLMSG_DONE. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: use xarray iterator to implement inet6_dump_ifinfo()Eric Dumazet
Prepare inet6_dump_ifinfo() to run with RCU protection instead of RTNL and use for_each_netdev_dump() interface. Also properly return 0 at the end of a dump, avoiding an extra recvmsg() system call and RTNL acquisition. Note that RTNL-less dumps need core changes, coming later in the series. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: prepare inet6_fill_ifinfo() for RCU protectionEric Dumazet
We want to use RCU protection instead of RTNL for inet6_fill_ifinfo(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26ipv6: prepare inet6_fill_ifla6_attrs() for RCUEric Dumazet
We want to no longer hold RTNL while calling inet6_fill_ifla6_attrs() in the future. Add needed READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26rtnetlink: prepare nla_put_iflink() to run under RCUEric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. This patch prepares dev_get_iflink() and nla_put_iflink() to run either with RTNL or RCU held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26Merge branch 'dp83826'David S. Miller
Jérémie Dautheribes says: ==================== Add support for TI DP83826 configuration This short patch series introduces the possibility of overriding some parameters which are latched by default by hardware straps on the TI DP83826 PHY. The settings that can be overridden include: - Configuring the PHY in either MII mode or RMII mode. - When in RMII mode, configuring the PHY in RMII slave mode or RMII master mode. The RMII master/slave mode is TI-specific and determines whether the PHY operates from a 25MHz reference clock (master mode) or from a 50MHz reference clock (slave mode). While these features should be supported by all the TI DP8382x family, I have only been able to test them on TI DP83826 hardware. Therefore, support has been added specifically for this PHY in this patch series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: phy: dp83826: support configuring RMII master/slave operation modeJérémie Dautheribes
The TI DP83826 PHY can operate between two RMII modes: - master mode (PHY operates from a 25MHz clock reference) - slave mode (PHY operates from a 50MHz clock reference) By default, the operation mode is configured by hardware straps. Add support to configure the operation mode from within the driver. Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-26net: phy: dp83826: Add support for phy-mode configurationJérémie Dautheribes
The TI DP83826 PHY can operate in either MII mode or RMII mode. By default, it is configured by straps. It can also be configured by writing to the bit 5 of register 0x17 - RMII and Status Register (RCSR). When phydev->interface is rmii, rmii mode must be enabled, otherwise mii mode must be set. This prevents misconfiguration of hw straps. Signed-off-by: Jérémie Dautheribes <jeremie.dautheribes@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>