summaryrefslogtreecommitdiff
path: root/Documentation/networking
AgeCommit message (Collapse)Author
2024-06-21octeontx2-pf: Add ucast filter count configurability via devlink.Sai Krishna
The existing method of reserving unicast filter count leads to wasted MCAM entries if the functionality is not used or fewer entries are used. Furthermore, the amount of MCAM entries differs amongst Octeon SoCs. We implemented a means to adjust the UC filter count via devlink, allowing for better use of MCAM entries across Netdev apps. commands: To get the current unicast filter count # devlink dev param show pci/0002:02:00.0 name unicast_filter_count To change/set the unicast filter count # devlink dev param set pci/0002:02:00.0 name unicast_filter_count value 5 cmode runtime Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21docs: net: document guidance of implementing the SR-IOV NDOsJakub Kicinski
New drivers were prevented from adding ndo_set_vf_* callbacks over the last few years. This was expected to result in broader switchdev adoption, but seems to have had little effect. Based on recent netdev meeting there is broad support for allowing adding those ops. There is a problem with the current API supporting a limited number of VFs (100+, which is less than some modern HW supports). We can try to solve it by adding similar functionality on devlink ports, but that'd be another API variation to maintain. So a netlink attribute reshuffling is a more likely outcome. Document the guidance, make it clear that the API is frozen. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-20Documentation: networking: document ISO 15765-2Francesco Valla
Document basic concepts, APIs and behaviour of the ISO 15675-2 (ISO-TP) CAN stack. Signed-off-by: Francesco Valla <valla.francesco@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://lore.kernel.org/all/20240501092413.414700-2-valla.francesco@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-06-18net: phy: introduce core support for phy-mode = "10g-qxgmii"Vladimir Oltean
10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport specification. It uses the same signaling as USXGMII, but it multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G per port. Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean either the single-port USXGMII or the quad-port 10G-QXGMII variant, and they could get away just fine with that thus far. But there is a need to distinguish between the 2 as far as SerDes drivers are concerned. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-12net: ipv4: Add a sysctl to set multipath hash seedPetr Machata
When calculating hashes for the purpose of multipath forwarding, both IPv4 and IPv6 code currently fall back on flow_hash_from_keys(). That uses a randomly-generated seed. That's a fine choice by default, but unfortunately some deployments may need a tighter control over the seed used. In this patch, make the seed configurable by adding a new sysctl key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed is used specifically for multipath forwarding and not for the other concerns that flow_hash_from_keys() is used for, such as queue selection. Expose the knob as sysctl because other such settings, such as headers to hash, are also handled that way. Like those, the multipath hash seed is a per-netns variable. Despite being placed in the net.ipv4 namespace, the multipath seed sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of TCP variables. The seed used by flow_hash_from_keys() is a 128-bit quantity. However it seems that usually the seed is a much more modest value. 32 bits seem typical (Cisco, Cumulus), some systems go even lower. For that reason, and to decouple the user interface from implementation details, go with a 32-bit quantity, which is then quadruplicated to form the siphash key. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240607151357.421181-3-petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-12Documentation/tcp-ao: Add a few lines on tracepointsDmitry Safonov
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/pensando/ionic/ionic_txrx.c d9c04209990b ("ionic: Mark error paths in the data path as unlikely") 491aee894a08 ("ionic: fix kernel panic in XDP_TX action") net/ipv6/ip6_fib.c b4cb4a1391dc ("net: use unrcu_pointer() helper") b01e1c030770 ("ipv6: fix possible race in __fib6_drop_pcpu_from()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05net/mlx5e: SHAMPO, Add header-only ethtool counters for header data splitTariq Toukan
Count the number of header-only packets and bytes from SHAMPO. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240603212219.1037656-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05net/mlx5e: SHAMPO, Drop rx_gro_match_packets counterDragos Tatulea
After modifying rx_gro_packets to be more accurate, the rx_gro_match_packets counter is redundant. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240603212219.1037656-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05net/mlx5e: SHAMPO, Make GRO counters more preciseDragos Tatulea
Don't count non GRO packets. A non GRO packet is a packet with a GRO cb count of 1. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240603212219.1037656-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05tcp: add sysctl_tcp_rto_min_usKevin Yang
Adding a sysctl knob to allow user to specify a default rto_min at socket init time, other than using the hard coded 200ms default rto_min. Note that the rto_min route option has the highest precedence for configuring this setting, followed by the TCP_BPF_RTO_MIN socket option, followed by the tcp_rto_min_us sysctl. Signed-off-by: Kevin Yang <yyd@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05Revert "xsk: Document ability to redirect to any socket bound to the same umem"Magnus Karlsson
This reverts commit 968595a93669b6b4f6d1fcf80cf2d97956b6868f. Reported-by: Yuval El-Hanany <YuvalE@radware.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com Link: https://lore.kernel.org/bpf/20240604122927.29080-3-magnus.karlsson@gmail.com
2024-06-01doc: new 'mptcp' page in 'networking'Matthieu Baerts (NGI0)
A general documentation about MPTCP was missing since its introduction in v5.6. Most of what is there comes from our recently updated mptcp.dev website, with additional links to resources from the kernel documentation. This is a first version, mainly targeting app developers and users. Link: https://www.mptcp.dev Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-3-e94cdd9f2673@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01doc: mptcp: alphabetical orderMatthieu Baerts (NGI0)
Similar to what is done in other 'sysctl' pages: it looks clearer from a readability perspective. This might cause troubles in the short/mid-term with the backports, but by not putting new entries at the end, this can help to reduce conflicts in case of backports in the long term. We don't change the information here too often, so it looks OK to do that. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-2-e94cdd9f2673@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01doc: mptcp: add missing 'available_schedulers' entryMatthieu Baerts (NGI0)
This sysctl knob has been added recently, but the documentation has not been updated. This knob is used to show the available schedulers choices that are registered, similar to 'net.ipv4.tcp_available_congestion_control'. Fixes: 73c900aa3660 ("mptcp: add net.mptcp.available_schedulers") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240530-upstream-net-20240520-mptcp-doc-v3-1-e94cdd9f2673@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13net: revert partially applied PHY topology seriesJakub Kicinski
The series is causing issues with PHY drivers built as modules. Since it was only partially applied and the merge window has opened let's revert and try again for v6.11. Revert 6916e461e793 ("net: phy: Introduce ethernet link topology representation") Revert 0ec5ed6c130e ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY") Revert e75e4e074c44 ("net: phy: add helpers to handle sfp phy connect/disconnect") Revert fdd353965b52 ("net: sfp: Add helper to return the SFP bus name") Revert 841942bc6212 ("net: ethtool: Allow passing a phy index for some commands") Link: https://lore.kernel.org/all/171242462917.4000.9759453824684907063.git-patchwork-notify@kernel.org/ Link: https://lore.kernel.org/all/20240507102822.2023826-1-maxime.chevallier@bootlin.com/ Link: https://lore.kernel.org/r/20240513154156.104281-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-13 We've added 119 non-merge commits during the last 14 day(s) which contain a total of 134 files changed, 9462 insertions(+), 4742 deletions(-). The main changes are: 1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi. 2) Add BPF range computation improvements to the verifier in particular around XOR and OR operators, refactoring of checks for range computation and relaxing MUL range computation so that src_reg can also be an unknown scalar, from Cupertino Miranda. 3) Add support to attach kprobe BPF programs through kprobe_multi link in a session mode, meaning, a BPF program is attached to both function entry and return, the entry program can decide if the return program gets executed and the entry program can share u64 cookie value with return program. Session mode is a common use-case for tetragon and bpftrace, from Jiri Olsa. 4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf as well as BPF selftest's struct_ops handling, from Andrii Nakryiko. 5) Improvements to BPF selftests in context of BPF gcc backend, from Jose E. Marchesi & David Faust. 6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test- -style in order to retire the old test, run it in BPF CI and additionally expand test coverage, from Jordan Rife. 7) Big batch for BPF selftest refactoring in order to remove duplicate code around common network helpers, from Geliang Tang. 8) Another batch of improvements to BPF selftests to retire obsolete bpf_tcp_helpers.h as everything is available vmlinux.h, from Martin KaFai Lau. 9) Fix BPF map tear-down to not walk the map twice on free when both timer and wq is used, from Benjamin Tissoires. 10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL, from Alexei Starovoitov. 11) Change BTF build scripts to using --btf_features for pahole v1.26+, from Alan Maguire. 12) Small improvements to BPF reusing struct_size() and krealloc_array(), from Andy Shevchenko. 13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions, from Ilya Leoshkevich. 14) Extend TCP ->cong_control() callback in order to feed in ack and flag parameters and allow write-access to tp->snd_cwnd_stamp from BPF program, from Miao Xu. 15) Add support for internal-only per-CPU instructions to inline bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs, from Puranjay Mohan. 16) Follow-up to remove the redundant ethtool.h from tooling infrastructure, from Tushar Vyavahare. 17) Extend libbpf to support "module:<function>" syntax for tracing programs, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) bpf: make list_for_each_entry portable bpf: ignore expected GCC warning in test_global_func10.c bpf: disable strict aliasing in test_global_func9.c selftests/bpf: Free strdup memory in xdp_hw_metadata selftests/bpf: Fix a few tests for GCC related warnings. bpf: avoid gcc overflow warning in test_xdp_vlan.c tools: remove redundant ethtool.h from tooling infra selftests/bpf: Expand ATTACH_REJECT tests selftests/bpf: Expand getsockname and getpeername tests sefltests/bpf: Expand sockaddr hook deny tests selftests/bpf: Expand sockaddr program return value tests selftests/bpf: Retire test_sock_addr.(c|sh) selftests/bpf: Remove redundant sendmsg test cases selftests/bpf: Migrate ATTACH_REJECT test cases selftests/bpf: Migrate expected_attach_type tests selftests/bpf: Migrate wildcard destination rewrite test selftests/bpf: Migrate sendmsg6 v4 mapped address tests selftests/bpf: Migrate sendmsg deny test cases selftests/bpf: Migrate WILDCARD_IP test selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases ... ==================== Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13Merge tag 'nf-next-24-05-12' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: Patch #1 skips transaction if object type provides no .update interface. Patch #2 skips NETDEV_CHANGENAME which is unused. Patch #3 enables conntrack to handle Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) as untracked opposed to invalid packets. From Linus Luessing. Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of dropping them, from Jason Xing. Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0, also from Jason. Patch #6 removes reference in netfilter's sysctl documentation on pickup entries which were already removed by Florian Westphal. Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which allows to evict entries from the conntrack table, also from Florian. Patches #8 to #16 updates nf_tables pipapo set backend to allocate the datastructure copy on-demand from preparation phase, to better deal with OOM situations where .commit step is too late to fail. Series from Florian Westphal. Patch #17 adds a selftest with packetdrill to cover conntrack TCP state transitions, also from Florian. Patch #18 use GFP_KERNEL to clone elements from control plane to avoid quick atomic reserves exhaustion with large sets, reporter refers to million entries magnitude. * tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow clone callbacks to sleep selftests: netfilter: add packetdrill based conntrack tests netfilter: nft_set_pipapo: remove dirty flag netfilter: nft_set_pipapo: move cloning of match info to insert/removal path netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone netfilter: nft_set_pipapo: merge deactivate helper into caller netfilter: nft_set_pipapo: prepare walk function for on-demand clone netfilter: nft_set_pipapo: prepare destroy function for on-demand clone netfilter: nft_set_pipapo: make pipapo_clone helper return NULL netfilter: nft_set_pipapo: move prove_locking helper around netfilter: conntrack: remove flowtable early-drop test netfilter: conntrack: documentation: remove reference to non-existent sysctl netfilter: use NF_DROP instead of -NF_DROP netfilter: conntrack: dccp: try not to drop skb in conntrack netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler netfilter: nf_tables: skip transaction if update object is not implemented ==================== Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-12ARC: Add eBPF JIT supportShahab Vahedi
This will add eBPF JIT support to the 32-bit ARCv2 processors. The implementation is qualified by running the BPF tests on a Synopsys HSDK board with "ARC HS38 v2.1c at 500 MHz" as the 4-core CPU. The test_bpf.ko reports 2-10 fold improvements in execution time of its tests. For instance: test_bpf: #33 tcpdump port 22 jited:0 704 1766 2104 PASS test_bpf: #33 tcpdump port 22 jited:1 120 224 260 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:0 238 PASS test_bpf: #141 ALU_DIV_X: 4294967295 / 4294967295 = 1 jited:1 23 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:0 2034681 PASS test_bpf: #776 JMP32_JGE_K: all ... magnitudes jited:1 1020022 PASS Deployment and structure ------------------------ The related codes are added to "arch/arc/net": - bpf_jit.h -- The interface that a back-end translator must provide - bpf_jit_core.c -- Knows how to handle the input eBPF byte stream - bpf_jit_arcv2.c -- The back-end code that knows the translation logic The bpf_int_jit_compile() at the end of bpf_jit_core.c is the entrance to the whole process. Normally, the translation is done in one pass, namely the "normal pass". In case some relocations are not known during this pass, some data (arc_jit_data) is allocated for the next pass to come. This possible next (and last) pass is called the "extra pass". 1. Normal pass # The necessary pass 1a. Dry run # Get the whole JIT length, epilogue offset, etc. 1b. Emit phase # Allocate memory and start emitting instructions 2. Extra pass # Only needed if there are relocations to be fixed 2a. Patch relocations Support status -------------- The JIT compiler supports BPF instructions up to "cpu=v4". However, it does not yet provide support for: - Tail calls - Atomic operations - 64-bit division/remainder - BPF_PROBE_MEM* (exception table) The result of "test_bpf" test suite on an HSDK board is: hsdk-lnx# insmod test_bpf.ko test_suite=test_bpf test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] All the failing test cases are due to the ones that were not JIT'ed. Categorically, they can be represented as: .-----------.------------.-------------. | test type | opcodes | # of cases | |-----------+------------+-------------| | atomic | 0xC3, 0xDB | 149 | | div64 | 0x37, 0x3F | 22 | | mod64 | 0x97, 0x9F | 15 | `-----------^------------+-------------| | (total) 186 | `-------------' Setup: build config ------------------- The following configs must be set to have a working JIT test: CONFIG_BPF_JIT=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_TEST_BPF=m The following options are not necessary for the tests module, but are good to have: CONFIG_DEBUG_INFO=y # prerequisite for below CONFIG_DEBUG_INFO_BTF=y # so bpftool can generate vmlinux.h CONFIG_FTRACE=y # CONFIG_BPF_SYSCALL=y # all these options lead to CONFIG_KPROBE_EVENTS=y # having CONFIG_BPF_EVENTS=y CONFIG_PERF_EVENTS=y # Some BPF programs provide data through /sys/kernel/debug: CONFIG_DEBUG_FS=y arc# mount -t debugfs debugfs /sys/kernel/debug Setup: elfutils --------------- The libdw.{so,a} library that is used by pahole for processing the final binary must come from elfutils 0.189 or newer. The support for ARCv2 [1] has been added since that version. [1] https://sourceware.org/git/?p=elfutils.git;a=commit;h=de3d46b3e7 Setup: pahole ------------- The line below in linux/scripts/Makefile.btf must be commented out: pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats Or else, the build will fail: $ make V=1 ... BTF .btf.vmlinux.bin.o pahole -J --btf_gen_floats \ -j --lang_exclude=rust \ --skip_encoding_btf_inconsistent_proto \ --btf_gen_optimized .tmp_vmlinux.btf Complex, interval and imaginary float types are not supported Encountered error while encoding BTF. ... BTFIDS vmlinux ./tools/bpf/resolve_btfids/resolve_btfids vmlinux libbpf: failed to find '.BTF' ELF section in vmlinux FAILED: load BTF from vmlinux: No data available This is due to the fact that the ARC toolchains generate "complex float" DIE entries in libgcc and at the moment, pahole can't handle such entries. Running the tests ----------------- host$ scp /bld/linux/lib/test_bpf.ko arc: arc # sysctl net.core.bpf_jit_enable=1 arc # insmod test_bpf.ko test_suite=test_bpf ... test_bpf: #1048 Staggered jumps: JMP32_JSLE_X jited:1 697811 PASS test_bpf: Summary: 863 PASSED, 186 FAILED, [851/851 JIT'ed] Acknowledgments --------------- - Claudiu Zissulescu for his unwavering support - Yuriy Kolerov for testing and troubleshooting - Vladimir Isaev for the pahole workaround - Sergey Matyukevich for paving the road by adding the interpreter support Signed-off-by: Shahab Vahedi <shahab@synopsys.com> Link: https://lore.kernel.org/r/20240430145604.38592-1-list+bpf@vahedi.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-06Merge tag 'ipsec-next-2024-05-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-05-03 1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support. This was defined by an early version of an IETF draft that did not make it to a standard. 2) Introduce direction attribute for xfrm states. xfrm states have a direction, a stsate can be used either for input or output packet processing. Add a direction to xfrm states to make it clear for what a xfrm state is used. * tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Restrict SA direction attribute to specific netlink message types xfrm: Add dir validation to "in" data path lookup xfrm: Add dir validation to "out" data path lookup xfrm: Add Direction to the SA in or out udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support ==================== Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-06netfilter: conntrack: documentation: remove reference to non-existent sysctlFlorian Westphal
The referenced sysctl doesn't exist anymore. Fixes: 4592ee7f525c ("netfilter: conntrack: remove offload_pickup sysctl again") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-01xfrm: Add dir validation to "in" data path lookupAntony Antony
Introduces validation for the x->dir attribute within the XFRM input data lookup path. If the configured direction does not match the expected direction, input, increment the XfrmInStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmInStateDirError 1 Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-01xfrm: Add dir validation to "out" data path lookupAntony Antony
Introduces validation for the x->dir attribute within the XFRM output data lookup path. If the configured direction does not match the expected direction, output, increment the XfrmOutStateDirError counter and drop the packet to ensure data integrity and correct flow handling. grep -vw 0 /proc/net/xfrm_stat XfrmOutPolError 1 XfrmOutStateDirError 1 Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-04-22ice: Document tx_scheduling_layers parameterMichal Wilczynski
New driver specific parameter 'tx_scheduling_layers' was introduced. Describe parameter in the documentation. Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-18net: pse-pd: Add support for PSE PIsKory Maincent (Dent Project)
The Power Sourcing Equipment Power Interface (PSE PI) plays a pivotal role in the architecture of Power over Ethernet (PoE) systems. It is essentially a blueprint that outlines how one or multiple power sources are connected to the eight-pin modular jack, commonly known as the Ethernet RJ45 port. This connection scheme is crucial for enabling the delivery of power alongside data over Ethernet cables. This patch adds support for getting the PSE controller node through PSE PI device subnode. This supports adds a way to get the PSE PI id from the pse_pi devicetree subnode of a PSE controller node simply by reading the reg property. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://lore.kernel.org/r/20240417-feature_poe-v9-7-242293fd1900@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-18net: ethtool: pse-pd: Expand pse commands with the PSE PoE interfaceKory Maincent (Dent Project)
Add PSE PoE interface support in the ethtool pse command. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://lore.kernel.org/r/20240417-feature_poe-v9-3-242293fd1900@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-18ethtool: Expand Ethernet Power Equipment with c33 (PoE) alongside PoDLKory Maincent (Dent Project)
In the current PSE interface for Ethernet Power Equipment, support is limited to PoDL. This patch extends the interface to accommodate the objects specified in IEEE 802.3-2022 145.2 for Power sourcing Equipment (PSE). The following objects are now supported and considered mandatory: - IEEE 802.3-2022 30.9.1.1.5 aPSEPowerDetectionStatus - IEEE 802.3-2022 30.9.1.1.2 aPSEAdminState - IEEE 802.3-2022 30.9.1.2.1 aPSEAdminControl To avoid confusion between "PoDL PSE" and "PoE PSE", which have similar names but distinct values, we have followed the suggestion of Oleksij Rempel and Andrew Lunn to maintain separate naming schemes for each, using c33 (clause 33) prefix for "PoE PSE". You can find more details in the discussion threads here: https://lore.kernel.org/netdev/20230912110637.GI780075@pengutronix.de/ https://lore.kernel.org/netdev/2539b109-72ad-470a-9dae-9f53de4f64ec@lunn.ch/ Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://lore.kernel.org/r/20240417-feature_poe-v9-1-242293fd1900@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-12net: hns3: add support to query scc version by devlink infoHao Chen
Add support to query scc version by devlink info for device V3. Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://lore.kernel.org/r/20240410125354.2177067-5-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-12nfp: update devlink device info outputFei Qin
Newer NIC will introduce a new part number, now add it into devlink device info. This patch also updates the information of "board.id" in nfp.rst to match the devlink-info.rst. Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-12devlink: add a new info version tagFei Qin
Add definition and documentation for the new generic info "board.part_number". The new one is for part number specific use, and board.id is modified to match the documentation in devlink-info. Signed-off-by: Fei Qin <fei.qin@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-10ethtool: update tsinfo statistics attribute docs with correct typeRahul Rameshbabu
nla_put_uint can either write a u32 or u64 netlink attribute value. The size depends on whether the value can be represented with a u32 or requires a u64. Use a uint annotation in various documentation to represent this. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Link: https://lore.kernel.org/r/20240409232520.237613-2-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08devlink: Support setting max_io_eqsParav Pandit
Many devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-06net: ethtool: Allow passing a phy index for some commandsMaxime Chevallier
Some netlink commands are target towards ethernet PHYs, to control some of their features. As there's several such commands, add the ability to pass a PHY index in the ethnl request, which will populate the generic ethnl_req_info with the relevant phydev when the command targets a PHY. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-05net/mlx5e: Introduce timestamps statistic counter for Tx DMA layerRahul Rameshbabu
Count number of transmitted packets that were hardware timestamped at the device DMA layer. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-4-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05net/mlx5e: Introduce lost_cqe statistic counter for PTP Tx port timestamping CQRahul Rameshbabu
Track the number of times a CQE was expected to not be delivered on PTP Tx port timestamping CQ. A CQE is expected to not be delivered if a certain amount of time passes since the corresponding CQE containing the DMA timestamp information has arrived. Increment the late_cqe counter when such a CQE does manage to be delivered to the CQ. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-3-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05ethtool: add interface to read Tx hardware timestamping statisticsRahul Rameshbabu
Multiple network devices that support hardware timestamping appear to have common behavior with regards to timestamp handling. Implement common Tx hardware timestamping statistics in a tx_stats struct_group. Common Rx hardware timestamping statistics can subsequently be implemented in a rx_stats struct_group for ethtool_ts_stats. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-2-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28Documentation: Add documentation for eswitch attributeWilliam Tu
Provide devlink documentation for three eswitch attributes: mode, inline-mode, and encap-mode. Signed-off-by: William Tu <witu@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240325181228.6244-1-witu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-26dns_resolver: correct module name in dns resolver documentationBharath SM
Fix an incorrect module name and sysfs path in dns resolver documentation. Signed-off-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240324104338.44083-1-bharathsm@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-20ionic: update documentation for XDP supportShannon Nelson
Add information to our documentation for the XDP features and related ethtool stats. While we're here, we also add the missing timestamp stats. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240319163534.38796-1-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19net: move dev->state into net_device_read_txrx groupEric Dumazet
dev->state can be read in rx and tx fast paths. netif_running() which needs dev->state is called from - enqueue_to_backlog() [RX path] - __dev_direct_xmit() [TX path] Fixes: 43a71cd66b9c ("net-device: reorganize net_device fast path variables") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Coco Li <lixiaoyan@google.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240314200845.3050179-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-14docs: networking: fix indentation errors in multi-pf-netdevJakub Kicinski
Stephen reports new warnings in the docs: Documentation/networking/multi-pf-netdev.rst:94: ERROR: Unexpected indentation. Documentation/networking/multi-pf-netdev.rst:106: ERROR: Unexpected indentation. Fixes: 77d9ec3f6c8c ("Documentation: networking: Add description for multi-pf netdev") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/all/20240312153304.0ef1b78e@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240313032329.3919036-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-12Merge tag 'net-next-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
2024-03-12Merge tag 'docs-6.9' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A moderatly busy cycle for development this time around. - Some cleanup of the main index page for easier navigation - Rework some of the other top-level pages for better readability and, with luck, fewer merge conflicts in the future. - Submit-checklist improvements, hopefully the first of many. - New Italian translations - A fair number of kernel-doc fixes and improvements. We have also dropped the recommendation to use an old version of Sphinx. - A new document from Thorsten on bisection ... and lots of fixes and updates" * tag 'docs-6.9' of git://git.lwn.net/linux: (54 commits) docs: verify/bisect: fixes, finetuning, and support for Arch docs: Makefile: Add dependency to $(YNL_INDEX) for targets other than htmldocs docs: Move ja_JP/howto.rst to ja_JP/process/howto.rst docs: submit-checklist: use subheadings docs: submit-checklist: structure by category docs: new text on bisecting which also covers bug validation docs: drop the version constraints for sphinx and dependencies docs: kerneldoc-preamble.sty: Remove code for Sphinx <2.4 docs: Restore "smart quotes" for quotes docs/zh_CN: accurate translation of "function" docs: Include simplified link titles in main index docs: Correct formatting of title in admin-guide/index.rst docs: kernel_feat.py: fix build error for missing files MAINTAINERS: Set the field name for subsystem profile section kasan: Add documentation for CONFIG_KASAN_EXTRA_INFO Fixed case issue with 'fault-injection' in documentation kernel-doc: handle #if in enums as well Documentation: update mailing list addresses doc: kerneldoc.py: fix indentation scripts/kernel-doc: simplify signature printing ...
2024-03-11net: netconsole: Add continuation line prefix to userdata messagesMatthew Wood
Add a space (' ') prefix to every userdata line to match docs for dev-kmsg. To account for this extra character in each userdata entry, reduce userdata entry names (directory name) from 54 characters to 53. According to the dev-kmsg docs, a space is used for subsequent lines to mark them as continuation lines. > A line starting with ' ', is a continuation line, adding > key/value pairs to the log message, which provide the machine > readable context of the message, for reliable processing in > userspace. Testing for this patch:: cd /sys/kernel/config/netconsole && mkdir cmdline0 cd cmdline0 mkdir userdata/test && echo "hello" > userdata/test/value mkdir userdata/test2 && echo "hello2" > userdata/test2/value echo "message" > /dev/kmsg Outputs:: 6.8.0-rc5-virtme,12,493,231373579,-;message test=hello test2=hello2 And I confirmed all testing works as expected from the original patchset Fixes: df03f830d099 ("net: netconsole: cache userdata formatted string in netconsole_target") Signed-off-by: Matthew Wood <thepacketgeek@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240308002525.248672-1-thepacketgeek@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08Merge tag 'mlx5-socket-direct-v3' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Support Multi-PF netdev (Socket Direct) This series adds support for combining multiple devices (PFs) of the same port under one netdev instance. Passing traffic through different devices belonging to different NUMA sockets saves cross-numa traffic and allows apps running on the same netdev from different numas to still feel a sense of proximity to the device and achieve improved performance. We achieve this by grouping PFs together, and creating the netdev only once all group members are probed. Symmetrically, we destroy the netdev once any of the PFs is removed. The channels are distributed between all devices, a proper configuration would utilize the correct close numa when working on a certain app/cpu. We pick one device to be a primary (leader), and it fills a special role. The other devices (secondaries) are disconnected from the network in the chip level (set to silent mode). All RX/TX traffic is steered through the primary to/from the secondaries. Currently, we limit the support to PFs only, and up to two devices (sockets). * tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: Documentation: networking: Add description for multi-pf netdev net/mlx5: Enable SD feature net/mlx5e: Block TLS device offload on combined SD netdev net/mlx5e: Support per-mdev queue counter net/mlx5e: Support cross-vhca RSS net/mlx5e: Let channels be SD-aware net/mlx5e: Create EN core HW resources for all secondary devices net/mlx5e: Create single netdev per SD group net/mlx5: SD, Add debugfs net/mlx5: SD, Add informative prints in kernel log net/mlx5: SD, Implement steering for primary and secondaries net/mlx5: SD, Implement devcom communication and primary election net/mlx5: SD, Implement basic query and instantiation net/mlx5: SD, Introduce SD lib net/mlx5: Add MPIR bit in mcam_access_reg ==================== Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07netdev: add per-queue statisticsJakub Kicinski
The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07doc: sfp-phylink: update the porting guide with PCS handlingMaxime Chevallier
Now that phylink has a comprehensive PCS support, update the porting guide to explain the process of supporting the PCS configuration. This also removed outdated references to phylink_config fields that no longer exists. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-07Documentation: networking: Add description for multi-pf netdevTariq Toukan
Add documentation for the multi-pf netdev feature. Describe the mlx5 implementation and design decisions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>