summaryrefslogtreecommitdiff
path: root/Documentation/networking/af_xdp.rst
AgeCommit message (Collapse)Author
2025-07-10net: xsk: introduce XDP_MAX_TX_SKB_BUDGET setsockoptJason Xing
This patch provides a setsockopt method to let applications leverage to adjust how many descs to be handled at most in one send syscall. It mitigates the situation where the default value (32) that is too small leads to higher frequency of triggering send syscall. Considering the prosperity/complexity the applications have, there is no absolutely ideal suggestion fitting all cases. So keep 32 as its default value like before. The patch does the following things: - Add XDP_MAX_TX_SKB_BUDGET socket option. - Set max_tx_budget to 32 by default in the initialization phase as a per-socket granular control. - Set the range of max_tx_budget as [32, xs->tx->nentries]. The idea behind this comes out of real workloads in production. We use a user-level stack with xsk support to accelerate sending packets and minimize triggering syscalls. When the packets are aggregated, it's not hard to hit the upper bound (namely, 32). The moment user-space stack fetches the -EAGAIN error number passed from sendto(), it will loop to try again until all the expected descs from tx ring are sent out to the driver. Enlarging the XDP_MAX_TX_SKB_BUDGET value contributes to less frequency of sendto() and higher throughput/PPS. Here is what I did in production, along with some numbers as follows: For one application I saw lately, I suggested using 128 as max_tx_budget because I saw two limitations without changing any default configuration: 1) XDP_MAX_TX_SKB_BUDGET, 2) socket sndbuf which is 212992 decided by net.core.wmem_default. As to XDP_MAX_TX_SKB_BUDGET, the scenario behind this was I counted how many descs are transmitted to the driver at one time of sendto() based on [1] patch and then I calculated the possibility of hitting the upper bound. Finally I chose 128 as a suitable value because 1) it covers most of the cases, 2) a higher number would not bring evident results. After twisting the parameters, a stable improvement of around 4% for both PPS and throughput and less resources consumption were found to be observed by strace -c -p xxx: 1) %time was decreased by 7.8% 2) error counter was decreased from 18367 to 572 [1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/ Signed-off-by: Jason Xing <kernelxing@tencent.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250704160138.48677-1-kerneljasonxing@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-09Documentation: xsk: correct the obsolete references and examplesJason Xing
The modified lines are mainly related to the following commits[1][2] which remove those tests and examples. Since samples/bpf has been deprecated, we can refer to more examples that are easily searched in the various xdp-projects, like the following link: https://github.com/xdp-project/bpf-examples/tree/main/AF_XDP-example [1] commit f36600634282 ("libbpf: move xsk.{c,h} into selftests/bpf") [2] commit cfb5a2dbf141 ("bpf, samples: Remove AF_XDP samples") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250708062907.11557-1-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05Revert "xsk: Document ability to redirect to any socket bound to the same umem"Magnus Karlsson
This reverts commit 968595a93669b6b4f6d1fcf80cf2d97956b6868f. Reported-by: Yuval El-Hanany <YuvalE@radware.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com Link: https://lore.kernel.org/bpf/20240604122927.29080-3-magnus.karlsson@gmail.com
2024-02-05xsk: document ability to redirect to any socket bound to the same umemMagnus Karlsson
Document the ability to redirect to any socket bound to the same umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20240205123553.22180-3-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19xsk: add multi-buffer documentationMagnus Karlsson
Add AF_XDP multi-buffer support documentation including two pseudo-code samples. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-18-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-04xsk: Honor SO_BINDTODEVICE on bindIlya Maximets
Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A privileged process might create the socket and pass it to a non-privileged process for later use. However, that process will be able to bind the socket to any network interface. Even though it will not be able to receive any traffic without modification of the BPF map, the situation is not ideal. Sockets already have a mechanism that can be used to restrict what interface they can be attached to. That is SO_BINDTODEVICE. To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow when non-privileged process is using AF_XDP. The intended workflow is following: 1. First process creates a bare socket with socket(AF_XDP, ...). 2. First process loads the XSK program to the interface. 3. First process adds the socket fd to a BPF map. 4. First process ties socket fd to a particular interface using SO_BINDTODEVICE. 5. First process sends socket fd to a second process. 6. Second process allocates UMEM. 7. Second process binds socket to the interface with bind(...). 8. Second process sends/receives the traffic. All the steps above are possible today if the first process is privileged and the second one has sufficient RLIMIT_MEMLOCK and no capabilities. However, the second process will be able to bind the socket to any interface it wants on step 7 and send traffic from it. With the proposed change, the second process will be able to bind the socket only to a specific interface chosen by the first process at step 4. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/20230703175329.3259672-1-i.maximets@ovn.org
2023-01-31Documentation: networking: correct spellingRandy Dunlap
Correct spelling problems for Documentation/networking/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Jiri Pirko <jiri@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20230129231053.20863-5-rdunlap@infradead.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2021-07-12doc, af_xdp: Fix bind flags option typoBaruch Siach
Fix XDP_ZERO_COPY flag typo since it is actually named XDP_ZEROCOPY instead as per if_xdp.h uapi header. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/1656fdf94704e9e735df0f8b97667d8f26dd098b.1625550240.git.baruch@tkos.co.il
2021-06-23docs, af_xdp: Consistent indentation in examplesIlya Maximets
Examples in this document use all kinds of indentation from 3 to 5 spaces and even mixed with tabs. Making them all even and equal to 4 spaces. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20210622185647.3705104-1-i.maximets@ovn.org
2020-08-31xsk: Documentation for XDP_SHARED_UMEM between queues and netdevsMagnus Karlsson
Add documentation for the XDP_SHARED_UMEM feature when a UMEM is shared between different queues and/or netdevs. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-16-git-send-email-magnus.karlsson@intel.com
2019-11-10xsk: Extend documentation for Rx|Tx-only sockets and shared umemsMagnus Karlsson
Add more documentation about the new Rx-only and Tx-only sockets in libbpf and also how libbpf can now support shared umems. Also found two pieces that could be improved in the text, that got fixed in this commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: William Tu <u9012063@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1573148860-30254-6-git-send-email-magnus.karlsson@intel.com
2019-10-23xsk: Improve documentation for AF_XDPMagnus Karlsson
Added sections on all the bind flags, libbpf, all the setsockopts and all the getsockopts. Also updated the document to reflect the latest features and to correct some spelling errors. v1 -> v2: * Updated XDP program with latest BTF map format * Added one more FAQ entry * Some minor edits and corrections v2 -> v3: * Simplified XDP_SHARED_UMEM example XDP program Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1571648224-16889-1-git-send-email-magnus.karlsson@intel.com
2019-08-31doc/af_xdp: include unaligned chunk caseKevin Laatz
The addition of unaligned chunks mode, the documentation needs to be updated to indicate that the incoming addr to the fill ring will only be masked if the user application is run in the aligned chunk mode. This patch also adds a line to explicitly indicate that the incoming addr will not be masked if running the user application in the unaligned chunk mode. Signed-off-by: Kevin Laatz <kevin.laatz@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-24xsk: sample kernel code is now in libbpfEric Leblond
Fix documentation that mention xdpsock_kern.c which has been replaced by code embedded in libbpf. Signed-off-by: Eric Leblond <eric@regit.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-21Documentation/networking: fix af_xdp.rst Sphinx warningsRandy Dunlap
Fix Sphinx warnings in Documentation/networking/af_xdp.rst by adding indentation: Documentation/networking/af_xdp.rst:319: WARNING: Literal block expected; none found. Documentation/networking/af_xdp.rst:326: WARNING: Literal block expected; none found. Fixes: 0f4a9b7d4ecb ("xsk: add FAQ to facilitate for first time users") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25xsk: add FAQ to facilitate for first time usersMagnus Karlsson
Added an FAQ section in Documentation/networking/af_xdp.rst to help first time users with common problems. As problems are getting identified, entries will be added to the FAQ. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-05bpf: typo fix in Documentation/networking/af_xdp.rstKonrad Djimeli
Fix a simple typo: Completetion -> Completion Signed-off-by: Konrad Djimeli <kdjimeli@igalia.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-04xsk: new descriptor addressing schemeBjörn Töpel
Currently, AF_XDP only supports a fixed frame-size memory scheme where each frame is referenced via an index (idx). A user passes the frame index to the kernel, and the kernel acts upon the data. Some NICs, however, do not have a fixed frame-size model, instead they have a model where a memory window is passed to the hardware and multiple frames are filled into that window (referred to as the "type-writer" model). By changing the descriptor format from the current frame index addressing scheme, AF_XDP can in the future be extended to support these kinds of NICs. In the index-based model, an idx refers to a frame of size frame_size. Addressing a frame in the UMEM is done by offseting the UMEM starting address by a global offset, idx * frame_size + offset. Communicating via the fill- and completion-rings are done by means of idx. In this commit, the idx is removed in favor of an address (addr), which is a relative address ranging over the UMEM. To convert an idx-based address to the new addr is simply: addr = idx * frame_size + offset. We also stop referring to the UMEM "frame" as a frame. Instead it is simply called a chunk. To transfer ownership of a chunk to the kernel, the addr of the chunk is passed in the fill-ring. Note, that the kernel will mask addr to make it chunk aligned, so there is no need for userspace to do that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or 3000 to the fill-ring will refer to the same chunk. On the completion-ring, the addr will match that of the Tx descriptor, passed to the kernel. Changing the descriptor format to use chunks/addr will allow for future changes to move to a type-writer based model, where multiple frames can reside in one chunk. In this model passing one single chunk into the fill-ring, would potentially result in multiple Rx descriptors. This commit changes the uapi of AF_XDP sockets, and updates the documentation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-03samples/bpf: sample application and documentation for AF_XDP socketsMagnus Karlsson
This is a sample application for AF_XDP sockets. The application supports three different modes of operation: rxdrop, txonly and l2fwd. To show-case a simple round-robin load-balancing between a set of sockets in an xskmap, set the RR_LB compile time define option to 1 in "xdpsock.h". v2: The entries variable was calculated twice in {umem,xq}_nb_avail. Co-authored-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>