summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-10Merge branch 'bpf, mm: introduce cgroup.memory=nobpf'Alexei Starovoitov
Yafang Shao says: ==================== The bpf memory accouting has some known problems in contianer environment, - The container memory usage is not consistent if there's pinned bpf program After the container restart, the leftover bpf programs won't account to the new generation, so the memory usage of the container is not consistent. This issue can be resolved by introducing selectable memcg, but we don't have an agreement on the solution yet. See also the discussions at https://lwn.net/Articles/905150/ . - The leftover non-preallocated bpf map can't be limited The leftover bpf map will be reparented, and thus it will be limited by the parent, rather than the container itself. Furthermore, if the parent is destroyed, it be will limited by its parent's parent, and so on. It can also be resolved by introducing selectable memcg. - The memory dynamically allocated in bpf prog is charged into root memcg only Nowdays the bpf prog can dynamically allocate memory, for example via bpf_obj_new(), but it only allocate from the global bpf_mem_alloc pool, so it will charge into root memcg only. That needs to be addressed by a new proposal. So let's give the container user an option to disable bpf memory accouting. The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/ Changes, v1->v2: - squash patches (Roman) - commit log improvement in patch #2. (Johannes) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10bpf: allow to disable bpf prog memory accountingYafang Shao
We can simply disable the bpf prog memory accouting by not setting the GFP_ACCOUNT. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-5-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10bpf: allow to disable bpf map memory accountingYafang Shao
We can simply set root memcg as the map's memcg to disable bpf memory accounting. bpf_map_area_alloc is a little special as it gets the memcg from current rather than from the map, so we need to disable GFP_ACCOUNT specifically for it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10bpf: use bpf_map_kvcalloc in bpf_local_storageYafang Shao
Introduce new helper bpf_map_kvcalloc() for the memory allocation in bpf_local_storage(). Then the allocation will charge the memory from the map instead of from current, though currently they are the same thing as it is only used in map creation path now. By charging map's memory into the memcg from the map, it will be more clear. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10mm: memcontrol: add new kernel parameter cgroup.memory=nobpfYafang Shao
Add new kernel parameter cgroup.memory=nobpf to allow user disable bpf memory accounting. This is a preparation for the followup patch. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20230210154734.4416-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10docs, bpf: Ensure IETF's BPF mailing list gets copied for ISA doc changesDaniel Borkmann
Given BPF is increasingly being used beyond just the Linux kernel, with implementations in NICs and other hardware, Windows, etc, there is an ongoing effort to document and standardize parts of the existing BPF infrastructure such as its ISA. As "source of truth" we decided some time ago to rely on the in-tree documentation, in particular, starting out with the Documentation/bpf/instruction-set.rst as a base for later RFC drafts on the ISA. Therefore, we want to ensure that changes to that document have bpf@ietf.org in Cc, so add a MAINTAINERS file entry with a section on documents related to standardization efforts. For now, this only relates to instruction-set.rst, and later additional files will be added. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Thaler <dthaler@microsoft.com> Cc: bpf@ietf.org Link: https://datatracker.ietf.org/doc/bofreq-thaler-bpf-ebpf/ Link: https://lore.kernel.org/r/57619c0dd8e354d82bf38745f99405e3babdc970.1676068387.git.daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-10Daniel Borkmann says:Jakub Kicinski
==================== pull-request: bpf-next 2023-02-11 We've added 96 non-merge commits during the last 14 day(s) which contain a total of 152 files changed, 4884 insertions(+), 962 deletions(-). There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c between commit 5b246e533d01 ("ice: split probe into smaller functions") from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev() is otherwise there a 2nd time, and add XDP features to the existing ice_cfg_netdev() one: [...] ice_set_netdev_features(netdev); netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT | NETDEV_XDP_ACT_XSK_ZEROCOPY; ice_set_ops(netdev); [...] Stephen's merge conflict mail: https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/ The main changes are: 1) Add support for BPF trampoline on s390x which finally allows to remove many test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich. 2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski. 3) Add capability to export the XDP features supported by the NIC. Along with that, add a XDP compliance test tool, from Lorenzo Bianconi & Marek Majtyka. 4) Add __bpf_kfunc tag for marking kernel functions as kfuncs, from David Vernet. 5) Add a deep dive documentation about the verifier's register liveness tracking algorithm, from Eduard Zingerman. 6) Fix and follow-up cleanups for resolve_btfids to be compiled as a host program to avoid cross compile issues, from Jiri Olsa & Ian Rogers. 7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted when testing on different NICs, from Jesper Dangaard Brouer. 8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang. 9) Extend libbpf to add an option for when the perf buffer should wake up, from Jon Doron. 10) Follow-up fix on xdp_metadata selftest to just consume on TX completion, from Stanislav Fomichev. 11) Extend the kfuncs.rst document with description on kfunc lifecycle & stability expectations, from David Vernet. 12) Fix bpftool prog profile to skip attaching to offline CPUs, from Tonghao Zhang. ==================== Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Documentation: isdn: correct spellingRandy Dunlap
Correct spelling problems for Documentation/isdn/ as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20230209071400.31476-9-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10Merge branch ↵Jakub Kicinski
'net-move-more-duplicate-code-of-ovs-and-tc-conntrack-into-nf_conntrack_ovs' Xin Long says: ==================== net: move more duplicate code of ovs and tc conntrack into nf_conntrack_ovs We've moved some duplicate code into nf_nat_ovs in: "net: eliminate the duplicate code in the ct nat functions of ovs and tc" This patchset addresses more code duplication in the conntrack of ovs and tc then creates nf_conntrack_ovs for them, and four functions will be extracted and moved into it: nf_ct_handle_fragments() nf_ct_skb_network_trim() nf_ct_helper() nf_ct_add_helper() ==================== Link: https://lore.kernel.org/r/cover.1675810210.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: extract nf_ct_handle_fragments to nf_conntrack_ovsXin Long
Now handle_fragments() in OVS and TC have the similar code, and this patch removes the duplicate code by moving the function to nf_conntrack_ovs. Note that skb_clear_hash(skb) or skb->ignore_df = 1 should be done only when defrag returns 0, as it does in other places in kernel. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: sched: move frag check and tc_skb_cb update out of handle_fragmentsXin Long
This patch has no functional changes and just moves frag check and tc_skb_cb update out of handle_fragments, to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10openvswitch: move key and ovs_cb update out of handle_fragmentsXin Long
This patch has no functional changes and just moves key and ovs_cb update out of handle_fragments, and skb_clear_hash() and skb->ignore_df change into handle_fragments(), to make it easier to move the duplicate code from handle_fragments() into nf_conntrack_ovs later. Note that it changes to pass info->family to handle_fragments() instead of key for the packet type check, as info->family is set according to key->eth.type in ovs_ct_copy_action() when creating the action. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: extract nf_ct_skb_network_trim function to nf_conntrack_ovsXin Long
There are almost the same code in ovs_skb_network_trim() and tcf_ct_skb_network_trim(), this patch extracts them into a function nf_ct_skb_network_trim() and moves the function to nf_conntrack_ovs. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10net: create nf_conntrack_ovs for ovs and tc useXin Long
Similar to nf_nat_ovs created by Commit ebddb1404900 ("net: move the nat function to nf_nat_ovs for ovs and tc"), this patch is to create nf_conntrack_ovs to get these functions shared by OVS and TC only. There are nf_ct_helper() and nf_ct_add_helper() from nf_conntrak_helper in this patch, and will be more in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-10libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()Ilya Leoshkevich
The code assumes that everything that comes after nlmsgerr are nlattrs. When calculating their size, it does not account for the initial nlmsghdr. This may lead to accessing uninitialized memory. Fixes: bbf48c18ee0c ("libbpf: add error reporting in XDP") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-8-iii@linux.ibm.com
2023-02-10selftests/bpf: Attach to fopen()/fclose() in attach_probeIlya Leoshkevich
malloc() and free() may be completely replaced by sanitizers, use fopen() and fclose() instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-7-iii@linux.ibm.com
2023-02-10selftests/bpf: Attach to fopen()/fclose() in uprobe_autoattachIlya Leoshkevich
malloc() and free() may be completely replaced by sanitizers, use fopen() and fclose() instead. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-6-iii@linux.ibm.com
2023-02-10selftests/bpf: Forward SAN_CFLAGS and SAN_LDFLAGS to runqslower and libbpfIlya Leoshkevich
To get useful results from the Memory Sanitizer, all code running in a process needs to be instrumented. When building tests with other sanitizers, it's not strictly necessary, but is also helpful. So make sure runqslower and libbpf are compiled with SAN_CFLAGS and linked with SAN_LDFLAGS. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-5-iii@linux.ibm.com
2023-02-10selftests/bpf: Split SAN_CFLAGS and SAN_LDFLAGSIlya Leoshkevich
Memory Sanitizer requires passing different options to CFLAGS and LDFLAGS: besides the mandatory -fsanitize=memory, one needs to pass header and library paths, and passing -L to a compilation step triggers -Wunused-command-line-argument. So introduce a separate variable for linker flags. Use $(SAN_CFLAGS) as a default in order to avoid complicating the ASan usage. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-4-iii@linux.ibm.com
2023-02-10tools: runqslower: Add EXTRA_CFLAGS and EXTRA_LDFLAGS supportIlya Leoshkevich
This makes it possible to add sanitizer flags. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-3-iii@linux.ibm.com
2023-02-10selftests/bpf: Quote host toolsIlya Leoshkevich
Using HOSTCC="ccache clang" breaks building the tests, since, when it's forwarded to e.g. bpftool, the child make sees HOSTCC=ccache and "clang" is considered a target. Fix by quoting it, and also HOSTLD and HOSTAR for consistency. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230210001210.395194-2-iii@linux.ibm.com
2023-02-10net: skbuff: drop the word head from skb cacheJakub Kicinski
skbuff_head_cache is misnamed (perhaps for historical reasons?) because it does not hold heads. Head is the buffer which skb->data points to, and also where shinfo lives. struct sk_buff is a metadata structure, not the head. Eric recently added skb_small_head_cache (which allocates actual head buffers), let that serve as an excuse to finally clean this up :) Leave the user-space visible name intact, it could possibly be uAPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10Merge branch 'net-ipa-GSI'David S. Miller
Alex Elder says: ==================== net: ipa: prepare for GSI register updtaes An upcoming series (or two) will convert the definitions of GSI registers used by IPA so they use the "IPA reg" mechanism to specify register offsets and their fields. This will simplify implementing the fairly large number of changes required in GSI registers to support more than 32 GSI channels (introduced in IPA v5.0). A few minor problems and inconsistencies were found, and they're fixed here. The last three patches in this series change the "ipa_reg" code to separate the IPA-specific part (the base virtual address, basically) from the generic register part, and the now- generic code is renamed to use just "reg_" or "REG_" as a prefix rather than "ipa_reg" or "IPA_REG_". ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: generalize register field functionsAlex Elder
Rename functions related to register fields so they don't appear to be IPA-specific, and move their definitions into "reg.h": ipa_reg_fmask() -> reg_fmask() ipa_reg_bit() -> reg_bit() ipa_reg_field_max() -> reg_field_max() ipa_reg_encode() -> reg_encode() ipa_reg_decode() -> reg_decode() Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: generalize register offset functionsAlex Elder
Rename ipa_reg_offset() to be reg_offset() and move its definition to "reg.h". Rename ipa_reg_n_offset() to be reg_n_offset() also. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: start generalizing "ipa_reg"Alex Elder
IPA register definitions have evolved with each new version. The changes required to support more than 32 endpoints in IPA v5.0 made it best to define a unified mechanism for defining registers and their fields. GSI register definitions, meanwhile, have remained fairly stable. And even as the total number of IPA endpoints goes beyond 32, the number of GSI channels on a given EE that underly endpoints still remains 32 or less. Despite that, GSI v3.0 (which is used with IPA v5.0) extends the number of channels (and events) it supports to be about 256, and as a result, many GSI register definitions must change significantly. To address this, we'll use the same "ipa_reg" mechanism to define the GSI registers. As a first step in generalizing the "ipa_reg" to also support GSI registers, isolate the definitions of the "ipa_reg" and "ipa_regs" structure types (and some supporting macros) into a new header file, and remove the "ipa_" and "IPA_" from symbol names. Separate the IPA register ID validity checking from the generic check that a register ID is in range. Aside from that, this is intended to have no functional effect on the code. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: GSI register cleanupAlex Elder
Move some static inline function definitions out of "gsi_reg.h" and into "gsi.c", which is the only place they're used. Rename them so their names identify the register they're associated with. Move the gsi_channel_type enumerated type definition below the offset and field definitions for the CH_C_CNTXT_0 register where it's used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: use bitmasks for GSI IRQ valuesAlex Elder
There are seven GSI interrupt types that can be signaled by a single GSI IRQ. These are represented in a bitmask, and the gsi_irq_type_id enumerated type defines what each bit position represents. Similarly, the global and general GSI interrupt types each has a set of conditions it signals, and both types have an enumerated type that defines which bit that represents each condition. When used, these enumerated values are passed as an argument to BIT() in *all* cases. So clean up the code a little bit by defining the enumerated type values as one-bit masks rather than bit positions. Rename gsi_general_id to be gsi_general_irq_id for consistency. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: tighten up IPA register validity checkingAlex Elder
When checking the validity of an IPA register ID, compare it against all possible ipa_reg_id values. Rename the function ipa_reg_id_valid() to be specific about what's being checked. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: add some new IPA versionsAlex Elder
Soon IPA v5.0+ will be supported, and when that happens we will be able to enable support for the SDX65 (IPA v5.0), SM8450 (IPA v5.1), and SM8550 (IPA v5.5). Fix the comment about the GSI version used for IPA v3.1. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: get rid of ipa->reg_addrAlex Elder
The reg_addr field in the IPA structure is set but never used. Get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: ipa: generic command param fixAlex Elder
Starting at IPA v4.11, the GSI_GENERIC_COMMAND GSI register got a new PARAMS field. The code that encodes a value into that field sets it unconditionally, which is wrong. We currently only provide 0 as the field's value, so this error has no real effect. Still, it's a bug, so let's fix it. Fix an (unrelated) incorrect comment as well. Fields in the ERROR_LOG GSI register actually *are* defined for IPA versions prior to v3.5.1. Fixes: fe68c43ce388 ("net: ipa: support enhanced channel flow control") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10net: microchip: vcap: Add tc flower keys for lan966xHoratiu Vultur
Add the following TC flower filter keys to lan966x for IS2: - ipv4_addr (sip and dip) - ipv6_addr (sip and dip) - control (IPv4 fragments) - portnum (tcp and udp port numbers) - basic (L3 and L4 protocol) - vlan (outer vlan tag info) - tcp (tcp flags) - ip (tos field) As the parsing of these keys is similar between lan966x and sparx5, move the code in a separate file to be shared by these 2 chips. And put the specific parsing outside of the common functions. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-10Merge tag 'rxrpc-next-20230208' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc development Here are some miscellaneous changes for rxrpc: (1) Use consume_skb() rather than kfree_skb_reason(). (2) Fix unnecessary waking when poking and already-poked call. (3) Add ack.rwind to the rxrpc_tx_ack tracepoint as this indicates how many incoming DATA packets we're telling the peer that we are currently willing to accept on this call. (4) Reduce duplicate ACK transmission. We send ACKs to let the peer know that we're increasing the receive window (ack.rwind) as we consume packets locally. Normal ACK transmission is triggered in three places and that leads to duplicates being sent. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-09net: pcs: rzn1-miic: remove unused struct members and use miic variableClément Léger
Remove unused bulk clocks struct from the miic state and use an already existing miic variable in miic_config(). Signed-off-by: Clément Léger <clement.leger@bootlin.com> Link: https://lore.kernel.org/r/20230208161249.329631-1-clement.leger@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09openvswitch: Use string_is_terminated() helperAndy Shevchenko
Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09genetlink: Use string_is_terminated() helperAndy Shevchenko
Use string_is_terminated() helper instead of cpecific memchr() call. This shows better the intention of the call. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09string_helpers: Move string_is_valid() to the headerAndy Shevchenko
Move string_is_valid() to the header for wider use. While at it, rename to string_is_terminated() to be precise about its semantics. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230208133153.22528-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net: micrel: Cable Diagnostics feature for lan8841 PHYHoratiu Vultur
Add support for cable diagnostics in lan8841 PHY. It has the same registers layout as lan8814 PHY, therefore reuse the functionality. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230208114406.1666671-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09nfp: support IPsec offloading for NFP3800Huanhuan Wang
Add IPsec offloading support for NFP3800. Include data plane and control plane. Data plane: add IPsec packet process flow in NFP3800 datapath (NFDk). Control plane: add an algorithm support distinction flow in xfrm hook function xdo_dev_state_add(), as NFP3800 has a different set of IPsec algorithm support. This matches existing support for the NFP6000/NFP4000 and their NFD3 datapath. In addition, fixup the md_bytes calculation for NFD3 datapath to make sure the two datapahts are keept in sync. Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230208091000.4139974-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge branch 'net-introduce-rps_default_mask'Jakub Kicinski
Paolo Abeni says: ==================== net: introduce rps_default_mask Real-time setups try hard to ensure proper isolation between time critical applications and e.g. network processing performed by the network stack in softirq and RPS is used to move the softirq activity away from the isolated core. If the network configuration is dynamic, with netns and devices routinely created at run-time, enforcing the correct RPS setting on each newly created device allowing to transient bad configuration became complex. Additionally, when multi-queue devices are involved, configuring rps in user-space on each queue easily becomes very expensive, e.g. some setups use veths with per cpu queues. These series try to address the above, introducing a new sysctl knob: rps_default_mask. The new sysctl entry allows configuring a netns-wide RPS mask, to be enforced since receive queue creation time without any fourther per device configuration required. Additionally, a simple self-test is introduced to check the rps_default_mask behavior. ==================== Link: https://lore.kernel.org/r/cover.1675789134.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09self-tests: introduce self-tests for RPS default maskPaolo Abeni
Ensure that RPS default mask changes take place on all newly created netns/devices and don't affect existing ones. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net: introduce default_rps_mask netns attributePaolo Abeni
If RPS is enabled, this allows configuring a default rps mask, which is effective since receive queue creation time. A default RPS mask allows the system admin to ensure proper isolation, avoiding races at network namespace or device creation time. The default RPS mask is initially empty, and can be modified via a newly added sysctl entry. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net-sysctl: factor-out rpm mask manipulation helpersPaolo Abeni
Will simplify the following patch. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09net-sysctl: factor out cpumask parsing helperPaolo Abeni
Will be used by the following patch to avoid code duplication. No functional changes intended. The only difference is that now flow_limit_cpu_sysctl() will always compute the flow limit mask on each read operation, even when read() will not return any byte to user-space. Note that the new helper is placed under a new #ifdef at the file start to better fit the usage in the later patch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
net/devlink/leftover.c / net/core/devlink.c: 565b4824c39f ("devlink: change port event netdev notifier from per-net to global") f05bd8ebeb69 ("devlink: move code to a dedicated directory") 687125b5799c ("devlink: split out core code") https://lore.kernel.org/all/20230208094657.379f2b1a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09tools/resolve_btfids: Pass HOSTCFLAGS as EXTRA_CFLAGS to prepare targetsJiri Olsa
Thorsten reported build issue with command line that defined extra HOSTCFLAGS that were not passed into 'prepare' targets, but were used to build resolve_btfids objects. This results in build fail when these objects are linked together: /usr/bin/ld: /build.../tools/bpf/resolve_btfids//libbpf/libbpf.a(libbpf-in.o): relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a PIE \ object; recompile with -fPIE Fixing this by passing HOSTCFLAGS in EXTRA_CFLAGS as part of HOST_OVERRIDES variable for prepare targets. [1] https://lore.kernel.org/bpf/f7922132-6645-6316-5675-0ece4197bfff@leemhuis.info/ Fixes: 56a2df7615fa ("tools/resolve_btfids: Compile resolve_btfids as host program") Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/bpf/20230209143735.4112845-1-jolsa@kernel.org
2023-02-09net: enable usercopy for skb_small_head_cacheEric Dumazet
syzbot and other bots reported that we have to enable user copy to/from skb->head. [1] We can prevent access to skb_shared_info, which is a nice improvement over standard kmem_cache. Layout of these kmem_cache objects is: < SKB_SMALL_HEAD_HEADROOM >< struct skb_shared_info > usercopy: Kernel memory overwrite attempt detected to SLUB object 'skbuff_small_head' (offset 32, size 20)! ------------[ cut here ]------------ kernel BUG at mm/usercopy.c:102 ! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6-syzkaller-01425-gcb6b2e11a42d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 RIP: 0010:usercopy_abort+0xbd/0xbf mm/usercopy.c:102 Code: e8 ee ad ba f7 49 89 d9 4d 89 e8 4c 89 e1 41 56 48 89 ee 48 c7 c7 20 2b 5b 8a ff 74 24 08 41 57 48 8b 54 24 20 e8 7a 17 fe ff <0f> 0b e8 c2 ad ba f7 e8 7d fb 08 f8 48 8b 0c 24 49 89 d8 44 89 ea RSP: 0000:ffffc90000067a48 EFLAGS: 00010286 RAX: 000000000000006b RBX: ffffffff8b5b6ea0 RCX: 0000000000000000 RDX: ffff8881401c0000 RSI: ffffffff8166195c RDI: fffff5200000cf3b RBP: ffffffff8a5b2a60 R08: 000000000000006b R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000000 R12: ffffffff8bf2a925 R13: ffffffff8a5b29a0 R14: 0000000000000014 R15: ffffffff8a5b2960 FS: 0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000c48e000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __check_heap_object+0xdd/0x110 mm/slub.c:4761 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:251 [inline] __check_object_size+0x1da/0x5a0 mm/usercopy.c:213 check_object_size include/linux/thread_info.h:199 [inline] check_copy_size include/linux/thread_info.h:235 [inline] copy_from_iter include/linux/uio.h:186 [inline] copy_from_iter_full include/linux/uio.h:194 [inline] memcpy_from_msg include/linux/skbuff.h:3977 [inline] qrtr_sendmsg+0x65f/0x970 net/qrtr/af_qrtr.c:965 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745 say_hello+0xf6/0x170 net/qrtr/ns.c:325 qrtr_ns_init+0x220/0x2b0 net/qrtr/ns.c:804 qrtr_proto_init+0x59/0x95 net/qrtr/af_qrtr.c:1296 do_one_initcall+0x141/0x790 init/main.c:1306 do_initcall_level init/main.c:1379 [inline] do_initcalls init/main.c:1395 [inline] do_basic_setup init/main.c:1414 [inline] kernel_init_freeable+0x6f9/0x782 init/main.c:1634 kernel_init+0x1e/0x1d0 init/main.c:1522 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308 </TASK> Fixes: bf9f1baa279f ("net: add dedicated kmem_cache for typical/small skb->head") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Link: https://lore.kernel.org/linux-next/CA+G9fYs-i-c2KTSA7Ai4ES_ZESY1ZnM=Zuo8P1jN00oed6KHMA@mail.gmail.com Link: https://lore.kernel.org/r/20230208142508.3278406-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-09Merge tag 'net-6.2-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can and ipsec subtrees. Current release - regressions: - sched: fix off by one in htb_activate_prios() - eth: mana: fix accessing freed irq affinity_hint - eth: ice: fix out-of-bounds KASAN warning in virtchnl Current release - new code bugs: - eth: mtk_eth_soc: enable special tag when any MAC uses DSA Previous releases - always broken: - core: fix sk->sk_txrehash default - neigh: make sure used and confirmed times are valid - mptcp: be careful on subflow status propagation on errors - xfrm: prevent potential spectre v1 gadget in xfrm_xlate32_attr() - phylink: move phy_device_free() to correctly release phy device - eth: mlx5: - fix crash unsetting rx-vlan-filter in switchdev mode - fix hang on firmware reset - serialize module cleanup with reload and remove" * tag 'net-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) selftests: forwarding: lib: quote the sysctl values net: mscc: ocelot: fix all IPv6 getting trapped to CPU when PTP timestamping is used rds: rds_rm_zerocopy_callback() use list_first_entry() net: txgbe: Update support email address selftests: Fix failing VXLAN VNI filtering test selftests: mptcp: stop tests earlier selftests: mptcp: allow more slack for slow test-case mptcp: be careful on subflow status propagation on errors mptcp: fix locking for in-kernel listener creation mptcp: fix locking for setsockopt corner-case mptcp: do not wait for bare sockets' timeout net: ethernet: mtk_eth_soc: fix DSA TX tag hwaccel for switch port 0 nfp: ethtool: fix the bug of setting unsupported port speed txhash: fix sk->sk_txrehash default net: ethernet: mtk_eth_soc: fix wrong parameters order in __xdp_rxq_info_reg() net: ethernet: mtk_eth_soc: enable special tag when any MAC uses DSA net: sched: sch: Fix off by one in htb_activate_prios() igc: Add ndo_tx_timeout support net: mana: Fix accessing freed irq affinity_hint hv_netvsc: Allocate memory in netvsc_dma_map() with GFP_ATOMIC ...
2023-02-09Merge tag 'for-linus-2023020901' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - fix potential infinite loop with a badly crafted HID device (Xin Zhao) - fix regression from 6.1 in USB logitech devices potentially making their mouse wheel not working (Bastien Nocera) - clean up in AMD sensors, which fixes a long time resume bug (Mario Limonciello) - few device small fixes and quirks * tag 'for-linus-2023020901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: Ignore battery for ELAN touchscreen 29DF on HP HID: amd_sfh: if no sensors are enabled, clean up HID: logitech: Disable hi-res scrolling on USB HID: core: Fix deadloop in hid_apply_multiplier. HID: Ignore battery for Elan touchscreen on Asus TP420IA HID: elecom: add support for TrackBall 056E:011C