summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-03libbpf: Factor out low-level BPF program loading helperAndrii Nakryiko
Refactor low-level API for BPF program loading to not rely on public API types. This allows painless extension without constant efforts to cleverly not break backwards compatibility. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-12-andrii@kernel.org
2020-12-03bpf: Allow to specify kernel module BTFs when attaching BPF programsAndrii Nakryiko
Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-11-andrii@kernel.org
2020-12-03bpf: Remove hard-coded btf_vmlinux assumption from BPF verifierAndrii Nakryiko
Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-10-andrii@kernel.org
2020-12-03selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTFAndrii Nakryiko
Add a self-tests validating libbpf is able to perform CO-RE relocations against the type defined in kernel module BTF. if bpf_testmod.o is not supported by the kernel (e.g., due to version mismatch), skip tests, instead of failing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-9-andrii@kernel.org
2020-12-03selftests/bpf: Add support for marking sub-tests as skippedAndrii Nakryiko
Previously skipped sub-tests would be counted as passing with ":OK" appened in the log. Change that to be accounted as ":SKIP". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-8-andrii@kernel.org
2020-12-03selftests/bpf: Add bpf_testmod kernel module for testingAndrii Nakryiko
Add bpf_testmod module, which is conceptually out-of-tree module and provides ways for selftests/bpf to test various kernel module-related functionality: raw tracepoint, fentry/fexit/fmod_ret, etc. This module will be auto-loaded by test_progs test runner and expected by some of selftests to be present and loaded. Pahole currently isn't able to generate BTF for static functions in kernel modules, so make sure traced function is global. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201203204634.1325171-7-andrii@kernel.org
2020-12-03libbpf: Add kernel module BTF support for CO-RE relocationsAndrii Nakryiko
Teach libbpf to search for candidate types for CO-RE relocations across kernel modules BTFs, in addition to vmlinux BTF. If at least one candidate type is found in vmlinux BTF, kernel module BTFs are not iterated. If vmlinux BTF has no matching candidates, then find all kernel module BTFs and search for all matching candidates across all of them. Kernel's support for module BTFs are inferred from the support for BTF name pointer in BPF UAPI. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-6-andrii@kernel.org
2020-12-03libbpf: Refactor CO-RE relocs to not assume a single BTF objectAndrii Nakryiko
Refactor CO-RE relocation candidate search to not expect a single BTF, rather return all candidate types with their corresponding BTF objects. This will allow to extend CO-RE relocations to accommodate kernel module BTFs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201203204634.1325171-5-andrii@kernel.org
2020-12-03libbpf: Add internal helper to load BTF data by FDAndrii Nakryiko
Add a btf_get_from_fd() helper, which constructs struct btf from in-kernel BTF data by FD. This is used for loading module BTFs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-4-andrii@kernel.org
2020-12-03bpf: Keep module's btf_data_size intact after loadAndrii Nakryiko
Having real btf_data_size stored in struct module is benefitial to quickly determine which kernel modules have associated BTF object and which don't. There is no harm in keeping this info, as opposed to keeping invalid pointer. Fixes: 607c543f939d ("bpf: Sanitize BTF data pointer after module is loaded") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-3-andrii@kernel.org
2020-12-03bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()Andrii Nakryiko
__module_address() needs to be called with preemption disabled or with module_mutex taken. preempt_disable() is enough for read-only uses, which is what this fix does. Also, module_put() does internal check for NULL, so drop it as well. Fixes: a38d1107f937 ("bpf: support raw tracepoints in modules") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201203204634.1325171-2-andrii@kernel.org
2020-12-03Merge branch 'Add support to set window_clamp from bpf setsockops'Alexei Starovoitov
Prankur gupta says: ==================== This patch contains support to set tcp window_field field from bpf setsockops. v2: Used TCP_WINDOW_CLAMP setsockopt logic for bpf_setsockopt (review comment addressed) v3: Created a common function for duplicated code (review comment addressed) v4: Removing logic to pass struct sock and struct tcp_sock together (review comment addressed) ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-12-03selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMPPrankur gupta
Adding selftests for new added functionality to set TCP_WINDOW_CLAMP from bpf setsockopt. Signed-off-by: Prankur gupta <prankgup@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202213152.435886-3-prankgup@fb.com
2020-12-03bpf: Adds support for setting window clampPrankur gupta
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_WINDOW_CLAMP, which sets the maximum receiver window size. It will be useful for limiting receiver window based on RTT. Signed-off-by: Prankur gupta <prankgup@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202213152.435886-2-prankgup@fb.com
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03Merge tag 'net-5.10-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS" * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled net/mlx5: Fix wrong address reclaim when command interface is down net/sched: act_mpls: ensure LSE is pullable before reading it net: openvswitch: ensure LSE is pullable before reading it net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl net: mvpp2: Fix error return code in mvpp2_open() chelsio/chtls: fix a double free in chtls_setkey() rtw88: debug: Fix uninitialized memory in debugfs code vxlan: fix error return code in __vxlan_dev_create() net: pasemi: fix error return code in pasemi_mac_open() cxgb3: fix error return code in t3_sge_alloc_qset() net/x25: prevent a couple of overflows dpaa_eth: copy timestamp fields to new skb in A-050385 workaround net: ip6_gre: set dev->hard_header_len when using header_ops mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 ...
2020-12-03Merge branch 'mptcp-reject-invalid-mp_join-requests-right-away'Jakub Kicinski
Florian Westphal says: ==================== mptcp: reject invalid mp_join requests right away At the moment MPTCP can detect an invalid join request (invalid token, max number of subflows reached, and so on) right away but cannot reject the connection until the 3WHS has completed. Instead the connection will complete and the subflow is reset afterwards. To send the reset most information is already available, but we don't have good spot where the reset could be sent: 1. The ->init_req callback is too early and also doesn't allow to return an error that could be used to inform the TCP stack that the SYN should be dropped. 2. The ->route_req callback lacks the skb needed to send a reset. 3. The ->send_synack callback is the best fit from the available hooks, but its called after the request socket has been inserted into the queue already. This means we'd have to remove it again right away. From a technical point of view, the second hook would be best: 1. Its before insertion into listener queue. 2. If it returns NULL TCP will drop the packet for us. Problem is that we'd have to pass the skb to the function just for MPTCP. Paolo suggested to merge init_req and route_req callbacks instead: This makes all info available to MPTCP -- a return value of NULL drops the packet and MPTCP can send the reset if needed. Because 'route_req' has a 'const struct sock *', this means either removal of const qualifier, or a bit of code churn to pass 'const' in security land. This does the latter; I did not find any spots that need write access to struct sock. To recap, the two alternatives are: 1. Solve it entirely in MPTCP: use the ->send_synack callback to unlink the request socket from the listener & drop it. 2. Avoid 'security' churn by removing the const qualifier. ==================== Link: https://lore.kernel.org/r/20201130153631.21872-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03mptcp: emit tcp reset when a join request failsFlorian Westphal
RFC 8684 says: If the token is unknown or the host wants to refuse subflow establishment (for example, due to a limit on the number of subflows it will permit), the receiver will send back a reset (RST) signal, analogous to an unknown port in TCP, containing an MP_TCPRST option (Section 3.6) with an "MPTCP specific error" reason code. mptcp-next doesn't support MP_TCPRST yet, this can be added in another change. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03tcp: merge 'init_req' and 'route_req' functionsFlorian Westphal
The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send a TCP reset if the token in a MP_JOIN request is unknown. At this time we don't do this, the 3whs completes and the 'new subflow' is reset afterwards. There are two ways to allow MPTCP to send the reset. 1. override 'send_synack' callback and emit the rst from there. The drawback is that the request socket gets inserted into the listeners queue just to get removed again right away. 2. Send the reset from the 'route_req' function instead. This avoids the 'add&remove request socket', but route_req lacks the skb that is required to send the TCP reset. Instead of just adding the skb to that function for MPTCP sake alone, Paolo suggested to merge init_req and route_req functions. This saves one indirection from syn processing path and provides the skb to the merged function at the same time. 'send reset on unknown mptcp join token' is added in next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03security: add const qualifier to struct sock in various placesFlorian Westphal
A followup change to tcp_request_sock_op would have to drop the 'const' qualifier from the 'route_req' function as the 'security_inet_conn_request' call is moved there - and that function expects a 'struct sock *'. However, it turns out its also possible to add a const qualifier to security_inet_conn_request instead. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03samples/bpf: Fix spelling mistake "recieving" -> "receiving"Colin Ian King
There is a spelling mistake in an error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201203114452.1060017-1-colin.king@canonical.com
2020-12-03bpf: Fix cold build of test_progs-no_alu32Brendan Jackman
This object lives inside the trunner output dir, i.e. tools/testing/selftests/bpf/no_alu32/btf_data.o At some point it gets copied into the parent directory during another part of the build, but that doesn't happen when building test_progs-no_alu32 from clean. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20201203120850.859170-1-jackmanb@google.com
2020-12-03libbpf: Cap retries in sys_bpf_prog_loadStanislav Fomichev
I've seen a situation, where a process that's under pprof constantly generates SIGPROF which prevents program loading indefinitely. The right thing to do probably is to disable signals in the upper layers while loading, but it still would be nice to get some error from libbpf instead of an endless loop. Let's add some small retry limit to the program loading: try loading the program 5 (arbitrary) times and give up. v2: * 10 -> 5 retires (Andrii Nakryiko) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201202231332.3923644-1-sdf@google.com
2020-12-03Merge tag 's390-5.10-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: "One commit is fixing lockdep irq state tracing which broke with -rc6. The other one fixes logical vs physical CPU address mixup in our PCI code. Summary: - fix lockdep irq state tracing - fix logical vs physical CPU address confusion in PCI code" * tag 's390-5.10-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: fix irq state tracing s390/pci: fix CPU address in MSI for directed IRQ
2020-12-03libbpf: Sanitise map names before pinningToke Høiland-Jørgensen
When we added sanitising of map names before loading programs to libbpf, we still allowed periods in the name. While the kernel will accept these for the map names themselves, they are not allowed in file names when pinning maps. This means that bpf_object__pin_maps() will fail if called on an object that contains internal maps (such as sections .rodata). Fix this by replacing periods with underscores when constructing map pin paths. This only affects the paths generated by libbpf when bpf_object__pin_maps() is called with a path argument. Any pin paths set by bpf_map__set_pin_path() are unaffected, and it will still be up to the caller to avoid invalid characters in those. Fixes: 113e6b7e15e2 ("libbpf: Sanitise internal map names so they are not rejected by the kernel") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201203093306.107676-1-toke@redhat.com
2020-12-03Merge tag '9p-for-5.10-rc7' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p fixes from Dominique Martinet: "Restore splice functionality for 9p" * tag '9p-for-5.10-rc7' of git://github.com/martinetd/linux: fs: 9p: add generic splice_write file operation fs: 9p: add generic splice_read file operations
2020-12-03libbpf: Fail early when loading programs with unspecified typeAndrei Matei
Before this patch, a program with unspecified type (BPF_PROG_TYPE_UNSPEC) would be passed to the BPF syscall, only to have the kernel reject it with an opaque invalid argument error. This patch makes libbpf reject such programs with a nicer error message - in particular libbpf now tries to diagnose bad ELF section names at both open time and load time. Signed-off-by: Andrei Matei <andreimatei1@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201203043410.59699-1-andreimatei1@gmail.com
2020-12-03Merge branch 'Fixes for ima selftest'Andrii Nakryiko
KP Singh says: ==================== From: KP Singh <kpsingh@google.com> # v3 -> v4 * Fix typos. * Update commit message for the indentation patch. * Added Andrii's acks. # v2 -> v3 * Added missing tags. * Indentation fixes + some other fixes suggested by Andrii. * Re-indent file to tabs. The selftest for the bpf_ima_inode_hash helper uses a shell script to setup the system for ima. While this worked without an issue on recent desktop distros, it failed on environments with stripped out shells like busybox which is also used by the bpf CI. This series fixes the assumptions made on the availablity of certain command line switches and the expectation that securityfs being mounted by default. It also adds the missing kernel config dependencies in tools/testing/selftests/bpf and, lastly, changes the indentation of ima_setup.sh to use tabs. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2020-12-03selftests/bpf: Indent ima_setup.sh with tabs.KP Singh
The file was formatted with spaces instead of tabs and went unnoticed as checkpatch.pl did not complain (probably because this is a shell script). Re-indent it with tabs to be consistent with other scripts. Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201203191437.666737-5-kpsingh@chromium.org
2020-12-03selftests/bpf: Add config dependency on BLK_DEV_LOOPKP Singh
The ima selftest restricts its scope to a test filesystem image mounted on a loop device and prevents permanent ima policy changes for the whole system. Fixes: 34b82d3ac105 ("bpf: Add a selftest for bpf_ima_inode_hash") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201203191437.666737-4-kpsingh@chromium.org
2020-12-03selftests/bpf: Ensure securityfs mount before writing ima policyKP Singh
SecurityFS may not be mounted even if it is enabled in the kernel config. So, check if the mount exists in /proc/mounts by parsing the file and, if not, mount it on /sys/kernel/security. Fixes: 34b82d3ac105 ("bpf: Add a selftest for bpf_ima_inode_hash") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201203191437.666737-3-kpsingh@chromium.org
2020-12-03selftests/bpf: Update ima_setup.sh for busyboxKP Singh
losetup on busybox does not output the name of loop device on using -f with --show. It also doesn't support -j to find the loop devices for a given backing file. losetup is updated to use "-a" which is available on busybox. blkid does not support options (-s and -o) to only display the uuid, so parse the output instead. Not all environments have mkfs.ext4, the test requires a loop device with a backing image file which could formatted with any filesystem. Update to using mkfs.ext2 which is available on busybox. Fixes: 34b82d3ac105 ("bpf: Add a selftest for bpf_ima_inode_hash") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: KP Singh <kpsingh@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201203191437.666737-2-kpsingh@chromium.org
2020-12-03Merge branch 'mlx5-fixes-2020-12-01'Jakub Kicinski
Saeed Mahameed says: ==================== mlx5 fixes 2020-12-01 This series introduces some fixes to mlx5 driver. ==================== Link: https://lore.kernel.org/r/20201203043946.235385-1-saeedm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steeringYevgeny Kliteynik
STEs format for Connect-X5 and Connect-X6DX different. Currently, on Connext-X6DX the SW steering would break at some point when building STEs w/o giving a proper error message. Fix this by checking the STE format of the current device when initializing domain: add mlx5_ifc definitions for Connect-X6DX SW steering, read FW capability to get the current format version, and check this version when domain is being created. Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net/mlx5e: kTLS, Enforce HW TX csum offload with kTLSTariq Toukan
Checksum calculation cannot be done in SW for TX kTLS HW offloaded packets. Offload it to the device, disregard the declared state of the TX csum offload feature. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Boris Pismenny <borisp@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net: mlx5e: fix fs_tcp.c build when IPV6 is not enabledRandy Dunlap
Fix build when CONFIG_IPV6 is not enabled by making a function be built conditionally. Fixes these build errors and warnings: ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c: In function 'accel_fs_tcp_set_ipv6_flow': ../include/net/sock.h:380:34: error: 'struct sock_common' has no member named 'skc_v6_daddr'; did you mean 'skc_daddr'? 380 | #define sk_v6_daddr __sk_common.skc_v6_daddr | ^~~~~~~~~~~~ ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:55:14: note: in expansion of macro 'sk_v6_daddr' 55 | &sk->sk_v6_daddr, 16); | ^~~~~~~~~~~ At top level: ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c:47:13: warning: 'accel_fs_tcp_set_ipv6_flow' defined but not used [-Wunused-function] 47 | static void accel_fs_tcp_set_ipv6_flow(struct mlx5_flow_spec *spec, struct sock *sk) Fixes: 5229a96e59ec ("net/mlx5e: Accel, Expose flow steering API for rules add/del") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net/mlx5: Fix wrong address reclaim when command interface is downEran Ben Elisha
When command interface is down, driver to reclaim all 4K page chucks that were hold by the Firmeware. Fix a bug for 64K page size systems, where driver repeatedly released only the first chunk of the page. Define helper function to fill 4K chunks for a given Firmware pages. Iterate over all unreleased Firmware pages and call the hepler per each. Fixes: 5adff6a08862 ("net/mlx5: Fix incorrect page count when in internal error") Signed-off-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net/sched: act_mpls: ensure LSE is pullable before reading itDavide Caratti
when 'act_mpls' is used to mangle the LSE, the current value is read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is contained in the skb "linear" area. Found by code inspection. v2: - use MPLS_HLEN instead of sizeof(new_lse), thanks to Jakub Kicinski Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/3243506cba43d14858f3bd21ee0994160e44d64a.1606987058.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net: openvswitch: ensure LSE is pullable before reading itDavide Caratti
when openvswitch is configured to mangle the LSE, the current value is read from the packet dereferencing 4 bytes at mpls_hdr(): ensure that the label is contained in the skb "linear" area. Found by code inspection. Fixes: d27cf5c59a12 ("net: core: add MPLS update core helper and use in OvS") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/aa099f245d93218b84b5c056b67b6058ccf81a66.1606987185.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net: skbuff: ensure LSE is pullable before decrementing the MPLS ttlDavide Caratti
skb_mpls_dec_ttl() reads the LSE without ensuring that it is contained in the skb "linear" area. Fix this calling pskb_may_pull() before reading the current ttl. Found by code inspection. Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/53659f28be8bc336c113b5254dc637cc76bbae91.1606987074.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03Merge tag 'wireless-drivers-2020-12-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.10 Second, and most likely final, set of fixes for v5.10. Small fixes and PCI id addtions. iwlwifi * PCI id additions mt76 * fix a kernel crash during device removal rtw88 * fix uninitialized memory in debugfs code * tag 'wireless-drivers-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: rtw88: debug: Fix uninitialized memory in debugfs code mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 iwlwifi: update MAINTAINERS entry ==================== Link: https://lore.kernel.org/r/20201203183408.EE88AC43461@smtp.codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03net: mvpp2: Fix error return code in mvpp2_open()Wang Hai
Fix to return negative error code -ENOENT from invalid configuration error handling case instead of 0, as done elsewhere in this function. Fixes: 4bb043262878 ("net: mvpp2: phylink support") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20201203141806.37966-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03chelsio/chtls: fix a double free in chtls_setkey()Dan Carpenter
The "skb" is freed by the transmit code in cxgb4_ofld_send() and we shouldn't use it again. But in the current code, if we hit an error later on in the function then the clean up code will call kfree_skb(skb) and so it causes a double free. Set the "skb" to NULL and that makes the kfree_skb() a no-op. Fixes: d25f2f71f653 ("crypto: chtls - Program the TLS session Key") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8ilb6PtBRLWiSHp@mwanda Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03Merge branch 'libbpf: add support for privileged/unprivileged control ↵Alexei Starovoitov
separation' Mariusz Dudek says: ==================== From: Mariusz Dudek <mariuszx.dudek@intel.com> This patch series adds support for separation of eBPF program load and xsk socket creation. In for example a Kubernetes environment you can have an AF_XDP CNI or daemonset that is responsible for launching pods that execute an application using AF_XDP sockets. It is desirable that the pod runs with as low privileges as possible, CAP_NET_RAW in this case, and that all operations that require privileges are contained in the CNI or daemonset. In this case, you have to be able separate ePBF program load from xsk socket creation. Currently, this will not work with the xsk_socket__create APIs because you need to have CAP_NET_ADMIN privileges to load eBPF program and CAP_SYS_ADMIN privileges to create update xsk_bpf_maps. To be exact xsk_set_bpf_maps does not need those privileges but it takes the prog_fd and xsks_map_fd and those are known only to process that was loading eBPF program. The api bpf_prog_get_fd_by_id that looks up the fd of the prog using an prog_id and bpf_map_get_fd_by_id that looks for xsks_map_fd usinb map_id both requires CAP_SYS_ADMIN. With this patch, the pod can be run with CAP_NET_RAW capability only. In case your umem is larger or equal process limit for MEMLOCK you need either increase the limit or CAP_IPC_LOCK capability. Without this patch in case of insufficient rights ENOPERM is returned by xsk_socket__create. To resolve this privileges issue two new APIs are introduced: - xsk_setup_xdp_prog - loads the built in XDP program. It can also return xsks_map_fd which is needed by unprivileged process to update xsks_map with AF_XDP socket "fd" - xsk_sokcet__update_xskmap - inserts an AF_XDP socket into an xskmap for a particular xsk_socket Usage example: int xsk_setup_xdp_prog(int ifindex, int *xsks_map_fd) int xsk_socket__update_xskmap(struct xsk_socket *xsk, int xsks_map_fd); Inserts AF_XDP socket "fd" into the xskmap. The first patch introduces the new APIs. The second patch provides a new sample applications working as control and modification to existing xdpsock application to work with less privileges. This patch set is based on bpf-next commit 97306be45fbe (Merge branch 'switch to memcg-based memory accounting') Since v6 - rebase on 97306be45fbe to resolve RLIMIT conflicts Since v5 - fixed sample/bpf/xdpsock_user.c to resolve merge conflicts Since v4 - sample/bpf/Makefile issues fixed Since v3: - force_set_map flag removed - leaking of xsk struct fixed - unified function error returning policy implemented Since v2: - new APIs moved itto LIBBPF_0.3.0 section - struct bpf_prog_cfg_opts removed - loading own eBPF program via xsk_setup_xdp_prog functionality removed Since v1: - struct bpf_prog_cfg improved for backward/forward compatibility - API xsk_update_xskmap renamed to xsk_socket__update_xskmap - commit message formatting fixed ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-12-03samples/bpf: Sample application for eBPF load and socket creation splitMariusz Dudek
Introduce a sample program to demonstrate the control and data plane split. For the control plane part a new program called xdpsock_ctrl_proc is introduced. For the data plane part, some code was added to xdpsock_user.c to act as the data plane entity. Application xdpsock_ctrl_proc works as control entity with sudo privileges (CAP_SYS_ADMIN and CAP_NET_ADMIN are sufficient) and the extended xdpsock as data plane entity with CAP_NET_RAW capability only. Usage example: sudo ./samples/bpf/xdpsock_ctrl_proc -i <interface> sudo ./samples/bpf/xdpsock -i <interface> -q <queue_id> -n <interval> -N -l -R Signed-off-by: Mariusz Dudek <mariuszx.dudek@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201203090546.11976-3-mariuszx.dudek@intel.com
2020-12-03libbpf: Separate XDP program load with xsk socket creationMariusz Dudek
Add support for separation of eBPF program load and xsk socket creation. This is needed for use-case when you want to privide as little privileges as possible to the data plane application that will handle xsk socket creation and incoming traffic. With this patch the data entity container can be run with only CAP_NET_RAW capability to fulfill its purpose of creating xsk socket and handling packages. In case your umem is larger or equal process limit for MEMLOCK you need either increase the limit or CAP_IPC_LOCK capability. To resolve privileges issue two APIs are introduced: - xsk_setup_xdp_prog - loads the built in XDP program. It can also return xsks_map_fd which is needed by unprivileged process to update xsks_map with AF_XDP socket "fd" - xsk_socket__update_xskmap - inserts an AF_XDP socket into an xskmap for a particular xsk_socket Signed-off-by: Mariusz Dudek <mariuszx.dudek@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20201203090546.11976-2-mariuszx.dudek@intel.com
2020-12-03tools/resolve_btfids: Fix some error messagesBrendan Jackman
Add missing newlines and fix polarity of strerror argument. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20201203102234.648540-1-jackmanb@google.com
2020-12-03selftests/bpf: Copy file using read/write in local storage testStanislav Fomichev
Splice (copy_file_range) doesn't work on all filesystems. I'm running test kernels on top of my read-only disk image and it uses plan9 under the hood. This prevents test_local_storage from successfully passing. There is really no technical reason to use splice, so lets do old-school read/write to copy file; this should work in all environments. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202174947.3621989-1-sdf@google.com
2020-12-03Merge branch 'bpftool: improve split BTF support'Alexei Starovoitov
Andrii Nakryiko says: ==================== Few follow up improvements to bpftool for split BTF support: - emit "name <anon>" for non-named BTFs in `bpftool btf show` command; - when dumping /sys/kernel/btf/<module> use /sys/kernel/btf/vmlinux as the base BTF, unless base BTF is explicitly specified with -B flag. This patch set also adds btf__base_btf() getter to access base BTF of the struct btf. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-12-03tools/bpftool: Auto-detect split BTFs in common casesAndrii Nakryiko
In case of working with module's split BTF from /sys/kernel/btf/*, auto-substitute /sys/kernel/btf/vmlinux as the base BTF. This makes using bpftool with module BTFs faster and simpler. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202065244.530571-4-andrii@kernel.org