linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-07-05	selftests/bpf: Add selftest for check_stack_max_depth bug	Kumar Kartikeya Dwivedi
	Use the bpf_timer_set_callback helper to mark timer_cb as an async callback, and put a direct call to timer_cb in the main subprog. As the check_stack_max_depth happens after the do_check pass, the order does not matter. Without the previous fix, the test passes successfully. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230705144730.235802-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-06-28	Merge tag 'net-next-6.5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking changes from Jakub Kicinski: "WiFi 7 and sendpage changes are the biggest pieces of work for this release. The latter will definitely require fixes but I think that we got it to a reasonable point. Core: - Rework the sendpage & splice implementations Instead of feeding data into sockets page by page extend sendmsg handlers to support taking a reference on the data, controlled by a new flag called MSG_SPLICE_PAGES Rework the handling of unexpected-end-of-file to invoke an additional callback instead of trying to predict what the right combination of MORE/NOTLAST flags is Remove the MSG_SENDPAGE_NOTLAST flag completely - Implement SCM_PIDFD, a new type of CMSG type analogous to SCM_CREDENTIALS, but it contains pidfd instead of plain pid - Enable socket busy polling with CONFIG_RT - Improve reliability and efficiency of reporting for ref_tracker - Auto-generate a user space C library for various Netlink families Protocols: - Allow TCP to shrink the advertised window when necessary, prevent sk_rcvbuf auto-tuning from growing the window all the way up to tcp_rmem[2] - Use per-VMA locking for "page-flipping" TCP receive zerocopy - Prepare TCP for device-to-device data transfers, by making sure that payloads are always attached to skbs as page frags - Make the backoff time for the first N TCP SYN retransmissions linear. Exponential backoff is unnecessarily conservative - Create a new MPTCP getsockopt to retrieve all info (MPTCP_FULL_INFO) - Avoid waking up applications using TLS sockets until we have a full record - Allow using kernel memory for protocol ioctl callbacks, paving the way to issuing ioctls over io_uring - Add nolocalbypass option to VxLAN, forcing packets to be fully encapsulated even if they are destined for a local IP address - Make TCPv4 use consistent hash in TIME_WAIT and SYN_RECV. Ensure in-kernel ECMP implementation (e.g. Open vSwitch) select the same link for all packets. Support L4 symmetric hashing in Open vSwitch - PPPoE: make number of hash bits configurable - Allow DNS to be overwritten by DHCPACK in the in-kernel DHCP client (ipconfig) - Add layer 2 miss indication and filtering, allowing higher layers (e.g. ACL filters) to make forwarding decisions based on whether packet matched forwarding state in lower devices (bridge) - Support matching on Connectivity Fault Management (CFM) packets - Hide the "link becomes ready" IPv6 messages by demoting their printk level to debug - HSR: don't enable promiscuous mode if device offloads the proto - Support active scanning in IEEE 802.15.4 - Continue work on Multi-Link Operation for WiFi 7 BPF: - Add precision propagation for subprogs and callbacks. This allows maintaining verification efficiency when subprograms are used, or in fact passing the verifier at all for complex programs, especially those using open-coded iterators - Improve BPF's {g,s}setsockopt() length handling. Previously BPF assumed the length is always equal to the amount of written data. But some protos allow passing a NULL buffer to discover what the output buffer should be, without writing anything - Accept dynptr memory as memory arguments passed to helpers - Add routing table ID to bpf_fib_lookup BPF helper - Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands - Drop bpf_capable() check in BPF_MAP_FREEZE command (used to mark maps as read-only) - Show target_{obj,btf}_id in tracing link fdinfo - Addition of several new kfuncs (most of the names are self-explanatory): - Add a set of new dynptr kfuncs: bpf_dynptr_adjust(), bpf_dynptr_is_null(), bpf_dynptr_is_rdonly(), bpf_dynptr_size() and bpf_dynptr_clone(). - bpf_task_under_cgroup() - bpf_sock_destroy() - force closing sockets - bpf_cpumask_first_and(), rework bpf_cpumask_any() kfuncs Netfilter: - Relax set/map validation checks in nf_tables. Allow checking presence of an entry in a map without using the value - Increase ip_vs_conn_tab_bits range for 64BIT builds - Allow updating size of a set - Improve NAT tuple selection when connection is closing Driver API: - Integrate netdev with LED subsystem, to allow configuring HW "offloaded" blinking of LEDs based on link state and activity (i.e. packets coming in and out) - Support configuring rate selection pins of SFP modules - Factor Clause 73 auto-negotiation code out of the drivers, provide common helper routines - Add more fool-proof helpers for managing lifetime of MDIO devices associated with the PCS layer - Allow drivers to report advanced statistics related to Time Aware scheduler offload (taprio) - Allow opting out of VF statistics in link dump, to allow more VFs to fit into the message - Split devlink instance and devlink port operations New hardware / drivers: - Ethernet: - Synopsys EMAC4 IP support (stmmac) - Marvell 88E6361 8 port (5x1GE + 3x2.5GE) switches - Marvell 88E6250 7 port switches - Microchip LAN8650/1 Rev.B0 PHYs - MediaTek MT7981/MT7988 built-in 1GE PHY driver - WiFi: - Realtek RTL8192FU, 2.4 GHz, b/g/n mode, 2T2R, 300 Mbps - Realtek RTL8723DS (SDIO variant) - Realtek RTL8851BE - CAN: - Fintek F81604 Drivers: - Ethernet NICs: - Intel (100G, ice): - support dynamic interrupt allocation - use meta data match instead of VF MAC addr on slow-path - nVidia/Mellanox: - extend link aggregation to handle 4, rather than just 2 ports - spawn sub-functions without any features by default - OcteonTX2: - support HTB (Tx scheduling/QoS) offload - make RSS hash generation configurable - support selecting Rx queue using TC filters - Wangxun (ngbe/txgbe): - add basic Tx/Rx packet offloads - add phylink support (SFP/PCS control) - Freescale/NXP (enetc): - report TAPRIO packet statistics - Solarflare/AMD: - support matching on IP ToS and UDP source port of outer header - VxLAN and GENEVE tunnel encapsulation over IPv4 or IPv6 - add devlink dev info support for EF10 - Virtual NICs: - Microsoft vNIC: - size the Rx indirection table based on requested configuration - support VLAN tagging - Amazon vNIC: - try to reuse Rx buffers if not fully consumed, useful for ARM servers running with 16kB pages - Google vNIC: - support TCP segmentation of >64kB frames - Ethernet embedded switches: - Marvell (mv88e6xxx): - enable USXGMII (88E6191X) - Microchip: - lan966x: add support for Egress Stage 0 ACL engine - lan966x: support mapping packet priority to internal switch priority (based on PCP or DSCP) - Ethernet PHYs: - Broadcom PHYs: - support for Wake-on-LAN for BCM54210E/B50212E - report LPI counter - Microsemi PHYs: support RGMII delay configuration (VSC85xx) - Micrel PHYs: receive timestamp in the frame (LAN8841) - Realtek PHYs: support optional external PHY clock - Altera TSE PCS: merge the driver into Lynx PCS which it is a variant of - CAN: Kvaser PCIEcan: - support packet timestamping - WiFi: - Intel (iwlwifi): - major update for new firmware and Multi-Link Operation (MLO) - configuration rework to drop test devices and split the different families - support for segmented PNVM images and power tables - new vendor entries for PPAG (platform antenna gain) feature - Qualcomm 802.11ax (ath11k): - Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID Advertisement (EMA) support in AP mode - support factory test mode - RealTek (rtw89): - add RSSI based antenna diversity - support U-NII-4 channels on 5 GHz band - RealTek (rtl8xxxu): - AP mode support for 8188f - support USB RX aggregation for the newer chips" tag 'net-next-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1602 commits) net: scm: introduce and use scm_recv_unix helper af_unix: Skip SCM_PIDFD if scm->pid is NULL. net: lan743x: Simplify comparison netlink: Add __sock_i_ino() for __netlink_diag_dump(). net: dsa: avoid suspicious RCU usage for synced VLAN-aware MAC addresses Revert "af_unix: Call scm_recv() only after scm_set_cred()." phylink: ReST-ify the phylink_pcs_neg_mode() kdoc libceph: Partially revert changes to support MSG_SPLICE_PAGES net: phy: mscc: fix packet loss due to RGMII delays net: mana: use vmalloc_array and vcalloc net: enetc: use vmalloc_array and vcalloc ionic: use vmalloc_array and vcalloc pds_core: use vmalloc_array and vcalloc gve: use vmalloc_array and vcalloc octeon_ep: use vmalloc_array and vcalloc net: usb: qmi_wwan: add u-blox 0x1312 composition perf trace: fix MSG_SPLICE_PAGES build error ipvlan: Fix return value of ipvlan_queue_xmit() netfilter: nf_tables: fix underflow in chain reference counter netfilter: nf_tables: unbind non-anonymous set if rule construction fails ...
2023-06-28	Merge tag 'v6.5-rc1-modules-next' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The changes queued up for modules are pretty tame, mostly code removal of moving of code. Only two minor functional changes are made, the only one which stands out is Sebastian Andrzej Siewior's simplification of module reference counting by removing preempt_disable() and that has been tested on linux-next for well over a month without no regressions. I'm now, I guess, also a kitchen sink for some kallsyms changes" [ There was a mis-communication about the concurrent module load changes that I had expected to come through Luis despite me authoring the patch. So some of the module updates were left hanging in the email ether, and I just committed them separately. It's my bad - I should have made it more clear that I expected my own patches to come through the module tree too. Now they missed linux-next, but hopefully that won't cause any issues - Linus ] * tag 'v6.5-rc1-modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: make kallsyms_show_value() as generic function kallsyms: move kallsyms_show_value() out of kallsyms.c kallsyms: remove unsed API lookup_symbol_attrs kallsyms: remove unused arch_get_kallsym() helper module: Remove preempt_disable() from module reference counting.
2023-06-24	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-06-23 We've added 49 non-merge commits during the last 24 day(s) which contain a total of 70 files changed, 1935 insertions(+), 442 deletions(-). The main changes are: 1) Extend bpf_fib_lookup helper to allow passing the route table ID, from Louis DeLosSantos. 2) Fix regsafe() in verifier to call check_ids() for scalar registers, from Eduard Zingerman. 3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and() and a rework of bpf_cpumask_any() kfuncs. Additionally, add selftests, from David Vernet. 4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings, from Gilad Sever. 5) Change bpf_link_put() to use workqueue unconditionally to fix it under PREEMPT_RT, from Sebastian Andrzej Siewior. 6) Follow-ups to address issues in the bpf_refcount shared ownership implementation, from Dave Marchevsky. 7) A few general refactorings to BPF map and program creation permissions checks which were part of the BPF token series, from Andrii Nakryiko. 8) Various fixes for benchmark framework and add a new benchmark for BPF memory allocator to BPF selftests, from Hou Tao. 9) Documentation improvements around iterators and trusted pointers, from Anton Protopopov. 10) Small cleanup in verifier to improve allocated object check, from Daniel T. Lee. 11) Improve performance of bpf_xdp_pointer() by avoiding access to shared_info when XDP packet does not have frags, from Jesper Dangaard Brouer. 12) Silence a harmless syzbot-reported warning in btf_type_id_size(), from Yonghong Song. 13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper, from Jarkko Sakkinen. 14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS, from Viktor Malik. tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits) bpf, docs: Document existing macros instead of deprecated bpf, docs: BPF Iterator Document selftests/bpf: Fix compilation failure for prog vrf_socket_lookup selftests/bpf: Add vrf_socket_lookup tests bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: Factor out socket lookup functions for the TC hookpoint. selftests/bpf: Set the default value of consumer_cnt as 0 selftests/bpf: Ensure that next_cpu() returns a valid CPU number selftests/bpf: Output the correct error code for pthread APIs selftests/bpf: Use producer_cnt to allocate local counter array xsk: Remove unused inline function xsk_buff_discard() bpf: Keep BPF_PROG_LOAD permission checks clear of validations bpf: Centralize permissions checks for all BPF map types bpf: Inline map creation logic in map_create() function bpf: Move unprivileged checks into map_create() and bpf_prog_load() bpf: Remove in_atomic() from bpf_link_put(). selftests/bpf: Verify that check_ids() is used for scalars in regsafe() bpf: Verify scalar ids mapping in regsafe() using check_ids() selftests/bpf: Check if mark_chain_precision() follows scalar ids ... ==================== Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: tools/testing/selftests/net/fcnal-test.sh d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled") dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.") https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/ https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22	selftests/bpf: Fix compilation failure for prog vrf_socket_lookup	Yonghong Song
	When building the latest kernel/selftest with clang17 compiler: make LLVM=1 -j <== for kernel make -C tools/testing/selftests/bpf LLVM=1 -j <== for selftest I hit the following compilation error: [...] In file included from progs/vrf_socket_lookup.c:3: In file included from /usr/include/linux/ip.h:21: In file included from /usr/include/asm/byteorder.h:5: In file included from /usr/include/linux/byteorder/little_endian.h:13: /usr/include/linux/swab.h:136:8: error: unknown type name '__always_inline' 136 \| static __always_inline unsigned long __swab(const unsigned long y) \| ^ /usr/include/linux/swab.h:171:8: error: unknown type name '__always_inline' 171 \| static __always_inline __u16 __swab16p(const __u16 p) \| ^ /usr/include/linux/swab.h:171:29: error: expected ';' after top level declarator 171 \| static __always_inline __u16 __swab16p(const __u16 p) \| ^ [...] Basically, with header files in my local host which is based on 5.12 kernel, __always_inline is not defined and this caused compilation failure. Since __always_inline is defined in bpf_helpers.h, let us move bpf_helpers.h to an early position which fixed the problem. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230622061921.816772-1-yhs@fb.com
2023-06-21	selftests/bpf: Add vrf_socket_lookup tests	Gilad Sever
	Verify that socket lookup via TC/XDP with all BPF APIs is VRF aware. Signed-off-by: Gilad Sever <gilad9366@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Eyal Birger <eyal.birger@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230621104211.301902-5-gilad9366@gmail.com
2023-06-13	selftests/bpf: Verify that check_ids() is used for scalars in regsafe()	Eduard Zingerman
	Verify that the following example is rejected by verifier: r9 = ... some pointer with range X ... r6 = ... unbound scalar ID=a ... r7 = ... unbound scalar ID=b ... if (r6 > r7) goto +1 r7 = r6 if (r7 > X) goto exit r9 += r6 (u64 )r9 = Y Also add test cases to: - check that check_alu_op() for BPF_MOV instruction does not allocate scalar ID if source register is a constant; - check that unique scalar IDs are ignored when new verifier state is compared to cached verifier state; - check that two different scalar IDs in a verified state can't be mapped to the same scalar ID in current state. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230613153824.3324830-5-eddyz87@gmail.com
2023-06-13	selftests/bpf: Check if mark_chain_precision() follows scalar ids	Eduard Zingerman
	Check __mark_chain_precision() log to verify that scalars with same IDs are marked as precise. Use several scenarios to test that precision marks are propagated through: - registers of scalar type with the same ID within one state; - registers of scalar type with the same ID cross several states; - registers of scalar type with the same ID cross several stack frames; - stack slot of scalar type with the same ID; - multiple scalar IDs are tracked independently. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230613153824.3324830-3-eddyz87@gmail.com
2023-06-13	selftests/bpf: add a test for subprogram extables	Krister Johansen
	In certain situations a program with subprograms may have a NULL extable entry. This should not happen, and when it does, it turns a single trap into multiple. Add a test case for further debugging and to prevent regressions. The test-case contains three essentially identical versions of the same test because just one program may not be sufficient to trigger the oops. This is due to the fact that the items are stored in a binary tree and have identical values so it's possible to sometimes find the ksym with the extable. With 3 copies, this has been reliable on this author's test systems. When triggered out of this test case, the oops looks like this: BUG: kernel NULL pointer dereference, address: 000000000000000c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 1132 Comm: test_progs Tainted: G OE 6.4.0-rc3+ #2 RIP: 0010:cmp_ex_search+0xb/0x30 Code: cc cc cc cc e8 36 cb 03 00 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 55 48 89 e5 48 8b 07 <48> 63 0e 48 01 f1 31 d2 48 39 c8 19 d2 48 39 c8 b8 01 00 00 00 0f RSP: 0018:ffffb30c4291f998 EFLAGS: 00010006 RAX: ffffffffc00b49da RBX: 0000000000000002 RCX: 000000000000000c RDX: 0000000000000002 RSI: 000000000000000c RDI: ffffb30c4291f9e8 RBP: ffffb30c4291f998 R08: ffffffffab1a42d0 R09: 0000000000000001 R10: 0000000000000000 R11: ffffffffab1a42d0 R12: ffffb30c4291f9e8 R13: 000000000000000c R14: 000000000000000c R15: 0000000000000000 FS: 00007fb5d9e044c0(0000) GS:ffff92e95ee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 000000010c3a2005 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> bsearch+0x41/0x90 ? __pfx_cmp_ex_search+0x10/0x10 ? bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c search_extable+0x3b/0x60 ? bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c search_bpf_extables+0x10d/0x190 ? bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c search_exception_tables+0x5d/0x70 fixup_exception+0x3f/0x5b0 ? look_up_lock_class+0x61/0x110 ? __lock_acquire+0x6b8/0x3560 ? __lock_acquire+0x6b8/0x3560 ? __lock_acquire+0x6b8/0x3560 kernelmode_fixup_or_oops+0x46/0x110 __bad_area_nosemaphore+0x68/0x2b0 ? __lock_acquire+0x6b8/0x3560 bad_area_nosemaphore+0x16/0x20 do_kern_addr_fault+0x81/0xa0 exc_page_fault+0xd6/0x210 asm_exc_page_fault+0x2b/0x30 RIP: 0010:bpf_prog_45a7907e7114d0ff_handle_fexit_ret_subprogs3+0x2a/0x6c Code: f3 0f 1e fa 0f 1f 44 00 00 66 90 55 48 89 e5 f3 0f 1e fa 48 8b 7f 08 49 bb 00 00 00 00 00 80 00 00 4c 39 df 73 04 31 f6 eb 04 <48> 8b 77 00 49 bb 00 00 00 00 00 80 00 00 48 81 c7 7c 00 00 00 4c RSP: 0018:ffffb30c4291fcb8 EFLAGS: 00010282 RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000 RDX: 00000000cddf1af1 RSI: 000000005315a00d RDI: ffffffffffffffea RBP: ffffb30c4291fcb8 R08: ffff92e644bf38a8 R09: 0000000000000000 R10: 0000000000000000 R11: 0000800000000000 R12: ffff92e663652690 R13: 00000000000001c8 R14: 00000000000001c8 R15: 0000000000000003 bpf_trampoline_251255721842_2+0x63/0x1000 bpf_testmod_return_ptr+0x9/0xb0 [bpf_testmod] ? bpf_testmod_test_read+0x43/0x2d0 [bpf_testmod] sysfs_kf_bin_read+0x60/0x90 kernfs_fop_read_iter+0x143/0x250 vfs_read+0x240/0x2a0 ksys_read+0x70/0xe0 __x64_sys_read+0x1f/0x30 do_syscall_64+0x68/0xa0 ? syscall_exit_to_user_mode+0x77/0x1f0 ? do_syscall_64+0x77/0xa0 ? irqentry_exit+0x35/0xa0 ? sysvec_apic_timer_interrupt+0x4d/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fb5da00a392 Code: ac 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 RSP: 002b:00007ffc5b3cab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000055bee7b8b100 RCX: 00007fb5da00a392 RDX: 00000000000001c8 RSI: 0000000000000000 RDI: 0000000000000009 RBP: 00007ffc5b3caba0 R08: 0000000000000000 R09: 0000000000000037 R10: 000055bee7b8c2a7 R11: 0000000000000246 R12: 000055bee78f1f60 R13: 00007ffc5b3cae90 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: bpf_testmod(OE) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common intel_uncore_frequency_common ppdev nfit crct10dif_pclmul crc32_pclmul psmouse ghash_clmulni_intel sha512_ssse3 aesni_intel parport_pc crypto_simd cryptd input_leds parport rapl ena i2c_piix4 mac_hid serio_raw ramoops reed_solomon pstore_blk drm pstore_zone efi_pstore autofs4 [last unloaded: bpf_testmod(OE)] CR2: 000000000000000c Though there may be some variation, depending on which suprogram triggers the bug. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/4ebf95ec857cd785b81db69f3e408c039ad8408b.1686616663.git.kjlx@templeofstupid.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-06-12	selftests/bpf: Update bpf_cpumask_any* tests to use bpf_cpumask_any_distribute*	David Vernet
	In a prior patch, we removed the bpf_cpumask_any() and bpf_cpumask_any_and() kfuncs, and replaced them with bpf_cpumask_any_distribute() and bpf_cpumask_any_distribute_and(). The advertised semantics between the two kfuncs were identical, with the former always returning the first CPU, and the latter actually returning any CPU. This patch updates the selftests for these kfuncs to use the new names. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230610035053.117605-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-06-12	selftests/bpf: Add test for new bpf_cpumask_first_and() kfunc	David Vernet
	A prior patch added a new kfunc called bpf_cpumask_first_and() which wraps cpumask_first_and(). This patch adds a selftest to validate its behavior. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230610035053.117605-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-06-08	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: net/sched/sch_taprio.c d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()") net/ipv4/sysctl_net_ipv4.c e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294") ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-08	selftests/bpf: Add test cases to assert proper ID tracking on spill	Maxim Mikityanskiy
	The previous commit fixed a verifier bypass by ensuring that ID is not preserved on narrowing spills. Add the test cases to check the problematic patterns. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230607123951.558971-3-maxtram95@gmail.com
2023-06-05	selftests/bpf: Add test for non-NULLable PTR_TO_BTF_IDs	David Vernet
	In a recent patch, we taught the verifier that trusted PTR_TO_BTF_ID can never be NULL. This prevents the verifier from incorrectly failing to load certain programs where it gets confused and thinks a reference isn't dropped because it incorrectly assumes that a branch exists in which a NULL PTR_TO_BTF_ID pointer is never released. This patch adds a testcase that verifies this cannot happen. Signed-off-by: David Vernet <void@manifault.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230602150112.1494194-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-06-05	bpf: Make bpf_refcount_acquire fallible for non-owning refs	Dave Marchevsky
	This patch fixes an incorrect assumption made in the original bpf_refcount series [0], specifically that the BPF program calling bpf_refcount_acquire on some node can always guarantee that the node is alive. In that series, the patch adding failure behavior to rbtree_add and list_push_{front, back} breaks this assumption for non-owning references. Consider the following program: n = bpf_kptr_xchg(&mapval, NULL); /* skip error checking / bpf_spin_lock(&l); if(bpf_rbtree_add(&t, &n->rb, less)) { bpf_refcount_acquire(n); / Failed to add, do something else with the node */ } bpf_spin_unlock(&l); It's incorrect to assume that bpf_refcount_acquire will always succeed in this scenario. bpf_refcount_acquire is being called in a critical section here, but the lock being held is associated with rbtree t, which isn't necessarily the lock associated with the tree that the node is already in. So after bpf_rbtree_add fails to add the node and calls bpf_obj_drop in it, the program has no ownership of the node's lifetime. Therefore the node's refcount can be decr'd to 0 at any time after the failing rbtree_add. If this happens before the refcount_acquire above, the node might be free'd, and regardless refcount_acquire will be incrementing a 0 refcount. Later patches in the series exercise this scenario, resulting in the expected complaint from the kernel (without this patch's changes): refcount_t: addition on 0; use-after-free. WARNING: CPU: 1 PID: 207 at lib/refcount.c:25 refcount_warn_saturate+0xbc/0x110 Modules linked in: bpf_testmod(O) CPU: 1 PID: 207 Comm: test_progs Tainted: G O 6.3.0-rc7-02231-g723de1a718a2-dirty #371 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:refcount_warn_saturate+0xbc/0x110 Code: 6f 64 f6 02 01 e8 84 a3 5c ff 0f 0b eb 9d 80 3d 5e 64 f6 02 00 75 94 48 c7 c7 e0 13 d2 82 c6 05 4e 64 f6 02 01 e8 64 a3 5c ff <0f> 0b e9 7a ff ff ff 80 3d 38 64 f6 02 00 0f 85 6d ff ff ff 48 c7 RSP: 0018:ffff88810b9179b0 EFLAGS: 00010082 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 RDX: 0000000000000202 RSI: 0000000000000008 RDI: ffffffff857c3680 RBP: ffff88810027d3c0 R08: ffffffff8125f2a4 R09: ffff88810b9176e7 R10: ffffed1021722edc R11: 746e756f63666572 R12: ffff88810027d388 R13: ffff88810027d3c0 R14: ffffc900005fe030 R15: ffffc900005fe048 FS: 00007fee0584a700(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005634a96f6c58 CR3: 0000000108ce9002 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> bpf_refcount_acquire_impl+0xb5/0xc0 (rest of output snipped) The patch addresses this by changing bpf_refcount_acquire_impl to use refcount_inc_not_zero instead of refcount_inc and marking bpf_refcount_acquire KF_RET_NULL. For owning references, though, we know the above scenario is not possible and thus that bpf_refcount_acquire will always succeed. Some verifier bookkeeping is added to track "is input owning ref?" for bpf_refcount_acquire calls and return false from is_kfunc_ret_null for bpf_refcount_acquire on owning refs despite it being marked KF_RET_NULL. Existing selftests using bpf_refcount_acquire are modified where necessary to NULL-check its return value. [0]: https://lore.kernel.org/bpf/20230415201811.343116-1-davemarchevsky@fb.com/ Fixes: d2dcc67df910 ("bpf: Migrate bpf_rbtree_add and bpf_list_push_{front,back} to possibly fail") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230602022647.1571784-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-06-02	selftests/bpf: Add access_inner_map selftest	Rhys Rustad-Elliott
	Add a selftest that accesses a BPF_MAP_TYPE_ARRAY (at a nonzero index) nested within a BPF_MAP_TYPE_HASH_OF_MAPS to flex a previously buggy case. Signed-off-by: Rhys Rustad-Elliott <me@rhysre.net> Link: https://lore.kernel.org/r/20230602190110.47068-3-me@rhysre.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-05-26	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-05-26 We've added 54 non-merge commits during the last 10 day(s) which contain a total of 76 files changed, 2729 insertions(+), 1003 deletions(-). The main changes are: 1) Add the capability to destroy sockets in BPF through a new kfunc, from Aditi Ghag. 2) Support O_PATH fds in BPF_OBJ_PIN and BPF_OBJ_GET commands, from Andrii Nakryiko. 3) Add capability for libbpf to resize datasec maps when backed via mmap, from JP Kobryn. 4) Move all the test kfuncs for CI out of the kernel and into bpf_testmod, from Jiri Olsa. 5) Big batch of xsk selftest improvements to prep for multi-buffer testing, from Magnus Karlsson. 6) Show the target_{obj,btf}_id in tracing link's fdinfo and dump it via bpftool, from Yafang Shao. 7) Various misc BPF selftest improvements to work with upcoming LLVM 17, from Yonghong Song. 8) Extend bpftool to specify netdevice for resolving XDP hints, from Larysa Zaremba. 9) Document masking in shift operations for the insn set document, from Dave Thaler. 10) Extend BPF selftests to check xdp_feature support for bond driver, from Lorenzo Bianconi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits) bpf: Fix bad unlock balance on freeze_mutex libbpf: Ensure FD >= 3 during bpf_map__reuse_fd() libbpf: Ensure libbpf always opens files with O_CLOEXEC selftests/bpf: Check whether to run selftest libbpf: Change var type in datasec resize func bpf: drop unnecessary bpf_capable() check in BPF_MAP_FREEZE command libbpf: Selftests for resizing datasec maps libbpf: Add capability for resizing datasec maps selftests/bpf: Add path_fd-based BPF_OBJ_PIN and BPF_OBJ_GET tests libbpf: Add opts-based bpf_obj_pin() API and add support for path_fd bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands libbpf: Start v1.3 development cycle bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM bpftool: Specify XDP Hints ifname when loading program selftests/bpf: Add xdp_feature selftest for bond device selftests/bpf: Test bpf_sock_destroy selftests/bpf: Add helper to get port using getsockname bpf: Add bpf_sock_destroy kfunc bpf: Add kfunc filter function to 'struct btf_kfunc_id_set' bpf: udp: Implement batching for sockets iterator ... ==================== Link: https://lore.kernel.org/r/20230526222747.17775-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-26	kallsyms: remove unused arch_get_kallsym() helper	Arnd Bergmann
	The arch_get_kallsym() function was introduced so that x86 could override it, but that override was removed in bf904d2762ee ("x86/pti/64: Remove the SYSCALL64 entry trampoline"), so now this does nothing except causing a warning about a missing prototype: kernel/kallsyms.c:662:12: error: no previous prototype for 'arch_get_kallsym' [-Werror=missing-prototypes] 662 \| int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value, Restore the old behavior before d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines") to simplify the code and avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Alan Maguire <alan.maguire@oracle.com> [mcgrof: fold in bpf selftest fix] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-25	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/raw.c 3632679d9e4f ("ipv{4,6}/raw: fix output xfrm lookup wrt protocol") c85be08fc4fa ("raw: Stop using RTO_ONLINK.") https://lore.kernel.org/all/20230525110037.2b532b83@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/freescale/fec_main.c 9025944fddfe ("net: fec: add dma_wmb to ensure correct descriptor values") 144470c88c5d ("net: fec: using the standard return codes when xdp xmit errors") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-05-24	libbpf: Selftests for resizing datasec maps	JP Kobryn
	This patch adds test coverage for resizing datasec maps. The first two subtests resize the bss and custom data sections. In both cases, an initial array (of length one) has its element set to one. After resizing the rest of the array is filled with ones as well. A BPF program is then run to sum the respective arrays and back on the userspace side the sum is checked to be equal to the number of elements. The third subtest attempts to perform resizing under conditions that will result in either the resize failing or the BTF info being cleared. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230524004537.18614-3-inwardvessel@gmail.com
2023-05-23	bpf, sockmap: Test progs verifier error with latest clang	John Fastabend
	With a relatively recent clang (7090c10273119) and with this commit to fix warnings in selftests (c8ed668593972) that uses __sink(err) to resolve unused variables. We get the following verifier error. root@6e731a24b33a:/host/tools/testing/selftests/bpf# ./test_sockmap libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = (u32 )(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = (u32 )(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'bpf_sockmap': failed to load: -13 libbpf: failed to load object 'test_sockmap_kern.bpf.o' load_bpf_file: (-1) No such file or directory ERROR: (-1) load bpf failed libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = (u32 )(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = (u32 )(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'bpf_sockmap': failed to load: -13 libbpf: failed to load object 'test_sockhash_kern.bpf.o' load_bpf_file: (-1) No such file or directory ERROR: (-1) load bpf failed libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = (u32 )(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = (u32 )(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- To fix simply remove the err value because its not actually used anywhere in the testing. We can investigate the root cause later. Future patch should probably actually test the err value as well. Although if the map updates fail they will get caught eventually by userspace. Fixes: c8ed668593972 ("selftests/bpf: fix lots of silly mistakes pointed out by compiler") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-15-john.fastabend@gmail.com
2023-05-23	bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops	John Fastabend
	When BPF program drops pkts the sockmap logic 'eats' the packet and updates copied_seq. In the PASS case where the sk_buff is accepted we update copied_seq from recvmsg path so we need a new test to handle the drop case. Original patch series broke this resulting in test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256 After updated patch with fix. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-14-john.fastabend@gmail.com
2023-05-23	bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0	John Fastabend
	When session gracefully shutdowns epoll needs to wake up and any recv() readers should return 0 not the -EAGAIN they previously returned. Note we use epoll instead of select to test the epoll wake on shutdown event as well. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-12-john.fastabend@gmail.com
2023-05-19	selftests/bpf: Test bpf_sock_destroy	Aditi Ghag
	The test cases for destroying sockets mirror the intended usages of the bpf_sock_destroy kfunc using iterators. The destroy helpers set `ECONNABORTED` error code that we can validate in the test code with client sockets. But UDP sockets have an overriding error code from `disconnect()` called during abort, so the error code validation is only done for TCP sockets. The failure test cases validate that the `bpf_sock_destroy` kfunc is not allowed from program attach types other than BPF trace iterator, and such programs fail to load. Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-10-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-05-17	selftests/bpf: Fix dynptr/test_dynptr_is_null	Yonghong Song
	With latest llvm17, dynptr/test_dynptr_is_null subtest failed in my testing VM. The failure log looks like below: All error logs: tester_init:PASS:tester_log_buf 0 nsec process_subtest:PASS:obj_open_mem 0 nsec process_subtest:PASS:Can't alloc specs array 0 nsec verify_success:PASS:dynptr_success__open 0 nsec verify_success:PASS:bpf_object__find_program_by_name 0 nsec verify_success:PASS:dynptr_success__load 0 nsec verify_success:PASS:bpf_program__attach 0 nsec verify_success:FAIL:err unexpected err: actual 4 != expected 0 #65/9 dynptr/test_dynptr_is_null:FAIL The error happens for bpf prog test_dynptr_is_null in dynptr_success.c: if (bpf_dynptr_is_null(&ptr2)) { err = 4; goto exit; } The bpf_dynptr_is_null(&ptr) unexpectedly returned a non-zero value and the control went to the error path. Digging further, I found the root cause is due to function signature difference between kernel and user space. In kernel, we have ... __bpf_kfunc bool bpf_dynptr_is_null(struct bpf_dynptr_kern ptr) ... while in bpf_kfuncs.h we have: extern int bpf_dynptr_is_null(const struct bpf_dynptr ptr) __ksym; The kernel bpf_dynptr_is_null disasm code: ffffffff812f1a90 <bpf_dynptr_is_null>: ffffffff812f1a90: f3 0f 1e fa endbr64 ffffffff812f1a94: 0f 1f 44 00 00 nopl (%rax,%rax) ffffffff812f1a99: 53 pushq %rbx ffffffff812f1a9a: 48 89 fb movq %rdi, %rbx ffffffff812f1a9d: e8 ae 29 17 00 callq 0xffffffff81464450 <__asan_load8_noabort> ffffffff812f1aa2: 48 83 3b 00 cmpq $0x0, (%rbx) ffffffff812f1aa6: 0f 94 c0 sete %al ffffffff812f1aa9: 5b popq %rbx ffffffff812f1aaa: c3 retq Note that only 1-byte register %al is set and the other 7-bytes are not touched. In bpf program, the asm code for the above bpf_dynptr_is_null(&ptr2): 266: 85 10 00 00 ff ff ff ff call -0x1 267: b4 01 00 00 04 00 00 00 w1 = 0x4 268: 16 00 03 00 00 00 00 00 if w0 == 0x0 goto +0x3 <LBB9_8> Basically, 4-byte subregister is tested. This might cause error as the value other than the lowest byte might not be 0. This patch fixed the issue by using the identical func prototype across kernel and selftest user space. The fixed bpf asm code: 267: 85 10 00 00 ff ff ff ff call -0x1 268: 54 00 00 00 01 00 00 00 w0 &= 0x1 269: b4 01 00 00 04 00 00 00 w1 = 0x4 270: 16 00 03 00 00 00 00 00 if w0 == 0x0 goto +0x3 <LBB9_8> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230517040404.4023912-1-yhs@fb.com
2023-05-16	selftests/bpf: Move kfunc exports to bpf_testmod/bpf_testmod_kfunc.h	Jiri Olsa
	Move all kfunc exports into separate bpf_testmod_kfunc.h header file and include it in tests that need it. We will move all test kfuncs into bpf_testmod in following change, so it's convenient to have declarations in single place. The bpf_testmod_kfunc.h is included by both bpf_testmod and bpf programs that use test kfuncs. As suggested by David, the bpf_testmod_kfunc.h includes vmlinux.h and bpf/bpf_helpers.h for bpf programs build, so the declarations have proper __ksym attribute and we can resolve all the structs. Note in kfunc_call_test_subprog.c we can no longer use the sk_state define from bpf_tcp_helpers.h (because it clashed with vmlinux.h) and we need to address __sk_common.skc_state field directly. Acked-by: David Vernet <void@manifault.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230515133756.1658301-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-16	selftests/bpf: Fix s390 sock_field test failure	Yonghong Song
	llvm patch [1] enabled cross-function optimization for func arguments (ArgumentPromotion) at -O2 level. And this caused s390 sock_fields test failure ([2]). The failure is gone right now as patch [1] was reverted in [3]. But it is possible that patch [3] will be reverted again and then the test failure in [2] will show up again. So it is desirable to fix the failure regardless. The following is an analysis why sock_field test fails with llvm patch [1]. The main problem is in static __noinline bool sk_dst_port__load_word(struct bpf_sock sk) { __u32 word = (__u32 )&sk->dst_port; return word[0] == bpf_htons(0xcafe); } static __noinline bool sk_dst_port__load_half(struct bpf_sock sk) { __u16 half = (__u16 )&sk->dst_port; return half[0] == bpf_htons(0xcafe); } ... int read_sk_dst_port(struct __sk_buff skb) { ... sk = skb->sk; ... if (!sk_dst_port__load_word(sk)) RET_LOG(); if (!sk_dst_port__load_half(sk)) RET_LOG(); ... } Through some cross-function optimization by ArgumentPromotion optimization, the compiler does: static __noinline bool sk_dst_port__load_word(__u32 word_val) { return word_val == bpf_htons(0xcafe); } static __noinline bool sk_dst_port__load_half(__u16 half_val) { return half_val == bpf_htons(0xcafe); } ... int read_sk_dst_port(struct __sk_buff skb) { ... sk = skb->sk; ... __u32 word = (__u32 )&sk->dst_port; __u32 word_val = word[0]; ... if (!sk_dst_port__load_word(word_val)) RET_LOG(); __u16 half_val = word_val >> 16; if (!sk_dst_port__load_half(half_val)) RET_LOG(); ... } In current uapi bpf.h, we have struct bpf_sock { ... __be16 dst_port; /* network byte order / __u16 :16; / zero padding / ... }; But the old kernel (e.g., 5.6) we have struct bpf_sock { ... __u32 dst_port; / network byte order */ ... }; So for backward compatability reason, 4-byte load of dst_port is converted to 2-byte load internally. Specifically, 'word_val = word[0]' is replaced by 2-byte load by the verifier and this caused the trouble for later sk_dst_port__load_half() where half_val becomes 0. Typical usr program won't have such a code pattern tiggering the above bug, so let us fix the test failure with source code change. Adding an empty asm volatile statement seems enough to prevent undesired transformation. [1] https://reviews.llvm.org/D148269 [2] https://lore.kernel.org/bpf/e7f2c5e8-a50c-198d-8f95-388165f1e4fd@meta.com/ [3] https://reviews.llvm.org/rG141be5c062ecf22bd287afffd310e8ac4711444a Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230516214945.1013578-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-13	selftests/bpf: Correctly handle optlen > 4096	Stanislav Fomichev
	Even though it's not relevant in selftests, the people might still copy-paste from them. So let's take care of optlen > 4096 cases explicitly. Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230511170456.1759459-4-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-05-06	selftests/bpf: Accept mem from dynptr in helper funcs	Daniel Rosenberg
	This ensures that buffers retrieved from dynptr_data are allowed to be passed in to helpers that take mem, like bpf_strncmp Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20230506013134.2492210-6-drosen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-06	selftests/bpf: Check overflow in optional buffer	Daniel Rosenberg
	This ensures we still reject invalid memory accesses in buffers that are marked optional. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20230506013134.2492210-4-drosen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-06	selftests/bpf: Test allowing NULL buffer in dynptr slice	Daniel Rosenberg
	bpf_dynptr_slice(_rw) no longer requires a buffer for verification. If the buffer is needed, but not present, the function will return NULL. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20230506013134.2492210-3-drosen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-06	selftests/bpf: Add testcase for bpf_task_under_cgroup	Feng Zhou
	test_progs: Tests new kfunc bpf_task_under_cgroup(). The bpf program saves the new task's pid within a given cgroup to the remote_pid, which is convenient for the user-mode program to verify the test correctness. The user-mode program creates its own mount namespace, and mounts the cgroupsv2 hierarchy in there, call the fork syscall, then check if remote_pid and local_pid are unequal. Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230506031545.35991-3-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-04	selftests/bpf: revert iter test subprog precision workaround	Andrii Nakryiko
	Now that precision propagation is supported fully in the presence of subprogs, there is no need to work around iter test. Revert original workaround. This reverts be7dbd275dc6 ("selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests"). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-05-04	selftests/bpf: add precision propagation tests in the presence of subprogs	Andrii Nakryiko
	Add a bunch of tests validating verifier's precision backpropagation logic in the presence of subprog calls and/or callback-calling helpers/kfuncs. We validate the following conditions: - subprog_result_precise: static subprog r0 result precision handling; - global_subprog_result_precise: global subprog r0 precision shortcutting, similar to BPF helper handling; - callback_result_precise: similarly r0 marking precise for callback-calling helpers; - parent_callee_saved_reg_precise, parent_callee_saved_reg_precise_global: propagation of precision for callee-saved registers bypassing static/global subprogs; - parent_callee_saved_reg_precise_with_callback: same as above, but in the presence of callback-calling helper; - parent_stack_slot_precise, parent_stack_slot_precise_global: similar to above, but instead propagating precision of stack slot (spilled SCALAR reg); - parent_stack_slot_precise_with_callback: same as above, but in the presence of callback-calling helper; - subprog_arg_precise: propagation of precision of static subprog's input argument back to caller; - subprog_spill_into_parent_stack_slot_precise: negative test validating that verifier currently can't support backtracking of stack access with non-r10 register, we validate that we fallback to forcing precision for all SCALARs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-27	selftests/bpf: Fix selftest test_global_funcs/global_func1 failure with ↵	Yonghong Song
	latest clang The selftest test_global_funcs/global_func1 failed with the latest clang17. The reason is due to upstream ArgumentPromotionPass ([1]), which may manipulate static function parameters and cause inlining although the funciton is marked as noinline. The original code: static __attribute__ ((noinline)) int f0(int var, struct __sk_buff skb) { return skb->len; } __attribute__ ((noinline)) int f1(struct __sk_buff skb) { ... return f0(0, skb) + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff skb) { return f0(1, skb) + f1(skb) + f2(2, skb) + f3(3, skb, 4); } After ArgumentPromotionPass, the code is translated to static __attribute__ ((noinline)) int f0(int var, int skb_len) { return skb_len; } __attribute__ ((noinline)) int f1(struct __sk_buff skb) { ... return f0(0, skb->len) + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff skb) { return f0(1, skb->len) + f1(skb) + f2(2, skb) + f3(3, skb, 4); } And later llvm InstCombine phase recognized that f0() simplify returns the value of the second argument and removed f0() completely and the final code looks like: __attribute__ ((noinline)) int f1(struct __sk_buff skb) { ... return skb->len + skb->len; } ... SEC("tc") __failure __msg("combined stack size of 4 calls is 544") int global_func1(struct __sk_buff *skb) { return skb->len + f1(skb) + f2(2, skb) + f3(3, skb, 4); } If f0() is not inlined, the verification will fail with stack size 544 for a particular callchain. With f0() inlined, the maximum stack size is 512 which is in the limit. Let us add a `asm volatile ("")` in f0() to prevent ArgumentPromotionPass from hoisting the code to its caller, and this fixed the test failure. [1] https://reviews.llvm.org/D148269 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230425174744.1758515-1-yhs@fb.com
2023-04-27	selftests/bpf: xdp_hw_metadata track more timestamps	Jesper Dangaard Brouer
	To correlate the hardware RX timestamp with something, add tracking of two software timestamps both clock source CLOCK_TAI (see description in man clock_gettime(2)). XDP metadata is extended with xdp_timestamp for capturing when XDP received the packet. Populated with BPF helper bpf_ktime_get_tai_ns(). I could not find a BPF helper for getting CLOCK_REALTIME, which would have been preferred. In userspace when AF_XDP sees the packet another software timestamp is recorded via clock_gettime() also clock source CLOCK_TAI. Example output shortly after loading igc driver: poll: 1 (0) skip=1 fail=0 redir=2 xsk_ring_cons__peek: 1 0x12557a8: rx_desc[1]->addr=100000000009000 addr=9100 comp_addr=9000 rx_hash: 0x82A96531 with RSS type:0x1 rx_timestamp: 1681740540304898909 (sec:1681740540.3049) XDP RX-time: 1681740577304958316 (sec:1681740577.3050) delta sec:37.0001 (37000059.407 usec) AF_XDP time: 1681740577305051315 (sec:1681740577.3051) delta sec:0.0001 (92.999 usec) 0x12557a8: complete idx=9 addr=9000 The first observation is that the 37 sec difference between RX HW vs XDP timestamps, which indicate hardware is likely clock source CLOCK_REALTIME, because (as of this writing) CLOCK_TAI is initialised with a 37 sec offset. The 93 usec (microsec) difference between XDP vs AF_XDP userspace is the userspace wakeup time. On this hardware it was caused by CPU idle sleep states, which can be reduced by tuning /dev/cpu_dma_latency. View current requested/allowed latency bound via: hexdump --format '"%d\n"' /dev/cpu_dma_latency More explanation of the output and how this can be used to identify clock drift for the HW clock can be seen here[1]: [1] https://github.com/xdp-project/xdp-project/blob/master/areas/hints/xdp_hints_kfuncs02_driver_igc.org Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Yoong Siang <yoong.siang.song@intel.com> Link: https://lore.kernel.org/bpf/168182466298.616355.2544377890818617459.stgit@firesoul
2023-04-27	selftests/bpf: Add tests for dynptr convenience helpers	Joanne Koong
	Add various tests for the added dynptr convenience helpers. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230420071414.570108-6-joannelkoong@gmail.com
2023-04-24	selftests/bpf: avoid mark_all_scalars_precise() trigger in one of iter tests	Andrii Nakryiko
	iter_pass_iter_ptr_to_subprog subtest is relying on actual array size being passed as subprog parameter. This combined with recent fixes to precision tracking in conditional jumps ([0]) is now causing verifier to backtrack all the way to the point where sum() and fill() subprogs are called, at which point precision backtrack bails out and forces all the states to have precise SCALAR registers. This in turn causes each possible value of i within fill() and sum() subprogs to cause a different non-equivalent state, preventing iterator code to converge. For now, change the test to assume fixed size of passed in array. Once BPF verifier supports precision tracking across subprogram calls, these changes will be reverted as unnecessary. [0] 71b547f56124 ("bpf: Fix incorrect verifier pruning due to missing register precision taints") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230424235128.1941726-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-22	selftests/bpf: verifier/prevent_map_lookup converted to inline assembly	Eduard Zingerman
	Test verifier/prevent_map_lookup automatically converted to use inline assembly. This was a part of a series [1] but could not be applied becuase another patch from a series had to be witheld. [1] https://lore.kernel.org/bpf/20230421174234.2391278-1-eddyz87@gmail.com/ Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421204514.2450907-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/value_ptr_arith converted to inline assembly	Eduard Zingerman
	Test verifier/value_ptr_arith automatically converted to use inline assembly. Test cases "sanitation: alu with different scalars 2" and "sanitation: alu with different scalars 3" are updated to avoid -ENOENT as return value, as __retval() annotation only supports numeric literals. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-25-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/value_illegal_alu converted to inline assembly	Eduard Zingerman
	Test verifier/value_illegal_alu automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-24-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/unpriv converted to inline assembly	Eduard Zingerman
	Test verifier/unpriv semi-automatically converted to use inline assembly. The verifier/unpriv.c had to be split in two parts: - the bulk of the tests is in the progs/verifier_unpriv.c; - the single test that needs `struct bpf_perf_event_data` definition is in the progs/verifier_unpriv_perf.c. The tests above can't be in a single file because: - first requires inclusion of the filter.h header (to get access to BPF_ST_MEM macro, inline assembler does not support this isntruction); - the second requires vmlinux.h, which contains definitions conflicting with filter.h. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-23-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/subreg converted to inline assembly	Eduard Zingerman
	Test verifier/subreg automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-22-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/spin_lock converted to inline assembly	Eduard Zingerman
	Test verifier/spin_lock automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-21-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/sock converted to inline assembly	Eduard Zingerman
	Test verifier/sock automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-20-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/search_pruning converted to inline assembly	Eduard Zingerman
	Test verifier/search_pruning automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-19-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/runtime_jit converted to inline assembly	Eduard Zingerman
	Test verifier/runtime_jit automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-18-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/regalloc converted to inline assembly	Eduard Zingerman
	Test verifier/regalloc automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-17-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21	selftests/bpf: verifier/ref_tracking converted to inline assembly	Eduard Zingerman
	Test verifier/ref_tracking automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-16-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>