summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-04Merge tag 'linux_kselftest-fixes-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to build warnings in several tests and fixes to ftrace tests" * tag 'linux_kselftest-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/futex: don't pass a const char* to asprintf(3) selftests/futex: don't redefine .PHONY targets (all, clean) selftests/tracing: Fix event filter test to retry up to 10 times selftests/futex: pass _GNU_SOURCE without a value to the compiler selftests/overlayfs: Fix build error on ppc64 selftests/openat2: Fix build warnings on ppc64 selftests: cachestat: Fix build warnings on ppc64 tracing/selftests: Fix kprobe event name test for .isra. functions selftests/ftrace: Update required config selftests/ftrace: Fix to check required event file kselftest/alsa: Ensure _GNU_SOURCE is defined
2024-06-04Merge branch 'efi/next' into efi/urgentArd Biesheuvel
2024-06-04openvswitch: Remove generic .ndo_get_stats64Breno Leitao
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20240531111552.3209198-2-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04openvswitch: Move stats allocation to coreBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Move openvswitch driver to leverage the core allocation. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240531111552.3209198-1-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04KVM: arm64: Ensure that SME controls are disabled in protected modeFuad Tabba
KVM (and pKVM) do not support SME guests. Therefore KVM ensures that the host's SME state is flushed and that SME controls for enabling access to ZA storage and for streaming are disabled. pKVM needs to protect against a buggy/malicious host. Ensure that it wouldn't run a guest when protected mode is enabled should any of the SME controls be enabled. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx formatFuad Tabba
When setting/clearing CPACR bits for EL0 and EL1, use the ELx format of the bits, which covers both. This makes the code clearer, and reduces the chances of accidentally missing a bit. No functional change intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba
Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Eagerly restore host fpsimd/sve state in pKVMFuad Tabba
When running in protected mode we don't want to leak protected guest state to the host, including whether a guest has used fpsimd/sve. Therefore, eagerly restore the host state on guest exit when running in protected mode, which happens only if the guest has used fpsimd/sve. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba
Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Specialize handling of host fpsimd state on trapFuad Tabba
In subsequent patches, n/vhe will diverge on saving the host fpsimd/sve state when taking a guest fpsimd/sve trap. Add a specialized helper to handle it. No functional change intended. Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helperFuad Tabba
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be toggled in different parts of the code, but the exact bits and their polarity differ between these two formats and the mode (vhe/nvhe/hvhe). To reduce the amount of duplicated code and the chance of getting the wrong bit/polarity or missing a field, abstract the set/clear of CPTR_EL2 bits behind a helper. Since (h)VHE is the way of the future, use the CPACR_EL1 format, which is a subset of the VHE CPTR_EL2, as a reference. No functional change intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_stateFuad Tabba
Since the prototypes for __sve_save_state/__sve_restore_state at hyp were added, the underlying macro has acquired a third parameter for saving/restoring ffr. Fix the prototypes to account for the third parameter, and restore the ffr for the guest since it is saved. Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Reintroduce __sve_save_stateFuad Tabba
Now that the hypervisor is handling the host sve state in protected mode, it needs to be able to save it. This reverts commit e66425fc9ba3 ("KVM: arm64: Remove unused __sve_save_state"). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04Merge branch 'tcp-refactor-skb_cmp_decrypted-checks'Paolo Abeni
Jakub Kicinski says: ==================== tcp: refactor skb_cmp_decrypted() checks Refactor the input patch coalescing checks and wrap "EOR forcing" logic into a helper. This will hopefully make the code easier to follow. While at it throw some DEBUG_NET checks into skb_shift(). ==================== Link: https://lore.kernel.org/r/20240530233616.85897-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net: skb: add compatibility warnings to skb_shift()Jakub Kicinski
According to current semantics we should never try to shift data between skbs which differ on decrypted or pp_recycle status. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04tcp: add a helper for setting EOR on tail skbJakub Kicinski
TLS (and hopefully soon PSP will) use EOR to prevent skbs with different decrypted state from getting merged, without adding new tests to the skb handling. In both cases once the connection switches to an "encrypted" state, all subsequent skbs will be encrypted, so a single "EOR fence" is sufficient to prevent mixing. Add a helper for setting the EOR bit, to make this arrangement more explicit. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04tcp: wrap mptcp and decrypted checks into tcp_skb_can_collapse_rx()Jakub Kicinski
tcp_skb_can_collapse() checks for conditions which don't make sense on input. Because of this we ended up sprinkling a few pairs of mptcp_skb_can_collapse() and skb_cmp_decrypted() calls on the input path. Group them in a new helper. This should make it less likely that someone will check mptcp and not decrypted or vice versa when adding new code. This implicitly adds a decrypted check early in tcp_collapse(). AFAIU this will very slightly increase our ability to collapse packets under memory pressure, not a real bug. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net: tls: fix marking packets as decryptedJakub Kicinski
For TLS offload we mark packets with skb->decrypted to make sure they don't escape the host without getting encrypted first. The crypto state lives in the socket, so it may get detached by a call to skb_orphan(). As a safety check - the egress path drops all packets with skb->decrypted and no "crypto-safe" socket. The skb marking was added to sendpage only (and not sendmsg), because tls_device injected data into the TCP stack using sendpage. This special case was missed when sendpage got folded into sendmsg. Fixes: c5c37af6ecad ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04Merge branch 'net-allow-dissecting-matching-tunnel-control-flags'Paolo Abeni
Davide Caratti says: ==================== net: allow dissecting/matching tunnel control flags Ilya says: "for correct matching on decapsulated packets, we should match on not only tunnel id and headers, but also on tunnel configuration flags like TUNNEL_NO_CSUM and TUNNEL_DONT_FRAGMENT. This is done to distinguish similar tunnels with slightly different configs. And it is important since tunnel configuration is flow based, i.e. can be different for every packet, even though the main tunnel port is the same." - patch 1 extends the kernel's flow dissector to extract these flags from the packet's tunnel metadata. - patch 2 extends TC flower to match on any combination of TUNNEL_NO_CSUM, TUNNEL_DONT_FRAGMENT, TUNNEL_OAM, TUNNEL_CRIT_OPT v4: - fix kernel-doc warning in flow_dissector.h (thanks Jakub) v3: - rebase on top of new uAPI bits and internals after commit 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps"). Use of network byte order is no more needed, since these bits match on metadata: convert netlink attributes to be u32. - also include TUNNEL_CRIT_OPT v2: - use NL_REQ_ATTR_CHECK() where possible (thanks Jamal) - don't overwrite 'ret' in the error path of fl_set_key_flags() ==================== Link: https://lore.kernel.org/r/cover.1717088241.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net/sched: cls_flower: add support for matching tunnel control flagsDavide Caratti
extend cls_flower to match TUNNEL_FLAGS_PRESENT bits in tunnel metadata. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04flow_dissector: add support for tunnel control flagsDavide Caratti
Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata. This is a prerequisite for matching these control flags using TC flower. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net: count drops due to missing qdisc as dev->tx_dropsJakub Kicinski
Catching and debugging missing qdiscs is pretty tricky. When qdisc is deleted we replace it with a noop qdisc, which silently drops all the packets. Since the noop qdisc has a single static instance we can't count drops at the qdisc level. Count them as dev->tx_drops. ip netns add red ip link add type veth peer netns red ip link set dev veth0 up ip -netns red link set dev veth0 up ip a a dev veth0 10.0.0.1/24 ip -netns red a a dev veth0 10.0.0.2/24 ping -c 2 10.0.0.2 # 2 packets transmitted, 2 received, 0% packet loss, time 1031ms ip -s link show dev veth0 # TX: bytes packets errors dropped carrier collsns # 1314 17 0 0 0 0 tc qdisc replace dev veth0 root handle 1234: mq tc qdisc replace dev veth0 parent 1234:1 pfifo tc qdisc del dev veth0 parent 1234:1 ping -c 2 10.0.0.2 # 2 packets transmitted, 0 received, 100% packet loss, time 1034ms ip -s link show dev veth0 # TX: bytes packets errors dropped carrier collsns # 1314 17 0 3 0 0 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240529162527.3688979-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-03Merge branch ↵Alexei Starovoitov
'enable-bpf-programs-to-declare-arrays-of-kptr-bpf_rb_root-and-bpf_list_head' Kui-Feng Lee says: ==================== Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head. Some types, such as type kptr, bpf_rb_root, and bpf_list_head, are treated in a special way. Previously, these types could not be the type of a field in a struct type that is used as the type of a global variable. They could not be the type of a field in a struct type that is used as the type of a field in the value type of a map either. They could not even be the type of array elements. This means that they can only be the type of global variables or of direct fields in the value type of a map. The patch set aims to enable the use of these specific types in arrays and struct fields, providing flexibility. It examines the types of global variables or the value types of maps, such as arrays and struct types, recursively to identify these special types and generate field information for them. For example, ... struct task_struct __kptr *ptr[3]; ... it will create 3 instances of "struct btf_field" in the "btf_record" of the data section. [..., btf_field(offset=0x100, type=BPF_KPTR_REF), btf_field(offset=0x108, type=BPF_KPTR_REF), btf_field(offset=0x110, type=BPF_KPTR_REF), ... ] It creates a record of each of three elements. These three records are almost identical except their offsets. Another example is ... struct A { ... struct task_struct __kptr *task; struct bpf_rb_root root; ... } struct A foo[2]; it will create 4 records. [..., btf_field(offset=0x7100, type=BPF_KPTR_REF), btf_field(offset=0x7108, type=BPF_RB_ROOT:), btf_field(offset=0x7200, type=BPF_KPTR_REF), btf_field(offset=0x7208, type=BPF_RB_ROOT:), ... ] Assuming that the size of an element/struct A is 0x100 and "foo" starts at 0x7000, it includes two kptr records at 0x7100 and 0x7200, and two rbtree root records at 0x7108 and 0x7208. All these field information will be flatten, for struct types, and repeated, for arrays. --- Changes from v6: - Return BPF_KPTR_REF from btf_get_field_type() only if var_type is a struct type. - Pass btf and type to btf_get_field_type(). Changes from v5: - Ensure field->offset values of kptrs are advanced correctly from one nested struct/or array to another. Changes from v4: - Return -E2BIG for i == MAX_RESOLVE_DEPTH. Changes from v3: - Refactor the common code of btf_find_struct_field() and btf_find_datasec_var(). - Limit the number of levels looking into a struct types. Changes from v2: - Support fields in nested struct type. - Remove nelems and duplicate field information with offset adjustments for arrays. Changes from v1: - Move the check of element alignment out of btf_field_cmp() to btf_record_find(). - Change the order of the previous patch 4 "bpf: check_map_kptr_access() compute the offset from the reg state" as the patch 7 now. - Reject BPF_RB_NODE and BPF_LIST_NODE with nelems > 1. - Rephrase the commit log of the patch "bpf: check_map_access() with the knowledge of arrays" to clarify the alignment on elements. v6: https://lore.kernel.org/all/20240520204018.884515-1-thinker.li@gmail.com/ v5: https://lore.kernel.org/all/20240510011312.1488046-1-thinker.li@gmail.com/ v4: https://lore.kernel.org/all/20240508063218.2806447-1-thinker.li@gmail.com/ v3: https://lore.kernel.org/all/20240501204729.484085-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20240412210814.603377-1-thinker.li@gmail.com/ v1: https://lore.kernel.org/bpf/20240410004150.2917641-1-thinker.li@gmail.com/ Kui-Feng Lee (9): bpf: Remove unnecessary checks on the offset of btf_field. bpf: Remove unnecessary call to btf_field_type_size(). bpf: refactor btf_find_struct_field() and btf_find_datasec_var(). bpf: create repeated fields for arrays. bpf: look into the types of the fields of a struct type recursively. bpf: limit the number of levels of a nested struct type. selftests/bpf: Test kptr arrays and kptrs in nested struct fields. selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types. selftests/bpf: Test global bpf_list_head arrays. kernel/bpf/btf.c | 310 ++++++++++++------ kernel/bpf/verifier.c | 4 +- .../selftests/bpf/prog_tests/cpumask.c | 5 + .../selftests/bpf/prog_tests/linked_list.c | 12 + .../testing/selftests/bpf/prog_tests/rbtree.c | 47 +++ .../selftests/bpf/progs/cpumask_success.c | 171 ++++++++++ .../testing/selftests/bpf/progs/linked_list.c | 42 +++ tools/testing/selftests/bpf/progs/rbtree.c | 77 +++++ 8 files changed, 558 insertions(+), 110 deletions(-) ==================== Link: https://lore.kernel.org/r/20240523174202.461236-1-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03selftests/bpf: Test global bpf_list_head arrays.Kui-Feng Lee
Make sure global arrays of bpf_list_heads and fields of bpf_list_heads in nested struct types work correctly. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-10-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.Kui-Feng Lee
Make sure global arrays of bpf_rb_root and fields of bpf_rb_root in nested struct types work correctly. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-9-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03selftests/bpf: Test kptr arrays and kptrs in nested struct fields.Kui-Feng Lee
Make sure that BPF programs can declare global kptr arrays and kptr fields in struct types that is the type of a global variable or the type of a nested descendant field in a global variable. An array with only one element is special case, that it treats the element like a non-array kptr field. Nested arrays are also tested to ensure they are handled properly. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-8-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: limit the number of levels of a nested struct type.Kui-Feng Lee
Limit the number of levels looking into struct types to avoid running out of stack space. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-7-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: look into the types of the fields of a struct type recursively.Kui-Feng Lee
The verifier has field information for specific special types, such as kptr, rbtree root, and list head. These types are handled differently. However, we did not previously examine the types of fields of a struct type variable. Field information records were not generated for the kptrs, rbtree roots, and linked_list heads that are not located at the outermost struct type of a variable. For example, struct A { struct task_struct __kptr * task; }; struct B { struct A mem_a; } struct B var_b; It did not examine "struct A" so as not to generate field information for the kptr in "struct A" for "var_b". This patch enables BPF programs to define fields of these special types in a struct type other than the direct type of a variable or in a struct type that is the type of a field in the value type of a map. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-6-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: create repeated fields for arrays.Kui-Feng Lee
The verifier uses field information for certain special types, such as kptr, rbtree root, and list head. These types are treated differently. However, we did not previously support these types in arrays. This update examines arrays and duplicates field information the same number of times as the length of the array if the element type is one of the special types. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-5-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: refactor btf_find_struct_field() and btf_find_datasec_var().Kui-Feng Lee
Move common code of the two functions to btf_find_field_one(). Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-4-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: Remove unnecessary call to btf_field_type_size().Kui-Feng Lee
field->size has been initialized by bpf_parse_fields() with the value returned by btf_field_type_size(). Use it instead of calling btf_field_type_size() again. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-3-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: Remove unnecessary checks on the offset of btf_field.Kui-Feng Lee
reg_find_field_offset() always return a btf_field with a matching offset value. Checking the offset of the returned btf_field is unnecessary. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-2-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03Merge tag 'wireless-2024-06-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.10-rc3 The first fixes for v6.10. And we have a big one, I suspect the biggest wireless pull request we ever had. There are fixes all over, both in stack and drivers. Likely the most important here are mt76 not working on mt7615 devices, ath11k not being able to connect to 6 GHz networks and rtlwifi suffering from packet loss. But of course there's much more. * tag 'wireless-2024-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (37 commits) wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS wifi: mt76: mt7615: add missing chanctx ops wifi: wilc1000: document SRCU usage instead of SRCU Revert "wifi: wilc1000: set atomic flag on kmemdup in srcu critical section" Revert "wifi: wilc1000: convert list management to RCU" wifi: mac80211: fix UBSAN noise in ieee80211_prep_hw_scan() wifi: mac80211: correctly parse Spatial Reuse Parameter Set element wifi: mac80211: fix Spatial Reuse element size check wifi: iwlwifi: mvm: don't read past the mfuart notifcation wifi: iwlwifi: mvm: Fix scan abort handling with HW rfkill wifi: iwlwifi: mvm: check n_ssids before accessing the ssids wifi: iwlwifi: mvm: properly set 6 GHz channel direct probe option wifi: iwlwifi: mvm: handle BA session teardown in RF-kill wifi: iwlwifi: mvm: Handle BIGTK cipher in kek_kck cmd wifi: iwlwifi: mvm: remove stale STA link data during restart wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef wifi: iwlwifi: mvm: set properly mac header wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64 wifi: iwlwifi: mvm: d3: fix WoWLAN command version lookup wifi: iwlwifi: mvm: fix a crash on 7265 ... ==================== Link: https://lore.kernel.org/r/20240603115129.9494CC2BD10@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03lib/test_rhashtable: add missing MODULE_DESCRIPTION() macroJeff Johnson
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_rhashtable.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240531-md-lib-test_rhashtable-v1-1-cd6d4138f1b6@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03Merge branch 'dst_cache-fix-possible-races'Jakub Kicinski
Eric Dumazet says: ==================== dst_cache: fix possible races This series is inspired by various undisclosed syzbot reports hinting at corruptions in dst_cache structures. It seems at least four users of dst_cache are racy against BH reentrancy. Last patch is adding a DEBUG_NET check to catch future misuses. ==================== Link: https://lore.kernel.org/r/20240531132636.2637995-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03net: dst_cache: add two DEBUG_NET warningsEric Dumazet
After fixing four different bugs involving dst_cache users, it might be worth adding a check about BH being blocked by dst_cache callers. DEBUG_NET_WARN_ON_ONCE(!in_softirq()); It is not fatal, if we missed valid case where no BH deadlock is to be feared, we might change this. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03ila: block BH in ila_output()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. ila_output() is called from lwtunnel_output() possibly from process context, and under rcu_read_lock(). We might be interrupted by a softirq, re-enter ila_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable(). Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03ipv6: sr: block BH in seg6_output_core() and seg6_input_core()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in seg6_output_core() is not good enough, because seg6_output_core() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter seg6_output_core() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in seg6_input_core(). Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled") Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Lebrun <dlebrun@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in rpl_output() is not good enough, because rpl_output() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter rpl_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in rpl_input(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <aahringo@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03ipv6: ioam: block BH from ioam6_output()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in ioam6_output() is not good enough, because ioam6_output() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter ioam6_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Justin Iurman <justin.iurman@uliege.be> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03vmxnet3: disable rx data ring on dma allocation failureMatthias Stocker
When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base, the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset rq->data_ring.desc_size for the data ring that failed, which presumably causes the hypervisor to reference it on packet reception. To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell the hypervisor to disable this feature. [ 95.436876] kernel BUG at net/core/skbuff.c:207! [ 95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1 [ 95.441558] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [ 95.443481] RIP: 0010:skb_panic+0x4d/0x4f [ 95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50 ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9 ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24 [ 95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246 [ 95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f [ 95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f [ 95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60 [ 95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000 [ 95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0 [ 95.455682] FS: 0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000 [ 95.457178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0 [ 95.459791] Call Trace: [ 95.460515] <IRQ> [ 95.461180] ? __die_body.cold+0x19/0x27 [ 95.462150] ? die+0x2e/0x50 [ 95.462976] ? do_trap+0xca/0x110 [ 95.463973] ? do_error_trap+0x6a/0x90 [ 95.464966] ? skb_panic+0x4d/0x4f [ 95.465901] ? exc_invalid_op+0x50/0x70 [ 95.466849] ? skb_panic+0x4d/0x4f [ 95.467718] ? asm_exc_invalid_op+0x1a/0x20 [ 95.468758] ? skb_panic+0x4d/0x4f [ 95.469655] skb_put.cold+0x10/0x10 [ 95.470573] vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3] [ 95.471853] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3] [ 95.473185] __napi_poll+0x2b/0x160 [ 95.474145] net_rx_action+0x2c6/0x3b0 [ 95.475115] handle_softirqs+0xe7/0x2a0 [ 95.476122] __irq_exit_rcu+0x97/0xb0 [ 95.477109] common_interrupt+0x85/0xa0 [ 95.478102] </IRQ> [ 95.478846] <TASK> [ 95.479603] asm_common_interrupt+0x26/0x40 [ 95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ 95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 [ 95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246 [ 95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000 [ 95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001 [ 95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3 [ 95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260 [ 95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000 [ 95.495035] acpi_safe_halt+0x14/0x20 [ 95.496127] acpi_idle_do_entry+0x2f/0x50 [ 95.497221] acpi_idle_enter+0x7f/0xd0 [ 95.498272] cpuidle_enter_state+0x81/0x420 [ 95.499375] cpuidle_enter+0x2d/0x40 [ 95.500400] do_idle+0x1e5/0x240 [ 95.501385] cpu_startup_entry+0x29/0x30 [ 95.502422] start_secondary+0x11c/0x140 [ 95.503454] common_startup_64+0x13e/0x141 [ 95.504466] </TASK> [ 95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse [ 95.516536] ---[ end trace 0000000000000000 ]--- Fixes: 6f4833383e85 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()") Signed-off-by: Matthias Stocker <mstocker@barracuda.com> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Ronak Doshi <ronak.doshi@broadcom.com> Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03r8152: Wake up the system if the we need a resetDouglas Anderson
If we get to the end of the r8152's suspend() routine and we find that the USB device is INACCESSIBLE then it means that some of our preparation for suspend didn't take place. We need a USB reset to get ourselves back in a consistent state so we can try again and that can't happen during system suspend. Call pm_wakeup_event() to wake the system up in this case. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/66590f25.170a0220.8b5ad.1752@mx.google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03r8152: If inaccessible at resume time, issue a resetDouglas Anderson
If we happened to get a USB transfer error during the transition to suspend then the usb_queue_reset_device() that r8152_control_msg() calls will get dropped on the floor. This is because usb_lock_device_for_reset() (which usb_queue_reset_device() uses) silently fails if it's called when a device is suspended or if too much time passes. Let's resolve this by resetting the device ourselves in r8152's resume() function. NOTE: due to timing, it's _possible_ that we could end up with two USB resets: the one queued previously and the one called from the resume() patch. This didn't happen in test cases I ran, though it's conceivably possible. We can't easily know if this happened since usb_queue_reset_device() can just silently drop the reset request. In any case, it's not expected that this is a problem since the two resets can't run at the same time (because of the device lock) and it should be OK to reset the device twice. If somehow the double-reset causes problems we could prevent resets from being queued up while suspend is running. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/66590f22.170a0220.8b5ad.1750@mx.google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03Merge tag 'cxl-fixes-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Compile fix for cxl-test from missing linux/vmalloc.h - Fix for memregion leaks in devm_cxl_add_region() * tag 'cxl-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix memregion leaks in devm_cxl_add_region() cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c
2024-06-03selftests/bpf: Drop duplicate bpf_map_lookup_elem in test_sockmapGeliang Tang
bpf_map_lookup_elem is invoked in bpf_prog3() already, no need to invoke it again. This patch drops it. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/ea8458462b876ee445173e3effb535fd126137ed.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Check length of recv in test_sockmapGeliang Tang
The value of recv in msg_loop may be negative, like EWOULDBLOCK, so it's necessary to check if it is positive before accumulating it to bytes_recvd. Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/5172563f7c7b2a2e953cef02e89fc34664a7b190.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Fix size of map_fd in test_sockmapGeliang Tang
The array size of map_fd[] is 9, not 8. This patch changes it as a more general form: ARRAY_SIZE(map_fd). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/0972529ee01ebf8a8fd2b310bdec90831c94be77.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Drop prog_fd array in test_sockmapGeliang Tang
The program fds can be got by using bpf_program__fd(progs[]), then prog_fd becomes useless. This patch drops it. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/9a6335e4d8dbab23c0d8906074457ceddd61e74b.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Replace tx_prog_fd with tx_prog in test_sockmapGeliang Tang
bpf_program__attach_sockmap() needs to take a parameter of type bpf_program instead of an fd, so tx_prog_fd becomes useless. This patch uses a pointer tx_prog to point to an item in progs[] array. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/23b37f932c547dd1ebfe154bbc0b0e957be21ee6.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Use bpf_link attachments in test_sockmapGeliang Tang
Switch attachments to bpf_link using bpf_program__attach_sockmap() instead of bpf_prog_attach(). This patch adds a new array progs[] to replace prog_fd[] array, set in populate_progs() for each program in bpf object. And another new array links[] to save the attached bpf_link. It is initalized as NULL in populate_progs, set as the return valuses of bpf_program__attach_sockmap(), and detached by bpf_link__detach(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/32cf8376a810e2e9c719f8e4cfb97132ed2d1f9c.1716446893.git.tanggeliang@kylinos.cn