summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-05Merge branch 'devlink-const'David S. Miller
Christophe JAILLET says: ==================== devlink: Constify struct devlink_dpipe_table_ops Patch 1 updates devl_dpipe_table_register() and struct devlink_dpipe_table to accept "const struct devlink_dpipe_table_ops". Then patch 2 updates the only user of this function. This is compile tested only. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05mlxsw: spectrum_router: Constify struct devlink_dpipe_table_opsChristophe JAILLET
'struct devlink_dpipe_table_ops' are not modified in this driver. Constifying these structures moves some data to a read-only section, so increase overall security. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 15557 712 0 16269 3f8d drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.o After: ===== text data bss dec hex filename 15789 488 0 16277 3f95 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05devlink: Constify the 'table_ops' parameter of devl_dpipe_table_register()Christophe JAILLET
"struct devlink_dpipe_table_ops" only contains some function pointers. Update "struct devlink_dpipe_table" and the 'table_ops' parameter of devl_dpipe_table_register() so that structures in drivers can be constified. Constifying these structures will move some data to a read-only section, so increase overall security. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: phy: aquantia: add support for PHY LEDsDaniel Golle
Aquantia Ethernet PHYs got 3 LED output pins which are typically used to indicate link status and activity. Add a minimal LED controller driver supporting the most common uses with the 'netdev' trigger as well as software-driven forced control of the LEDs. Signed-off-by: Daniel Golle <daniel@makrotopia.org> [ rework indentation, fix checkpatch error and improve some functions ] Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: phy: aquantia: move priv and hw stat to headerChristian Marangi
In preparation for LEDs support, move priv and hw stat to header to reference priv struct also in other .c outside aquantia.main Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: ethtool: remove unused struct 'cable_test_tdr_req_info'Dr. David Alan Gilbert
'cable_test_tdr_req_info' is unused since the original commit f2bc8ad31a7f ("net: ethtool: Allow PHY cable test TDR data to configured"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: caif: remove unused structsDr. David Alan Gilbert
'cfpktq' has been unused since commit 73d6ac633c6c ("caif: code cleanup"). 'caif_packet_funcs' is declared but never defined. Remove both of them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: remove NULL-pointer net parameter in ip_metrics_convertJason Xing
When I was doing some experiments, I found that when using the first parameter, namely, struct net, in ip_metrics_convert() always triggers NULL pointer crash. Then I digged into this part, realizing that we can remove this one due to its uselessness. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: bridge: fix an inconsistent indentationChen Hanxiao
Smatch complains: net/bridge/br_netlink_tunnel.c: 318 br_process_vlan_tunnel_info() warn: inconsistent indenting Fix it with a proper indenting Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net: phy: Micrel KSZ8061: fix errata solution not taking effect problemTristram Ha
KSZ8061 needs to write to a MMD register at driver initialization to fix an errata. This worked in 5.0 kernel but not in newer kernels. The issue is the main phylib code no longer resets PHY at the very beginning. Calling phy resuming code later will reset the chip if it is already powered down at the beginning. This wipes out the MMD register write. Solution is to implement a phy resume function for KSZ8061 to take care of this problem. Fixes: 232ba3a51cc2 ("net: phy: Micrel KSZ8061: link failure after cable connect") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05dt-bindings: dsa: Rewrite Vitesse VSC73xx in schemaLinus Walleij
This rewrites the Vitesse VSC73xx DSA switches DT binding in schema. It was a bit tricky since I needed to come up with some way of applying the SPI properties only on SPI devices and not platform devices, but I figured something out that works. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05net/smc: avoid overwriting when adjusting sock bufsizesWen Gu
When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf to sysctl_tcp_wmem[1], since this may overwrite the value set by tcp_sndbuf_expand() in TCP connection establishment. And the other setting sk_{snd|rcv}buf to sysctl value in smc_adjust_sock_bufsizes() can also be omitted since the initialization of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem or ipv4_sysctl_tcp_{w|r}mem[1]. Fixes: 30c3c4a4497c ("net/smc: Use correct buffer sizes when switching between TCP and SMC") Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba.com Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>, too. Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05octeontx2-af: Always allocate PF entries from low prioriy zoneSubbaraya Sundeep
PF mcam entries has to be at low priority always so that VF can install longest prefix match rules at higher priority. This was taken care currently but when priority allocation wrt reference entry is requested then entries are allocated from mid-zone instead of low priority zone. Fix this and always allocate entries from low priority zone for PFs. Fixes: 7df5b4b260dd ("octeontx2-af: Allocate low priority entries for PF") Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05efi: Add missing __nocfi annotations to runtime wrappersArd Biesheuvel
The EFI runtime wrappers are a sandbox for calling into EFI runtime services, which are invoked using indirect calls. When running with kCFI enabled, the compiler will require the target of any indirect call to be type annotated. Given that the EFI runtime services prototypes and calling convention are governed by the EFI spec, not the Linux kernel, adding such type annotations for firmware routines is infeasible, and so the compiler must be informed that prototype validation should be omitted. Add the __nocfi annotation at the appropriate places in the EFI runtime wrapper code to achieve this. Note that this currently only affects 32-bit ARM, given that other architectures that support both kCFI and EFI use an asm wrapper to call EFI runtime services, and this hides the indirect call from the compiler. Fixes: 1a4fec49efe5 ("ARM: 9392/2: Support CLANG CFI") Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-06-05Revert "xsk: Document ability to redirect to any socket bound to the same umem"Magnus Karlsson
This reverts commit 968595a93669b6b4f6d1fcf80cf2d97956b6868f. Reported-by: Yuval El-Hanany <YuvalE@radware.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com Link: https://lore.kernel.org/bpf/20240604122927.29080-3-magnus.karlsson@gmail.com
2024-06-05Revert "xsk: Support redirect to any socket bound to the same umem"Magnus Karlsson
This reverts commit 2863d665ea41282379f108e4da6c8a2366ba66db. This patch introduced a potential kernel crash when multiple napi instances redirect to the same AF_XDP socket. By removing the queue_index check, it is possible for multiple napi instances to access the Rx ring at the same time, which will result in a corrupted ring state which can lead to a crash when flushing the rings in __xsk_flush(). This can happen when the linked list of sockets to flush gets corrupted by concurrent accesses. A quick and small fix is not possible, so let us revert this for now. Reported-by: Yuval El-Hanany <YuvalE@radware.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/xdp-newbies/8100DBDC-0B7C-49DB-9995-6027F6E63147@radware.com Link: https://lore.kernel.org/bpf/20240604122927.29080-2-magnus.karlsson@gmail.com
2024-06-05bpf: Set run context for rawtp test_run callbackJiri Olsa
syzbot reported crash when rawtp program executed through the test_run interface calls bpf_get_attach_cookie helper or any other helper that touches task->bpf_ctx pointer. Setting the run context (task->bpf_ctx pointer) for test_run callback. Fixes: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value") Reported-by: syzbot+3ab78ff125b7979e45f9@syzkaller.appspotmail.com Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9 Link: https://lore.kernel.org/bpf/20240604150024.359247-1-jolsa@kernel.org
2024-06-05tpm: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Link: https://lore.kernel.org/all/20240520224620.9480-4-tony.luck@intel.com/ Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-06-05tpm_tis: Do *not* flush uninitialized workJan Beulich
tpm_tis_core_init() may fail before tpm_tis_probe_irq_single() is called, in which case tpm_tis_remove() unconditionally calling flush_work() is triggering a warning for .func still being NULL. Cc: stable@vger.kernel.org # v6.5+ Fixes: 481c2d14627d ("tpm,tpm_tis: Disable interrupts after 1000 unhandled IRQs") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2024-06-04Merge tag 'devicetree-fixes-for-6.10-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix regression in 'interrupt-map' handling affecting Apple M1 mini (at least) - Fix binding example warning in stm32 st,mlahb binding - Fix schema error in Allwinner platform binding causing lots of spurious warnings - Add missing MODULE_DESCRIPTION() to DT kunit tests * tag 'devicetree-fixes-for-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: of: property: Fix fw_devlink handling of interrupt-map of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw() dt-bindings: arm: stm32: st,mlahb: Drop spurious "reg" property from example dt-bindings: arm: sunxi: Fix incorrect '-' usage of: of_test: add MODULE_DESCRIPTION()
2024-06-04selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()Yonghong Song
I hit the following failure when running selftests with internal backported upstream kernel: test_ksyms:PASS:kallsyms_fopen 0 nsec test_ksyms:FAIL:ksym_find symbol 'bpf_link_fops' not found #123 ksyms:FAIL In /proc/kallsyms, we have $ cat /proc/kallsyms | grep bpf_link_fops ffffffff829f0cb0 d bpf_link_fops.llvm.12608678492448798416 The CONFIG_LTO_CLANG_THIN is enabled in the kernel which is responsible for bpf_link_fops.llvm.12608678492448798416 symbol name. In prog_tests/ksyms.c we have kallsyms_find("bpf_link_fops", &link_fops_addr) and kallsyms_find() compares "bpf_link_fops" with symbols in /proc/kallsyms in order to find the entry. With bpf_link_fops.llvm.<hash> in /proc/kallsyms, the kallsyms_find() failed. To fix the issue, in kallsyms_find(), if a symbol has suffix .llvm.<hash>, that suffix will be ignored for comparison. This fixed the test failure. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240604180034.1356016-1-yonghong.song@linux.dev
2024-06-04selftests/bpf: Fix bpf_cookie and find_vma in nested VMSong Liu
bpf_cookie and find_vma are flaky in nested VMs, which is used by some CI systems. It turns out these failures are caused by unreliable perf event in nested VM. Fix these by: 1. Use PERF_COUNT_SW_CPU_CLOCK in find_vma; 2. Increase sample_freq in bpf_cookie. Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org
2024-06-04Revert "ethernet: octeontx2: avoid linking objects into multiple modules"Jakub Kicinski
This reverts commit 727c94c9539aa8865cdbf6a783da6a6585f1fec2. Stephen reports that this commit causes a circular module dependency for him. Revert, and we'll try to address the problem, again. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/all/20240531152223.25591c8e@canb.auug.org.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-04Merge tag 'linux_kselftest-fixes-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "Fixes to build warnings in several tests and fixes to ftrace tests" * tag 'linux_kselftest-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/futex: don't pass a const char* to asprintf(3) selftests/futex: don't redefine .PHONY targets (all, clean) selftests/tracing: Fix event filter test to retry up to 10 times selftests/futex: pass _GNU_SOURCE without a value to the compiler selftests/overlayfs: Fix build error on ppc64 selftests/openat2: Fix build warnings on ppc64 selftests: cachestat: Fix build warnings on ppc64 tracing/selftests: Fix kprobe event name test for .isra. functions selftests/ftrace: Update required config selftests/ftrace: Fix to check required event file kselftest/alsa: Ensure _GNU_SOURCE is defined
2024-06-04Merge branch 'efi/next' into efi/urgentArd Biesheuvel
2024-06-04openvswitch: Remove generic .ndo_get_stats64Breno Leitao
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20240531111552.3209198-2-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04openvswitch: Move stats allocation to coreBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Move openvswitch driver to leverage the core allocation. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240531111552.3209198-1-leitao@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04KVM: arm64: Ensure that SME controls are disabled in protected modeFuad Tabba
KVM (and pKVM) do not support SME guests. Therefore KVM ensures that the host's SME state is flushed and that SME controls for enabling access to ZA storage and for streaming are disabled. pKVM needs to protect against a buggy/malicious host. Ensure that it wouldn't run a guest when protected mode is enabled should any of the SME controls be enabled. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx formatFuad Tabba
When setting/clearing CPACR bits for EL0 and EL1, use the ELx format of the bits, which covers both. This makes the code clearer, and reduces the chances of accidentally missing a bit. No functional change intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba
Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Eagerly restore host fpsimd/sve state in pKVMFuad Tabba
When running in protected mode we don't want to leak protected guest state to the host, including whether a guest has used fpsimd/sve. Therefore, eagerly restore the host state on guest exit when running in protected mode, which happens only if the guest has used fpsimd/sve. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba
Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Specialize handling of host fpsimd state on trapFuad Tabba
In subsequent patches, n/vhe will diverge on saving the host fpsimd/sve state when taking a guest fpsimd/sve trap. Add a specialized helper to handle it. No functional change intended. Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helperFuad Tabba
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be toggled in different parts of the code, but the exact bits and their polarity differ between these two formats and the mode (vhe/nvhe/hvhe). To reduce the amount of duplicated code and the chance of getting the wrong bit/polarity or missing a field, abstract the set/clear of CPTR_EL2 bits behind a helper. Since (h)VHE is the way of the future, use the CPACR_EL1 format, which is a subset of the VHE CPTR_EL2, as a reference. No functional change intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_stateFuad Tabba
Since the prototypes for __sve_save_state/__sve_restore_state at hyp were added, the underlying macro has acquired a third parameter for saving/restoring ffr. Fix the prototypes to account for the third parameter, and restore the ffr for the guest since it is saved. Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Reintroduce __sve_save_stateFuad Tabba
Now that the hypervisor is handling the host sve state in protected mode, it needs to be able to save it. This reverts commit e66425fc9ba3 ("KVM: arm64: Remove unused __sve_save_state"). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04Merge branch 'tcp-refactor-skb_cmp_decrypted-checks'Paolo Abeni
Jakub Kicinski says: ==================== tcp: refactor skb_cmp_decrypted() checks Refactor the input patch coalescing checks and wrap "EOR forcing" logic into a helper. This will hopefully make the code easier to follow. While at it throw some DEBUG_NET checks into skb_shift(). ==================== Link: https://lore.kernel.org/r/20240530233616.85897-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net: skb: add compatibility warnings to skb_shift()Jakub Kicinski
According to current semantics we should never try to shift data between skbs which differ on decrypted or pp_recycle status. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04tcp: add a helper for setting EOR on tail skbJakub Kicinski
TLS (and hopefully soon PSP will) use EOR to prevent skbs with different decrypted state from getting merged, without adding new tests to the skb handling. In both cases once the connection switches to an "encrypted" state, all subsequent skbs will be encrypted, so a single "EOR fence" is sufficient to prevent mixing. Add a helper for setting the EOR bit, to make this arrangement more explicit. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04tcp: wrap mptcp and decrypted checks into tcp_skb_can_collapse_rx()Jakub Kicinski
tcp_skb_can_collapse() checks for conditions which don't make sense on input. Because of this we ended up sprinkling a few pairs of mptcp_skb_can_collapse() and skb_cmp_decrypted() calls on the input path. Group them in a new helper. This should make it less likely that someone will check mptcp and not decrypted or vice versa when adding new code. This implicitly adds a decrypted check early in tcp_collapse(). AFAIU this will very slightly increase our ability to collapse packets under memory pressure, not a real bug. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net: tls: fix marking packets as decryptedJakub Kicinski
For TLS offload we mark packets with skb->decrypted to make sure they don't escape the host without getting encrypted first. The crypto state lives in the socket, so it may get detached by a call to skb_orphan(). As a safety check - the egress path drops all packets with skb->decrypted and no "crypto-safe" socket. The skb marking was added to sendpage only (and not sendmsg), because tls_device injected data into the TCP stack using sendpage. This special case was missed when sendpage got folded into sendmsg. Fixes: c5c37af6ecad ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04Merge branch 'net-allow-dissecting-matching-tunnel-control-flags'Paolo Abeni
Davide Caratti says: ==================== net: allow dissecting/matching tunnel control flags Ilya says: "for correct matching on decapsulated packets, we should match on not only tunnel id and headers, but also on tunnel configuration flags like TUNNEL_NO_CSUM and TUNNEL_DONT_FRAGMENT. This is done to distinguish similar tunnels with slightly different configs. And it is important since tunnel configuration is flow based, i.e. can be different for every packet, even though the main tunnel port is the same." - patch 1 extends the kernel's flow dissector to extract these flags from the packet's tunnel metadata. - patch 2 extends TC flower to match on any combination of TUNNEL_NO_CSUM, TUNNEL_DONT_FRAGMENT, TUNNEL_OAM, TUNNEL_CRIT_OPT v4: - fix kernel-doc warning in flow_dissector.h (thanks Jakub) v3: - rebase on top of new uAPI bits and internals after commit 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps"). Use of network byte order is no more needed, since these bits match on metadata: convert netlink attributes to be u32. - also include TUNNEL_CRIT_OPT v2: - use NL_REQ_ATTR_CHECK() where possible (thanks Jamal) - don't overwrite 'ret' in the error path of fl_set_key_flags() ==================== Link: https://lore.kernel.org/r/cover.1717088241.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net/sched: cls_flower: add support for matching tunnel control flagsDavide Caratti
extend cls_flower to match TUNNEL_FLAGS_PRESENT bits in tunnel metadata. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04flow_dissector: add support for tunnel control flagsDavide Caratti
Dissect [no]csum, [no]dontfrag, [no]oam, [no]crit flags from skb metadata. This is a prerequisite for matching these control flags using TC flower. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-04net: count drops due to missing qdisc as dev->tx_dropsJakub Kicinski
Catching and debugging missing qdiscs is pretty tricky. When qdisc is deleted we replace it with a noop qdisc, which silently drops all the packets. Since the noop qdisc has a single static instance we can't count drops at the qdisc level. Count them as dev->tx_drops. ip netns add red ip link add type veth peer netns red ip link set dev veth0 up ip -netns red link set dev veth0 up ip a a dev veth0 10.0.0.1/24 ip -netns red a a dev veth0 10.0.0.2/24 ping -c 2 10.0.0.2 # 2 packets transmitted, 2 received, 0% packet loss, time 1031ms ip -s link show dev veth0 # TX: bytes packets errors dropped carrier collsns # 1314 17 0 0 0 0 tc qdisc replace dev veth0 root handle 1234: mq tc qdisc replace dev veth0 parent 1234:1 pfifo tc qdisc del dev veth0 parent 1234:1 ping -c 2 10.0.0.2 # 2 packets transmitted, 0 received, 100% packet loss, time 1034ms ip -s link show dev veth0 # TX: bytes packets errors dropped carrier collsns # 1314 17 0 3 0 0 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240529162527.3688979-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-03Merge branch ↵Alexei Starovoitov
'enable-bpf-programs-to-declare-arrays-of-kptr-bpf_rb_root-and-bpf_list_head' Kui-Feng Lee says: ==================== Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head. Some types, such as type kptr, bpf_rb_root, and bpf_list_head, are treated in a special way. Previously, these types could not be the type of a field in a struct type that is used as the type of a global variable. They could not be the type of a field in a struct type that is used as the type of a field in the value type of a map either. They could not even be the type of array elements. This means that they can only be the type of global variables or of direct fields in the value type of a map. The patch set aims to enable the use of these specific types in arrays and struct fields, providing flexibility. It examines the types of global variables or the value types of maps, such as arrays and struct types, recursively to identify these special types and generate field information for them. For example, ... struct task_struct __kptr *ptr[3]; ... it will create 3 instances of "struct btf_field" in the "btf_record" of the data section. [..., btf_field(offset=0x100, type=BPF_KPTR_REF), btf_field(offset=0x108, type=BPF_KPTR_REF), btf_field(offset=0x110, type=BPF_KPTR_REF), ... ] It creates a record of each of three elements. These three records are almost identical except their offsets. Another example is ... struct A { ... struct task_struct __kptr *task; struct bpf_rb_root root; ... } struct A foo[2]; it will create 4 records. [..., btf_field(offset=0x7100, type=BPF_KPTR_REF), btf_field(offset=0x7108, type=BPF_RB_ROOT:), btf_field(offset=0x7200, type=BPF_KPTR_REF), btf_field(offset=0x7208, type=BPF_RB_ROOT:), ... ] Assuming that the size of an element/struct A is 0x100 and "foo" starts at 0x7000, it includes two kptr records at 0x7100 and 0x7200, and two rbtree root records at 0x7108 and 0x7208. All these field information will be flatten, for struct types, and repeated, for arrays. --- Changes from v6: - Return BPF_KPTR_REF from btf_get_field_type() only if var_type is a struct type. - Pass btf and type to btf_get_field_type(). Changes from v5: - Ensure field->offset values of kptrs are advanced correctly from one nested struct/or array to another. Changes from v4: - Return -E2BIG for i == MAX_RESOLVE_DEPTH. Changes from v3: - Refactor the common code of btf_find_struct_field() and btf_find_datasec_var(). - Limit the number of levels looking into a struct types. Changes from v2: - Support fields in nested struct type. - Remove nelems and duplicate field information with offset adjustments for arrays. Changes from v1: - Move the check of element alignment out of btf_field_cmp() to btf_record_find(). - Change the order of the previous patch 4 "bpf: check_map_kptr_access() compute the offset from the reg state" as the patch 7 now. - Reject BPF_RB_NODE and BPF_LIST_NODE with nelems > 1. - Rephrase the commit log of the patch "bpf: check_map_access() with the knowledge of arrays" to clarify the alignment on elements. v6: https://lore.kernel.org/all/20240520204018.884515-1-thinker.li@gmail.com/ v5: https://lore.kernel.org/all/20240510011312.1488046-1-thinker.li@gmail.com/ v4: https://lore.kernel.org/all/20240508063218.2806447-1-thinker.li@gmail.com/ v3: https://lore.kernel.org/all/20240501204729.484085-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20240412210814.603377-1-thinker.li@gmail.com/ v1: https://lore.kernel.org/bpf/20240410004150.2917641-1-thinker.li@gmail.com/ Kui-Feng Lee (9): bpf: Remove unnecessary checks on the offset of btf_field. bpf: Remove unnecessary call to btf_field_type_size(). bpf: refactor btf_find_struct_field() and btf_find_datasec_var(). bpf: create repeated fields for arrays. bpf: look into the types of the fields of a struct type recursively. bpf: limit the number of levels of a nested struct type. selftests/bpf: Test kptr arrays and kptrs in nested struct fields. selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types. selftests/bpf: Test global bpf_list_head arrays. kernel/bpf/btf.c | 310 ++++++++++++------ kernel/bpf/verifier.c | 4 +- .../selftests/bpf/prog_tests/cpumask.c | 5 + .../selftests/bpf/prog_tests/linked_list.c | 12 + .../testing/selftests/bpf/prog_tests/rbtree.c | 47 +++ .../selftests/bpf/progs/cpumask_success.c | 171 ++++++++++ .../testing/selftests/bpf/progs/linked_list.c | 42 +++ tools/testing/selftests/bpf/progs/rbtree.c | 77 +++++ 8 files changed, 558 insertions(+), 110 deletions(-) ==================== Link: https://lore.kernel.org/r/20240523174202.461236-1-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03selftests/bpf: Test global bpf_list_head arrays.Kui-Feng Lee
Make sure global arrays of bpf_list_heads and fields of bpf_list_heads in nested struct types work correctly. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-10-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.Kui-Feng Lee
Make sure global arrays of bpf_rb_root and fields of bpf_rb_root in nested struct types work correctly. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-9-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03selftests/bpf: Test kptr arrays and kptrs in nested struct fields.Kui-Feng Lee
Make sure that BPF programs can declare global kptr arrays and kptr fields in struct types that is the type of a global variable or the type of a nested descendant field in a global variable. An array with only one element is special case, that it treats the element like a non-array kptr field. Nested arrays are also tested to ensure they are handled properly. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-8-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-06-03bpf: limit the number of levels of a nested struct type.Kui-Feng Lee
Limit the number of levels looking into struct types to avoid running out of stack space. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240523174202.461236-7-thinker.li@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>