Age | Commit message (Collapse) | Author |
|
The send engine should be ready to handle more opcodes
in addition to RDMA_WRITE/RDMA_READ.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add new WQE type: FLOW_TBL_ACCESS, which will be used for
writing modify header arguments.
This type has specific control segment and special data segment.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add enum value for modify-header argument object and mlx5_bits
for the related capabilities.
Signed-off-by: Muhammad Sammar <muhammads@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In STEv1 counter action can be set either by filling counter ID on STE, in
which case it is executed before other actions on this STE, or as a single
action, in which case it is executed in accordance with the actions order.
FW steering on STEv1 devices implements counter as counter ID on STE, and
this counter is set on the last STE.
Fix SMFS to be consistent with this behaviour - move TX counter to the
last STE, this way the counter will include all actions of the previous STEs
that might have changed packet headers length, e.g. encap, vlan push, etc.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Create a new profile for SFs in order to disable the command cache.
Each function command cache consumes ~500KB of memory, when using a
large number of SFs this savings is notable on memory constarined
systems.
Use a new profile to provide for future differences between SFs and PFs.
The mr_cache not used for non-PF functions, so it is excluded from the
new profile.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Pass target struct net_device to mdb attach/detach handler in order to
expose the port name to the new tracepoints. Implemented following
tracepoints:
- Attach mdb to port.
- Detach mdb from port.
Usage example:
># cd /sys/kernel/debug/tracing
># echo mlx5:mlx5_esw_bridge_port_mdb_attach >> set_event
># cat trace
...
kworker/0:0-19071 [000] ..... 259004.253848: mlx5_esw_bridge_port_mdb_attach: net_device=enp8s0f0_0 addr=33:33:ff:00:00:01 vid=0 num_ports=1 offloaded=1
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Implement support for add/del SWITCHDEV_OBJ_ID_PORT_MDB events. For mdb
destination addresses configure egress table rules to replicate to per-port
multicast tables of all ports that are member of the multicast group as
illustrated by 'MDB1' rule in the following diagram:
+--------+--+
+---------------------------------------> Port 1 | |
| +-^------+--+
| |
| |
+-----------------------------------------+ | +---------------------------+ |
| EGRESS table | | +--> PORT 1 multicast table | |
+----------------------------------+ +-----------------------------------------+ | | +---------------------------+ |
| INGRESS table | | | | | | | |
+----------------------------------+ | dst_mac=P1,vlan=X -> pop vlan, goto P1 +--+ | | FG0: | |
| | | dst_mac=P1,vlan=Y -> pop vlan, goto P1 | | | src_port=dst_port -> drop | |
| src_mac=M1,vlan=X -> goto egress +---> dst_mac=P2,vlan=X -> pop vlan, goto P2 +--+ | | FG1: | |
| ... | | dst_mac=P2,vlan=Y -> goto P2 | | | | VLAN X -> pop, goto port | |
| | | dst_mac=MDB1,vlan=Y -> goto mcast P1,P2 +-----+ | ... | |
+----------------------------------+ | | | | | VLAN Y -> pop, goto port +-------+
+-----------------------------------------+ | | | FG3: |
| | | matchall -> goto port |
| | | |
| | +---------------------------+
| |
| |
| | +--------+--+
+---------------------------------------> Port 2 | |
| +-^------+--+
| |
| |
| +---------------------------+ |
+--> PORT 2 multicast table | |
+---------------------------+ |
| | |
| FG0: | |
| src_port=dst_port -> drop | |
| FG1: | |
| VLAN X -> pop, goto port | |
| ... | |
| | |
| FG3: | |
| matchall -> goto port +-------+
| |
+---------------------------+
MDB is managed by extending mlx5 bridge to store an entry in
mlx5_esw_bridge->mdb_list linked list (used to iterate over all offloaded
MDBs) and mlx5_esw_bridge->mdb_ht hash table (used to lookup existing MDB
by MAC+VLAN). Every MDB entry can be attached to arbitrary amount of bridge
ports that are stored in mlx5_esw_bridge_mdb_entry->ports xarray in order
to allow both efficient lookup of the port and also iteration over all
ports that the entry is attached to. Every time MDB is attached/detached
to/from a port, the hardware rule is recreated with list of destinations
corresponding to all attached ports. When the entry is detached from the
last port it is removed from mdb and destroyed which means that the ports
xarray also acts as implicit reference counting mechanism.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When VLAN with 'untagged' flag is created on port also provision the
per-port multicast table rule to pop the VLAN during packet replication.
This functionality must be in per-port table because some subset of ports
that are member of multicast group can require just a match on VLAN (trunk
mode) while other subset can be configured to remove the VLAN tag from
packets received on the ports (access mode).
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Multicast replication requires adding one more level of FDB_BR_OFFLOAD
priority flow tables. The new level is used for per-port multicast-specific
tables that have following flow groups structure (flow highest to lowest
priority):
- Flow group of size one that matches on source port metadata. This will
have a static single rule that prevent packets from being replicated to
their source port.
- Flow group of size one that matches all packets and forwards them to the
port that owns the table.
Initialize the table dynamically on all bridge ports when adding a port to
the bridge that has multicast enabled and on all existing bridge ports when
receiving multicast enable notification.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Handle SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED attribute notification to
dynamically toggle bridge multicast offload. Set new
MLX5_ESW_BRIDGE_MCAST_FLAG bridge flag when multicast offload is enabled.
Put multicast-specific code into new bridge_mcast.c file.
When initializing bridge multicast pipeline create a static rule for
snooping on IGMP traffic and three rules for snooping on MLD traffic (for
query, report and done message types). Note that matching MLD traffic
requires having flexparser MLX5_FLEX_PROTO_ICMPV6 capability enabled.
By default Linux bridge is created with multicast enabled which can be
modified by 'mcast_snooping' argument:
$ ip link set name my_bridge type bridge mcast_snooping 0
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The pattern when function looks up a port by vport_num+vhca_id tuple in
order to just obtain its parent bridge is repeated multiple times in
bridge.c file. Further commits in this series use the pattern even more.
Extract the pattern to standalone mlx5_esw_bridge_from_port_lookup()
function to improve code readability.
This commits doesn't change functionality.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patches in series will require accessing flow tables and groups
sizes, table levels and struct mlx5_esw_bridge from new the new source file
dedicated to multicast code. Expose these data in bridge_priv.h to reduce
clutter in following patches that will implement the actual functionality.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Bridge ingress and egress tables got more flow groups recently for QinQ
support and will get more in following patches of this series. Increase the
sizes of the tables to allow offloading more flows in each mode.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add the required hardware definitions to mlx5_ifc: fdb_uplink_hairpin,
fdb_multi_path_any_table_limit_regc, fdb_multi_path_any_table.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This reverts commit b26cd9325be4c1fcd331b77f10acb627c560d4d7.
This patch introduces a regression on Lenovo Z13, which can't wake
from the lid with it applied; and some unspecified AMD based Dell
platforms are unable to wake from hitting the power button
Signed-off-by: Kornel Dulęba <korneld@chromium.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230411134932.292287-1-korneld@chromium.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
When loading a DT overlay that creates a device, the device is not
probed, unless the DT overlay is unloaded and reloaded again.
After the recent refactoring to improve fw_devlink, it no longer depends
on the "compatible" property to identify which device tree nodes will
become struct devices. fw_devlink now picks up dangling consumers
(consumers pointing to descendent device tree nodes of a device that
aren't converted to child devices) when a device is successfully bound
to a driver. See __fw_devlink_pickup_dangling_consumers().
However, during DT overlay, a device's device tree node can have
sub-nodes added/removed without unbinding/rebinding the driver. This
difference in behavior between the normal device instantiation and
probing flow vs. the DT overlay flow has a bunch of implications that
are pointed out elsewhere[1]. One of them is that the fw_devlink logic
to pick up dangling consumers is never exercised.
This patch solves the fw_devlink issue by marking all DT nodes added by
DT overlays with FWNODE_FLAG_NOT_DEVICE (fwnode that won't become
device), and by clearing the flag when a struct device is actually
created for the DT node. This way, fw_devlink knows not to have
consumers waiting on these newly added DT nodes, and to propagate the
dependency to an ancestor DT node that has the corresponding struct
device.
Based on a patch by Saravana Kannan, which covered only platform and spi
devices.
[1] https://lore.kernel.org/r/CAGETcx_bkuFaLCiPrAWCPQz+w79ccDp6=9e881qmK=vx3hBMyg@mail.gmail.com
Fixes: 4a032827daa89350 ("of: property: Simplify of_link_to_phandle()")
Link: https://lore.kernel.org/r/CAGETcx_+rhHvaC_HJXGrr5_WAd2+k5f=rWYnkCZ6z5bGX-wj4w@mail.gmail.com
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C
Acked-by: Shawn Guo <shawnguo@kernel.org>
Acked-by: Saravana Kannan <saravanak@google.com>
Tested-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Link: https://lore.kernel.org/r/e1fa546682ea4c8474ff997ab6244c5e11b6f8bc.1680182615.git.geert+renesas@glider.be
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The "compatible" doesn't match what the kernel is using. Fix it as
kernel using.
Fixes: 6b2748ada244 ("dt-bindings: interrupt-controller: add yaml for LoongArch CPU interrupt controller")
Reported-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/all/20221208020954.GA3368836-robh@kernel.org/
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Liu Peibao <liupeibao@loongson.cn>
Link: https://lore.kernel.org/r/20230401091304.12633-1-liupeibao@loongson.cn
[robh: Rename file to match compatible, fix subject typo]
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Provide pci_msix_can_alloc_dyn() stub when CONFIG_PCI_MSI unset to
avoid build errors (Reinette Chatre)
- Quirk AMD XHCI controller that loses MSI-X state in D3hot to avoid
broken USB after hotplug or suspend/resume (Basavaraj Natikar)
- Fix use-after-free in pci_bus_release_domain_nr() (Rob Herring)
* tag 'pci-v6.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: Fix use-after-free in pci_bus_release_domain_nr()
x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
PCI/MSI: Provide missing stub for pci_msix_can_alloc_dyn()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix fast checksum detection, this affects filesystems with non-crc32c
checksum, calculation would not be offloaded to worker threads
- restore thread_pool mount option behaviour for endio workers, the new
value for maximum active threads would not be set to the actual work
queues
* tag 'for-6.3-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix fast csum implementation detection
btrfs: restore the thread_pool= behavior in remount for the end I/O workqueues
|
|
Adding verifier test for accessing u32 pointer argument in
tracing programs.
The test program loads 1nd argument of bpf_fentry_test9 function
which is u32 pointer and checks that verifier allows that.
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230410085908.98493-3-zhoufeng.zf@bytedance.com
|
|
'priv' is a managed resource, so there is no need to free it explicitly or
there will be a double free().
Fixes: 90ad200b4cbc ("drm/armada: Use devm_drm_dev_alloc")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/c4f3c9207a9fce35cb6dd2cc60e755275961588a.1640536364.git.christophe.jaillet@wanadoo.fr
|
|
When tracing a kernel function with arg type is u32*, btf_ctx_access()
would report error: arg2 type INT is not a struct.
The commit bb6728d75611 ("bpf: Allow access to int pointer arguments
in tracing programs") added support for int pointer, but did not skip
modifiers before checking it's type. This patch fixes it.
Fixes: bb6728d75611 ("bpf: Allow access to int pointer arguments in tracing programs")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230410085908.98493-2-zhoufeng.zf@bytedance.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Core fix:
- mtdblock: Tolerate corrected bit-flips
Raw NAND fixes:
- meson: Fix bitmask for length in command word
- stm32_fmc2:
- Remove unsupported EDO mode
- Use timings.mode instead of checking tRC_min.
The first patch is the real fix but nowadays we use
timings.mode instead of bare timings, so in order to ease the
backports, the fix was split into two steps, the first one easy
to backport on older kernels, the second one just as a
follow-up so recent stable kernels would look like the
mainline"
* tag 'mtd/fixes-for-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: meson: fix bitmask for length in command word
mtdblock: tolerate corrected bit-flips
mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fix from Damien Le Moal:
- Update my email address in the MAINTAINERS file
* tag 'ata-6.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
MAINTAINERS: Change ata maintainer email addresses
|
|
Pull kvm fixes from Paolo Bonzini:
"Two ARM fixes:
- Ensure the guest PMU context is restored before the first KVM_RUN,
fixing an issue where EL0 event counting is broken after vCPU
save/restore
- Actually initialize ID_AA64PFR0_EL1.{CSV2,CSV3} based on the
sanitized, system-wide values for protected VMs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs
KVM: arm64: PMU: Restore the guest's EL0 event counting after migration
|
|
Andrii Nakryiko says:
====================
This patch set changes BPF verifier log behavior to behave as a rotating log,
by default. If user-supplied log buffer is big enough to contain entire
verifier log output, there is no effective difference. But where previously
user supplied too small log buffer and would get -ENOSPC error result and the
beginning part of the verifier log, now there will be no error and user will
get ending part of verifier log filling up user-supplied log buffer. Which
is, in absolute majority of cases, is exactly what's useful, relevant, and
what users want and need, as the ending of the verifier log is containing
details of verifier failure and relevant state that got us to that failure.
So this rotating mode is made default, but for some niche advanced debugging
scenarios it's possible to request old behavior by specifying additional
BPF_LOG_FIXED (8) flag.
This patch set adjusts libbpf to allow specifying flags beyond 1 | 2 | 4. We
also add --log-size and --log-fixed options to veristat to be able to both
test this functionality manually, but also to be used in various debugging
scenarios. We also add selftests that tries many variants of log buffer size
to stress-test correctness of internal verifier log bookkeeping code.
Further, this patch set is merged with log_size_actual v1 patchset ([0]),
which adds ability to get required log buffer size to fit entire verifier
log output.
This addresses a long-standing limitation, which causes users and BPF loader
library writers to guess and pre-size log buffer, often allocating unnecessary
extra memory for this or doing extra program verifications just to size logs
better, ultimately wasting resources. This was requested most recently by Go
BPF library maintainers ([1]).
See respective patches for details. A bunch of them some drive-by fixes
detecting during working with the code. Some other further refactor and
compratmentalize verifier log handling code into kernel/bpf/log.c, which
should also make it simpler to integrate such verbose log for other
complicated bpf() syscall commands, if necessary. The rest are actual logic
to calculate maximum log buffer size needed and return it to user-space.
Few patches wire this on libbpf side, and the rest add selftests to test
proper log truncation and log_buf==NULL handling.
This turned into a pretty sizable patch set with lots of arithmetics, but
hopefully the set of features added to verifier log in this patch set are
both useful for BPF users and are self-contained and isolated enough to not
cause troubles going forward.
v3->v4:
- s/log_size_actual/log_true_size/ (Alexei);
- log_buf==NULL && log_size==0 don't trigger -ENOSPC (Lorenz);
- added WARN_ON_ONCE if we try bpf_vlog_reset() forward (Lorenz);
- added selftests for truncation in BPF_LOG_FIXED mode;
- fixed edge case in BPF_LOG_FIXED when log_size==1, leaving buf not zero
terminated;
v2->v3:
- typos and comment improvement (Lorenz);
- merged with log_size_actual v1 ([0]) patch set (Alexei);
- added log_buf==NULL condition allowed (Lorenz);
- added BPF_BTF_LOAD logs tests (Lorenz);
- more clean up and refactoring of internal verifier log API;
v1->v2:
- return -ENOSPC even in rotating log mode for preserving backwards
compatibility (Lorenz);
[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=735213&state=*
[1] https://lore.kernel.org/bpf/CAN+4W8iNoEbQzQVbB_o1W0MWBDV4xCJAq7K3f6psVE-kkCfMqg@mail.gmail.com/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Add verifier log tests for BPF_BTF_LOAD command, which are very similar,
conceptually, to BPF_PROG_LOAD tests. These are two separate commands
dealing with verbose verifier log, so should be both tested separately.
Test that log_buf==NULL condition *does not* return -ENOSPC.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-20-andrii@kernel.org
|
|
Add few extra test conditions to validate that it's ok to pass
log_buf==NULL and log_size==0 to BPF_PROG_LOAD command with the intent
to get log_true_size without providing a buffer.
Test that log_buf==NULL condition *does not* return -ENOSPC.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-19-andrii@kernel.org
|
|
Add additional test cases validating that log_true_size is consistent
between fixed and rotating log modes, and that log_true_size can be
used *exactly* without causing -ENOSPC, while using just 1 byte shorter
log buffer would cause -ENOSPC.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-18-andrii@kernel.org
|
|
Similar to what we did for bpf_prog_load() in previous patch, wire
returning of log_true_size value from kernel back to the user through
OPTS out field.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-17-andrii@kernel.org
|
|
Add output-only log_true_size field to bpf_prog_load_opts to return
bpf_attr->log_true_size value back from bpf() syscall.
Note, that we have to drop const modifier from opts in bpf_prog_load().
This could potentially cause compilation error for some users. But
the usual practice is to define bpf_prog_load_ops
as a local variable next to bpf_prog_load() call and pass pointer to it,
so const vs non-const makes no difference and won't even come up in most
(if not all) cases.
There are no runtime and ABI backwards/forward compatibility issues at all.
If user provides old struct bpf_prog_load_opts, libbpf won't set new
fields. If old libbpf is provided new bpf_prog_load_opts, nothing will
happen either as old libbpf doesn't yet know about this new field.
Adding a new variant of bpf_prog_load() just for this seems like a big
and unnecessary overkill. As a corroborating evidence is the fact that
entire selftests/bpf code base required not adjustment whatsoever.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-16-andrii@kernel.org
|
|
Drop the log_size>0 and log_buf!=NULL condition when log_level>0. This
allows users to request log_true_size of a full log without providing
actual (even if small) log buffer. Verifier log handling code was mostly
ready to handle NULL log->ubuf, so only few small changes were necessary
to prevent NULL log->ubuf from causing problems.
Note, that if user provided NULL log_buf with log_level>0 we don't
consider this a log truncation, and thus won't return -ENOSPC.
We also enforce that either (log_buf==NULL && log_size==0) or
(log_buf!=NULL && log_size>0).
Suggested-by: Lorenz Bauer <lmb@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-15-andrii@kernel.org
|
|
Simplify internal verifier log API down to bpf_vlog_init() and
bpf_vlog_finalize(). The former handles input arguments validation in
one place and makes it easier to change it. The latter subsumes -ENOSPC
(truncation) and -EFAULT handling and simplifies both caller's code
(bpf_check() and btf_parse()).
For btf_parse(), this patch also makes sure that verifier log
finalization happens even if there is some error condition during BTF
verification process prior to normal finalization step.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-14-andrii@kernel.org
|
|
Add output-only log_true_size and btf_log_true_size field to
BPF_PROG_LOAD and BPF_BTF_LOAD commands, respectively. It will return
the size of log buffer necessary to fit in all the log contents at
specified log_level. This is very useful for BPF loader libraries like
libbpf to be able to size log buffer correctly, but could be used by
users directly, if necessary, as well.
This patch plumbs all this through the code, taking into account actual
bpf_attr size provided by user to determine if these new fields are
expected by users. And if they are, set them from kernel on return.
We refactory btf_parse() function to accommodate this, moving attr and
uattr handling inside it. The rest is very straightforward code, which
is split from the logging accounting changes in the previous patch to
make it simpler to review logic vs UAPI changes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-13-andrii@kernel.org
|
|
Change how we do accounting in BPF_LOG_FIXED mode and adopt log->end_pos
as *logical* log position. This means that we can go beyond physical log
buffer size now and be able to tell what log buffer size should be to
fit entire log contents without -ENOSPC.
To do this for BPF_LOG_FIXED mode, we need to remove a short-circuiting
logic of not vsnprintf()'ing further log content once we filled up
user-provided buffer, which is done by bpf_verifier_log_needed() checks.
We modify these checks to always keep going if log->level is non-zero
(i.e., log is requested), even if log->ubuf was NULL'ed out due to
copying data to user-space, or if entire log buffer is physically full.
We adopt bpf_verifier_vlog() routine to work correctly with
log->ubuf == NULL condition, performing log formatting into temporary
kernel buffer, doing all the necessary accounting, but just avoiding
copying data out if buffer is full or NULL'ed out.
With these changes, it's now possible to do this sort of determination of
log contents size in both BPF_LOG_FIXED and default rolling log mode.
We need to keep in mind bpf_vlog_reset(), though, which shrinks log
contents after successful verification of a particular code path. This
log reset means that log->end_pos isn't always increasing, so to return
back to users what should be the log buffer size to fit all log content
without causing -ENOSPC even in the presence of log resetting, we need
to keep maximum over "lifetime" of logging. We do this accounting in
bpf_vlog_update_len_max() helper.
A related and subtle aspect is that with this logical log->end_pos even in
BPF_LOG_FIXED mode we could temporary "overflow" buffer, but then reset
it back with bpf_vlog_reset() to a position inside user-supplied
log_buf. In such situation we still want to properly maintain
terminating zero. We will eventually return -ENOSPC even if final log
buffer is small (we detect this through log->len_max check). This
behavior is simpler to reason about and is consistent with current
behavior of verifier log. Handling of this required a small addition to
bpf_vlog_reset() logic to avoid doing put_user() beyond physical log
buffer dimensions.
Another issue to keep in mind is that we limit log buffer size to 32-bit
value and keep such log length as u32, but theoretically verifier could
produce huge log stretching beyond 4GB. Instead of keeping (and later
returning) 64-bit log length, we cap it at UINT_MAX. Current UAPI makes
it impossible to specify log buffer size bigger than 4GB anyways, so we
don't really loose anything here and keep everything consistently 32-bit
in UAPI. This property will be utilized in next patch.
Doing the same determination of maximum log buffer for rolling mode is
trivial, as log->end_pos and log->start_pos are already logical
positions, so there is nothing new there.
These changes do incidentally fix one small issue with previous logging
logic. Previously, if use provided log buffer of size N, and actual log
output was exactly N-1 bytes + terminating \0, kernel logic coun't
distinguish this condition from log truncation scenario which would end
up with truncated log contents of N-1 bytes + terminating \0 as well.
But now with log->end_pos being logical position that could go beyond
actual log buffer size, we can distinguish these two conditions, which
we do in this patch. This plays nicely with returning log_size_actual
(implemented in UAPI in the next patch), as we can now guarantee that if
user takes such log_size_actual and provides log buffer of that exact
size, they will not get -ENOSPC in return.
All in all, all these changes do conceptually unify fixed and rolling
log modes much better, and allow a nice feature requested by users:
knowing what should be the size of the buffer to avoid -ENOSPC.
We'll plumb this through the UAPI and the code in the next patch.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-12-andrii@kernel.org
|
|
Move log->level == 0 check into bpf_vlog_truncated() instead of doing it
explicitly. Also remove unnecessary goto in kernel/bpf/verifier.c.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-11-andrii@kernel.org
|
|
If verifier log is in BPF_LOG_KERNEL mode, no log->ubuf is expected and
it stays NULL throughout entire verification process. Don't erroneously
return -EFAULT in such case.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-10-andrii@kernel.org
|
|
btf_parse() is missing -EFAULT error return if log->ubuf was NULL-ed out
due to error while copying data into user-provided buffer. Add it, but
handle a special case of BPF_LOG_KERNEL in which log->ubuf is always NULL.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-9-andrii@kernel.org
|
|
Verifier log position reset is meaningless in BPF_LOG_KERNEL mode, so
just exit early in bpf_vlog_reset() if log->level is BPF_LOG_KERNEL.
This avoid meaningless put_user() into NULL log->ubuf.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-8-andrii@kernel.org
|
|
Add selftests validating BPF_LOG_FIXED behavior, which used to be the
only behavior, and now default rotating BPF verifier log, which returns
just up to last N bytes of full verifier log, instead of returning
-ENOSPC.
To stress test correctness of in-kernel verifier log logic, we force it
to truncate program's verifier log to all lengths from 1 all the way to
its full size (about 450 bytes today). This was a useful stress test
while developing the feature.
For both fixed and rotating log modes we expect -ENOSPC if log contents
doesn't fit in user-supplied log buffer.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-7-andrii@kernel.org
|
|
Add --log-size to be able to customize log buffer sent to bpf() syscall
for BPF program verification logging.
Add --log-fixed to enforce BPF_LOG_FIXED behavior for BPF verifier log.
This is useful in unlikely event that beginning of truncated verifier
log is more important than the end of it (which with rotating verifier
log behavior is the default now).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230406234205.323208-6-andrii@kernel.org
|
|
This basically prevents any forward compatibility. And we either way
just return -EINVAL, which would otherwise be returned from bpf()
syscall anyways.
Similarly, drop enforcement of non-NULL log_buf when log_level > 0. This
won't be true anymore soon.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-5-andrii@kernel.org
|
|
Currently, if user-supplied log buffer to collect BPF verifier log turns
out to be too small to contain full log, bpf() syscall returns -ENOSPC,
fails BPF program verification/load, and preserves first N-1 bytes of
the verifier log (where N is the size of user-supplied buffer).
This is problematic in a bunch of common scenarios, especially when
working with real-world BPF programs that tend to be pretty complex as
far as verification goes and require big log buffers. Typically, it's
when debugging tricky cases at log level 2 (verbose). Also, when BPF program
is successfully validated, log level 2 is the only way to actually see
verifier state progression and all the important details.
Even with log level 1, it's possible to get -ENOSPC even if the final
verifier log fits in log buffer, if there is a code path that's deep
enough to fill up entire log, even if normally it would be reset later
on (there is a logic to chop off successfully validated portions of BPF
verifier log).
In short, it's not always possible to pre-size log buffer. Also, what's
worse, in practice, the end of the log most often is way more important
than the beginning, but verifier stops emitting log as soon as initial
log buffer is filled up.
This patch switches BPF verifier log behavior to effectively behave as
rotating log. That is, if user-supplied log buffer turns out to be too
short, verifier will keep overwriting previously written log,
effectively treating user's log buffer as a ring buffer. -ENOSPC is
still going to be returned at the end, to notify user that log contents
was truncated, but the important last N bytes of the log would be
returned, which might be all that user really needs. This consistent
-ENOSPC behavior, regardless of rotating or fixed log behavior, allows
to prevent backwards compatibility breakage. The only user-visible
change is which portion of verifier log user ends up seeing *if buffer
is too small*. Given contents of verifier log itself is not an ABI,
there is no breakage due to this behavior change. Specialized tools that
rely on specific contents of verifier log in -ENOSPC scenario are
expected to be easily adapted to accommodate old and new behaviors.
Importantly, though, to preserve good user experience and not require
every user-space application to adopt to this new behavior, before
exiting to user-space verifier will rotate log (in place) to make it
start at the very beginning of user buffer as a continuous
zero-terminated string. The contents will be a chopped off N-1 last
bytes of full verifier log, of course.
Given beginning of log is sometimes important as well, we add
BPF_LOG_FIXED (which equals 8) flag to force old behavior, which allows
tools like veristat to request first part of verifier log, if necessary.
BPF_LOG_FIXED flag is also a simple and straightforward way to check if
BPF verifier supports rotating behavior.
On the implementation side, conceptually, it's all simple. We maintain
64-bit logical start and end positions. If we need to truncate the log,
start position will be adjusted accordingly to lag end position by
N bytes. We then use those logical positions to calculate their matching
actual positions in user buffer and handle wrap around the end of the
buffer properly. Finally, right before returning from bpf_check(), we
rotate user log buffer contents in-place as necessary, to make log
contents contiguous. See comments in relevant functions for details.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-4-andrii@kernel.org
|
|
It's not clear why we have 128 as minimum size, but it makes testing
harder and seems unnecessary, as we carefully handle truncation
scenarios and use proper snprintf variants. So remove this limitation
and just enforce positive length for log buffer.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-3-andrii@kernel.org
|
|
kernel/bpf/verifier.c file is large and growing larger all the time. So
it's good to start splitting off more or less self-contained parts into
separate files to keep source code size (somewhat) somewhat under
control.
This patch is a one step in this direction, moving some of BPF verifier log
routines into a separate kernel/bpf/log.c. Right now it's most low-level
and isolated routines to append data to log, reset log to previous
position, etc. Eventually we could probably move verifier state
printing logic here as well, but this patch doesn't attempt to do that
yet.
Subsequent patches will add more logic to verifier log management, so
having basics in a separate file will make sure verifier.c doesn't grow
more with new changes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/bpf/20230406234205.323208-2-andrii@kernel.org
|
|
There is an extra whitespace in the SPDX tag, before the license name,
in the script for generating man pages for the bpf() syscall and the
helpers. It has caused problems in Debian packaging, in the tool that
autodetects licenses. Let's clean it up.
Fixes: 5cb62b7598f2 ("bpf, docs: Use SPDX license identifier in bpf_doc.py")
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230411144747.66734-1-quentin@isovalent.com
|
|
This is an oversight from dc5bdb68b5b3 ("drm/fb-helper: Fix vt
restore") - I failed to realize that nasty userspace could set this.
It's not pretty to mix up kernel-internal and userspace uapi flags
like this, but since the entire fb_var_screeninfo structure is uapi
we'd need to either add a new parameter to the ->fb_set_par callback
and fb_set_par() function, which has a _lot_ of users. Or some other
fairly ugly side-channel int fb_info. Neither is a pretty prospect.
Instead just correct the issue at hand by filtering out this
kernel-internal flag in the ioctl handling code.
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Fixes: dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: shlomo@fastmail.com
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Noralf Trønnes <noralf@tronnes.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.7+
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Qiujun Huang <hqjagain@gmail.com>
Cc: Peter Rosin <peda@axentia.se>
Cc: linux-fbdev@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230404193934.472457-1-daniel.vetter@ffwll.ch
|
|
The two constants accomplish the same thing.
Signed-off-by: Shailend Chand <shailend@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230407184830.309398-1-shailend@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
cli.py currently throws a pure KeyError if kernel doesn't support
a netlink family. Users who did not write ynl (hah) may waste
their time investigating what's wrong with the Python code.
Improve the error message:
Traceback (most recent call last):
File "/home/kicinski/devel/linux/tools/net/ynl/lib/ynl.py", line 362, in __init__
self.family = GenlFamily(self.yaml['name'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kicinski/devel/linux/tools/net/ynl/lib/ynl.py", line 331, in __init__
self.genl_family = genl_family_name_to_id[family_name]
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'netdev'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kicinski/devel/linux/./tools/net/ynl/cli.py", line 52, in <module>
main()
File "/home/kicinski/devel/linux/./tools/net/ynl/cli.py", line 31, in main
ynl = YnlFamily(args.spec, args.schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kicinski/devel/linux/tools/net/ynl/lib/ynl.py", line 364, in __init__
raise Exception(f"Family '{self.yaml['name']}' not supported by the kernel")
Exception: Family 'netdev' not supported by the kernel
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230407145609.297525-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Unregister PTP clock when the driver is removed.
Purge the RX and TX skb queues.
Fixes: 514def5dd339 ("phy: nxp-c45-tja11xx: add timestamping support")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Radu Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230406095904.75456-1-radu-nicolae.pirea@oss.nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|