Age | Commit message (Collapse) | Author |
|
Steffen Klassert says:
====================
Support fraglist GRO/GSO
This patchset adds support to do GRO/GSO by chaining packets
of the same flow at the SKB frag_list pointer. This avoids
the overhead to merge payloads into one big packet, and
on the other end, if GSO is needed it avoids the overhead
of splitting the big packet back to the native form.
Patch 1 adds netdev feature flags to enable fraglist GRO,
this implements one of the configuration options discussed
at netconf 2019.
Patch 2 adds a netdev software feature set that defaults to off
and assigns the new fraglist GRO feature flag to it.
Patch 3 adds the core infrastructure to do fraglist GRO/GSO.
Patch 4 enables UDP to use fraglist GRO/GSO if configured.
I have only meaningful forwarding performance measurements.
I did some tests for the local receive path with netperf and iperf,
but in this case the sender that generates the packets is the
bottleneck. So the benchmarks are not that meaningful for the
receive path.
Paolo Abeni did some benchmarks of the local receive path for the
RFC v2 version of this pachset, results can be found here:
https://www.spinics.net/lists/netdev/msg551158.html
I used my IPsec forwarding test setup for the performance measurements:
------------ ------------
-->| router 1 |-------->| router 2 |--
| ------------ ------------ |
| |
| -------------------- |
--------|Spirent Testcenter|<----------
--------------------
net-next (September 7th 2019):
Single stream UDP frame size 1460 Bytes: 1.161.000 fps (13.5 Gbps).
----------------------------------------------------------------------
net-next (September 7th 2019) + standard UDP GRO/GSO (not implemented
in this patchset):
Single stream UDP frame size 1460 Bytes: 1.801.000 fps (21 Gbps).
----------------------------------------------------------------------
net-next (September 7th 2019) + fraglist UDP GRO/GSO:
Single stream UDP frame size 1460 Bytes: 2.860.000 fps (33.4 Gbps).
=======================================================================
net-next (January 23th 2020):
Single stream UDP frame size 1460 Bytes: 919.000 fps (10.73 Gbps).
----------------------------------------------------------------------
net-next (January 23th 2020) + fraglist UDP GRO/GSO:
Single stream UDP frame size 1460 Bytes: 2.430.000 fps (28.38 Gbps).
-----------------------------------------------------------------------
Changes from RFC v1:
- Add IPv6 support.
- Split patchset to enable UDP GRO by default before adding
fraglist GRO support.
- Mark fraglist GRO packets as CHECKSUM_NONE.
- Take a refcount on the first segment skb when doing fraglist
segmentation. With this we can use the same error handling
path as with standard segmentation.
Changes from RFC v2:
- Add a netdev feature flag to configure listifyed GRO.
- Fix UDP GRO enabling for IPv6.
- Fix a rcu_read_lock() imbalance.
- Fix error path in skb_segment_list().
Changes from RFC v3:
- Rename NETIF_F_GRO_LIST to NETIF_F_GRO_FRAGLIST and add
NETIF_F_GSO_FRAGLIST.
- Move introduction of SKB_GSO_FRAGLIST to patch 2.
- Use udpv6_encap_needed_key instead of udp_encap_needed_key in IPv6.
- Move some missplaced code from patch 5 to patch 1 where it belongs to.
Changes from RFC v4:
- Drop the 'UDP: enable GRO by default' patch for now. Standard UDP GRO
is not changed with this patchset.
- Rebase to net-next current.
Changes fom v1 (December 18th):
- Do a full __copy_skb_header instead of tryng to find the really
needed subset header fields. Thisa can be done later.
- Mark all fraglist GRO packets with CHECKSUM_UNNECESSARY.
- Rebase to net-next current.
Changes fom v2 (January 24th):
- Do the CHECKSUM_UNNECESSARY setting from IPv4 for IPv6 too.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch extends UDP GRO to support fraglist GRO/GSO
by using the previously introduced infrastructure.
If the feature is enabled, all UDP packets are going to
fraglist GRO (local input and forward).
After validating the csum, we mark ip_summed as
CHECKSUM_UNNECESSARY for fraglist GRO packets to
make sure that the csum is not touched.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the core functions to chain/unchain
GSO skbs at the frag_list pointer. This also adds
a new GSO type SKB_GSO_FRAGLIST and a is_flist
flag to napi_gro_cb which indicates that this
flow will be GROed by fraglist chaining.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous patch added the NETIF_F_GRO_FRAGLIST feature.
This is a software feature that should default to off.
Current software features default to on, so add a new
feature set that defaults to off.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds new Fraglist GRO/GSO feature flags. They will be used
to configure fraglist GRO/GSO what will be implemented with some
followup paches.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recently XDP Support was added to the mvneta driver
for software buffer management only.
It is still possible to attach an XDP program if
hardware buffer management is used.
It is not doing anything at that point.
The patch disallows attaching XDP programs to mvneta
if hardware buffer management is used.
I am sorry about that. It is my first submission and I am having
some troubles with the format of my emails.
v4 -> v5:
- Remove extra tabs
v3 -> v4:
- Please ignore v3 I accidentally submitted
my other patch with git-send-mail and v4 is correct
v2 -> v3:
- My mailserver corrupted the patch
resubmission with git-send-email
v1 -> v2:
- Fixing the patches indentation
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The subpacket scanning loop in rxrpc_receive_data() references the
subpacket count in the private data part of the sk_buff in the loop
termination condition. However, when the final subpacket is pasted into
the ring buffer, the function is no longer has a ref on the sk_buff and
should not be looking at sp->* any more. This point is actually marked in
the code when skb is cleared (but sp is not - which is an error).
Fix this by caching sp->nr_subpackets in a local variable and using that
instead.
Also clear 'sp' to catch accesses after that point.
This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets
trashed by the sk_buff getting freed and reused in the meantime.
Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is possible for malicious userspace to set TCF_EM_SIMPLE bit
even for matches that should not have this bit set.
This can fool two places using tcf_em_is_simple()
1) tcf_em_tree_destroy() -> memory leak of em->data
if ops->destroy() is NULL
2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes
of a kernel pointer.
BUG: memory leak
unreferenced object 0xffff888121850a40 (size 32):
comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s)
hex dump (first 32 bytes):
00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline]
[<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline]
[<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline]
[<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671
[<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127
[<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline]
[<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32
[<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline]
[<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline]
[<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300
[<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline]
[<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219
[<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104
[<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415
[<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
[<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
[<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
[<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
[<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
[<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline]
[<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659
[<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
[<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
[<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
[<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline]
[<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline]
[<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+03c4738ed29d5d366ddf@syzkaller.appspotmail.com
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If seq_file .next fuction does not change position index,
read after some lseek can generate an unexpected output.
See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283
v1 -> v2: removed missed increment in end of function
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/eca84fdd-c374-a154-d874-6c7b55fc3bc4@virtuozzo.com
|
|
Include the size of struct nhmsg size when calculating
how much of a payload to allocate in a new netlink nexthop
notification message.
Without this, we will fail to fill the skbuff at certain nexthop
group sizes.
You can reproduce the failure with the following iproute2 commands:
ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link add dummy3 type dummy
ip link add dummy4 type dummy
ip link add dummy5 type dummy
ip link add dummy6 type dummy
ip link add dummy7 type dummy
ip link add dummy8 type dummy
ip link add dummy9 type dummy
ip link add dummy10 type dummy
ip link add dummy11 type dummy
ip link add dummy12 type dummy
ip link add dummy13 type dummy
ip link add dummy14 type dummy
ip link add dummy15 type dummy
ip link add dummy16 type dummy
ip link add dummy17 type dummy
ip link add dummy18 type dummy
ip link add dummy19 type dummy
ip ro add 1.1.1.1/32 dev dummy1
ip ro add 1.1.1.2/32 dev dummy2
ip ro add 1.1.1.3/32 dev dummy3
ip ro add 1.1.1.4/32 dev dummy4
ip ro add 1.1.1.5/32 dev dummy5
ip ro add 1.1.1.6/32 dev dummy6
ip ro add 1.1.1.7/32 dev dummy7
ip ro add 1.1.1.8/32 dev dummy8
ip ro add 1.1.1.9/32 dev dummy9
ip ro add 1.1.1.10/32 dev dummy10
ip ro add 1.1.1.11/32 dev dummy11
ip ro add 1.1.1.12/32 dev dummy12
ip ro add 1.1.1.13/32 dev dummy13
ip ro add 1.1.1.14/32 dev dummy14
ip ro add 1.1.1.15/32 dev dummy15
ip ro add 1.1.1.16/32 dev dummy16
ip ro add 1.1.1.17/32 dev dummy17
ip ro add 1.1.1.18/32 dev dummy18
ip ro add 1.1.1.19/32 dev dummy19
ip next add id 1 via 1.1.1.1 dev dummy1
ip next add id 2 via 1.1.1.2 dev dummy2
ip next add id 3 via 1.1.1.3 dev dummy3
ip next add id 4 via 1.1.1.4 dev dummy4
ip next add id 5 via 1.1.1.5 dev dummy5
ip next add id 6 via 1.1.1.6 dev dummy6
ip next add id 7 via 1.1.1.7 dev dummy7
ip next add id 8 via 1.1.1.8 dev dummy8
ip next add id 9 via 1.1.1.9 dev dummy9
ip next add id 10 via 1.1.1.10 dev dummy10
ip next add id 11 via 1.1.1.11 dev dummy11
ip next add id 12 via 1.1.1.12 dev dummy12
ip next add id 13 via 1.1.1.13 dev dummy13
ip next add id 14 via 1.1.1.14 dev dummy14
ip next add id 15 via 1.1.1.15 dev dummy15
ip next add id 16 via 1.1.1.16 dev dummy16
ip next add id 17 via 1.1.1.17 dev dummy17
ip next add id 18 via 1.1.1.18 dev dummy18
ip next add id 19 via 1.1.1.19 dev dummy19
ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
ip next del id 1111
Fixes: 430a049190de ("nexthop: Add support for nexthop groups")
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
tools/testing/selftests/bpf/Makefile supports overriding clang, llc and
other tools so that custom ones can be used instead of those from PATH.
It's convinient and heavily used by some users.
Apply same rules to runqslower/Makefile.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200124224142.1833678-1-rdna@fb.com
|
|
In a complex TC class hierarchy like this:
tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \
avpkt 1000 cell 8
tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \
rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \
avpkt 1000 bounded
tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
sport 80 0xffff flowid 1:3
tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \
sport 25 0xffff flowid 1:4
tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \
rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \
avpkt 1000
tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \
rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \
avpkt 1000
where filters are installed on qdisc 1:0, so we can't merely
search from class 1:1 when creating class 1:3 and class 1:4. We have
to walk through all the child classes of the direct parent qdisc.
Otherwise we would miss filters those need reverse binding.
Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current implementations of ops->bind_class() are merely
searching for classid and updating class in the struct tcf_result,
without invoking either of cl_ops->bind_tcf() or
cl_ops->unbind_tcf(). This breaks the design of them as qdisc's
like cbq use them to count filters too. This is why syzbot triggered
the warning in cbq_destroy_class().
In order to fix this, we have to call cl_ops->bind_tcf() and
cl_ops->unbind_tcf() like the filter binding path. This patch does
so by refactoring out two helper functions __tcf_bind_filter()
and __tcf_unbind_filter(), which are lockless and accept a Qdisc
pointer, then teaching each implementation to call them correctly.
Note, we merely pass the Qdisc pointer as an opaque pointer to
each filter, they only need to pass it down to the helper
functions without understanding it at all.
Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class")
Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
This batch contains Netfilter updates for net-next:
1) Add nft_setelem_parse_key() helper function.
2) Add NFTA_SET_ELEM_KEY_END to specify a range with one single element.
3) Add NFTA_SET_DESC_CONCAT to describe the set element concatenation,
from Stefano Brivio.
4) Add bitmap_cut() to copy n-bits from source to destination,
from Stefano Brivio.
5) Add set to match on arbitrary concatenations, from Stefano Brivio.
6) Add selftest for this new set type. An extract of Stefano's
description follows:
"Existing nftables set implementations allow matching entries with
interval expressions (rbtree), e.g. 192.0.2.1-192.0.2.4, entries
specifying field concatenation (hash, rhash), e.g. 192.0.2.1:22,
but not both.
In other words, none of the set types allows matching on range
expressions for more than one packet field at a time, such as ipset
does with types bitmap:ip,mac, and, to a more limited extent
(netmasks, not arbitrary ranges), with types hash:net,net,
hash:net,port, hash:ip,port,net, and hash:net,port,net.
As a pure hash-based approach is unsuitable for matching on ranges,
and "proxying" the existing red-black tree type looks impractical as
elements would need to be shared and managed across all employed
trees, this new set implementation intends to fill the functionality
gap by employing a relatively novel approach.
The fundamental idea, illustrated in deeper detail in patch 5/9, is to
use lookup tables classifying a small number of grouped bits from each
field, and map the lookup results in a way that yields a verdict for
the full set of specified fields.
The grouping bit aspect is loosely inspired by the Grouper algorithm,
by Jay Ligatti, Josh Kuhn, and Chris Gage (see patch 5/9 for the full
reference).
A reference, stand-alone implementation of the algorithm itself is
available at:
https://pipapo.lameexcu.se
Some notes about possible future optimisations are also mentioned
there. This algorithm reduces the matching problem to, essentially,
a repetitive sequence of simple bitwise operations, and is
particularly suitable to be optimised by leveraging SIMD instruction
sets."
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This test covers functionality and stability of the newly added
nftables set implementation supporting concatenation of ranged
fields.
For some selected set expression types, test:
- correctness, by checking that packets match or don't
- concurrency, by attempting races between insertion, deletion, lookup
- timeout feature, checking that packets don't match expired entries
and (roughly) estimate matching rates, comparing to baselines for
simple drop on netdev ingress hook and for hash and rbtrees sets.
In order to send packets, this needs one of sendip, netcat or bash.
To flood with traffic, iperf3, iperf and netperf are supported. For
performance measurements, this relies on the sample pktgen script
pktgen_bench_xmit_mode_netif_receive.sh.
If none of the tools suitable for a given test are available, specific
tests will be skipped.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This new set type allows for intervals in concatenated fields,
which are expressed in the usual way, that is, simple byte
concatenation with padding to 32 bits for single fields, and
given as ranges by specifying start and end elements containing,
each, the full concatenation of start and end values for the
single fields.
Ranges are expanded to composing netmasks, for each field: these
are inserted as rules in per-field lookup tables. Bits to be
classified are divided in 4-bit groups, and for each group, the
lookup table contains 4^2 buckets, representing all the possible
values of a bit group. This approach was inspired by the Grouper
algorithm:
http://www.cse.usf.edu/~ligatti/projects/grouper/
Matching is performed by a sequence of AND operations between
bucket values, with buckets selected according to the value of
packet bits, for each group. The result of this sequence tells
us which rules matched for a given field.
In order to concatenate several ranged fields, per-field rules
are mapped using mapping arrays, one per field, that specify
which rules should be considered while matching the next field.
The mapping array for the last field contains a reference to
the element originally inserted.
The notes in nft_set_pipapo.c cover the algorithm in deeper
detail.
A pure hash-based approach is of no use here, as ranges need
to be classified. An implementation based on "proxying" the
existing red-black tree set type, creating a tree for each
field, was considered, but deemed impractical due to the fact
that elements would need to be shared between trees, at least
as long as we want to keep UAPI changes to a minimum.
A stand-alone implementation of this algorithm is available at:
https://pipapo.lameexcu.se
together with notes about possible future optimisations
(in pipapo.c).
This algorithm was designed with data locality in mind, and can
be highly optimised for SIMD instruction sets, as the bulk of
the matching work is done with repetitive, simple bitwise
operations.
At this point, without further optimisations, nft_concat_range.sh
reports, for one AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB
L2$):
TEST: performance
net,port [ OK ]
baseline (drop from netdev hook): 10190076pps
baseline hash (non-ranged entries): 6179564pps
baseline rbtree (match on first field only): 2950341pps
set with 1000 full, ranged entries: 2304165pps
port,net [ OK ]
baseline (drop from netdev hook): 10143615pps
baseline hash (non-ranged entries): 6135776pps
baseline rbtree (match on first field only): 4311934pps
set with 100 full, ranged entries: 4131471pps
net6,port [ OK ]
baseline (drop from netdev hook): 9730404pps
baseline hash (non-ranged entries): 4809557pps
baseline rbtree (match on first field only): 1501699pps
set with 1000 full, ranged entries: 1092557pps
port,proto [ OK ]
baseline (drop from netdev hook): 10812426pps
baseline hash (non-ranged entries): 6929353pps
baseline rbtree (match on first field only): 3027105pps
set with 30000 full, ranged entries: 284147pps
net6,port,mac [ OK ]
baseline (drop from netdev hook): 9660114pps
baseline hash (non-ranged entries): 3778877pps
baseline rbtree (match on first field only): 3179379pps
set with 10 full, ranged entries: 2082880pps
net6,port,mac,proto [ OK ]
baseline (drop from netdev hook): 9718324pps
baseline hash (non-ranged entries): 3799021pps
baseline rbtree (match on first field only): 1506689pps
set with 1000 full, ranged entries: 783810pps
net,mac [ OK ]
baseline (drop from netdev hook): 10190029pps
baseline hash (non-ranged entries): 5172218pps
baseline rbtree (match on first field only): 2946863pps
set with 1000 full, ranged entries: 1279122pps
v4:
- fix build for 32-bit architectures: 64-bit division needs
div_u64() (kbuild test robot <lkp@intel.com>)
v3:
- rework interface for field length specification,
NFT_SET_SUBKEY disappears and information is stored in
description
- remove scratch area to store closing element of ranges,
as elements now come with an actual attribute to specify
the upper range limit (Pablo Neira Ayuso)
- also remove pointer to 'start' element from mapping table,
closing key is now accessible via extension data
- use bytes right away instead of bits for field lengths,
this way we can also double the inner loop of the lookup
function to take care of upper and lower bits in a single
iteration (minor performance improvement)
- make it clearer that set operations are actually atomic
API-wise, but we can't e.g. implement flush() as one-shot
action
- fix type for 'dup' in nft_pipapo_insert(), check for
duplicates only in the next generation, and in general take
care of differentiating generation mask cases depending on
the operation (Pablo Neira Ayuso)
- report C implementation matching rate in commit message, so
that AVX2 implementation can be compared (Pablo Neira Ayuso)
v2:
- protect access to scratch maps in nft_pipapo_lookup() with
local_bh_disable/enable() (Florian Westphal)
- drop rcu_read_lock/unlock() from nft_pipapo_lookup(), it's
already implied (Florian Westphal)
- explain why partial allocation failures don't need handling
in pipapo_realloc_scratch(), rename 'm' to clone and update
related kerneldoc to make it clear we're not operating on
the live copy (Florian Westphal)
- add expicit check for priv->start_elem in
nft_pipapo_insert() to avoid ending up in nft_pipapo_walk()
with a NULL start element, and also zero it out in every
operation that might make it invalid, so that insertion
doesn't proceed with an invalid element (Florian Westphal)
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The new bitmap function bitmap_cut() copies bits from source to
destination by removing the region specified by parameters first
and cut, and remapping the bits above the cut region by right
shifting them.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Introduce a new nested netlink attribute, NFTA_SET_DESC_CONCAT, used
to specify the length of each field in a set concatenation.
This allows set implementations to support concatenation of multiple
ranged items, as they can divide the input key into matching data for
every single field. Such set implementations would be selected as
they specify support for NFT_SET_INTERVAL and allow desc->field_count
to be greater than one. Explicitly disallow this for nft_set_rbtree.
In order to specify the interval for a set entry, userspace would
include in NFTA_SET_DESC_CONCAT attributes field lengths, and pass
range endpoints as two separate keys, represented by attributes
NFTA_SET_ELEM_KEY and NFTA_SET_ELEM_KEY_END.
While at it, export the number of 32-bit registers available for
packet matching, as nftables will need this to know the maximum
number of field lengths that can be specified.
For example, "packets with an IPv4 address between 192.0.2.0 and
192.0.2.42, with destination port between 22 and 25", can be
expressed as two concatenated elements:
NFTA_SET_ELEM_KEY: 192.0.2.0 . 22
NFTA_SET_ELEM_KEY_END: 192.0.2.42 . 25
and NFTA_SET_DESC_CONCAT attribute would contain:
NFTA_LIST_ELEM
NFTA_SET_FIELD_LEN: 4
NFTA_LIST_ELEM
NFTA_SET_FIELD_LEN: 2
v4: No changes
v3: Complete rework, NFTA_SET_DESC_CONCAT instead of NFTA_SET_SUBKEY
v2: No changes
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add NFTA_SET_ELEM_KEY_END attribute to convey the closing element of the
interval between kernel and userspace.
This patch also adds the NFT_SET_EXT_KEY_END extension to store the
closing element value in this interval.
v4: No changes
v3: New patch
[sbrivio: refactor error paths and labels; add corresponding
nft_set_ext_type for new key; rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Add helper function to parse the set element key netlink attribute.
v4: No changes
v3: New patch
[sbrivio: refactor error paths and labels; use NFT_DATA_VALUE_MAXLEN
instead of sizeof(*key) in helper, value can be longer than that;
rebase]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ath.git patches for v5.6. Major changes:
ar5523
* add support for SMCWUSBT-G2 USB device
|
|
The loop counter addr is a u16 where as the upper limit of the loop
is an int. In the unlikely event that the il->cfg->eeprom_size is
greater than 64K then we end up with an infinite loop since addr will
wrap around an never reach upper loop limit. Fix this by making addr
an int.
Addresses-Coverity: ("Infinite loop")
Fixes: be663ab67077 ("iwlwifi: split the drivers for agn and legacy devices 3945/4965")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
There is a spelling mistake in one of the fields in the btc_coexist struct,
fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:16:18:
warning: ofdmswing_table defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:56:17:
warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c:92:17:
warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=]
These variable is never used, so remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:15:18:
warning: ofdmswing_table defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:61:17:
warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8192ee/dm.c:97:17:
warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=]
These variable is never used, so remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:142:17:
warning: cckswing_table_ch1ch13 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:178:17:
warning: cckswing_table_ch14 defined but not used [-Wunused-const-variable=]
drivers/net/wireless/realtek/rtlwifi/rtl8821ae/dm.c:96:18:
warning: ofdmswing_table defined but not used [-Wunused-const-variable=]
These variable is never used, so remove them.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Variable cond is being assigned with a value that is never
read, it is assigned a new value later on. The assignment is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Add support for 11ax features: TWT responder and spatial reuse.
Add separate structure for spatial reuse parameters and pass this
structure to firmware along with other parameters in start_ap
command. Pass TWT responder value to firmware. Bump qlink
protocol version.
Signed-off-by: Mikhail Karpenko <mkarpenko@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Add HE rates into STA info. Report HE Rx/Tx MCS if STA supports them.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Bridging qtnfmac interfaces is possible only if the following two
conditions are fulfilled:
- firmware advertises proper support with QLINK_HW_CAPAB_HW_BRIDGE
- kernel is built with CONFIG_NET_SWITCHDEV support
Otherwise adding qtnfmac wireless interfaces into the same bridge
should not be allowed since packets flooded by kernel may break
internal forwarding rules between interfaces.
This patch disables adding qtnfmac wireless interfaces into the
same bridge if no support is provided either by card or by kernel.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Firmware may support DFS offload. However the final decision on whether
to use it or not should be up to the user. So even if firmware supports
DFS offload, it should be enabled only if user explicitly requests it.
For this purpose introduce kernel param dfs_offload which is disabled
by default.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Currently this parameter is global, it is not specific to mac.
So this function does not need any input parameters.
Signed-off-by: Sergey Matyukevich <sergey.matyukevich.os@quantenna.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
idx is declared as u32, it will never less than 0.
Signed-off-by: yuehaibing <yuehaibing@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
When TX packet arrives, driver should leave deep PS state to make
sure the DMA is working. After requested to leave deep PS state,
driver needs to poll the PS state to check if the mode has been
changed successfully. The driver used to check the state of the
hardware every 20 msecs, which means upon the first failure of
state check, the CPU is delayed 20 msecs for next check. This is
harmful for some time-sensitive applications such as media players.
So, use shorter delay time each check from 20 msecs to 100 usecs.
The state should be changed in several tries. But we still need
to reserve ~15 msecs in total in case of the state just took too
long to be changed successfully. If the states of driver and the
hardware is not synchronized, the power state could be locked
forever, which mean we could never enter/leave the PS state.
Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Reviewed-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Sometimes the TX queue may be empty and we could possible
dequeue a NULL pointer, crash the kernel. If the skb is NULL
then there is nothing to do, just leave the ISR.
And the TX queue should not be empty here, so print an error
to see if there is anything wrong for DMA ring.
Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver")
Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Report monitor interface availability using cfg80211 and support it in
the add_virtual_intf() and del_virtual_intf() callbacks. This new
feature is conditional and depends on firmware flagging monitor packets.
Receiving monitor frames is already handled by the brcmf_netif_mon_rx().
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Move similar/duplicated code out of combination specific code blocks.
This simplifies code a bit and allows adding more combinations later.
A list of combinations remains unchanged.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Commit 262f2b53f679 ("brcmfmac: call brcmf_attach() just before calling
brcmf_bus_started()") changed the initialization order of the brcmfmac
SDIO driver. Unfortunately since brcmf_sdiod_intr_register() is now
called before the sdiodev->bus_if initialization, it reads the wrong
chip ID and fails to initialize the GPIO on brcm43362. Thus the chip
cannot send interrupts and fails to probe:
[ 12.517023] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 12.531214] ieee80211 phy0: brcmf_bus_started: failed: -110
[ 12.536976] ieee80211 phy0: brcmf_attach: dongle is not responding: err=-110
[ 12.566467] brcmfmac: brcmf_sdio_firmware_callback: brcmf_attach failed
Initialize the bus interface earlier to ensure that
brcmf_sdiod_intr_register() properly sets up the OOB interrupt.
BugLink: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908438
Fixes: 262f2b53f679 ("brcmfmac: call brcmf_attach() just before calling brcmf_bus_started()")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Fixes coccicheck warning:
drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c:911:2-24: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Fixes coccicheck warning:
drivers/net/wireless/st/cw1200/txrx.c:718:6-16: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Fixes coccicheck warning:
drivers/net/wireless/realtek/rtw88/phy.c:1437:1-24: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Some of functions which were exposed in sw.h, are only used in sw.c, so
just make them static. This makes sw.h unnecessary, so remove it.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Some of functions which were exposed in sw.h, are only used in sw.c, so
just make them static. This makes sw.h unnecessary, so remove it.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Some of functions which were exposed in sw.h, are only used in sw.c, so
just make them static. This makes sw.h unnecessary, so remove it.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
It has one define, which is already defined in include from reg.h.
All the declared functions are not implemented anywhere, sw.c has
ones with similar names which are already static.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Some of functions which were exposed in sw.h, are only used in sw.c, so
just make them static. This makes sw.h unnecessary, so remove it.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
It has one define, which is already defined in include from reg.h.
All functions are declared in their own headers and included in *.c
files belonging to them.
This makes sw.h unnecessary, so we can remove it.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Some of functions which were exposed in sw.h, are only used in sw.c, so
just make them static. The rtl92c_init_var_map function is not defined
anywhere, while declared in sw.h. Two other functions are also declared
in phy.h (which is included in sw.c) and their definitions are in phy.c
Overall sw.h is unnecessary and can be removed.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Some of functions which were exposed in sw.h, are only used in sw.c, so
just make them static. This makes sw.h unnecessary, so remove it.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
|
Replace USB_VENDER_ID_REALTEK with USB_VENDOR_ID_REALTEK.
Signed-off-by: Amadeusz Sławiński <amade@asmblr.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|