summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-01-19nfp: add TLV capabilities to the BARJakub Kicinski
NFP is entirely programmable, including the PCI data interface. Using a fixed control BAR layout certainly makes implementations easier, but require careful considerations when space is allocated. Once BAR area is allocated to one feature nothing else can use it. Allocating space statically also requires it to be sized upfront, which leads to either unnecessary limitation or wastage. We currently have a 32bit capability word defined which tells drivers which application FW features are supported. Most of the bits are exhausted. The same bits are also reused for enabling specific features. Bulk of capabilities don't have a need for an enable bit, however, leading to confusion and wastage. TLVs seems like a better fit for expressing capabilities of applications running on programmable hardware. This patch leaves the front of the BAR as is, and declares a TLV capability start at offset 0x58. Most of the space up to 0x0d90 is already allocated, but the used space can be wrapped with RESERVED TLVs. E.g.: Address Type Length 0x0058 RESERVED 0xe00 /* Wrap basic structures */ 0x0e5c FEATURE_A 0x004 0x0e64 FEATURE_B 0x004 0x0e6c RESERVED 0x990 /* Wrap qeueue stats */ 0x1800 FEATURE_C 0x100 Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19nfp: improve app not found messageJakub Kicinski
When driver app matching loaded FW is not found users are faced with: nfp: failed to find app with ID 0x%02x This message does not properly explain that matching driver code is either not built into the driver or the driver is too old. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19nfp: protect each repr pointer individually with RCUJakub Kicinski
Representors are grouped in sets by type. Currently the whole sets are under RCU protection, but individual representor pointers are not. This causes some inconveniences when representors have to be destroyed, because we have to allocate new sets to remove any representors. Protect the individual pointers with RCU. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19nfp: add nfp_reprs_get_locked() helperJakub Kicinski
The write side of repr tables is always done under pf->lock. Add a helper to dereference repr table pointers under protection of that lock. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19nfp: register devlink after app is createdJakub Kicinski
Devlink used to have two global locks: devlink lock and port lock, our lock ordering looked like this: devlink lock -> driver's pf->lock -> devlink port lock After recent changes port lock was replaced with per-instance lock. Unfortunately, new per-instance lock is taken on most operations now. This means we can only grab the pf->lock from the port split/unsplit ops. Lock ordering looks like this: devlink lock -> driver's pf->lock -> devlink instance lock Since we can't take pf->lock from most devlink ops, make sure nfp_apps are prepared to service them as soon as devlink is registered. Locking the pf must be pushed down after nfp_app_init() callback. The init order looks like this: nfp_app_init devlink_register nfp_app_start netdev/port_register As soon as app_init is done nfp_apps must be ready to service devlink-related callbacks. apps can only register their own devlink objects from nfp_app_start. Fixes: 2406e7e546b2 ("devlink: Add per devlink instance lock") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19nfp: release global resources only on the remove pathJakub Kicinski
NFP app is currently shut down as soon as all the vNICs are gone. This means we can't depend on the app existing throughout the lifetime of the device. Free the app only from PCI remove path. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19nfp: core: make scalar CPP helpers fail on short accessesJakub Kicinski
Currently the helpers for accessing 4 or 8 byte values over the CPP bus return the length of IO on success. If the IO was short caller has to deal with error handling. The short IO for 4/8B values is completely impractical. Make the helpers return an error if full access was not possible. Fix the few places which are actually dealing with errors correctly, most call sites already only deal with negative return codes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge branch 'tcp-min-rtt'David S. Miller
Yuchung Cheng says: ==================== tcp: do not use RTT from delayed ACKs for min-RTT This patch set prevents TCP sender from using RTT samples from (suspected) delayed ACKs as the minimum RTT, to avoid unbounded over-estimation of the network path delay. This issue is common when a connection has extended periods of one packet chit-chat beyond the min RTT filter window. The first patch does that for TCP general min RTT estimation. The second patch addresses specifically the BBR congestion control's min RTT filter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBRYuchung Cheng
A persistent connection may send tiny amount of data (e.g. health-check) for a long period of time. BBR's windowed min RTT filter may only see RTT samples from delayed ACKs causing BBR to grossly over-estimate the path delay depending how much the ACK was delayed at the receiver. This patch skips RTT samples that are likely coming from delayed ACKs. Note that it is possible the sender never obtains a valid measure to set the min RTT. In this case BBR will continue to set cwnd to initial window which seems fine because the connection is thin stream. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19tcp: avoid min-RTT overestimation from delayed ACKsYuchung Cheng
This patch avoids having TCP sender or congestion control overestimate the min RTT by orders of magnitude. This happens when all the samples in the windowed filter are one-packet transfer like small request and health-check like chit-chat, which is farily common for applications using persistent connections. This patch tries to conservatively labels and skip RTT samples obtained from this type of workload. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19tipc: fix race between poll() and setsockopt()Jon Maloy
Letting tipc_poll() dereference a socket's pointer to struct tipc_group entails a race risk, as the group item may be deleted in a concurrent tipc_sk_join() or tipc_sk_leave() thread. We now move the 'open' flag in struct tipc_group to struct tipc_sock, and let the former retain only a pointer to the moved field. This will eliminate the race risk. Reported-by: syzbot+799dafde0286795858ac@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19l2tp: remove switch block in l2tp_nl_cmd_session_create()Lorenzo Bianconi
Remove the switch block in l2tp_nl_cmd_session_create() that checks pseudowire-specific parameters since just L2TP_PWTYPE_ETH and L2TP_PWTYPE_PPP are currently supported and no actual checks are performed. Moreover the L2TP_PWTYPE_IP/default case presents a harmless issue in error handling (break instead of goto out_tunnel) Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19cxgb4: IPv6 filter takes 2 tidsGanesh Goudar
on T6, IPv6 filter would occupy 2 tids instead of 4. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge branch 'l2tp-set-l2specific_len-based-on-l2specific_type'David S. Miller
Lorenzo Bianconi says: ==================== l2tp: set l2specific_len based on l2specific_type Do not rely on l2specific_len value provided by userspace but set sublayer length according to l2specific_type. Mark L2TP_ATTR_L2SPEC_LEN attribute as not used Changes since v2: - drop the patch related to a fix in the switch default case in l2tp_nl_cmd_session_create() - use L2SPECTYPE_NONE as default case in l2tp_get_l2specific_len() Changes since v1: - remove l2specific_len parameter - add sanity check on l2specific_type provided by userspace ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19l2tp: mark L2TP_ATTR_L2SPEC_LEN as not usedLorenzo Bianconi
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19l2tp: remove l2specific_len configurable parameterLorenzo Bianconi
Remove l2specific_len configuration parameter since now L2-Specific Sublayer length is computed according to l2specific_type provided by userspace. Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19l2tp: remove l2specific_len dependency in l2tp_coreLorenzo Bianconi
Remove l2specific_len dependency while building l2tpv3 header or parsing the received frame since default L2-Specific Sublayer is always four bytes long and we don't need to rely on a user supplied value. Moreover in l2tp netlink code there are no sanity checks to enforce the relation between l2specific_len and l2specific_type, so sending a malformed netlink message is possible to set l2specific_type to L2TP_L2SPECTYPE_DEFAULT (or even L2TP_L2SPECTYPE_NONE) and set l2specific_len to a value greater than 4 leaking memory on the wire and sending corrupted frames. Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19l2tp: double-check l2specific_type provided by userspaceLorenzo Bianconi
Add sanity check on l2specific_type provided by userspace in l2tp_nl_cmd_session_create() since just L2TP_L2SPECTYPE_DEFAULT and L2TP_L2SPECTYPE_NONE are currently supported. Moreover explicitly set l2specific_type to L2TP_L2SPECTYPE_DEFAULT only if the userspace does not provide a value for it Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Tested-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge branch 'cxgb4-reduce-memory-footprint-for-collecting-firmware-dump'David S. Miller
Rahul Lakkireddy says: ==================== cxgb4: reduce memory footprint for collecting firmware dump Firmware dump can be large (upto 2 GB). In low memory conditions, ethtool fails to allocate such large memory. So, use zlib deflate to compress collected firmware dump. Patch 1 updates collection logic to use compression. Patch 2 adds zlib deflate to compress collected firmware dump. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19cxgb4: use zlib deflate to compress firmware dumpRahul Lakkireddy
Use zlib deflate to compress firmware dump. Collect and compress as much firmware dump as possible into a 32 MB buffer. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19cxgb4: update dump collection logic to use compressionRahul Lakkireddy
Update firmware dump collection logic to use compression when available. Let collection logic attempt to do compression, instead of returning out of memory early. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19net/dim: Fix fixpoint divide exception in net_dim_stats_compareTalat Batheesh
Helmut reported a bug about devision by zero while running traffic and doing physical cable pull test. When the cable unplugged the ppms become zero, so when dividing the current ppms by the previous ppms in the next dim iteration there is devision by zero. This patch prevent this division for both ppms and epms. Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism") Fixes: 4c4dbb4a7363 ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux") Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com> Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19devlink: Make some functions staticWei Yongjun
Fixes the following sparse warnings: net/core/devlink.c:2297:25: warning: symbol 'devlink_resource_find' was not declared. Should it be static? net/core/devlink.c:2322:6: warning: symbol 'devlink_resource_validate_children' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19mlxsw: spectrum: Make function mlxsw_sp_kvdl_part_occ() staticWei Yongjun
Fixes the following sparse warning: drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning: symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19forcedeth: remove unused variableZhu Yanjun
The variable miistat is not used. So it is removed. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19ipv6: mcast: remove dead codeEric Dumazet
Since commit 41033f029e39 ("snmp: Remove duplicate OUTMCAST stat increment") one line of code became unneeded. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19caif: reduce stack size with KASANArnd Bergmann
When CONFIG_KASAN is set, we can use relatively large amounts of kernel stack space: net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=] This adds convenience wrappers around cfpkt_extr_head(), which is responsible for most of the stack growth. With those wrapper functions, gcc apparently starts reusing the stack slots for each instance, thus avoiding the problem. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge tag 'linux-can-next-for-4.16-20180119' of ↵David S. Miller
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2018-01-16 this is a pull request for net-next/master consisting of 1 patch. This patch by Arnd Bergmann for the m_can driver silences a compiler warning if CONFIG_PM is not selected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge tag 'wireless-drivers-next-for-davem-2018-01-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.16 Final few patches before the merge window, nothing really special. ath9k * add MSI support (not enabled by default yet) rtlwifi * support A-MSDU in A-MPDU aggregation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19can: m_can: mark runtime-PM handlers as __maybe_unusedArnd Bergmann
Building without CONFIG_PM results in a harmless warning: drivers/net/can/m_can/m_can.c:1763:12: error: 'm_can_runtime_resume' defined but not used [-Werror=unused-function] drivers/net/can/m_can/m_can.c:1752:12: error: 'm_can_runtime_suspend' defined but not used [-Werror=unused-function] Marking the functions as __maybe_unused lets the compiler silently drop them instead. Fixes: cdf8259d6573 ("can: m_can: Add PM Support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-01-18net/sched/sch_prio.c: work around gcc-4.4.4 union initializer issuesAndrew Morton
gcc-4.4.4 has problems witn anon union initializers. Work around this. net/sched/sch_prio.c: In function 'prio_dump_offload': net/sched/sch_prio.c:260: error: unknown field 'stats' specified in initializer net/sched/sch_prio.c:260: warning: initialization makes integer from pointer without a cast net/sched/sch_prio.c:261: error: unknown field 'stats' specified in initializer net/sched/sch_prio.c:261: warning: initialization makes integer from pointer without a cast Fixes: 7fdb61b44c0c95 ("net: sch: prio: Add offload ability to PRIO qdisc") Cc: Nogah Frankel <nogahf@mellanox.com> Cc: Yuval Mintz <yuvalm@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18flow_netlink: Remove unneeded semicolonsChristopher Díaz Riveros
Trivial fix removes unneeded semicolons after if blocks. This issue was detected by using the Coccinelle software. Signed-off-by: Christopher Díaz Riveros <chrisadr@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18net/mlx5e: Fix trailing semicolonLuis de Bethencourt
The trailing semicolon is an empty statement that does no operation. Removing it since it doesn't do anything. Signed-off-by: Luis de Bethencourt <luisbg@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18net: sched: silence uninitialized parent variable warning in tc_dump_tfilterJiri Pirko
When tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK, parent is still passed down but the value is never used. Compiler does not recognize it and issues a warning. Silence it down initializing parent to 0. Fixes: 7960d1daf278 ("net: sched: use block index as a handle instead of qdisc when block is shared") Reported-by: David Miller <davem@davemloft.net> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18rtlwifi: Support A-MSDU in A-MPDU capabilityPing-Ke Shih
Due to the fact that A-MSDU deaggregation is done in software, we set this flag to support the A-MSDU in A-MPDU Signed-off-by: Steven Ting <steventing@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-01-18MAINTAINERS: wireless: update wil6210 maintainer entryMaya Erez
wil6210 maintainer email and mail list has changed, hence update its MAINTAINERS entry accordingly. Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-01-17Merge tag 'linux-can-next-for-4.16-20180116' of ↵David S. Miller
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2018-01-16 this is a pull request for net-next/master consisting of 9 patches. This is a series of patches, some of them initially by Franklin S Cooper Jr, which was picked up by Faiz Abbas. Faiz Abbas added some patches while working on this series, I contributed one as well. The first two patches add support to CAN device infrastructure to limit the bitrate of a CAN adapter if the used CAN-transceiver has a certain maximum bitrate. The remaining patches improve the m_can driver. They add support for bitrate limiting to the driver, clean up the driver and add support for runtime PM. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17vxlan: Fix trailing semicolonLuis de Bethencourt
The trailing semicolon is an empty statement that does no operation. It is completely stripped out by the compiler. Removing it since it doesn't do anything. Fixes: 5f35227ea34b ("net: Generalize ndo_gso_check to ndo_features_check") Signed-off-by: Luis de Bethencourt <luisbg@kernel.org> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17cxgb4: restructure VF mgmt codeGanesh Goudar
restructure the code which adds support for configuring PCIe VF via mgmt netdevice. which was added by commit 7829451c695e ("cxgb4: Add control net_device for configuring PCIe VF") Original work by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: Remove spinlock from get_net_ns_by_id()Kirill Tkhai
idr_find() is safe under rcu_read_lock() and maybe_get_net() guarantees that net is alive. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: Fix possible race in peernet2id_alloc()Kirill Tkhai
peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count) under net->nsid_lock does not guarantee, peer is alive: rcu_read_lock() peernet2id_alloc() .. spin_lock_bh(&net->nsid_lock) .. refcount_read(&peer->count) (!= 0) .. .. put_net() .. cleanup_net() .. for_each_net(tmp) .. spin_lock_bh(&tmp->nsid_lock) .. __peernet2id(tmp, net) == -1 .. .. .. .. __peernet2id_alloc(alloc == true) .. .. .. rcu_read_unlock() .. .. synchronize_rcu() .. kmem_cache_free(net) After the above situation, net::netns_id contains id pointing to freed memory, and any other dereferencing by the id will operate with this freed memory. Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc() is generic interface, and better we fix it before someone really starts use it in wrong context. v2: Don't place refcount_read(&net->count) under net->nsid_lock as suggested by Eric W. Biederman <ebiederm@xmission.com> v3: Rebase on top of net-next Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge branch 'tun-allow-to-attach-eBPF-filter'David S. Miller
Jason Wang says: ==================== tun: allow to attach eBPF filter This series tries to implement eBPF socket filter for tun. This could be used for implementing efficient virtio-net receive filter for vhost-net. Changes from V2: - fix typo - remove unnecessary double check ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17tun: allow to attach ebpf socket filterJason Wang
This patch allows userspace to attach eBPF filter to tun. This will allow to implement VM dataplane filtering in a more efficient way compared to cBPF filter by allowing either qemu or libvirt to attach eBPF filter to tun. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17tuntap: rename struct tun_steering_prog to struct tun_progJason Wang
To be reused by other eBPF program other than queue selection. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge branch 'net-sched-allow-qdiscs-to-share-filter-block-instances'David S. Miller
Jiri Pirko says: ==================== net: sched: allow qdiscs to share filter block instances Currently the filters added to qdiscs are independent. So for example if you have 2 netdevices and you create ingress qdisc on both and you want to add identical filter rules both, you need to add them twice. This patchset makes this easier and mainly saves resources allowing to share all filters within a qdisc - I call it a "filter block". Also this helps to save resources when we do offload to hw for example to expensive TCAM. So back to the example. First, we create 2 qdiscs. Both will share block number 22. "22" is just an identification: $ tc qdisc add dev ens7 ingress_block 22 ingress ^^^^^^^^^^^^^^^^ $ tc qdisc add dev ens8 ingress_block 22 ingress ^^^^^^^^^^^^^^^^ If we don't specify "block" command line option, no shared block would be created: $ tc qdisc add dev ens9 ingress Now if we list the qdiscs, we will see the block index in the output: $ tc qdisc qdisc ingress ffff: dev ens7 parent ffff:fff1 ingress_block 22 qdisc ingress ffff: dev ens8 parent ffff:fff1 ingress_block 22 qdisc ingress ffff: dev ens9 parent ffff:fff1 To make is more visual, the situation looks like this: ens7 ingress qdisc ens7 ingress qdisc | | | | +----------> block 22 <----------+ Unlimited number of qdiscs may share the same block. Note that this patchset introduces block sharing support also for clsact qdisc: $ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact $ tc qdisc show dev ens10 qdisc clsact ffff: dev ens10 parent ffff:fff1 ingress_block 23 egress_block 24 We can add filter using the block index: $ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop Note we cannot use the qdisc for filter manipulations of shared blocks: $ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 action drop Error: This filter block is shared. Please use the block index to manipulate the filters. We will see the same output if we list filters for ingress qdisc of ens7 and ens8, also for the block 22: $ tc filter show block 22 filter block 22 protocol ip pref 25 flower chain 0 filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 ... $ tc filter show dev ens7 ingress filter block 22 protocol ip pref 25 flower chain 0 filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 ... $ tc filter show dev ens8 ingress filter block 22 protocol ip pref 25 flower chain 0 filter block 22 protocol ip pref 25 flower chain 0 handle 0x1 ... --- v10->v11: - patch 2: - fixed error path when register_pernet_subsys fails pointed out by Cong - patch 9: - rebased on top of the current net-next v9->v10: - patch 7: - fixed ifindex magic in the patch description - userspace patches: - added manpages and patch descriptions v8->v9: - patch "net: sched: add rt netlink message type for block get" was removed, userspace check filter existence using qdisc dump v7->v8: - patch 7: - added comment to ifindex block magic - patch 9: - new patch - patch 10: - base this on the patch that introduces qdisc-generic block index attributes parsing/dumping - patch 13: - rebased on top of current net-next v6->v7: - patch 1: - unsquashed shared block patch that was previously squashed by mistake - fixed error path in block create - freeing chain 0 - patch 2: - new patch - splitted from the previous one as it got accidentaly squashed in the rebasing process in the past - converted to idr extended - removed auto-generating of block indexes. Callers have to explicily tell that the block is shared by passing non-zero block index - fixed error path in block get ext - freeing chain 0 - patch 7: - changed extack message for block index handle as suggested by DaveA - added extack message when block index does not exist - the block ifindex magic is in define and change to 0xffffffff as suggested by Jamal - patch 8: - new patch implementing RTM_GETBLOCK in order to query if the block with some index exists - patch 9: - adjust to the core changes and check block index attributes for being 0 v5->v6: - added patch 6 that introduces block handle v4->v5: - patch 5: - add tracking of binding of devs that are unable to offload and check that before block cbs call. v3->v4: - patch 1: - rebased on top of the current net-next - added some extack strings - patch 3: - rebased on top of the current net-next - patch 5: - propagate netdev_ops->ndo_setup_tc error up to tcf_block_offload_bind caller - patch 7: - rebased on top of the current net-next v2->v3: - removed original patch 1, removing tp->q cls_bpf dependency. Fixed by Jakub in the meantime. - patch 1: - rebased on top of the current net-next - patch 5: - new patch - patch 8: - removed "p_" prefix from block index function args - patch 10: - add tc offload feature handling ==================== Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind opsJiri Pirko
No need to convert from mlxsw_sp_port to net_device and back again. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17mlxsw: spectrum_acl: Implement TC block sharingJiri Pirko
Benefit from the prepared TC and in-driver ACL infrastructure and introduce block sharing offload. For that, a new struct "block" is introduced in spectrum_acl in order to hold a list of specific block-port bindings. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbindJiri Pirko
Instead, pass netdev and ingress flag to ruleset unbind op. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroyJiri Pirko
In order to prepare for follow-up changes, make the bind/unbind helpers very simple. That required move of ht insertion/removal and bind/unbind calls into mlxsw_sp_acl_ruleset_create/destroy. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: allow ingress and clsact qdiscs to share filter blocksJiri Pirko
Benefit from the previously introduced shared filter blocks infrastructure and allow ingress and clsact qdisc instances to share filter blocks. The block index is coming from userspace as qdisc option. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>