summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-10-15tipc: fix unsafe rcu locking when accessing publication listTung Nguyen
The binding table's 'cluster_scope' list is rcu protected to handle races between threads changing the list and those traversing the list at the same moment. We have now found that the function named_distribute() uses the regular list_for_each() macro to traverse the said list. Likewise, the function tipc_named_withdraw() is removing items from the same list using the regular list_del() call. When these two functions execute in parallel we see occasional crashes. This commit fixes this by adding the missing _rcu() suffixes. Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15rxrpc: Fix incorrect conditional on IPV6David Howells
The udpv6_encap_enable() function is part of the ipv6 code, and if that is configured as a loadable module and rxrpc is built in then a build failure will occur because the conditional check is wrong: net/rxrpc/local_object.o: In function `rxrpc_lookup_local': local_object.c:(.text+0x2688): undefined reference to `udpv6_encap_enable' Use the correct config symbol (CONFIG_AF_RXRPC_IPV6) in the conditional check rather than CONFIG_IPV6 as that will do the right thing. Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Reported-by: kbuild-all@01.org Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15ipv6: rate-limit probes for neighbourless routesSabrina Dubroca
When commit 270972554c91 ("[IPV6]: ROUTE: Add Router Reachability Probing (RFC4191).") introduced router probing, the rt6_probe() function required that a neighbour entry existed. This neighbour entry is used to record the timestamp of the last probe via the ->updated field. Later, commit 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().") removed the requirement for a neighbour entry. Neighbourless routes skip the interval check and are not rate-limited. This patch adds rate-limiting for neighbourless routes, by recording the timestamp of the last probe in the fib6_info itself. Fixes: 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15rxrpc: use correct kvec num when sending BUSY response packetYueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: net/rxrpc/output.c: In function 'rxrpc_reject_packets': net/rxrpc/output.c:527:11: warning: variable 'ioc' set but not used [-Wunused-but-set-variable] 'ioc' is the correct kvec num when sending a BUSY (or an ABORT) response packet. Fixes: ece64fec164f ("rxrpc: Emit BUSY packets when supposed to rather than ABORTs") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15rxrpc: Fix an uninitialised variableDavid Howells
Fix an uninitialised variable introduced by the last patch. This can cause a crash when a new call comes in to a local service, such as when an AFS fileserver calls back to the local cache manager. Fixes: c1e15b4944c9 ("rxrpc: Fix the packet reception routine") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15tipc: initialize broadcast link stale counter correctlyJon Maloy
In the commit referred to below we added link tolerance as an additional criteria for declaring broadcast transmission "stale" and resetting the unicast links to the affected node. Unfortunately, this 'improvement' introduced two bugs, which each and one alone cause only limited problems, but combined lead to seemingly stochastic unicast link resets, depending on the amount of broadcast traffic transmitted. The first issue, a missing initialization of the 'tolerance' field of the receiver broadcast link, was recently fixed by commit 047491ea334a ("tipc: set link tolerance correctly in broadcast link"). Ths second issue, where we omit to reset the 'stale_cnt' field of the same link after a 'stale' period is over, leads to this counter accumulating over time, and in the absence of the 'tolerance' criteria leads to the above described symptoms. This commit adds the missing initialization. Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15llc: set SOCK_RCU_FREE in llc_sap_add_socket()Cong Wang
WHen an llc sock is added into the sk_laddr_hash of an llc_sap, it is not marked with SOCK_RCU_FREE. This causes that the sock could be freed while it is still being read by __llc_lookup_established() with RCU read lock. sock is refcounted, but with RCU read lock, nothing prevents the readers getting a zero refcnt. Fix it by setting SOCK_RCU_FREE in llc_sap_add_socket(). Reported-by: syzbot+11e05f04c15e03be5254@syzkaller.appspotmail.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI ↵Justin.Lee1@Dell.com
command The new command (NCSI_CMD_SEND_CMD) is added to allow user space application to send NC-SI command to the network card. Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response. The work flow is as below. Request: User space application -> Netlink interface (msg) -> new Netlink handler - ncsi_send_cmd_nl() -> ncsi_xmit_cmd() Response: Response received - ncsi_rcv_rsp() -> internal response handler - ncsi_rsp_handler_xxx() -> ncsi_rsp_handler_netlink() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) -> user space application Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> Netlink interface (msg with zero data length) -> user space application Error: Error detected -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> user space application Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15tipc: support binding to specific ip address when activating UDP bearerHoang Le
INADDR_ANY is hard-coded when activating UDP bearer. So, we could not bind to a specific IP address even with replicast mode using - given remote ip address instead of using multicast ip address. In this commit, we fixed it by checking and switch to use appropriate local ip address. before: $netstat -plu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 **0.0.0.0:6118** 0.0.0.0:* after: $netstat -plu Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address udp 0 0 **10.0.0.2:6118** 0.0.0.0:* Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net/sched: cls_api: add missing validation of netlink attributesDavide Caratti
Similarly to what has been done in 8b4c3cdd9dd8 ("net: sched: Add policy validation for tc attributes"), fix classifier code to add validation of TCA_CHAIN and TCA_KIND netlink attributes. tested with: # ./tdc.py -c filter v2: Let sch_api and cls_api share nla_policy they have in common, thanks to David Ahern. v3: Avoid EXPORT_SYMBOL(), as validation of those attributes is not done by TC modules, thanks to Cong Wang. While at it, restore the 'Delete / get qdisc' comment to its orginal position, just above tc_get_qdisc() function prototype. Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Support capturing outgoing SMT trafficMaciej W. Rozycki
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue is queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames which are queued back to the Tx queue for the RMC to send to the ring. When a network tap is attached to an FDDI interface handled by `defza' any incoming SMT frames captured are queued to our usual processing of network data received, which in turn delivers them to any listening taps. However the outgoing SMT frames produced by the firmware bypass our network protocol stack and are therefore not delivered to taps. This in turn means that taps are missing a part of network traffic sent by the adapter, which may make it more difficult to track down network problems or do general traffic analysis. Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that a network tap is attached, with a newly-created `dev_nit_active' helper wrapping the usual condition used in the transmit path. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15ethtool: fix a privilege escalation bugWenwen Wang
In dev_ethtool(), the eth command 'ethcmd' is firstly copied from the use-space buffer 'useraddr' and checked to see whether it is ETHTOOL_PERQUEUE. If yes, the sub-command 'sub_cmd' is further copied from the user space. Otherwise, 'sub_cmd' is the same as 'ethcmd'. Next, according to 'sub_cmd', a permission check is enforced through the function ns_capable(). For example, the permission check is required if 'sub_cmd' is ETHTOOL_SCOALESCE, but it is not necessary if 'sub_cmd' is ETHTOOL_GCOALESCE, as suggested in the comment "Allow some commands to be done by anyone". The following execution invokes different handlers according to 'ethcmd'. Specifically, if 'ethcmd' is ETHTOOL_PERQUEUE, ethtool_set_per_queue() is called. In ethtool_set_per_queue(), the kernel object 'per_queue_opt' is copied again from the user-space buffer 'useraddr' and 'per_queue_opt.sub_command' is used to determine which operation should be performed. Given that the buffer 'useraddr' is in the user space, a malicious user can race to change the sub-command between the two copies. In particular, the attacker can supply ETHTOOL_PERQUEUE and ETHTOOL_GCOALESCE to bypass the permission check in dev_ethtool(). Then before ethtool_set_per_queue() is called, the attacker changes ETHTOOL_GCOALESCE to ETHTOOL_SCOALESCE. In this way, the attacker can bypass the permission check and execute ETHTOOL_SCOALESCE. This patch enforces a check in ethtool_set_per_queue() after the second copy from 'useraddr'. If the sub-command is different from the one obtained in the first copy in dev_ethtool(), an error code EINVAL will be returned. Fixes: f38d138a7da6 ("net/ethtool: support set coalesce per queue") Signed-off-by: Wenwen Wang <wang6495@umn.edu> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15ethtool: fix a missing-check bugWenwen Wang
In ethtool_get_rxnfc(), the eth command 'cmd' is compared against 'ETHTOOL_GRXFH' to see whether it is necessary to adjust the variable 'info_size'. Then the whole structure of 'info' is copied from the user-space buffer 'useraddr' with 'info_size' bytes. In the following execution, 'info' may be copied again from the buffer 'useraddr' depending on the 'cmd' and the 'info.flow_type'. However, after these two copies, there is no check between 'cmd' and 'info.cmd'. In fact, 'cmd' is also copied from the buffer 'useraddr' in dev_ethtool(), which is the caller function of ethtool_get_rxnfc(). Given that 'useraddr' is in the user space, a malicious user can race to change the eth command in the buffer between these copies. By doing so, the attacker can supply inconsistent data and cause undefined behavior because in the following execution 'info' will be passed to ops->get_rxnfc(). This patch adds a necessary check on 'info.cmd' and 'cmd' to confirm that they are still same after the two copies in ethtool_get_rxnfc(). Otherwise, an error code EINVAL will be returned. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15bpf: Fix IPv6 dport byte-order in bpf_sk_lookupJoe Stringer
Commit 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") mistakenly passed the destination port in network byte-order to the IPv6 TCP/UDP socket lookup functions, which meant that BPF writers would need to either manually swap the byte-order of this field or otherwise IPv6 sockets could not be located via this helper. Fix the issue by swapping the byte-order appropriately in the helper. This also makes the API more consistent with the IPv4 version. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf: Allow sk_lookup with IPv6 moduleJoe Stringer
This is a more complete fix than d71019b54bff ("net: core: Fix build with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6 module is loaded (not just if it's compiled in). Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: add bpf support to sk_msg handlingJohn Fastabend
This work adds BPF sk_msg verdict program support to kTLS allowing BPF and kTLS to be combined together. Previously kTLS and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. To resolve this, leveraging the work from previous patches that consolidates the use of sk_msg, we can finally enable BPF sk_msg verdict programs so they continue to run after the kTLS socket is created. No change in behavior when kTLS is not used in combination with BPF, the kselftest suite for kTLS also runs successfully. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: replace poll implementation with read hookJohn Fastabend
Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: convert to generic sk_msg interfaceDaniel Borkmann
Convert kTLS over to make use of sk_msg interface for plaintext and encrypted scattergather data, so it reuses all the sk_msg helpers and data structure which later on in a second step enables to glue this to BPF. This also allows to remove quite a bit of open coded helpers which are covered by the sk_msg API. Recent changes in kTLs 80ece6a03aaf ("tls: Remove redundant vars from tls record structure") and 4e6d47206c32 ("tls: Add support for inplace records encryption") changed the data path handling a bit; while we've kept the latter optimization intact, we had to undo the former change to better fit the sk_msg model, hence the sg_aead_in and sg_aead_out have been brought back and are linked into the sk_msg sgs. Now the kTLS record contains a msg_plaintext and msg_encrypted sk_msg each. In the original code, the zerocopy_from_iter() has been used out of TX but also RX path. For the strparser skb-based RX path, we've left the zerocopy_from_iter() in decrypt_internal() mostly untouched, meaning it has been moved into tls_setup_from_iter() with charging logic removed (as not used from RX). Given RX path is not based on sk_msg objects, we haven't pursued setting up a dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it could be an option to prusue in a later step. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf, sockmap: convert to generic sk_msg interfaceDaniel Borkmann
Add a generic sk_msg layer, and convert current sockmap and later kTLS over to make use of it. While sk_buff handles network packet representation from netdevice up to socket, sk_msg handles data representation from application to socket layer. This means that sk_msg framework spans across ULP users in the kernel, and enables features such as introspection or filtering of data with the help of BPF programs that operate on this data structure. Latter becomes in particular useful for kTLS where data encryption is deferred into the kernel, and as such enabling the kernel to perform L7 introspection and policy based on BPF for TLS connections where the record is being encrypted after BPF has run and came to a verdict. In order to get there, first step is to transform open coding of scatter-gather list handling into a common core framework that subsystems can use. The code itself has been split and refactored into three bigger pieces: i) the generic sk_msg API which deals with managing the scatter gather ring, providing helpers for walking and mangling, transferring application data from user space into it, and preparing it for BPF pre/post-processing, ii) the plain sock map itself where sockets can be attached to or detached from; these bits are independent of i) which can now be used also without sock map, and iii) the integration with plain TCP as one protocol to be used for processing L7 application data (later this could e.g. also be extended to other protocols like UDP). The semantics are the same with the old sock map code and therefore no change of user facing behavior or APIs. While pursuing this work it also helped finding a number of bugs in the old sockmap code that we've fixed already in earlier commits. The test_sockmap kselftest suite passes through fine as well. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tcp, ulp: remove ulp bits from sockmapDaniel Borkmann
In order to prepare sockmap logic to be used in combination with kTLS we need to detangle it from ULP, and further split it in later commits into a generic API. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanupDaniel Borkmann
Whenever the ULP data on the socket is mangled, enforce that the caller has the socket lock held as otherwise things may race with initialization and cleanup callbacks from ulp ops as both would mangle internal socket state. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15xfrm: use complete IPv6 addresses for hashMichal Kubecek
In some environments it is common that many hosts share the same lower half of their IPv6 addresses (in particular ::1). As __xfrm6_addr_hash() and __xfrm6_daddr_saddr_hash() calculate the hash only from the lower halves, as much as 1/3 of the hosts ends up in one hashtable chain which harms the performance. Use complete IPv6 addresses when calculating the hashes. Rather than just adding two more words to the xor, use jhash2() for consistency with __xfrm6_pref_hash() and __xfrm6_dpref_spref_hash(). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-10-14 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix xsk map update and delete operation to not call synchronize_net() but to piggy back on SOCK_RCU_FREE for sockets instead as we are not allowed to sleep under RCU, from Björn. 2) Do not change RLIMIT_MEMLOCK in reuseport_bpf selftest if the process already has unlimited RLIMIT_MEMLOCK, from Eric. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-14Bluetooth: Remove redundant check on statusColin Ian King
The check on status is redundant as a status has to be zero at the point it is being checked because of a previous check and return path via label 'unlock'. Remove the redundant check and the deadcode that can never be reached. Detected by CoverityScan, CID#1471710 ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-10-14Bluetooth: Errata Service Release 8, Erratum 3253Mallikarjun Phulari
L2CAP: New result values 0x0006 - Connection refused – Invalid Source CID 0x0007 - Connection refused – Source CID already allocated As per the ESR08_V1.0.0, 1.11.2 Erratum 3253, Page No. 54, "Remote CID invalid Issue". Applies to Core Specification versions: V5.0, V4.2, v4.1, v4.0, and v3.0 + HS Vol 3, Part A, Section 4.2, 4.3, 4.14, 4.15. Core Specification Version 5.0, Page No.1753, Table 4.6 and Page No. 1767, Table 4.14 New result values are added to l2cap connect/create channel response as 0x0006 - Connection refused – Invalid Source CID 0x0007 - Connection refused – Source CID already allocated Signed-off-by: Mallikarjun Phulari <mallikarjun.phulari@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-10-14Bluetooth: Use separate L2CAP LE credit based connection result valuesMallikarjun Phulari
Add the result values specific to L2CAP LE credit based connections and change the old result values wherever they were used. Signed-off-by: Mallikarjun Phulari <mallikarjun.phulari@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-10-13bpf: Fix dev pointer dereference from sk_skbJoe Stringer
Dan Carpenter reports: The patch 6acc9b432e67: "bpf: Add helper to retrieve socket in BPF" from Oct 2, 2018, leads to the following Smatch complaint: net/core/filter.c:4893 bpf_sk_lookup() error: we previously assumed 'skb->dev' could be null (see line 4885) Fix this issue by checking skb->dev before using it. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-13kill TIOCSERGSTRUCTAl Viro
Once upon a time a bunch of serial drivers used to provide that; today it's only amiserial and it's FUBAR - the structure being copied to userland includes kernel pointers, fields with config-dependent size, etc. No userland code using it could possibly survive - e.g. enabling lockdep definitely changes the layout. Besides, it's a massive infoleak. Kill it. If somebody needs that data for debugging purposes, they can bloody well expose it saner ways. Assuming anyone does debugging of amiserial in the first place, that is. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-10-13change semantics of ldisc ->compat_ioctl()Al Viro
First of all, make it return int. Returning long when native method had never allowed that is ridiculous and inconvenient. More importantly, change the caller; if ldisc ->compat_ioctl() is NULL or returns -ENOIOCTLCMD, tty_compat_ioctl() will try to feed cmd and compat_ptr(arg) to ldisc's native ->ioctl(). That simplifies ->compat_ioctl() instances quite a bit - they only need to deal with ioctls that are neither generic tty ones (those would get shunted off to tty_ioctl()) nor simple compat pointer ones. Note that something like TCFLSH won't reach ->compat_ioctl(), even if ldisc ->ioctl() does handle it - it will be recognized earlier and passed to tty_ioctl() (and ultimately - ldisc ->ioctl()). For many ldiscs it means that NULL ->compat_ioctl() does the right thing. Those where it won't serve (see e.g. n_r3964.c) are also easily dealt with - we need to handle the numeric-argument ioctls (calling the native instance) and, if such would exist, the ioctls that need layout conversion, etc. All in-tree ldiscs dealt with. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-10-13rfcomm: get rid of mentioning TIOC[SG]SERIALAl Viro
no support there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'mac80211-next-for-davem-2018-10-12' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Highlights: * merge net-next, so I can finish the hwsim workqueue removal * fix TXQ NULL pointer issue that was reported multiple times * minstrel cleanups from Felix * simplify lib80211 code by not using skcipher, note that this will conflict with the crypto tree (and this new code here should be used) * use new netlink policy validation in nl80211 * fix up SAE (part of WPA3) in client-mode * FTM responder support in the stack ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: bridge: add support for per-port vlan statsNikolay Aleksandrov
This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: Evict neighbor entries on carrier downDavid Ahern
When a link's carrier goes down it could be a sign of the port changing networks. If the new network has overlapping addresses with the old one, then the kernel will continue trying to use neighbor entries established based on the old network until the entries finally age out - meaning a potentially long delay with communications not working. This patch evicts neighbor entries on carrier down with the exception of those marked permanent. Permanent entries are managed by userspace (either an admin or a routing daemon such as FRR). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net/ipv6: Add knob to skip DELROUTE message on device downDavid Ahern
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE notifications when a device is taken down (admin down) or deleted. IPv4 does not generate a message for routes evicted by the down or delete; IPv6 does. A NOS at scale really needs to avoid these messages and have IPv4 and IPv6 behave similarly, relying on userspace to handle link notifications and evict the routes. At this point existing user behavior needs to be preserved. Since notifications are a global action (not per app) the only way to preserve existing behavior and allow the messages to be skipped is to add a new sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to disable the notificatioons. IPv6 route code already supports the option to skip the message (it is used for multipath routes for example). Besides the new sysctl we need to pass the skip_notify setting through the generic fib6_clean and fib6_walk functions to fib6_clean_node and to set skip_notify on calls to __ip_del_rt for the addrconf_ifdown path. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12mac80211: implement ieee80211_tx_rate_update to update rateAnilkumar Kolli
Current mac80211 has provision to update tx status through ieee80211_tx_status() and ieee80211_tx_status_ext(). But drivers like ath10k updates the tx status from the skb except txrate, txrate will be updated from a different path, peer stats. Using ieee80211_tx_status_ext() in two different paths (one for the stats, one for the tx rate) would duplicate the stats instead. To avoid this stats duplication, ieee80211_tx_rate_update() is implemented. Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org> [minor commit message editing, use initializers in code] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12nl80211: Add per peer statistics to compute FCS error rateAnkita Bajaj
Add support for drivers to report the total number of MPDUs received and the number of MPDUs received with an FCS error from a specific peer. These counters will be incremented only when the TA of the frame matches the MAC address of the peer irrespective of FCS error. It should be noted that the TA field in the frame might be corrupted when there is an FCS error and TA matching logic would fail in such cases. Hence, FCS error counter might not be fully accurate, but it can provide help in detecting bad RX links in significant number of cases. This FCS error counter without full accuracy can be used, e.g., to trigger a kick-out of a connected client with a bad link in AP mode to force such a client to roam to another AP. Signed-off-by: Ankita Bajaj <bankita@codeaurora.org> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12mac80211: support FTM responder configuration/statisticsPradeep Kumar Chitrapu
New bss param ftm_responder is used to notify the driver to enable fine timing request (FTM) responder role in AP mode. Plumb the new cfg80211 API for FTM responder statistics through to the driver API in mac80211. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking 1) RXRPC receive path fixes from David Howells. 2) Re-export __skb_recv_udp(), from Jiri Kosina. 3) Fix refcounting in u32 classificer, from Al Viro. 4) Userspace netlink ABI fixes from Eugene Syromiatnikov. 5) Don't double iounmap on rmmod in ena driver, from Arthur Kiyanovski. 6) Fix devlink string attribute handling, we must pull a copy into a kernel buffer if the lifetime extends past the netlink request. From Moshe Shemesh. 7) Fix hangs in RDS, from Ka-Cheong Poon. 8) Fix recursive locking lockdep warnings in tipc, from Ying Xue. 9) Clear RX irq correctly in socionext, from Ilias Apalodimas. 10) bcm_sf2 fixes from Florian Fainelli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: dsa: bcm_sf2: Call setup during switch resume net: dsa: bcm_sf2: Fix unbind ordering net: phy: sfp: remove sfp_mutex's definition r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips net: socionext: clear rx irq correctly net/mlx4_core: Fix warnings during boot on driverinit param set failures tipc: eliminate possible recursive locking detected by LOCKDEP selftests: udpgso_bench.sh explicitly requires bash selftests: rtnetlink.sh explicitly requires bash. qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface tipc: queue socket protocol error messages into socket receive buffer tipc: set link tolerance correctly in broadcast link net: ipv4: don't let PMTU updates increase route MTU net: ipv4: update fnhe_pmtu when first hop's MTU changes net/ipv6: stop leaking percpu memory in fib6 info rds: RDS (tcp) hangs on sendto() to unresponding address net: make skb_partial_csum_set() more robust against overflows devlink: Add helper function for safely copy string param devlink: Fix param cmode driverinit for string type devlink: Fix param set handling for string type ...
2018-10-11tipc: eliminate possible recursive locking detected by LOCKDEPYing Xue
When booting kernel with LOCKDEP option, below warning info was found: WARNING: possible recursive locking detected 4.19.0-rc7+ #14 Not tainted -------------------------------------------- swapper/0/1 is trying to acquire lock: 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000dcfc0fc8 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 but task is already holding lock: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&list->lock)->rlock#4); lock(&(&list->lock)->rlock#4); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at: register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051 #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline] #1: 00000000cbb9b036 (&(&list->lock)->rlock#4){+...}, at: tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849 stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1af/0x295 lib/dump_stack.c:113 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline] check_deadlock kernel/locking/lockdep.c:1803 [inline] validate_chain kernel/locking/lockdep.c:2399 [inline] __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168 spin_lock_bh include/linux/spinlock.h:334 [inline] tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521 tipc_init_net+0x472/0x610 net/tipc/core.c:82 ops_init+0xf7/0x520 net/core/net_namespace.c:129 __register_pernet_operations net/core/net_namespace.c:940 [inline] register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052 tipc_init+0x83/0x104 net/tipc/core.c:140 do_one_initcall+0x109/0x70a init/main.c:885 do_initcall_level init/main.c:953 [inline] do_initcalls init/main.c:961 [inline] do_basic_setup init/main.c:979 [inline] kernel_init_freeable+0x4bd/0x57f init/main.c:1144 kernel_init+0x13/0x180 init/main.c:1063 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413 The reason why the noise above was complained by LOCKDEP is because we nested to hold l->wakeupq.lock and l->inputq->lock in tipc_link_reset function. In fact it's unnecessary to move skb buffer from l->wakeupq queue to l->inputq queue while holding the two locks at the same time. Instead, we can move skb buffers in l->wakeupq queue to a temporary list first and then move the buffers of the temporary list to l->inputq queue, which is also safe for us. Fixes: 3f32d0be6c16 ("tipc: lock wakeup & inputq at tipc_link_reset()") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11Merge tag 'alloc-args-v4.19-rc8' of ↵Greg Kroah-Hartman
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Kees writes: "Fix open-coded multiplication arguments to allocators - Fixes several new open-coded multiplications added in the 4.19 merge window." * tag 'alloc-args-v4.19-rc8' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Replace more open-coded allocation size multiplications
2018-10-11mac80211: Extend SAE authentication in infra BSS STA modeJouni Malinen
Previous implementation of SAE authentication in infrastructure BSS was somewhat restricting and not exactly clean way of handling the two auth() operations. This ended up removing and re-adding the STA entry for the AP in the middle of authentication and also messing up authentication state tracking through the sequence of four Authentication frames. Furthermore, this did not work if the AP ended up sending out SAE Confirm (auth trans #2) immediately after SAE Commit (auth trans #1) before the station had time to transmit its SAE Confirm. Clean up authentication state handling for the SAE case to allow two rounds of auth() calls without dropping all state between those operations. Track peer Confirmed status and mark authentication completed only once both ends have confirmed. ieee80211_mgd_auth() check for EBUSY cases is now handling only the pending association (ifmgd->assoc_data) while all pending authentication (ifmgd->auth_data) cases are allowed to proceed to allow user space to start a new connection attempt from scratch even if the previously requested authentication is still waiting completion. This is needed to avoid making SAE error cases with retries take excessive amount of time with no means for the user space to stop that (apart from setting the netdev down). As an extra bonus, the end of ieee80211_rx_mgmt_auth() can be cleaned up to avoid the extra copy of the cfg80211_rx_mlme_mgmt() call for ongoing SAE authentication since the new ieee80211_mark_sta_auth() helper function can handle both completion of authentication and updates to the STA entry under the same condition and there is no need to return from the function between those operations. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: Move ieee80211_mgd_auth() EBUSY check to be before allocationJouni Malinen
This makes it easier to conditionally replace full allocation of auth_data to use reallocation for the case of continuing SAE authentication. Furthermore, there was not really any point in having this check done so late in the function after having already completed number of steps that cannot be used anyway in the error case. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: Helper function for marking STA authenticatedJouni Malinen
Authentication exchange can be completed in both TX and RX paths for SAE, so move this common functionality into a helper function to avoid having to implement practically the same operations in two places when extending SAE implementation in the following commits. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: rc80211_minstrel: remove variance / stddev calculationFelix Fietkau
When there are few packets (e.g. for sampling attempts), the exponentially weighted variance is usually vastly overestimated, making the resulting data essentially useless. As far as I know, there has not been any practical use for this, so let's not waste any cycles on it. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: do not sample rates 3 times slower than max_prob_rateFelix Fietkau
These rates are highly unlikely to be used quickly, even if the link deteriorates rapidly. This improves throughput in cases where CCK rates are not reliable enough to be skipped entirely during sampling. Sampling these rates regularly can cost a lot of airtime. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix sampling/reporting of CCK rates in HT modeFelix Fietkau
Long/short preamble selection cannot be sampled separately, since it depends on the BSS state. Because of that, sampling attempts to currently not used preamble modes are not counted in the statistics, which leads to CCK rates being sampled too often. Fix statistics accounting for long/short preamble by increasing the index where necessary. Fix excessive CCK rate sampling by dropping unsupported sample attempts. This improves throughput on 2.4 GHz channels Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix CCK rate group streams valueFelix Fietkau
Fixes a harmless underflow issue when CCK rates are actively being used Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: fix using short preamble CCK rates on HT clientsFelix Fietkau
mi->supported[MINSTREL_CCK_GROUP] needs to be updated short preamble rates need to be marked as supported regardless of whether it's currently enabled. Its state can change at any time without a rate_update call. Fixes: 782dda00ab8e ("mac80211: minstrel_ht: move short preamble check out of get_rate") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-11mac80211: minstrel: reduce minstrel_mcs_groups sizeFelix Fietkau
By storing a shift value for all duration values of a group, we can reduce precision by a neglegible amount to make it fit into a u16 value. This improves cache footprint and reduces size: Before: text data bss dec hex filename 10024 116 0 10140 279c rc80211_minstrel_ht.o After: text data bss dec hex filename 9368 116 0 9484 250c rc80211_minstrel_ht.o Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>