summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-03-02batman-adv: clarify CFG80211 dependencyArnd Bergmann
The driver calls cfg80211_get_station, which may be part of a module, so we must not enable BATMAN_ADV_BATMAN_V if BATMAN_ADV=y and CFG80211=m: net/built-in.o: In function `batadv_v_elp_get_throughput': (text+0x5c62c): undefined reference to `cfg80211_get_station' This clarifies the dependency to cover all combinations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: c833484e5f38 ("batman-adv: ELP - compute the metric based on the estimated throughput") Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02Merge tag 'mac80211-for-davem-2016-03-02' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Here are a few more fixes for the current cycle: * check GCMP encryption vs. fragmentation properly; we'd found this problem quite a while ago but waited for the 802.11 spec to be updated * fix RTS/CTS logic in minstrel_ht * fix RX of certain public action frames in AP mode * add mac80211_hwsim to MAC80211 in MAINTAINERS, this helps the kbuild robot pick up the right tree for it ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Antonio Quartulli says: ==================== batman-adv 20160229 this is our (hopefully) latest batch of patches intended for net-next. With this patchset we finally introduce B.A.T.M.A.N. V: the latest version of our routing protocol. Technical documentation describing the protocol in more detail can be found in our wiki[1][2][3][4]. For what concerns this pull request, you can find the high level description right below. [1] https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V [2] https://www.open-mesh.org/projects/batman-adv/wiki/OGMv2 [3] https://www.open-mesh.org/projects/batman-adv/wiki/ELP [4] https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V_Tests ... With this patchset we finally introduce our new routing protocol: B.A.T.M.A.N. V. Its implementation started quite some years ago, but due to the big changes being introduced it took a while to be discussed, designed, worked, re-worked, tested and debugged (well, we're never done with the latest). The entire operation has basically been a team work involving all the core contributors together with other people interested in the project. The new protocol is divided into two main subcomponents, called respectively ELP and OGMv2. The former is in charge of dealing with the neighbour discovery and link quality estimation, while the latter implements the algorithm that spreads the metrics around the network and computes optimal paths. The biggest change introduced with B.A.T.M.A.N. V is the new metric: the protocol won't rely on packet loss anymore, but it will use the estimated throughput extracted directly from the wifi driver (when available) by querying cfg80211. Batman-adv will also send some unicast probing packets when an interface is not used for payload traffic to make sure that such values are current. The new protocol can be compiled-in or not like other features we have and when selected will pull in CFG80211 as dependency for the reason described above. Thanks to the big work brought up in the past by Marek Lindner, batman-adv can easily deal several protocol implementations, therefore compiling in this new version does not exclude the older. This means that the user is offered the option to choose the protocol when creating the mesh interface (default is the old one to keep backward compatibility). Along with the protocol there are some sysfs knobs that are introduced to fine tune some of its behaviours, but users are recommended to keep the default values unless they know what they are doing. The last patch is about advertising our own patchwork platform (thanks to Sven Eckelmann for having set that up!) in the MAINTAINERS file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: pktgen: use reset to set mac headerZhang Shengju
Since offset is zero, it's not necessary to use set function. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01sch_mqprio: Fix build with older gcc.David S. Miller
CC [M] net/sched/sch_mqprio.o net/sched/sch_mqprio.c: In function ?mqprio_init?: net/sched/sch_mqprio.c:145: error: unknown field ?tc? specified in initializer net/sched/sch_mqprio.c:145: warning: missing braces around initializer net/sched/sch_mqprio.c:145: warning: (near initialization for ?tc.<anonymous>?) make[2]: *** [net/sched/sch_mqprio.o] Error 1 make[1]: *** [net/sched] Error 2 make: *** [net] Error 2 Several people reported this, surround the unnamed union member initialization with braces to fix. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: remove skb_sender_cpu_clear()WANG Cong
After commit 52bd2d62ce67 ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01sctp: sctp_remaddr_seq_show use the wrong variable to dump transport infoXin Long
Now in sctp_remaddr_seq_show(), we use variable *tsp to get the param *v. but *tsp is also used to traversal transport_addr_list, which will cover the previous value, and make sctp_transport_put work on the wrong transport. So fix it by adding a new variable to get the param *v. Fixes: fba4c330c5b9 ("sctp: hold transport before we access t->asoc in sctp proc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01sctp: lack the check for ports in sctp_v6_cmp_addrXin Long
As the member .cmp_addr of sctp_af_inet6, sctp_v6_cmp_addr should also check the port of addresses, just like sctp_v4_cmp_addr, cause it's invoked by sctp_cmp_addr_exact(). Now sctp_v6_cmp_addr just check the port when two addresses have different family, and lack the port check for two ipv6 addresses. that will make sctp_hash_cmp() cannot work well. so fix it by adding ports comparison in sctp_v6_cmp_addr(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: ipv6/l3mdev: Move host route on saved address if necessaryDavid Ahern
Commit f1705ec197e70 allows IPv6 addresses to be retained on a link down. The address can have a cached host route which can point to the wrong FIB table if the L3 enslavement is changed (e.g., route can point to local table instead of VRF table if device is added to an L3 domain). On link up check the table of the cached host route against the FIB table associated with the device and correct if needed. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: sctp: Convert log timestamps to be y2038 safeDeepa Dinamani
SCTP probe log timestamps use struct timespec which is not y2038 safe. Use struct timespec64 which is 2038 safe instead. Use monotonic time instead of real time as only time differences are logged. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: ipv4: tcp_probe: Replace timespec with timespec64Deepa Dinamani
TCP probe log timestamps use struct timespec which is not y2038 safe. Even though timespec might be good enough here as it is used to represent delta time, the plan is to get rid of all uses of timespec in the kernel. Replace with struct timespec64 which is y2038 safe. Prints still use unsigned long format and type. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: ipv4: Convert IP network timestamps to be y2038 safeDeepa Dinamani
ICMP timestamp messages and IP source route options require timestamps to be in milliseconds modulo 24 hours from midnight UT format. Add inet_current_timestamp() function to support this. The function returns the required timestamp in network byte order. Timestamp calculation is also changed to call ktime_get_real_ts64() which uses struct timespec64. struct timespec64 is y2038 safe. Previously it called getnstimeofday() which uses struct timespec. struct timespec is not y2038 safe. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Support to encoding decoding skb prio on IFE actionJamal Hadi Salim
Example usage: Set the skb priority using skbedit then allow it to be encoded sudo tc qdisc add dev $ETH root handle 1: prio sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbedit prio 17 \ action ife encode \ allow prio \ dst 02:15:15:15:15:15 Note: You dont need the skbedit action if you are already encoding the skb priority earlier. A zero skb priority will not be sent Alternative hard code static priority of decimal 33 (unlike skbedit) then mark of 0x12 every time the filter matches sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action ife encode \ type 0xDEAD \ use prio 33 \ use mark 0x12 \ dst 02:15:15:15:15:15 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Support to encoding decoding skb mark on IFE actionJamal Hadi Salim
Example usage: Set the skb using skbedit then allow it to be encoded sudo tc qdisc add dev $ETH root handle 1: prio sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action skbedit mark 17 \ action ife encode \ allow mark \ dst 02:15:15:15:15:15 Note: You dont need the skbedit action if you are already encoding the skb mark earlier. A zero skb mark, when seen, will not be encoded. Alternative hard code static mark of 0x12 every time the filter matches sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \ u32 match ip protocol 1 0xff flowid 1:2 \ action ife encode \ type 0xDEAD \ use mark 0x12 \ dst 02:15:15:15:15:15 Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01introduce IFE actionJamal Hadi Salim
This action allows for a sending side to encapsulate arbitrary metadata which is decapsulated by the receiving end. The sender runs in encoding mode and the receiver in decode mode. Both sender and receiver must specify the same ethertype. At some point we hope to have a registered ethertype and we'll then provide a default so the user doesnt have to specify it. For now we enforce the user specify it. Lets show example usage where we encode icmp from a sender towards a receiver with an skbmark of 17; both sender and receiver use ethertype of 0xdead to interop. YYYY: Lets start with Receiver-side policy config: xxx: add an ingress qdisc sudo tc qdisc add dev $ETH ingress xxx: any packets with ethertype 0xdead will be subjected to ife decoding xxx: we then restart the classification so we can match on icmp at prio 3 sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \ u32 match u32 0 0 flowid 1:1 \ action ife decode reclassify xxx: on restarting the classification from above if it was an icmp xxx: packet, then match it here and continue to the next rule at prio 4 xxx: which will match based on skb mark of 17 sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \ u32 match ip protocol 1 0xff flowid 1:1 \ action continue xxx: match on skbmark of 0x11 (decimal 17) and accept sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \ handle 0x11 fw flowid 1:1 \ action ok xxx: Lets show the decoding policy sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead xxx: filter pref 2 u32 filter pref 2 u32 fh 800: ht divisor 1 filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0) match 00000000/00000000 at 0 (success 0 ) action order 1: ife decode action reclassify index 1 ref 1 bind 1 installed 14 sec used 14 sec type: 0x0 Metadata: allow mark allow hash allow prio allow qmap Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Observe that above lists all metadatum it can decode. Typically these submodules will already be compiled into a monolithic kernel or loaded as modules YYYY: Lets show the sender side now .. xxx: Add an egress qdisc on the sender netdev sudo tc qdisc add dev $ETH root handle 1: prio xxx: xxx: Match all icmp packets to 192.168.122.237/24, then xxx: tag the packet with skb mark of decimal 17, then xxx: Encode it with: xxx: ethertype 0xdead xxx: add skb->mark to whitelist of metadatum to send xxx: rewrite target dst MAC address to 02:15:15:15:15:15 xxx: sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \ match ip dst 192.168.122.237/24 \ match ip protocol 1 0xff \ flowid 1:2 \ action skbedit mark 17 \ action ife encode \ type 0xDEAD \ allow mark \ dst 02:15:15:15:15:15 xxx: Lets show the encoding policy sudo tc -s filter ls dev $ETH parent 1: protocol ip xxx: filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 0 success 0) match c0a87aed/ffffffff at 16 (success 0 ) match 00010000/00ff0000 at 8 (success 0 ) action order 1: skbedit mark 17 index 6 ref 1 bind 1 Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: ife encode action pipe index 3 ref 1 bind 1 dst MAC: 02:15:15:15:15:15 type: 0xDEAD Metadata: allow mark Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: test by sending ping from sender to destination Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Merge tag 'mac80211-next-for-davem-2016-02-26' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Here's another round of updates for -next: * big A-MSDU RX performance improvement (avoid linearize of paged RX) * rfkill changes: cleanups, documentation, platform properties * basic PBSS support in cfg80211 * MU-MIMO action frame processing support * BlockAck reordering & duplicate detection offload support * various cleanups & little fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: add support for more router port information dumpingNikolay Aleksandrov
Allow for more multicast router port information to be dumped such as timer and type attributes. For that that purpose we need to extend the MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries recently. The new format is thus: [MDBA_ROUTER_PORT] = { <- nested attribute u32 ifindex <- router port ifindex for user-space compatibility [MDBA_ROUTER_PATTR attributes] } This way it remains compatible with older users (they'll simply retrieve the u32 in the beginning) and new users can parse the remaining attributes. It would also allow to add future extensions to the router port without breaking compatibility. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: add support for temporary port routerNikolay Aleksandrov
Add support for a temporary router port which doesn't depend only on the incoming query. It can be refreshed if set to the same value, which is a no-op for the rest. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: do nothing if port's multicast_router is set to the same valNikolay Aleksandrov
This is needed for the upcoming temporary port router. There's no point to go through the logic if the value is the same. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: mcast: use names for the different multicast_router typesNikolay Aleksandrov
Using raw values makes it difficult to extend and also understand the code, give them names and do explicit per-option manipulation in br_multicast_set_port_router. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: dsa: support VLAN filtering switchdev attrVivien Didelot
When a user explicitly requests VLAN filtering with something like: # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port attribute. Add support for it in the DSA layer with a new port_vlan_filtering function to let drivers toggle 802.1Q filtering on user demand. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01Introduce devlink infrastructureJiri Pirko
Introduce devlink infrastructure for drivers to register and expose to userspace via generic Netlink interface. There are two basic objects defined: devlink - one instance for every "parent device", for example switch ASIC devlink port - one instance for every physical port of the device. This initial portion implements basic get/dump of objects to userspace. Also, port splitter and port type setting is implemented. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01svcrdma: Use new CQ API for RPC-over-RDMA server send CQsChuck Lever
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Use new CQ API for RPC-over-RDMA server receive CQsChuck Lever
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Remove close_out exit pathChuck Lever
Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE is already set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Hook up the logic to return ERR_CHUNKChuck Lever
RFC 5666 Section 4.2 states: > When the peer detects an RPC-over-RDMA header version that it does > not support (currently this document defines only version 1), it > replies with an error code of ERR_VERS, and provides the low and > high inclusive version numbers it does, in fact, support. And: > When other decoding errors are detected in the header or chunks, > either an RPC decode error MAY be returned or the RPC/RDMA error > code ERR_CHUNK MUST be returned. The Linux NFS server does throw ERR_VERS when a client sends it a request whose rdma_version is not "one." But it does not return ERR_CHUNK when a header decoding error occurs. It just drops the request. To improve protocol extensibility, it should reject invalid values in the rdma_proc field instead of treating them all like RDMA_MSG. Otherwise clients can't detect when the server doesn't support new rdma_proc values. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Use correct XID in error repliesChuck Lever
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Make RDMA_ERROR messages workChuck Lever
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: svc_rdma_post_recv() should close connection on errorChuck Lever
Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Close connection when a send error occursChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01nfsd: Lower NFSv4.1 callback message size limitChuck Lever
The maximum size of a backchannel message on RPC-over-RDMA depends on the connection's inline threshold. Today that threshold is typically 1024 bytes, making the maximum message size 996 bytes. The Linux server's CREATE_SESSION operation checks that the size of callback Calls can be as large as 1044 bytes, to accommodate RPCSEC_GSS. Thus CREATE_SESSION fails if a client advertises the true message size maximum of 996 bytes. But the server's backchannel currently does not support RPCSEC_GSS. The actual maximum size it needs is much smaller. It is safe to reduce the limit to enable NFSv4.1 on RDMA backchannel operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Do not send Write chunk XDR pad with inline contentChuck Lever
The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Do not write xdr_buf::tail in a Write chunkChuck Lever
When the Linux NFS server writes an odd-length data item into a Write chunk, it finishes with XDR pad bytes. If the data item is smaller than the Write chunk, the pad bytes are written at the end of the data item, but still inside the chunk (ie, in the application's buffer). Since this is direct data placement, that exposes the pad bytes. XDR pad bytes are inserted in order to preserve the XDR alignment of the next XDR data item in an XDR stream. But Write chunks do not appear in the payload XDR stream, and only one data item is allowed in each chunk. Thus XDR padding is not needed in a Write chunk. With NFSv4, the Linux NFS server places the results of any operations that follow an NFSv4 READ or READLINK in the xdr_buf's tail. Those results also should never be sent as a part of a Write chunk. The current logic in send_write_chunks() appears to assume that the xdr_buf's tail contains only pad bytes (ie, NFSv3). The server should write only the contents of the xdr_buf's page list in a Write chunk. If there's more than an XDR pad in the tail, that needs to go inline or in the Reply chunk. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01svcrdma: Find client-provided write and reply chunks once per replyChuck Lever
The client provides the location of Write chunks into which the server writes bulk payload. The client provides these when the Upper Layer Protocol wants direct data placement and the Binding allows it. (For NFS, this is READ and READLINK operations). The client also provides the location of a Reply chunk into which the server writes the non-bulk part of an RPC reply. The client provides this chunk whenever it believes the reply can be larger than its receive buffers. The server then uses the presence of these chunks to determine how it will form its reply message. svc_rdma_sendto() was looking for Write and Reply chunks multiple times for every reply message. It would be more efficient to do it just once. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-01net: sched: cls_u32 add bit to specify software only rulesJohn Fastabend
In the initial implementation the only way to stop a rule from being inserted into the hardware table was via the device feature flag. However this doesn't work well when working on an end host system where packets are expect to hit both the hardware and software datapaths. For example we can imagine a rule that will match an IP address and increment a field. If we install this rule in both hardware and software we may increment the field twice. To date we have only added support for the drop action so we have been able to ignore these cases. But as we extend the action support we will hit this example plus more such cases. Arguably these are not even corner cases in many working systems these cases will be common. To avoid forcing the driver to always abort (i.e. the above example) this patch adds a flag to add a rule in software only. A careful user can use this flag to build software and hardware datapaths that work together. One example we have found particularly useful is to use hardware resources to set the skb->mark on the skb when the match may be expensive to run in software but a mark lookup in a hash table is cheap. The idea here is hardware can do in one lookup what the u32 classifier may need to traverse multiple lists and hash tables to compute. The flag is only passed down on inserts. On deletion to avoid stale references in hardware we always try to remove a rule if it exists. The flags field is part of the classifier specific options. Although it is tempting to lift this into the generic structure doing this proves difficult do to how the tc netlink attributes are implemented along with how the dump/change routines are called. There is also precedence for putting seemingly generic pieces in the specific classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So although not ideal I've left FLAGS in the u32 options as well as it simplifies the code greatly and user space has already learned how to manage these bits ala 'tc' tool. Another thing if trying to update a rule we require the flags to be unchanged. This is to force user space, software u32 and the hardware u32 to keep in sync. Thanks to Simon Horman for catching this case. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01net: sched: consolidate offload decision in cls_u32John Fastabend
The offload decision was originally very basic and tied to if the dev implemented the appropriate ndo op hook. The next step is to allow the user to more flexibly define if any paticular rule should be offloaded or not. In order to have this logic in one function lift the current check into a helper routine tc_should_offload(). Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01ovs: propagate per dp max headroom to all vportsPaolo Abeni
This patch implements bookkeeping support to compute the maximum headroom for all the devices in each datapath. When said value changes, the underlying devs are notified via the ndo_set_rx_headroom method. This also increases the internal vports xmit performance. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01bridge: notify enslaved devices of headroom changesPaolo Abeni
On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01mac80211: minstrel_ht: fix a logic error in RTS/CTS handlingFelix Fietkau
RTS/CTS needs to be enabled if the rate is a fallback rate *or* if it's a dual-stream rate and the sta is in dynamic SMPS mode. Cc: stable@vger.kernel.org Fixes: a3ebb4e1b763 ("mac80211: minstrel_ht: handle peers in dynamic SMPS") Reported-by: Matías Richart <mrichart@fing.edu.uy> Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-03-01mac80211: Fix Public Action frame RX in AP modeJouni Malinen
Public Action frames use special rules for how the BSSID field (Address 3) is set. A wildcard BSSID is used in cases where the transmitter and recipient are not members of the same BSS. As such, we need to accept Public Action frames with wildcard BSSID. Commit db8e17324553 ("mac80211: ignore frames between TDLS peers when operating as AP") added a rule that drops Action frames to TDLS-peers based on an Action frame having different DA (Address 1) and BSSID (Address 3) values. This is not correct since it misses the possibility of BSSID being a wildcard BSSID in which case the Address 1 would not necessarily match. Fix this by allowing mac80211 to accept wildcard BSSID in an Action frame when in AP mode. Fixes: db8e17324553 ("mac80211: ignore frames between TDLS peers when operating as AP") Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-03-01mac80211: check PN correctly for GCMP-encrypted fragmented MPDUsJohannes Berg
Just like for CCMP we need to check that for GCMP the fragments have PNs that increment by one; the spec was updated to fix this security issue and now has the following text: The receiver shall discard MSDUs and MMPDUs whose constituent MPDU PN values are not incrementing in steps of 1. Adapt the code for CCMP to work for GCMP as well, luckily the relevant fields already alias each other so no code duplication is needed (just check the aliasing with BUILD_BUG_ON.) Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-02-29sch_dsmark: update backlog as wellWANG Cong
Similarly, we need to update backlog too when we update qlen. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29sch_htb: update backlog as wellWANG Cong
We saw qlen!=0 but backlog==0 on our production machine: qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0) backlog 0b 72p requeues 0 The problem is we only count qlen for HTB qdisc but not backlog. We need to update backlog too when we update qlen, so that we can at least know the average packet length. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29net_sched: update hierarchical backlog tooWANG Cong
When the bottom qdisc decides to, for example, drop some packet, it calls qdisc_tree_decrease_qlen() to update the queue length for all its ancestors, we need to update the backlog too to keep the stats on root qdisc accurate. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29net_sched: introduce qdisc_replace() helperWANG Cong
Remove nearly duplicated code and prepare for the following patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-29netfilter: xt_osf: remove unused variableSudip Mukherjee
While building with W=1 we got the warning: net/netfilter/xt_osf.c:265:9: warning: variable 'loop_cont' set but not used The local variable loop_cont was only initialized and then assigned a value but was never used or checked after that. While removing the variable, the case of OSFOPT_TS was not removed so that it will serve as a reminder to us that we can do something in that particular case. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-02-29netfilter: meta: add PRANDOM supportFlorian Westphal
Can be used to randomly match packets e.g. for statistic traffic sampling. See commit 3ad0040573b0c00f8848 ("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs") for more info why this doesn't use prandom_u32 directly. Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL for prandom_seed_full_state too. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-02-29netfilter: nfnetlink_acct: validate NFACCT_FILTER parametersPhil Turnbull
nfacct_filter_alloc doesn't validate the NFACCT_FILTER_MASK and NFACCT_FILTER_VALUE parameters which can trigger a NULL pointer dereference. CAP_NET_ADMIN is required to trigger the bug. Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-02-29batman-adv: Start new development cycleSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-29batman-adv: B.A.T.M.A.N. V - implement bat_neigh_print APILinus Luessing
Lists all neighbours detected by the Echo Locating Protocol (ELP) and their throughput metric. Initially Developed by Linus during a 6 months trainee study period in Ascom (Switzerland) AG. Signed-off-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>