summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-06-29cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC headerFelix Fietkau
The PDU length of incoming LLC frames is set to the total skb payload size in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly includes the length of the IEEE 802.11 header. The resulting LLC frame header has a too large PDU length, causing the llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming skb, effectively breaking STP. Solve the problem by properly substracting the IEEE 802.11 frame header size from the PDU length, allowing the LLC processor to pick up the incoming control messages. Special thanks to Gerry Rozema for tracking down the regression and proposing a suitable patch. Fixes: 2d1c304cb2d5 ("cfg80211: add function for 802.3 conversion with separate output buffer") Cc: stable@vger.kernel.org Reported-by: Gerry Rozema <gerryr@rozeware.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-29net: bridge: fix vlan stats continue counterNikolay Aleksandrov
I made a dumb off-by-one mistake when I added the vlan stats counter dumping code. The increment should happen before the check, not after otherwise we miss one entry when we continue dumping. Fixes: a60c090361ea ("bridge: netlink: export per-vlan stats") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29tcp: do not send too big packets at retransmit timeEric Dumazet
Arjun reported a bug in TCP stack and bisected it to a recent commit. In case where we process SACK, we can coalesce multiple skbs into fat ones (tcp_shift_skb_data()), to lower write queue overhead, because we do not expect to retransmit these packets. However, SACK reneging can happen, forcing the sender to retransmit all these packets. If skb->len is above 64KB, we then send buggy IP packets that could hang TSO engine on cxgb4. Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs so that we cook packets of optimal size vs TCP/pacing. Thanks to Arjun for reporting the bug and running the tests ! Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Arjun V <arjun@chelsio.com> Tested-by: Arjun V <arjun@chelsio.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29tipc: rename udp_port in struct udp_media_addrRichard Alpe
Context implies that port in struct "udp_media_addr" is referring to a UDP port. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29tipc: honor msg2addr return valueRichard Alpe
The UDP msg2addr function tipc_udp_msg2addr() can return -EINVAL which prior to this patch was unhanded in the caller. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29net: the space is required before the open parenthesis '('Wei Tang
The space is missing before the open parenthesis '(', and this will introduce much more noise when checking patch around. Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29batman-adv: Clean up untagged vlan when destroying via rtnl-linkSven Eckelmann
The untagged vlan object is only destroyed when the interface is removed via the legacy sysfs interface. But it also has to be destroyed when the standard rtnl-link interface is used. Fixes: 5d2c05b21337 ("batman-adv: add per VLAN interface attribute framework") Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29batman-adv: Fix ICMP RR ethernet access after skb_linearizeSven Eckelmann
The skb_linearize may reallocate the skb. This makes the calculated pointer for ethhdr invalid. But it the pointer is used later to fill in the RR field of the batadv_icmp_packet_rr packet. Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the pointer and avoid the invalid read. Fixes: da6b8c20a5b8 ("batman-adv: generalize batman-adv icmp packet handling") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29batman-adv: Fix double-put of vlan objectBen Hutchings
Each batadv_tt_local_entry hold a single reference to a batadv_softif_vlan. In case a new entry cannot be added to the hash table, the error path puts the reference, but the reference will also now be dropped by batadv_tt_local_entry_release(). Fixes: a33d970d0b54 ("batman-adv: Fix reference counting of vlan object for tt_local_entry") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29batman-adv: Fix use-after-free/double-free of tt_req_nodeSven Eckelmann
The tt_req_node is added and removed from a list inside a spinlock. But the locking is sometimes removed even when the object is still referenced and will be used later via this reference. For example batadv_send_tt_request can create a new tt_req_node (including add to a list) and later re-acquires the lock to remove it from the list and to free it. But at this time another context could have already removed this tt_req_node from the list and freed it. CPU#0 batadv_batman_skb_recv from net_device 0 -> batadv_iv_ogm_receive -> batadv_iv_ogm_process -> batadv_iv_ogm_process_per_outif -> batadv_tvlv_ogm_receive -> batadv_tvlv_ogm_receive -> batadv_tvlv_containers_process -> batadv_tvlv_call_handler -> batadv_tt_tvlv_ogm_handler_v1 -> batadv_tt_update_orig -> batadv_send_tt_request -> batadv_tt_req_node_new spin_lock(...) allocates new tt_req_node and adds it to list spin_unlock(...) return tt_req_node CPU#1 batadv_batman_skb_recv from net_device 1 -> batadv_recv_unicast_tvlv -> batadv_tvlv_containers_process -> batadv_tvlv_call_handler -> batadv_tt_tvlv_unicast_handler_v1 -> batadv_handle_tt_response spin_lock(...) tt_req_node gets removed from list and is freed spin_unlock(...) CPU#0 <- returned to batadv_send_tt_request spin_lock(...) tt_req_node gets removed from list and is freed MEMORY CORRUPTION/SEGFAULT/... spin_unlock(...) This can only be solved via reference counting to allow multiple contexts to handle the list manipulation while making sure that only the last context holding a reference will free the object. Fixes: a73105b8d4c7 ("batman-adv: improved client announcement mechanism") Signed-off-by: Sven Eckelmann <sven@narfation.org> Tested-by: Martin Weinelt <martin@darmstadt.freifunk.net> Tested-by: Amadeus Alfa <amadeus@chemnitz.freifunk.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29batman-adv: replace WARN with rate limited output on non-existing VLANSimon Wunderlich
If a VLAN tagged frame is received and the corresponding VLAN is not configured on the soft interface, it will splat a WARN on every packet received. This is a quite annoying behaviour for some scenarios, e.g. if bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames from Ethernet coming in without having any VLAN configuration on bat0. The code should probably create vlan objects on the fly and transparently transport these VLAN-tagged Ethernet frames, but until this is done, at least the WARN splat should be replaced by a rate limited output. Fixes: 354136bcc3c4 ("batman-adv: fix kernel crash due to missing NULL checks") Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28Bridge: Fix ipv6 mc snooping if bridge has no ipv6 addressdaniel
The bridge is falsly dropping ipv6 mulitcast packets if there is: 1. No ipv6 address assigned on the brigde. 2. No external mld querier present. 3. The internal querier enabled. When the bridge fails to build mld queries, because it has no ipv6 address, it slilently returns, but keeps the local querier enabled. This specific case causes confusing packet loss. Ipv6 multicast snooping can only work if: a) An external querier is present OR b) The bridge has an ipv6 address an is capable of sending own queries Otherwise it has to forward/flood the ipv6 multicast traffic, because snooping cannot work. This patch fixes the issue by adding a flag to the bridge struct that indicates that there is currently no ipv6 address assinged to the bridge and returns a false state for the local querier in __br_multicast_querier_exists(). Special thanks to Linus Lüssing. Fixes: d1d81d4c3dd8 ("bridge: check return value of ipv6_dev_get_saddr()") Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28mac80211: Fix mesh estab_plinks counting in STA removal caseJouni Malinen
If a user space program (e.g., wpa_supplicant) deletes a STA entry that is currently in NL80211_PLINK_ESTAB state, the number of established plinks counter was not decremented and this could result in rejecting new plink establishment before really hitting the real maximum plink limit. For !user_mpm case, this decrementation is handled by mesh_plink_deactive(). Fix this by decrementing estab_plinks on STA deletion (mesh_sta_cleanup() gets called from there) so that the counter has a correct value and the Beacon frame advertisement in Mesh Configuration element shows the proper value for capability to accept additional peers. Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-28caif: Remove unneeded header fileAmitoj Kaur Chawla
Drop redundant include of moduleparam.h The Coccinelle semantic patch used to make this change is as follows: @ includesmodule @ @@ #include <linux/module.h> @ depends on includesmodule @ @@ - #include <linux/moduleparam.h> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28net: diag: Add support to filter on device indexDavid Ahern
Add support to inet_diag facility to filter sockets based on device index. If an interface index is in the filter only sockets bound to that index (sk_bound_dev_if) are returned. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28ipmr/ip6mr: Initialize the last assert time of mfc entries.Tom Goff
This fixes wrong-interface signaling on 32-bit platforms for entries created when jiffies > 2^31 + MFC_ASSERT_THRESH. Signed-off-by: Tom Goff <thomas.goff@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-28s390/iucv: use basic blocks for iucv inline assembliesHeiko Carstens
Use only simple inline assemblies which consist of a single basic block if the register asm construct is being used. Otherwise gcc would generate broken code if the compiler option --sanitize-coverage=trace-pc would be used. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-27netlabel: Implement CALIPSO config functions for SMACK.Huw Davies
SMACK uses similar functions to control CIPSO, these are the equivalent functions for CALIPSO and follow exactly the same semantics. int netlbl_cfg_calipso_add(struct calipso_doi *doi_def, struct netlbl_audit *audit_info) Adds a CALIPSO doi. void netlbl_cfg_calipso_del(u32 doi, struct netlbl_audit *audit_info) Removes a CALIPSO doi. int netlbl_cfg_calipso_map_add(u32 doi, const char *domain, const struct in6_addr *addr, const struct in6_addr *mask, struct netlbl_audit *audit_info) Creates a mapping between a domain and a CALIPSO doi. If addr and mask are non-NULL this creates an address-selector type mapping. This also extends netlbl_cfg_map_del() to remove IPv6 address-selector mappings. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Add a label cache.Huw Davies
This works in exactly the same way as the CIPSO label cache. The idea is to allow the lsm to cache the result of a secattr lookup so that it doesn't need to perform the lookup for every skbuff. It introduces two sysctl controls: calipso_cache_enable - enables/disables the cache. calipso_cache_bucket_size - sets the size of a cache bucket. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Add validation of CALIPSO option.Huw Davies
Lengths, checksum and the DOI are checked. Checking of the level and categories are left for the socket layer. CRC validation is performed in the calipso module to avoid unconditionally linking crc_ccitt() into ipv6. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Pass a family parameter to netlbl_skbuff_err().Huw Davies
This makes it possible to route the error to the appropriate labelling engine. CALIPSO is far less verbose than CIPSO when encountering a bogus packet, so there is no need for a CALIPSO error handler. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Allow the lsm to label the skbuff directly.Huw Davies
In some cases, the lsm needs to add the label to the skbuff directly. A NF_INET_LOCAL_OUT IPv6 hook is added to selinux to match the IPv4 behaviour. This allows selinux to label the skbuffs that it requires. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27ipv6: constify the skb pointer of ipv6_find_tlv().Huw Davies
Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Allow request sockets to be relabelled by the lsm.Huw Davies
Request sockets need to have a label that takes into account the incoming connection as well as their parent's label. This is used for the outgoing SYN-ACK and for their child full-socket. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27ipv6: Allow request socks to contain IPv6 options.Huw Davies
If set, these will take precedence over the parent's options during both sending and child creation. If they're not set, the parent's options (if any) will be used. This is to allow the security_inet_conn_request() hook to modify the IPv6 options in just the same way that it already may do for IPv4. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27calipso: Set the calipso socket label to match the secattr.Huw Davies
CALIPSO is a hop-by-hop IPv6 option. A lot of this patch is based on the equivalent CISPO code. The main difference is due to manipulating the options in the hop-by-hop header. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Move bitmap manipulation functions to the NetLabel core.Huw Davies
This is to allow the CALIPSO labelling engine to use these. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27ipv6: Add ipv6_renew_options_kern() that accepts a kernel mem pointer.Huw Davies
The functionality is equivalent to ipv6_renew_options() except that the newopt pointer is in kernel, not user, memory The kernel memory implementation will be used by the CALIPSO network labelling engine, which needs to be able to set IPv6 hop-by-hop options. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for removing a CALIPSO DOI.Huw Davies
Remove a specified DOI through the NLBL_CALIPSO_C_REMOVE command. It requires the attribute: NLBL_CALIPSO_A_DOI. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for creating a CALIPSO protocol domain mapping.Huw Davies
This extends the NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF commands to accept CALIPSO protocol DOIs. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for enumerating the CALIPSO DOI list.Huw Davies
Enumerate the DOI list through the NLBL_CALIPSO_C_LISTALL command. It takes no attributes. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add support for querying a CALIPSO DOI.Huw Davies
Query a specified DOI through the NLBL_CALIPSO_C_LIST command. It requires the attribute: NLBL_CALIPSO_A_DOI. The reply will contain: NLBL_CALIPSO_A_MTYPE Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Initial support for the CALIPSO netlink protocol.Huw Davies
CALIPSO is a packet labelling protocol for IPv6 which is very similar to CIPSO. It is specified in RFC 5570. Much of the code is based on the current CIPSO code. This adds support for adding passthrough-type CALIPSO DOIs through the NLBL_CALIPSO_C_ADD command. It requires attributes: NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS. NLBL_CALIPSO_A_DOI. In passthrough mode the CALIPSO engine will map MLS secattr levels and categories directly to the packet label. At this stage, the major difference between this and the CIPSO code is that IPv6 may be compiled as a module. To allow for this the CALIPSO functions are registered at module init time. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Add an address family to domain hash entries.Huw Davies
The reason is to allow different labelling protocols for different address families with the same domain. This requires the addition of an address family attribute in the netlink communication protocol. It is used in several messages: NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF take it as an optional attribute for the unlabelled protocol. It may be one of AF_INET, AF_INET6 or AF_UNSPEC (to specify both address families). If it is missing, it defaults to AF_UNSPEC. NLBL_MGMT_C_LISTALL and NLBL_MGMT_C_LISTDEF return it as part of the enumeration of each item. Addtionally, it may be sent to LISTDEF to specify which address family to return. Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27netlabel: Mark rcu pointers with __rcu.Huw Davies
This fixes sparse errors of the form: incompatible types in comparison expression (different address spaces) Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-06-27vsock: make listener child lock ordering explicitStefan Hajnoczi
There are several places where the listener and pending or accept queue child sockets are accessed at the same time. Lockdep is unhappy that two locks from the same class are held. Tell lockdep that it is safe and document the lock ordering. Originally Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> sent a similar patch asking whether this is safe. I have audited the code and also covered the vsock_pending_work() function. Suggested-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-27ipv6: enforce egress device match in per table nexthop lookupsPaolo Abeni
with the commit 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups"), net hop lookup is first performed on route creation in the passed-in table. However device match is not enforced in table lookup, so the found route can be later discarded due to egress device mismatch and no global lookup will be performed. This cause the following to fail: ip link add dummy1 type dummy ip link add dummy2 type dummy ip link set dummy1 up ip link set dummy2 up ip route add 2001:db8:8086::/48 dev dummy1 metric 20 ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy1 metric 20 ip route add 2001:db8:8086::/48 dev dummy2 metric 21 ip route add 2001:db8:d34d::/64 via 2001:db8:8086::2 dev dummy2 metric 21 RTNETLINK answers: No route to host This change fixes the issue enforcing device lookup in ip6_nh_lookup_table() v1->v2: updated commit message title Fixes: 8c14586fc320 ("net: ipv6: Use passed in table for nexthop lookups") Reported-and-tested-by: Beniamino Galvani <bgalvani@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-27Merge tag 'linux-can-next-for-4.8-20160623' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2016-06-17 this is a pull request of 4 patches for net-next/master. Arnd Bergmann's patch fixes a regresseion in af_can introduced in linux-can-next-for-4.8-20160617. There are two patches by Ramesh Shanmugasundaram, which add CAN-2.0 support to the rcar_canfd driver. And a patch by Ed Spiridonov that adds better error diagnoses messages to the Ed Spiridonov driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-27tipc: Use kmemdup instead of kmalloc and memcpyAmitoj Kaur Chawla
Replace calls to kmalloc followed by a memcpy with a direct call to kmemdup. The Coccinelle semantic patch used to make this change is as follows: @@ expression from,to,size,flag; statement S; @@ - to = \(kmalloc\|kzalloc\)(size,flag); + to = kmemdup(from,size,flag); if (to==NULL || ...) S - memcpy(to, from, size); Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-26Merge tag 'rxrpc-rewrite-20160622-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Get rid of conn bundle and transport structs Here's the next part of the AF_RXRPC rewrite. The primary purpose of this set is to get rid of the rxrpc_conn_bundle and rxrpc_transport structs. This simplifies things for future development of the connection handling. To this end, the following significant changes are made: (1) The rxrpc_connection struct is given pointers to the local and peer endpoints, inside the rxrpc_conn_parameters struct. Pointers to the transport's copy of these pointers are then redirected to the connection struct. (2) Exclusive connection handling is fixed. Exclusive connections should do just one call and then be retired. They are used in security negotiations and, I believe, the idea is to avoid reuse of negotiated security contexts. The current code is doing a single connection per socket and doing all the calls over that. With this change it gets a new connection for each call made. (3) A new sendmsg() control message marker is added to make individual calls operate over exclusive connections. This should be used in future in preference to the sockopt that marks a socket as "exclusive connection". (4) IDs for client connections initiated by a machine are now allocated from a global pool using the IDR facility and are unique across all client connections, no matter their destination. The IDR facility is then used to look up a connection on the connection ID alone. Other parameters are then verified afterwards. Note that the IDR facility may use a lot of memory if the IDs it holds are widely scattered. Given this, in a future commit, client connections will be retired if they are more than a certain distance from the last ID allocated. The client epoch is advanced by 1 each time the client ID counter wraps. Connections outside the current epoch will also be retired in a future commit. (5) The connection bundle concept is removed and the client connection tree is moved into the local endpoint. The queue for waiting for a call channel is moved to the rxrpc_connection struct as there can only be one connection for any particular key going to any particular peer now. (6) The rxrpc_transport struct is removed and the service connection tree is moved into the peer struct. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net_sched: generalize bulk dequeueEric Dumazet
When qdisc bulk dequeue was added in linux-3.18 (commit 5772e9a3463b "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"), it was constrained to some specific qdiscs. With some extra care, we can extend this to all qdiscs, so that typical traffic shaping solutions can benefit from small batches (8 packets in this patch). For example, HTB is often used on some multi queue device. And bonding/team are multi queue devices... Idea is to bulk-dequeue packets mapping to the same transmit queue. This brings between 35 and 80 % performance increase in HTB setup under pressure on a bonding setup : 1) NUMA node contention : 610,000 pps -> 1,110,000 pps 2) No node contention : 1,380,000 pps -> 1,930,000 pps Now we should work to add batches on the enqueue() side ;) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net_sched: sch_htb: export class backlog in dumpsEric Dumazet
We already get child qdisc qlen, we also can get its backlog so that class dumps can report it. Also replace qstats by a single drop counter, but move it in a separate cache line so that drops do not dirty useful cache lines. Tested: $ tc -s cl sh dev eth0 class htb 1:1 root leaf 3: prio 0 rate 1Gbit ceil 1Gbit burst 500000b cburst 500000b Sent 2183346912 bytes 9021815 pkt (dropped 2340774, overlimits 0 requeues 0) rate 1001Mbit 517543pps backlog 120758b 499p requeues 0 lended: 9021770 borrowed: 0 giants: 0 tokens: 9 ctokens: 9 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net_sched: fq_codel: cache skb->truesize into skb->cbEric Dumazet
Now we defer skb drops, it makes sense to keep a copy of skb->truesize in struct codel_skb_cb to avoid one cache line miss per dropped skb in fq_codel_drop(), to reduce latencies a bit further. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net_sched: drop packets after root qdisc lock is releasedEric Dumazet
Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25net: ircomm, cleanup TIOCGSERIALJiri Slaby
In ircomm_tty_get_serial_info, struct serial_struct is memset to 0 and then some members set to 0 explicitly. Remove the latter as it is obviously superfluous. And remove the retinfo check against NULL. copy_to_user will take care of that. Part of hub6 cleanup series. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Samuel Ortiz <samuel@sortiz.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-25openvswitch: Only set mark and labels with a commit flag.Jarno Rajahalme
Only set conntrack mark or labels when the commit flag is specified. This makes sure we can not set them before the connection has been persisted, as in that case the mark and labels would be lost in an event of an userspace upcall. OVS userspace already requires the commit flag to accept setting ct_mark and/or ct_labels. Validate for this in the kernel API. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-25openvswitch: Set mark and labels before confirming.Jarno Rajahalme
Set conntrack mark and labels right before committing so that the initial conntrack NEW event has the mark and labels. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-24netfilter: nf_tables: add support for inverted logic in nft_lookupArturo Borrero
Introduce a new configuration option for this expression, which allows users to invert the logic of set lookups. In _init() we will now return EINVAL if NFT_LOOKUP_F_INV is in anyway related to a map lookup. The code in the _eval() function has been untangled and updated to sopport the XOR of options, as we should consider 4 cases: * lookup false, invert false -> NFT_BREAK * lookup false, invert true -> return w/o NFT_BREAK * lookup true, invert false -> return w/o NFT_BREAK * lookup true, invert true -> NFT_BREAK Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: nf_tables: get rid of NFT_BASECHAIN_DISABLEDPablo Neira Ayuso
This flag was introduced to restore rulesets from the new netdev family, but since 5ebe0b0eec9d6f7 ("netfilter: nf_tables: destroy basechain and rules on netdevice removal") the ruleset is released once the netdev is gone. This also removes nft_register_basechain() and nft_unregister_basechain() since they have no clients anymore after this rework. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-06-24netfilter: conntrack: allow increasing bucket size via sysctl tooFlorian Westphal
No need to restrict this to module parameter. We export a copy of the real hash size -- when user alters the value we allocate the new table, copy entries etc before we update the real size to the requested one. This is also needed because the real size is used by concurrent readers and cannot be changed without synchronizing the conntrack generation seqcnt. We only allow changing this value from the initial net namespace. Tested using http-client-benchmark vs. httpterm with concurrent while true;do echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets done Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>