summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-11-07netfilter: nf_tables: performance set policy skips size description in selectionPablo Neira Ayuso
Use the complexity and space notations if policy is performance, this results in placing the bitmap set representation over the hashtable for key <= 16 for better performance as we discussed during the last NFWS in Faro, Portugal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-07netfilter: conntrack: use power efficient workqueueVincent Guittot
conntrack uses the bounded system_long_wq workqueue for its works that don't have to run on the cpu they have been queued. Using bounded workqueue prevents the scheduler to make smart decision about the best place to schedule the work. This patch replaces system_long_wq with system_power_efficient_wq. the work stays bounded to a cpu by default unless the CONFIG_WQ_POWER_EFFICIENT is enable. In the latter case, the work can be scheduled on the best cpu from a power or a performance point of view. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: conntrack: move nf_ct_netns_{get,put}() to corePablo Neira Ayuso
So we can call this from other expression that need conntrack in place to work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2017-11-06netfilter: conntrack: don't cache nlattr_tuple_size result in nla_sizeFlorian Westphal
We currently call ->nlattr_tuple_size() once at register time and cache result in l4proto->nla_size. nla_size is the only member that is written to, avoiding this would allow to make l4proto trackers const. We can use ->nlattr_tuple_size() at run time, and cache result in the individual trackers instead. This is an intermediate step, next patch removes nlattr_size() callback and computes size at compile time, then removes nla_size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: nft_hash: fix nft_hash_deactivateFlorian Westphal
Jindřich Makovička says: The logical OR looks fishy to me. Shouldn't be && there instead? Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1199 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: xt_connlimit: remove mask argumentFlorian Westphal
Instead of passing mask to all the helpers, just fixup the search key early. After rbtree conversion, each rbtree node stores connections of same 'addr & mask', so no need to pass the mask too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ebtables: clean up initialization of bufColin Ian King
buf is initialized to buf_start and then set on the next statement to buf_start + offsets[i]. Clean this up to just initialize buf to buf_start + offsets[i] to clean up the clang build warning: "Value stored to 'buf' during its initialization is never read" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Fix inappropriate output of procfsKUWAZAWA Takuya
Information about ipvs in different network namespace can be seen via procfs. How to reproduce: # ip netns add ns01 # ip netns add ns02 # ip netns exec ns01 ip a add dev lo 127.0.0.1/8 # ip netns exec ns02 ip a add dev lo 127.0.0.1/8 # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80 # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80 The ipvsadm displays information about its own network namespace only. # ip netns exec ns01 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.1:80 wlc # ip netns exec ns02 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.2:80 wlc But I can see information about other network namespace via procfs. # ip netns exec ns01 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010101:0050 wlc TCP 0A010102:0050 wlc # ip netns exec ns02 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010102:0050 wlc Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Use %pS printk format for direct addressesHelge Deller
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06NFC: Convert timers to use timer_setup()Allen Pais
Switch to using the new timer_setup() and from_timer() for net/nfc/* Signed-off-by: Allen Pais <allen.pais@oracle.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-11-06NFC: fix device-allocation error returnJohan Hovold
A recent change fixing NFC device allocation itself introduced an error-handling bug by returning an error pointer in case device-id allocation failed. This is clearly broken as the callers still expected NULL to be returned on errors as detected by Dan's static checker. Fix this up by returning NULL in the event that we've run out of memory when allocating a new device id. Note that the offending commit is marked for stable (3.8) so this fix needs to be backported along with it. Fixes: 20777bc57c34 ("NFC: fix broken device allocation") Cc: stable <stable@vger.kernel.org> # 3.8 Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2017-11-05tcp: fix DSACK-based undo on non-duplicate ACKPriyaranjan Jha
Fixes DSACK-based undo when sender is in Open State and an ACK advances snd_una. Example scenario: - Sender goes into recovery and makes some spurious rtx. - It comes out of recovery and enters into open state. - It sends some more packets, let's say 4. - The receiver sends an ACK for the first two, but this ACK is lost. - The sender receives ack for first two, and DSACK for previous spurious rtx. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yousuk Seung <ysseung@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: higher throughput under reordering with adaptive RACK reordering wndPriyaranjan Jha
Currently TCP RACK loss detection does not work well if packets are being reordered beyond its static reordering window (min_rtt/4).Under such reordering it may falsely trigger loss recoveries and reduce TCP throughput significantly. This patch improves that by increasing and reducing the reordering window based on DSACK, which is now supported in major TCP implementations. It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries. - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded by srtt), since there is possibility that spurious retransmission was due to reordering delay longer than reo_wnd. - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16) no. of successful recoveries (accounts for full DSACK-based loss recovery undo). After that, reset it to default (min_rtt/4). - At max, reo_wnd is incremented only once per rtt. So that the new DSACK on which we are reacting, is due to the spurious retx (approx) after the reo_wnd has been updated last time. - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than absolute value to account for change in rtt. In our internal testing, we observed significant increase in throughput, in scenarios where reordering exceeds min_rtt/4 (previous static value). Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: resolve tagging protocol at parse timeVivien Didelot
Extend the dsa_port_parse_cpu() function to resolve the tagging protocol at port parsing time, instead of waiting for the whole tree to be complete. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: add one port parsing function per typeVivien Didelot
Add dsa_port_parse_user, dsa_port_parse_dsa and dsa_port_parse_cpu functions to factorize the code shared by both OF and pdata parsing. They don't do much for the moment but will be extended later to support tagging protocol resolution for example. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: only check presence of link propertyVivien Didelot
When parsing a port, simply use of_property_read_bool which checks the presence of a given property, instead of parsing the link phandle. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: rework switch parsingVivien Didelot
When parsing a switch, we have to identify to which tree it belongs and parse its ports. Provide two functions to separate the OF and platform data specific paths. Also use the of_property_read_variable_u32_array function to parse the OF member array instead of calling of_property_read_u32_index twice. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: get tree before parsing portsVivien Didelot
We will need a reference to the dsa_switch_tree when parsing a CPU port, so fetch it right after parsing the member and before parsing ports. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: rework switch addition and removalVivien Didelot
This patch removes the unnecessary index argument from the dsa_dst_add_ds and dsa_dst_del_ds functions and renames them to dsa_tree_add_switch and dsa_tree_remove_switch respectively. In addition to a more explicit scope, we now check the presence of an existing switch with the same index directly within dsa_tree_add_switch. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: provide a find or new tree helperVivien Didelot
Rename dsa_get_dst to dsa_tree_find since it doesn't increment the reference counter, rename dsa_add_dst to dsa_tree_alloc for symmetry with dsa_tree_free, and provide a convenient dsa_tree_touch function to find or allocate a new tree. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: get and put tree reference countingVivien Didelot
Provide convenient dsa_tree_get and dsa_tree_put functions scoping a DSA tree used to increment and decrement its reference counter, instead of poking directly its kref structure. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: simplify tree reference countingVivien Didelot
DSA trees have a refcount used to automatically free the dsa_switch_tree structure once there is no switch devices inside of it. The refcount is incremented when a switch is added to the tree, and decremented when it is removed from it. But because of kref_init, the refcount is also incremented at initialization, and when looking up the tree from the list for symmetry. Thus the current code stores the number of switches plus one, and makes the switch registration more complex. To simplify the switch registration function, we reset the refcount to zero after initialization and don't increment it when looking up a tree. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: make tree index unsignedVivien Didelot
Similarly to a DSA switch and port, rename the tree index from "tree" to "index" and make it an unsigned int because it isn't supposed to be less than 0. u32 is an OF specific data used to retrieve the value and has no need to be propagated up to the tree index. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: remove old offload/analyzerJakub Kicinski
Thanks to the ability to load a program for a specific device, running verifier twice is no longer needed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05cls_bpf: allow attaching programs loaded for specific deviceJakub Kicinski
If TC program is loaded with skip_sw flag, we should allow the device-specific programs to be accepted. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05xdp: allow attaching programs loaded for specific deviceJakub Kicinski
Pass the netdev pointer to bpf_prog_get_type(). This way BPF code can decide whether the device matches what the code was loaded/translated for. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: bpf: rename ndo_xdp to ndo_bpfJakub Kicinski
ndo_xdp is a control path callback for setting up XDP in the driver. We can reuse it for other forms of communication between the eBPF stack and the drivers. Rename the callback and associated structures and definitions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05l2tp: don't use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6Guillaume Nault
Using l2tp_tunnel_find() in l2tp_ip_recv() is wrong for two reasons: * It doesn't take a reference on the returned tunnel, which makes the call racy wrt. concurrent tunnel deletion. * The lookup is only based on the tunnel identifier, so it can return a tunnel that doesn't match the packet's addresses or protocol. For example, a packet sent to an L2TPv3 over IPv6 tunnel can be delivered to an L2TPv2 over UDPv4 tunnel. This is worse than a simple cross-talk: when delivering the packet to an L2TP over UDP tunnel, the corresponding socket is UDP, where ->sk_backlog_rcv() is NULL. Calling sk_receive_skb() will then crash the kernel by trying to execute this callback. And l2tp_tunnel_find() isn't even needed here. __l2tp_ip_bind_lookup() properly checks the socket binding and connection settings. It was used as a fallback mechanism for finding tunnels that didn't have their data path registered yet. But it's not limited to this case and can be used to replace l2tp_tunnel_find() in the general case. Fix l2tp_ip6 in the same way. Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: do not clear again skb->csum in tcp_init_nondata_skb()Eric Dumazet
tcp_init_nondata_skb() is fed with freshly allocated skbs. They already have a cleared csum field, no need to clear it again. This is based on Neal review on commit 3b11775033dc ("tcp: do not mangle skb->cb[] in tcp_make_synack()"), noticing I did not clear skb->csum. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: tcp_mtu_probing() cleanupEric Dumazet
Reduce one indentation level to make code more readable. tcp_sync_mss() can be factorized. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05rtnetlink: use netnsid to query interfaceJiri Benc
Currently, when an application gets netnsid from the kernel (for example as the result of RTM_GETLINK call on one end of the veth pair), it's not much useful. There's no reliable way to get to the netns fd from the netnsid, nor does any kernel API accept netnsid. Extend the RTM_GETLINK call to also accept netnsid. It will operate on the netns with the given netnsid in such case. Of course, the calling process needs to have enough capabilities in the target name space; for now, require CAP_NET_ADMIN. This can be relaxed in the future. To signal to the calling process that the kernel understood the new IFLA_IF_NETNSID attribute in the query, it will include it in the response. This is needed to detect older kernels, as they will just ignore IFLA_IF_NETNSID and query in the current name space. This patch implemetns IFLA_IF_NETNSID only for get and dump. For set operations, this can be extended later. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05openvswitch: reliable interface indentification in port dumpsJiri Benc
This patch allows reliable identification of netdevice interfaces connected to openvswitch bridges. In particular, user space queries the netdev interfaces belonging to the ports for statistics, up/down state, etc. Datapath dump needs to provide enough information for the user space to be able to do that. Currently, only interface names are returned. This is not sufficient, as openvswitch allows its ports to be in different name spaces and the interface name is valid only in its name space. What is needed and generally used in other netlink APIs, is the pair ifindex+netnsid. The solution is addition of the ifindex+netnsid pair (or only ifindex if in the same name space) to vport get/dump operation. On request side, ideally the ifindex+netnsid pair could be used to get/set/del the corresponding vport. This is not implemented by this patch and can be added later if needed. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: export peernet2id_allocJiri Benc
It will be used by openvswitch. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05ipv6: remove IN6_ADDR_HSIZE from addrconf.hEric Dumazet
IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid confusion. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05pktgen: do not abuse IN6_ADDR_HSIZEEric Dumazet
pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an IPv6 address. Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old bug is hitting us. Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04net: sched: cls_u32: use bitwise & rather than logical && on n->flagsColin Ian King
Currently n->flags is being operated on by a logical && operator rather than a bitwise & operator. This looks incorrect as these should be bit flag operations. Fix this. Detected by CoverityScan, CID#1460398 ("Logical vs. bitwise operator") Fixes: 245dc5121a9b ("net: sched: cls_u32: call block callbacks for offload") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04netfilter/ipvs: clear ipvs_property flag when SKB net namespace changedYe Yin
When run ipvs in two different network namespace at the same host, and one ipvs transport network traffic to the other network namespace ipvs. 'ipvs_property' flag will make the second ipvs take no effect. So we should clear 'ipvs_property' when SKB network namespace changed. Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()") Signed-off-by: Ye Yin <hustcat@gmail.com> Signed-off-by: Wei Zhou <chouryzhou@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04tcp_nv: use do_div() instead of expensive div64_u64()Konstantin Khlebnikov
Average RTT is 32-bit thus full 64-bit division is redundant. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04add support of IFF_XMIT_DST_RELEASE bit in vlanVadim Fedorenko
Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE flag on the vlan netdev". But the last comment was "does not support properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE bit changes, we want to update all the vlans at the same time )" I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in bonding/team. Both bonding and team call netdev_change_features() after recalculation of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only thing needed to support is to recheck this bit in vlan_transfer_features(). Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vadim Fedorenko <vfedorenko@yandex-team.ru> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Files removed in 'net-next' had their license header updated in 'net'. We take the remove from 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Hopefully this is the last batch of networking fixes for 4.14 Fingers crossed... 1) Fix stmmac to use the proper sized OF property read, from Bhadram Varka. 2) Fix use after free in net scheduler tc action code, from Cong Wang. 3) Fix SKB control block mangling in tcp_make_synack(). 4) Use proper locking in fib_dump_info(), from Florian Westphal. 5) Fix IPG encodings in systemport driver, from Florian Fainelli. 6) Fix division by zero in NV TCP congestion control module, from Konstantin Khlebnikov. 7) Fix use after free in nf_reject_ipv4, from Tejaswi Tanikella" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: systemport: Correct IPG length settings tcp: do not mangle skb->cb[] in tcp_make_synack() fib: fib_dump_info can no longer use __in_dev_get_rtnl stmmac: use of_property_read_u32 instead of read_u8 net_sched: hold netns refcnt for each action net_sched: acquire RTNL in tc_action_net_exit() net: vrf: correct FRA_L3MDEV encode type tcp_nv: fix division by zero in tcpnv_acked() netfilter: nf_reject_ipv4: Fix use-after-free in send_reset netfilter: nft_set_hash: disable fast_ops for 2-len keys
2017-11-03net: use -ENOSPC for transient busy indicationGilad Ben-Yossef
Replace -EBUSY with -ENOSPC when handling transient busy indication in the absence of backlog. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-11-03net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpathJiri Pirko
In sch_handle_egress and sch_handle_ingress tp->q is used only in order to update stats. So stats and filter list are the only things that are needed in clsact qdisc fastpath processing. Introduce new mini_Qdisc struct to hold those items. Also, introduce a helper to swap the mini_Qdisc structures in case filter list head changes. This removes need for tp->q usage without added overhead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: sched: introduce chain_head_change callbackJiri Pirko
Add a callback that is to be called whenever head of the chain changes. Also provide a callback for the default case when the caller gets a block using non-extended getter. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net_sched: check NULL in tcf_block_put()Cong Wang
Callers of tcf_block_put() could pass NULL so we can't use block->q before checking if block is NULL or not. tcf_block_put_ext() callers are fine, it is always non-NULL. Fixes: 8c4083b30e56 ("net: sched: add block bind/unbind notif. and extended block_get/put") Reported-by: Dave Taht <dave.taht@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03xfrm: Fix stack-out-of-bounds read in xfrm_state_find.Steffen Klassert
When we do tunnel or beet mode, we pass saddr and daddr from the template to xfrm_state_find(), this is ok. On transport mode, we pass the addresses from the flowi, assuming that the IP addresses (and address family) don't change during transformation. This assumption is wrong in the IPv4 mapped IPv6 case, packet is IPv4 and template is IPv6. Fix this by using the addresses from the template unconditionally. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-11-03xfrm: do unconditional template resolution before pcpu cache checkFlorian Westphal
Stephen Smalley says: Since 4.14-rc1, the selinux-testsuite has been encountering sporadic failures during testing of labeled IPSEC. git bisect pointed to commit ec30d ("xfrm: add xdst pcpu cache"). The xdst pcpu cache is only checking that the policies are the same, but does not validate that the policy, state, and flow match with respect to security context labeling. As a result, the wrong SA could be used and the receiver could end up performing permission checking and providing SO_PEERSEC or SCM_SECURITY values for the wrong security context. This fix makes it so that we always do the template resolution, and then checks that the found states match those in the pcpu bundle. This has the disadvantage of doing a bit more work (lookup in state hash table) if we can reuse the xdst entry (we only avoid xdst alloc/free) but we don't add a lot of extra work in case we can't reuse. xfrm_pol_dead() check is removed, reasoning is that xfrm_tmpl_resolve does all needed checks. Cc: Paul Moore <paul@paul-moore.com> Fixes: ec30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache") Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-11-03tcp: tcp_fragment() should not assume rtx skbsEric Dumazet
While stress testing MTU probing, we had crashes in list_del() that we root-caused to the fact that tcp_fragment() is unconditionally inserting the freshly allocated skb into tsorted_sent_queue list. But this list is supposed to contain skbs that were sent. This was mostly harmless until MTU probing was enabled. Fortunately we can use the tcp_queue enum added later (but in same linux version) for rtx-rb-tree to fix the bug. Fixes: e2080072ed2d ("tcp: new list for sent but unacked skbs for RACK recovery") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Priyaranjan Jha <priyarjha@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: sched: cls_bpf: use bitwise & rather than logical && on gen_flagsColin Ian King
Currently gen_flags is being operated on by a logical && operator rather than a bitwise & operator. This looks incorrect as these should be bit flag operations. Fix this. Detected by CoverityScan, CID#1460305 ("Logical vs. bitwise operator") Fixes: 3f7889c4c79b ("net: sched: cls_bpf: call block callbacks for offload) Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03tcp: fix a lockdep issue in tcp_fastopen_reset_cipher()Eric Dumazet
icsk_accept_queue.fastopenq.lock is only fully initialized at listen() time. LOCKDEP is not happy if we attempt a spin_lock_bh() on it, because of missing annotation. (Although kernel runs just fine) Lets use net->ipv4.tcp_fastopen_ctx_lock to protect ctx access. Fixes: 1fba70e5b6be ("tcp: socket option to set TCP fast open key") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Christoph Paasch <cpaasch@apple.com> Reviewed-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>