linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2016-07-11	xprtrdma: Remove rpcrdma_map_one() and friends	Chuck Lever
	Clean up: ALLPHYSICAL is gone and FMR has been converted to use scatterlists. There are no more users of these functions. This patch shrinks the size of struct rpcrdma_req by about 3500 bytes on x86_64. There is one of these structs for each RPC credit (128 credits per transport connection). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Remove ALLPHYSICAL memory registration mode	Chuck Lever
	No HCA or RNIC in the kernel tree requires the use of ALLPHYSICAL. ALLPHYSICAL advertises in the clear on the network fabric an R_key that is good for all of the client's memory. No known exploit exists, but theoretically any user on the server can use that R_key on the client's QP to read or update any part of the client's memory. ALLPHYSICAL exposes the client to server bugs, including: o base/bounds errors causing data outside the i/o buffer to be accessed o RDMA access after reply causing data corruption and/or integrity fail ALLPHYSICAL can't protect application memory regions from server update after a local signal or soft timeout has terminated an RPC. ALLPHYSICAL chunks are no larger than a page. Special cases to handle small chunks and long chunk lists have been a source of implementation complexity and bugs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Do not leak an MW during a DMA map failure	Chuck Lever
	Based on code audit. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Refactor MR recovery work queues	Chuck Lever
	I found that commit ead3f26e359e ("xprtrdma: Add ro_unmap_safe memreg method"), which introduces ro_unmap_safe, never wired up the FMR recovery worker. The FMR and FRWR recovery work queues both do the same thing. Instead of setting up separate individual work queues for this, schedule a delayed worker to deal with them, since recovering MRs is not performance-critical. Fixes: ead3f26e359e ("xprtrdma: Add ro_unmap_safe memreg method") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Use scatterlist for DMA mapping and unmapping under FMR	Chuck Lever
	The use of a scatterlist for handling DMA mapping and unmapping was recently introduced in frwr_ops.c in commit 4143f34e01e9 ("xprtrdma: Port to new memory registration API"). That commit did not make a similar update to xprtrdma's FMR support because the core ib_map_phys_fmr() and ib_unmap_fmr() APIs have not been changed to take a scatterlist argument. However, FMR still needs to do DMA mapping and unmapping. It appears that RDS, for example, uses a scatterlist for this, then builds the DMA addr array for the ib_map_phys_fmr call separately. I see that SRP also utilizes a scatterlist for DMA mapping. xprtrdma can do something similar. This modernization is used immediately to properly defer DMA unmapping during fmr_unmap_safe (a FIXME). It separates the DMA unmapping coordinates from the rl_segments array. This array, being part of an rpcrdma_req, is always re-used immediately when an RPC exits. A scatterlist is allocated in memory independent of the rl_segments array, so it can be preserved indefinitely (ie, until the MR invalidation and DMA unmapping can actually be done by a worker thread). The FRWR and FMR DMA mapping code are slightly different from each other now, and will diverge further when the "Check for holes" logic can be removed from FRWR (support for SG_GAP MRs). So I chose not to create helpers for the common-looking code. Fixes: ead3f26e359e ("xprtrdma: Add ro_unmap_safe memreg method") Suggested-by: Sagi Grimberg <sagi@lightbits.io> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Rename fields in rpcrdma_fmr	Chuck Lever
	Clean up: Use the same naming convention used in other RPC/RDMA-related data structures. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Move init and release helpers	Chuck Lever
	Clean up: Moving these helpers in a separate patch makes later patches more readable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Create common scatterlist fields in rpcrdma_mw	Chuck Lever
	Clean up: FMR is about to replace the rpcrdma_map_one code with scatterlists. Move the scatterlist fields out of the FRWR-specific union and into the generic part of rpcrdma_mw. One minor change: -EIO is now returned if FRWR registration fails. The RPC is terminated immediately, since the problem is likely due to a software bug, thus retrying likely won't help. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	xprtrdma: Remove FMRs from the unmap list after unmapping	Chuck Lever
	ib_unmap_fmr() takes a list of FMRs to unmap. However, it does not remove the FMRs from this list as it processes them. Other ib_unmap_fmr() call sites are careful to remove FMRs from the list after ib_unmap_fmr() returns. Since commit 7c7a5390dc6c8 ("xprtrdma: Add ro_unmap_sync method for FMR") fmr_op_unmap_sync passes more than one FMR to ib_unmap_fmr(), but it didn't bother to remove the FMRs from that list once the call was complete. I've noticed some instability that could be related to list tangling by the new fmr_op_unmap_sync() logic. In an abundance of caution, add some defensive logic to clean up properly after ib_unmap_fmr(). Fixes: 7c7a5390dc6c8 ("xprtrdma: Add ro_unmap_sync method for FMR") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-07-11	udp: prevent bugcheck if filter truncates packet too much	Michal Kubeček
	If socket filter truncates an udp packet below the length of UDP header in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if kernel is configured that way) can be easily enforced by an unprivileged user which was reported as CVE-2016-6162. For a reproducer, see http://seclists.org/oss-sec/2016/q3/8 Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Marco Grassi <marco.gra@gmail.com> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11	Merge tag 'batadv-net-for-davem-20160708' of git://git.open-mesh.org/linux-merge	David S. Miller
	Simon Wunderlich says: ==================== Here are a couple batman-adv bugfix patches, all by Sven Eckelmann: - Fix possible NULL pointer dereference for vlan_insert_tag (two patches) - Fix reference handling in some features, which may lead to reference leaks or invalid memory access (four patches) - Fix speedy join: DHCP packets handled by the gateway feature should be sent with 4-address unicast instead of 3-address unicast to make speedy join work. This fixes/speeds up DHCP assignment for clients which join a mesh for the first time. (one patch) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11	netfilter: nf_conntrack_h323: fix off-by-one in DecodeQ931	Toby DiPasquale
	This patch corrects an off-by-one error in the DecodeQ931 function in the nf_conntrack_h323 module. This error could result in reading off the end of a Q.931 frame. Signed-off-by: Toby DiPasquale <toby@cbcg.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	Merge tag 'ipvs-for-v4.8' of ↵	Pablo Neira Ayuso
	https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== IPVS Updates for v4.8 please consider these enhancements to the IPVS. This alters the behaviour of the "least connection" schedulers such that pre-established connections are included in the active connection count. This avoids overloading servers when a large number of new connections arrive in a short space of time - e.g. when clients reconnect after a node or network failure. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: nf_tables: get rid of possible_net_t from set and basechain	Pablo Neira Ayuso
	We can pass the netns pointer as parameter to the functions that need to gain access to it. From basechains, I didn't find any client for this field anymore so let's remove this too. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: nft_ct: make byte/packet expr more friendly	Liping Zhang
	If we want to use ct packets expr, and add a rule like follows: # nft add rule filter input ct packets gt 1 counter We will find that no packets will hit it, because nf_conntrack_acct is disabled by default. So It will not work until we enable it manually via "echo 1 > /proc/sys/net/netfilter/nf_conntrack_acct". This is not friendly, so like xt_connbytes do, if the user want to use ct byte/packet expr, enable nf_conntrack_acct automatically. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: physdev: physdev-is-out should not work with OUTPUT chain	Hangbin Liu
	physdev_mt() will check skb->nf_bridge first, which was alloced in br_nf_pre_routing. So if we want to use --physdev-out and physdev-is-out, we need to match it in FORWARD or POSTROUTING chain. physdev_mt_check() only checked physdev-out and missed physdev-is-out. Fix it and update the debug message to make it clearer. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Marcelo R Leitner <marcelo.leitner@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: nat: convert nat bysrc hash to rhashtable	Florian Westphal
	It did use a fixed-size bucket list plus single lock to protect add/del. Unlike the main conntrack table we only need to add and remove keys. Convert it to rhashtable to get table autosizing and per-bucket locking. The maximum number of entries is -- as before -- tied to the number of conntracks so we do not need another upperlimit. The change does not handle rhashtable_remove_fast error, only possible "error" is -ENOENT, and that is something that can happen legitimetely, e.g. because nat module was inserted at a later time and no src manip took place yet. Tested with http-client-benchmark + httpterm with DNAT and SNAT rules in place. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	Merge tag 'ipvs-fixes2-for-v4.7' of ↵	Pablo Neira Ayuso
	https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== Second Round of IPVS Fixes for v4.7 The fix from Quentin Armitage allows the backup sync daemon to be bound to a link-local mcast IPv6 address as is already the case for IPv4. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: move nat hlist_head to nf_conn	Florian Westphal
	The nat extension structure is 32bytes in size on x86_64: struct nf_conn_nat { struct hlist_node bysource; /* 0 16 / struct nf_conn ct; /* 16 8 / union nf_conntrack_nat_help help; / 24 4 / int masq_index; / 28 4 / / size: 32, cachelines: 1, members: 4 / / last cacheline: 32 bytes */ }; The hlist is needed to quickly check for possible tuple collisions when installing a new nat binding. Storing this in the extension area has two drawbacks: 1. We need ct backpointer to get the conntrack struct from the extension. 2. When reallocation of extension area occurs we need to fixup the bysource hash head via hlist_replace_rcu. We can avoid both by placing the hlist_head in nf_conn and place nf_conn in the bysource hash rather than the extenstion. We can also remove the ->move support; no other extension needs it. Moving the entire nat extension into nf_conn would be possible as well but then we have to add yet another callback for deletion from the bysource hash table rather than just using nat extension ->destroy hook for this. nf_conn size doesn't increase due to aligment, followup patch replaces hlist_node with single pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: conntrack: simplify early_drop	Florian Westphal
	We don't need to acquire the bucket lock during early drop, we can use lockless traveral just like ____nf_conntrack_find. The timer deletion serves as synchronization point, if another cpu attempts to evict same entry, only one will succeed with timer deletion. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: nf_ct_helper: unlink helper again when hash resize happen	Liping Zhang
	From: Liping Zhang <liping.zhang@spreadtrum.com> Similar to ctnl_untimeout, when hash resize happened, we should try to do unhelp from the 0# bucket again. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: cttimeout: unlink timeout obj again when hash resize happen	Liping Zhang
	Imagine such situation, nf_conntrack_htable_size now is 4096, we are doing ctnl_untimeout, and iterate on 3000# bucket. Meanwhile, another user try to reduce hash size to 2048, then all nf_conn are removed to the new hashtable. When this hash resize operation finished, we still try to itreate ct begin from 3000# bucket, find nothing to do and just return. We may miss unlinking some timeout objects. And later we will end up with invalid references to timeout object that are already gone. So when we find that hash resize happened, try to unlink timeout objects from the 0# bucket again. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	netfilter: conntrack: fix race between nf_conntrack proc read and hash resize	Liping Zhang
	When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack hash table via /sys/module/nf_conntrack/parameters/hashsize, race will happen, because reader can observe a newly allocated hash but the old size (or vice versa). So oops will happen like follows： BUG: unable to handle kernel NULL pointer dereference at 0000000000000017 IP: [<ffffffffa0418e21>] seq_print_acct+0x11/0x50 [nf_conntrack] Call Trace: [<ffffffffa0412f4e>] ? ct_seq_show+0x14e/0x340 [nf_conntrack] [<ffffffff81261a1c>] seq_read+0x2cc/0x390 [<ffffffff812a8d62>] proc_reg_read+0x42/0x70 [<ffffffff8123bee7>] __vfs_read+0x37/0x130 [<ffffffff81347980>] ? security_file_permission+0xa0/0xc0 [<ffffffff8123cf75>] vfs_read+0x95/0x140 [<ffffffff8123e475>] SyS_read+0x55/0xc0 [<ffffffff817c2572>] entry_SYSCALL_64_fastpath+0x1a/0xa4 It is very easy to reproduce this kernel crash. 1. open one shell and input the following cmds: while : ; do echo $RANDOM > /sys/module/nf_conntrack/parameters/hashsize done 2. open more shells and input the following cmds: while : ; do cat /proc/net/nf_conntrack done 3. just wait a monent, oops will happen soon. The solution in this patch is based on Florian's Commit 5e3c61f98175 ("netfilter: conntrack: fix lookup race during hash resize"). And add a wrapper function nf_conntrack_get_ht to get hash and hsize suggested by Florian Westphal. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-11	NFC: digital: Fix RTOX supervisor PDU handling	Thierry Escande
	When the target needs more time to process the received PDU, it sends Response Timeout Extension (RTOX) PDU. When the initiator receives a RTOX PDU, it must reply with a RTOX PDU and extends the current rwt value with the formula: rwt_int = rwt * rtox This patch takes care of the rtox value passed by the target in the RTOX PDU and extends the timeout for the next response accordingly. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Add support for NFC DEP Response Waiting Time	Thierry Escande
	When sending an ATR_REQ, the initiator must wait for the ATR_RES at least 'RWT(nfcdep,activation) + dRWT(nfcdep)' and no more than 'RWT(nfcdep,activation) + dRWT(nfcdep) + dT(nfcdep,initiator)'. This gives a timeout value between 1237 ms and 1337 ms. This patch defines DIGITAL_ATR_RES_RWT to 1337 used for the timeout value of ATR_REQ command. For other DEP PDUs, the initiator must wait between 'RWT + dRWT(nfcdep)' and 'RWT + dRWT(nfcdep) + dT(nfcdep,initiator)' where RWT is given by the following formula: '(256 * 16 / f(c)) * 2^wt' where wt is the value of the TO field in the ATR_RES response and is in the range between 0 and 14. This patch declares a mapping table for wt values and gives RWT max values between 100 ms and 5049 ms. This patch also defines DIGITAL_ATR_RES_TO_WT, the maximum wt value in target mode, to 8. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Free supervisor PDUs	Thierry Escande
	This patch frees the RTOX resp sk_buff in initiator mode. It also makes use of the free_resp exit point for ATN supervisor PDUs in both initiator and target mode. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Rework ACK PDU handling in initiator mode	Thierry Escande
	With this patch, ACK PDU sk_buffs are now freed and code has been refactored for better errors handling. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Fix ACK & NACK PDUs handling in target mode	Thierry Escande
	When the target receives a NACK PDU, it re-sends the last sent PDU. ACK PDUs are received by the target as a reply from the initiator to chained I-PDUs. There are 3 cases to handle: - If the target has previously received 1 or more ATN PDUs and the PNI in the ACK PDU is equal to the target PNI - 1, then it means that the initiator did not received the last issued PDU from the target. In this case it re-sends this PDU. - If the target has received 1 or more ATN PDUs but the ACK PNI is not the target PNI - 1, then this means that this ACK is the reply of the previous chained I-PDU sent by the target. The target did not received it on the first attempt and it is being re-sent by the initiator. The process continues as usual. - No ATN PDU received before this ACK PDU. This is the reply of a chained I-PDU. The target keeps on processing its chained I-PDU. The code has been refactored to avoid too many indentation levels. Also, ACK and NACK PDUs were not freed. This is now fixed. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Fix target DEP_REQ I-PDU handling after ATN PDU	Thierry Escande
	When the initiator sends a DEP_REQ I-PDU, the target device may not reply in a timely manner. In this case the initiator device must send an attention PDU (ATN) and if the recipient replies with an ATN PDU in return, then the last I-PDU must be sent again by the initiator. This patch fixes how the target handles I-PDU received after an ATN PDU has been received. There are 2 possible cases: - The target has received the initial DEP_REQ and sends back the DEP_RES but the initiator did not receive it. In this case, after the initiator has sent an ATN PDU and the target replied it (with an ATN as well), the initiator sends the saved skb of the initial DEP_REQ again and the target replies with the saved skb of the initial DEP_RES. - Or the target did not even received the initial DEP_REQ. In this case, after the ATN PDUs exchange, the initiator sends the saved skb and the target simply passes it up, just as usual. This behavior is controlled using the atn_count and the PNI field of the digital device structure. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Remove useless call to skb_reserve()	Thierry Escande
	When allocating chained I-PDUs, there is no need to call skb_reserve() since it's already done by digital_alloc_skb() and contains enough room for the driver head and tail data. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-11	NFC: digital: Fix handling of saved PDU sk_buff pointers	Thierry Escande
	This patch fixes the way an I-PDU is saved in case it needs to be sent again. It is now copied using pskb_copy() and not simply referenced using skb_get() since it could be modified by the driver. digital_in_send_saved_skb() and digital_tg_send_saved_skb() still get a reference on the saved skb which is re-sent but release it if the send operation fails. That way the caller doesn't have to take care about skb ref in case of error. RTOX supervisor PDU must not be saved as this can override a previously saved I-PDU that should be re-sent later on. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-09	dccp: avoid deadlock in dccp_v4_ctl_send_reset	Eric Dumazet
	In the prep work I did before enabling BH while handling socket backlog, I missed two points in DCCP : 1) dccp_v4_ctl_send_reset() uses bh_lock_sock(), assuming BH were blocked. It is not anymore always true. 2) dccp_v4_route_skb() was using __IP_INC_STATS() instead of IP_INC_STATS() A similar fix was done for TCP, in commit 47dcc20a39d0 ("ipv4: tcp: ip_send_unicast_reply() is not BH safe") Fixes: 7309f8821fd6 ("dccp: do not assume DCCP code is non preemptible") Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	ipv6: do not abuse GFP_ATOMIC in inet6_netconf_notify_devconf()	Eric Dumazet
	All inet6_netconf_notify_devconf() callers are in process context, so we can use GFP_KERNEL allocations if we take care of not holding a rwlock while not needed in ip6mr (we hold RTNL there) Fixes: d67b8c616b48 ("netconf: advertise mc_forwarding status") Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding status") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf()	Eric Dumazet
	inet_forward_change() runs with RTNL held. We are allowed to sleep if required. If we use __in_dev_get_rtnl() instead of __in_dev_get_rcu(), we no longer have to use GFP_ATOMIC allocations in inet_netconf_notify_devconf(), meaning we are less likely to miss notifications under memory pressure, and wont touch precious memory reserves either and risk dropping incoming packets. inet_netconf_get_devconf() can also use GFP_KERNEL allocation. Fixes: edc9e748934c ("rtnl/ipv4: use netconf msg to advertise forwarding status") Fixes: 9e5511106f99 ("rtnl/ipv4: add support of RTM_GETNETCONF") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	net: tracepoint napi:napi_poll add work and budget	Jesper Dangaard Brouer
	An important information for the napi_poll tracepoint is knowing the work done (packets processed) by the napi_poll() call. Add both the work done and budget, as they are related. Handle trace_napi_poll() param change in dropwatch/drop_monitor and in python perf script netdev-times.py in backward compat way, as python fortunately supports optional parameter handling. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	mpls: allow routes on ipip and sit devices	Simon Horman
	Allow MPLS routes on IPIP and SIT devices now that they support forwarding MPLS packets. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	ipip: support MPLS over IPv4	Simon Horman
	Extend the IPIP driver to support MPLS over IPv4. The implementation is an extension of existing support for IPv4 over IPv4 and is based of multiple inner-protocol support for the SIT driver. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	sit: support MPLS over IPv4	Simon Horman
	Extend the SIT driver to support MPLS over IPv4. This implementation extends existing support for IPv6 over IPv4 and IPv4 over IPv4. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	tunnels: support MPLS over IPv4 tunnels	Simon Horman
	Extend tunnel support to MPLS over IPv4. The implementation extends the existing differentiation between IPIP and IPv6 over IPv4 to also cover MPLS over IPv4. Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	net: bridge: extend MLD/IGMP query stats	Nikolay Aleksandrov
	As was suggested this patch adds support for the different versions of MLD and IGMP query types. Since the user visible structure is still in net-next we can augment it instead of adding netlink attributes. The distinction between the different IGMP/MLD query types is done as suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based on query payload size and code for IGMP. Since all IGMP packets go through multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be sure that at least the ip/ipv6 header can be directly used. [1] https://tools.ietf.org/html/rfc3376#section-7 [2] https://tools.ietf.org/html/rfc3810#section-8.1 Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-09	Bluetooth: Rename HCI_BREDR into HCI_PRIMARY	Marcel Holtmann
	The HCI_BREDR naming is confusing since it actually stands for Primary Bluetooth Controller. Which is a term that has been used in the latest standard. However from a legacy point of view there only really have been Basic Rate (BR) and Enhanced Data Rate (EDR). Recent versions of Bluetooth introduced Low Energy (LE) and made this terminology a little bit confused since Dual Mode Controllers include BR/EDR and LE. To simplify this the name HCI_PRIMARY stands for the Primary Controller which can be a single mode or dual mode controller. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-09	Bluetooth: Remove controller device attributes	Marcel Holtmann
	The controller device attributes are not used and expose no valuable information. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-09	Bluetooth: Remove connection link attributes	Marcel Holtmann
	The connection link attributes are not used and expose no valuable information. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2016-07-09	sctp: fix panic when sending auth chunks	Marcelo Ricardo Leitner
	When we introduced GSO support, if using auth the auth chunk was being left queued on the packet even after the final segment was generated. Later on sctp_transmit_packet it calls sctp_packet_reset, which zeroed the packet len while not accounting for this left-over. This caused more space to be used the next packet due to the chunk still being queued, but space which wasn't allocated as its size wasn't accounted. The fix is to only queue it back when we know that we are going to generate another segment. Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08	net: dsa: initialize the routing table	Vivien Didelot
	The routing table of every switch in a tree is currently initialized to all zeros. This is an issue since 0 is a valid port number. Add a DSA_RTABLE_NONE=-1 constant to initialize the signed values of the routing table pointing to other switches. This fixes the device mapping of the mv88e6xxx driver where the port pointing to the switch itself and to non-existent switches was wrongly configured to be 0. It is now set to the expected 0xf value. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08	Merge tag 'mac80211-for-davem-2016-07-06' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Two more fixes: * handle allocation failures in new(ish) A-MSDU decapsulation * don't leak memory on nl80211 ACL parse errors ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08	Merge tag 'rxrpc-rewrite-20160706' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Improve conn/call lookup and fix call number generation [ver #3] I've fixed a couple of patch descriptions and excised the patch that duplicated the connections list for reconsideration at a later date. For reference, the excised patch is sitting on the rxrpc-experimental branch of my git tree, based on top of the rxrpc-rewrite branch. Diffing it against yesterday's tag shows no differences. Would you prefer the patch set to be emailed afresh instead of a git-pull request? David --- Here's the next part of the AF_RXRPC rewrite. The two main purposes of this set are to fix the call number handling and to make use of RCU when looking up the connection or call to pass a received packet to. Important changes in this set include: (1) Avoidance of placing stack data into SG lists in rxkad so that kernel stacks can become vmalloc'd (Herbert Xu). (2) Calls cease pinning the connection they used as soon as possible, which allows the connection to be discarded sooner and allows the call channel on that connection to be reused earlier. (3) Make each call channel on a connection have a separate and independent call number space rather than having a shared number space for the connection. Call numbers should increment monotonically per channel on the client, and the server should ignore a call with a lower call number for that channel than the latest it has seen. The RESPONSE packet sets the minimum values of each call ID counter on a connection. (4) Look up calls by indexing the channel array on a connection rather than by keeping calls in an rbtree on that connection. Also look up calls using the channel array rather than using a hashtable. The call hashtable can then be removed. (5) Call terminal statuses are cached in the channel array for the last call. It is assumed that if we the server have seen call N, then the client no longer cares about call N-1 on the same channel. This will allow retransmission of the terminal status in future without the need to keep the rxrpc_call struct around. (6) Peer lookups are moved out of common connection handling code and into service connection handling code as client connections (a) must point to a peer before they can be used and (b) are looked up by a machine-unique connection ID directly, so we only need to look up the peer first if we're going to deal with a service call. (7) The reference count on a connection is held elevated by 1 whilst it is alive (ie. idle unused connections have a refcount of 1). The reaper will attempt to change the refcount from 1->0 and skip if this cannot be done, whilst look ups only increment the refcount if it's non-zero. This makes the implementation of RCU lookups easier as we don't have to get a ref on the connection or a lock on the connection list to prevent a connection being reaped whilst we're contemplating queueing a packet that initiates a new service call upon it. If we need to get a connection, but there's a dead connection in the tree, we use rb_replace_node() to replace the dead one with a new one. (8) Use a seqlock to validate the walk over the service connection rbtree attached to a peer when it's being walked in RCU mode. (9) Make the incoming call/connection packet handling code use RCU mode and locks and make it only take a reference if the call/connection gets queued on a workqueue. The intention is that the next set will introduce the connection lifetime management and capacity limits to prevent clients from overloading the server. There are some fixes too: (1) Verifying that a packet coming in to a client connection came from the expected source. (2) Fix handling of connection failure in client call creation where we don't reinitialise the list linkage block and a second attempt to unlink the failed connection oopses and also we don't set the state correctly, which causes an assertion failure. (3) New service calls were being added to the socket's accept queue under the wrong lock. Changes: (V2) In rxrpc_find_service_conn_rcu() initialised the sequence number to 0. Fixed the RCU handling in conn_service.c by introducing and using rb_replace_node_rcu() as an RCU-safe alternative in rxrpc_publish_service_conn(). Modified and used rcu_dereference_raw() to avoid RCU sparse warnings in rxrpc_find_service_conn_rcu(). Added in some missing RCU dereference wrappers. It seems to be necessary to turn on CONFIG_PROVE_RCU_REPEATEDLY as well as CONFIG_SPARSE_RCU_POINTER to get the static __rcu annotation checking to happen. Fixed some other sparse warnings, including a missing ntohs() in jumbo packet processing. (V3) Fixed some commit descriptions. Excised the patch that duplicated the connection list to separate out the procfs list for reconsideration at a later date. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08	hfsc: reduce hfsc_sched to 14 cachelines	Florian Westphal
	hfsc_sched is huge (size: 920, cachelines: 15), but we can get it to 14 cachelines by placing level after filter_cnt (covering 4 byte hole) and reducing period/nactive/flags to u32 (period is just a counter, incremented when class becomes active -- 2**32 is plenty for this purpose, also, long is only 32bit wide on 32bit platforms anyway). cl_vtperiod is exported to userspace via tc_hfsc_stats, but its period member is already u32, so no precision is lost there either. Cc: Michal Soltys <soltys@ziu.info> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-08	netfilter: nft_ct: fix expiration getter	Florian Westphal
	We need to compute timeout.expires - jiffies, not the other way around. Add a helper, another patch can then later change more places in conntrack code where we currently open-code this. Will allow us to only change one place later when we remove per-ct timer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-08	6lowpan: ndisc: set invalid unicast short addr to unspec	Alexander Aring
	When receiving neighbour information with short address option field we should check the complete range of invalid short addresses and set it to one invalid address setting which is the unspecified address. This address is also used when by creating at first a new neighbour entry to indicate no short address is set. Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>