summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-07-20tipc: clean up definitions and usage of link flagsJon Paul Maloy
The status flag LINK_STOPPED is not needed any more, since the mechanism for delayed deletion of links has been removed. Likewise, LINK_STARTED and LINK_START_EVT are unnecessary, because we can just as well start the link timer directly from inside tipc_link_create(). We eliminate these flags in this commit. Instead of the above flags, we now introduce three new link modes, TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages the link is allowed to receive in this state. TIPC_LINK_BLOCKED also blocks timer-driven protocol messages to be sent out, and any change to the link FSM. Since the modes are mutually exclusive, we convert them to state values, and rename the 'flags' field in struct tipc_link to 'exec_mode'. Finally, we move the #defines for link FSM states and events from link.h into enums inside the file link.c, which is the real usage scope of these definitions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20tipc: make media xmit call outside node spinlock contextJon Paul Maloy
Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20tipc: change sk_buffer handling in tipc_link_xmit()Jon Paul Maloy
When the function tipc_link_xmit() is given a buffer list for transmission, it currently consumes the list both when transmission is successful and when it fails, except for the special case when it encounters link congestion. This behavior is inconsistent, and needs to be corrected if we want to avoid problems in later commits in this series. In this commit, we change this to let the function consume the list only when transmission is successful, and leave the list with the sender in all other cases. We also modifiy the socket code so that it adapts to this change, i.e., purges the list when a non-congestion error code is returned. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20tipc: use bearer index when looking up active linksJon Paul Maloy
struct tipc_node currently holds two arrays of link pointers; one, indexed by bearer identity, which contains all links irrespective of current state, and one two-slot array for the currently active link or links. The latter array contains direct pointers into the elements of the former. This has the effect that we cannot know the bearer id of a link when accessing it via the "active_links[]" array without actually dereferencing the pointer, something we want to avoid in some cases. In this commit, we do instead store the bearer identity in the "active_links" array, and use this as an index to find the right element in the overall link entry array. This change should be seen as a preparation for the later commits in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20tipc: move link input queue to tipc_nodeJon Paul Maloy
At present, the link input queue and the name distributor receive queues are fields aggregated in struct tipc_link. This is a hazard, because a link might be deleted while a receiving socket still keeps reference to one of the queues. This commit fixes this bug. However, rather than adding yet another reference counter to the critical data path, we move the two queues to safe ground inside struct tipc_node, which is already protected, and let the link code only handle references to the queues. This is also in line with planned later changes in this area. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20tipc: move link creation from neighbor discoverer to nodeJon Paul Maloy
As a step towards turning links into node internal entities, we move the creation of links from the neighbor discovery logics to the node's link control logics. We also create an additional entry for the link's media address in the newly introduced struct tipc_link_entry, since this is where it is needed in the upcoming commits. The current copy in struct tipc_link is kept for now, but will be removed later. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20tipc: introduce link entry structure to struct tipc_nodeJon Paul Maloy
struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20switchdev: add offload_fwd_mark generator helperScott Feldman
skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be unique for device and may even be unique for a sub-set of ports within device, so add switchdev helper function to generate unique marks based on port's switch ID and group_ifindex. group_ifindex would typically be the container dev's ifindex, such as the bridge's ifindex. The generator uses a global hash table to store offload_fwd_marks hashed by {switch ID, group_ifindex} key. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20net: add phys ID compare helper to test if two IDs are the sameScott Feldman
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20net: don't reforward packets already forwarded by offload deviceScott Feldman
Just before queuing skb for xmit on port, check if skb has been marked by switchdev port driver as already fordwarded by device. If so, drop skb. A non-zero skb->offload_fwd_mark field is set by the switchdev port driver/device on ingress to indicate the skb has already been forwarded by the device to egress ports with matching dev->skb_mark. The switchdev port driver would assign a non-zero dev->offload_skb_mark for each device port netdev during registration, for example. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20Revert "sit: Add gro callbacks to sit_offload"Herbert Xu
This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit: Add gro callbacks to sit_offload") because it generates packets that cannot be handled even by our own GSO. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not definedNikolay Aleksandrov
Fix: net/bridge/br_if.c: In function 'br_dev_delete': >> net/bridge/br_if.c:284:2: error: implicit declaration of function >> 'br_multicast_dev_del' [-Werror=implicit-function-declaration] br_multicast_dev_del(br); ^ cc1: some warnings being treated as errors when igmp snooping is not defined. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20net/ipv6: update flowi6_oif in ip6_dst_lookup_flow if not setPhil Sutter
Newly created flows don't have flowi6_oif set (at least if the associated socket is not interface-bound). This leads to a mismatch in __xfrm6_selector_match() for policies which specify an interface in the selector (sel->ifindex != 0). Backtracing shows this happens in code-paths originating from e.g. ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was not tested for.) In summary, this patch fixes policy matching on outgoing interface for locally generated packets. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bridge: multicast: fix handling of temp and perm entriesSatish Ashok
When the bridge (or port) is brought down/up flush only temp entries and leave the perm ones. Flush perm entries only when deleting the bridge device or the associated port. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20bridge: multicast: notify on group deleteNikolay Aleksandrov
Group notifications were not sent when a group expired or was deleted due to bridge/port device being deleted. So add br_mdb_notify() to br_multicast_del_pg(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20ebpf: add helper to retrieve net_cls's classid cookieDaniel Borkmann
It would be very useful to retrieve the net_cls's classid from an eBPF program to allow for a more fine-grained classification, it could be directly used or in conjunction with additional policies. I.e. docker, but also tooling such as cgexec, can easily run applications via net_cls cgroups: cgcreate -g net_cls:/foo echo 42 > foo/net_cls.classid cgexec -g net_cls:foo <prog> Thus, their respecitve classid cookie of foo can then be looked up on the egress path to apply further policies. The helper is desigend such that a non-zero value returns the cgroup id. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20cls_cgroup: factor out classid retrievalDaniel Borkmann
Split out retrieving the cgroups net_cls classid retrieval into its own function, so that it can be reused later on from other parts of the traffic control subsystem. If there's no skb->sk, then the small helper returns 0 as well, which in cls_cgroup terms means 'could not classify'. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-17cfg80211: use RTNL locked reg_can_beacon for IR-relaxationArik Nemtsov
The RTNL is required to check for IR-relaxation conditions that allow more channels to beacon. Export an RTNL locked version of reg_can_beacon and use it where possible in AP/STA interface type flows, where IR-relaxation may be applicable. Fixes: 06f207fc5418 ("cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17mac80211: add missing length check for confirm framesBob Copeland
Although mesh_rx_plink_frame() already checks that frames have enough bytes for the action code plus another two bytes for capability/reason code, it doesn't take into account that confirm frames also have an additional two-byte aid. As a result, a corrupt frame could cause a subsequent subtraction to wrap around to ill effect. Add another check for this case. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17mac80211: correct aid location in peering framesBob Copeland
According to 802.11-2012 8.5.16.3.2 AID comes directly after the capability bytes in mesh peering confirm frames. The existing code, however, was adding a 2 byte offset to this location, resulting in garbage data going out over the air. Remove the offset to fix it. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17wireless: regulatory: reduce log level of CRDA related messagesThomas Petazzoni
With a basic Linux userspace, the messages "Calling CRDA to update world regulatory domain" appears 10 times after boot every second or so, followed by a final "Exceeded CRDA call max attempts. Not calling CRDA". For those of us not having the corresponding userspace parts, having those messages repeatedly displayed at boot time is a bit annoying, so this commit reduces their log level to pr_debug(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17mac80211: shut down interfaces before destroying interface listJohannes Berg
If the hardware is unregistered while interfaces are up, mac80211 will unregister all interfaces, which in turns causes mac80211 to be called again to remove them all from the driver and eventually shut down the hardware. During this shutdown, however, it's currently already unsafe to iterate the list of interfaces atomically, as the list is manipulated in an unsafe manner. This puts an undue burden on the driver - it must stop all its activities before calling ieee80211_unregister_hw(), while in the normal stop path it can do all cleanup in the stop method. If, for example, it's using the iteration during RX for some reason, it would have to stop RX before unregistering to avoid crashes. Fix this problem by closing all interfaces before unregistering them. This will cause the driver stop to have completed before we manipulate the interface list, and after the driver is stopped *and* has called ieee80211_unregister_hw() it really musn't be iterating any more as the memory will be freed as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17mac80211: wowlan: enable powersave if suspend while ps-pollingChaitanya T K
If for any reason we're in the middle of PS-polling or awake after TX due to dynamic powersave while going to suspend, go back to save power. This might cause a response frame to get lost, but since we can't really wait for it while going to suspend that's still better than not enabling powersave which would cause higher power usage during (and possibly even after) suspend. Note that this really only affects the very few drivers that use the powersave implementation in mac80211. Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com> [rewrite misleading commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17mac80211: don't clear all tx flags when requeingMichal Kazior
When acting as AP and a PS-Poll frame is received associated station is marked as one in a Service Period. This state is kept until Tx status for released frame is reported. While a station is in Service Period PS-Poll frames are ignored. However if PS-Poll was received during A-MPDU teardown it was possible to have the to-be released frame re-queued back to pending queue. In such case the frame was stripped of 2 important flags: (a) IEEE80211_TX_CTL_NO_PS_BUFFER (b) IEEE80211_TX_STATUS_EOSP Stripping of (a) led to the frame that was to be released to be queued back to ps_tx_buf queue. If station remained to use only PS-Poll frames the re-queued frame (and new ones) was never actually transmitted because mac80211 would ignore subsequent PS-Poll frames due to station being in Service Period. There was nothing left to clear the Service Period bit (no xmit -> no tx status -> no SP end), i.e. the AP would have the station stuck in Service Period. Beacon TIM would repeatedly prompt station to poll for frames but it would get none. Once (a) is not stripped (b) becomes important because it's the main condition to clear the Service Period bit of the station when Tx status for the released frame is reported back. This problem was observed with ath9k acting as P2P GO in some testing scenarios but isn't limited to it. AP operation with mac80211 based Tx A-MPDU control combined with clients using PS-Poll frames is subject to this race. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-17mac80211: clear subdir_stations when removing debugfsTom Hughes
If we don't do this, and we then fail to recreate the debugfs directory during a mode change, then we will fail later trying to add stations to this now bogus directory: BUG: unable to handle kernel NULL pointer dereference at 0000006c IP: [<c0a92202>] mutex_lock+0x12/0x30 Call Trace: [<c0678ab4>] start_creating+0x44/0xc0 [<c0679203>] debugfs_create_dir+0x13/0xf0 [<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211] Cc: stable@kernel.org Signed-off-by: Tom Hughes <tom@compton.nu> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-07-16ipv6: Remove unused arguments for __ipv6_dev_get_saddr().YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15netlink: changes for setting and clearing protodown via netlink.Anuradha Karuppiah
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15net core: Add protodown support.Anuradha Karuppiah
This patch introduces the proto_down flag that can be used by user space applications to notify switch drivers that errors have been detected on the device. The switch driver can react to protodown notification by doing a phys down on the associated switch port. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15tc: act_bpf: fix memory leakAlexei Starovoitov
prog->bpf_ops is populated when act_bpf is used with classic BPF and prog->bpf_name is optionally used with extended BPF. Fix memory leak when act_bpf is released. Fixes: d23b8ad8ab23 ("tc: add BPF based action") Fixes: a8cb5f556b56 ("act_bpf: add initial eBPF support for actions") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15fq_codel: fix return value of fq_codel_drop()WANG Cong
The ->drop() is supposed to return the number of bytes it dropped, however fq_codel_drop() returns the index of the flow where it drops a packet from. Fix this by introducing a helper to wrap fq_codel_drop(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15net_sched: fix a use-after-free in sfqWANG Cong
Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipv6: Fix finding best source address in ipv6_dev_get_saddr().YOSHIFUJI Hideaki/吉藤英明
Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline <ek@google.com>. Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>. Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15ipv6: lock socket in ip6_datagram_connect()Eric Dumazet
ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15pkt_sched: sch_qfq: remove unused member of struct qfq_schedAndrea Parri
The member (u32) "num_active_agg" of struct qfq_sched has been unused since its introduction in 462dbc9101acd38e92eda93c0726857517a24bbd "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT) there is no active plan to use it; this removes the member. Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15fq_codel: fix a use-after-freeWANG Cong
Fixes: 25331d6ce42b ("net: sched: implement qstat helper routines") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15tcp: don't use F-RTO on non-recurring timeoutsYuchung Cheng
Currently F-RTO may repeatedly send new data packets on non-recurring timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682) should only be used on either new recovery or recurring timeouts. This exacerbates the recovery progress during frequent timeout & repair, because we prioritize sending new data packets instead of repairing the holes when the bandwidth is already scarce. Fix it by correcting the test of a new recovery episode. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15bridge: mdb: fix double add notificationNikolay Aleksandrov
Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: cfd567543590 ("bridge: add support of adding and deleting mdb entries") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leaveSatish Ashok
A report with INCLUDE/Change_to_include and empty source list should be treated as a leave, specified by RFC 3376, section 3.1: "If the requested filter mode is INCLUDE *and* the requested source list is empty, then the entry corresponding to the requested interface and multicast address is deleted if present. If no such entry is present, the request is ignored." Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Mainly fix-ups for the various 4.2 items" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits) IB/core: Destroy ocrdma_dev_id IDR on module exit IB/core: Destroy multcast_idr on module exit IB/mlx4: Optimize do_slave_init IB/mlx4: Fix memory leak in do_slave_init IB/mlx4: Optimize freeing of items on error unwind IB/mlx4: Fix use of flow-counters for process_mad IB/ipath: Convert use of __constant_<foo> to <foo> IB/ipoib: Set MTU to max allowed by mode when mode changes IB/ipoib: Scatter-Gather support in connected mode IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush IB/ucma: Fix lockdep warning in ucma_lock_files rds: rds_ib_device.refcount overflow RDMA/nes: Fix for incorrect recording of the MAC address RDMA/nes: Fix for resolving the neigh RDMA/core: Fixes for port mapper client registration IB/IPoIB: Fix bad error flow in ipoib_add_port() IB/mlx4: Do not attemp to report HCA clock offset on VFs IB/cm: Do not queue work to a device that's going away IB/srp: Avoid using uninitialized variable ...
2015-07-15net: Fix skb csum races when peekingHerbert Xu
When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15NET: AX.25: Stop heartbeat timer on disconnect.Richard Stearn
This may result in a kernel panic. The bug has always existed but somehow we've run out of luck now and it bites. Signed-off-by: Richard Stearn <richard@rns-stearn.demon.co.uk> Cc: stable@vger.kernel.org # all branches Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15net: Clone skb before setting peeked flagHerbert Xu
Shared skbs must not be modified and this is crucial for broadcast and/or multicast paths where we use it as an optimisation to avoid unnecessary cloning. The function skb_recv_datagram breaks this rule by setting peeked without cloning the skb first. This causes funky races which leads to double-free. This patch fixes this by cloning the skb and replacing the skb in the list when setting skb->peeked. Fixes: a59322be07c9 ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15rtnetlink: reject non-IFLA_VF_PORT attributes inside IFLA_VF_PORTSDaniel Borkmann
Similarly as in commit 4f7d2cdfdde7 ("rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver"), we have a double nesting of netlink attributes, i.e. IFLA_VF_PORTS only contains IFLA_VF_PORT that is nested itself. While IFLA_VF_PORTS is a verified attribute from ifla_policy[], we only check if the IFLA_VF_PORTS container has IFLA_VF_PORT attributes and then pass the attribute's content itself via nla_parse_nested(). It would be more correct to reject inner types other than IFLA_VF_PORT instead of continuing parsing and also similarly as in commit 4f7d2cdfdde7, to check for a minimum of NLA_HDRLEN. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Scott Feldman <sfeldma@gmail.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-14rds: rds_ib_device.refcount overflowWengang Wang
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device") There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) 119 queue_work(rds_wq, &rds_ibdev->free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bridge/br_mdb.c Minor conflict in br_mdb.c, in 'net' we added a memset of the on-stack 'ip' variable whereas in 'net-next' we assign a new member 'vid'. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-13bridge: mdb: add vlan support for user entriesNikolay Aleksandrov
Until now all user mdb entries were added in vlan 0, this patch adds support to allow the user to specify the vlan for the entry. About the uapi change a hole in struct br_mdb_entry is used so the size and offsets are kept the same (verified with pahole and tested with older iproute2). Example: $ bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000 dev br0 port eth1 grp 239.0.0.1 permanent vlan 200 dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-13net: Build IPv6 into kernel by defaultTom Herbert
This patch makes the default to build IPv6 into the kernel. IPv6 now has significant traction and any remaining vestiges of IPv6 not being provided parity with IPv4 should be swept away. IPv6 is now core to the Internet and kernel. Points on IPv6 adoption: - Per Google statistics, IPv6 usage has reached 7% on the Internet and continues to exhibit an exponential growth rate https://www.google.com/intl/en/ipv6/statistics.html - Just a few days ago ARIN officially depleted its IPv4 pool - IPv6 only data centers are being successfully built (e.g. at Facebook) This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6 is set to "y" and the text has been updated to reflect the maturity of IPv6. Impact: Under some circumstances building modules in to kernel might have a performance advantage. In my testing, I did notice a very slight improvement. This will obviously increase the size of the kernel image. In my configuration I see: IPv6 as module: text data bss dec hex filename 9703666 1899288 933888 12536842 bf4c0a vmlinux IPv6 built into kernel text data bss dec hex filename 9436490 1879600 913408 12229498 ba9b7a vmlinux Which increases text size by ~270K (2.8% increase in size for me). If image size is an issue, presumably for a device which does not do IP networking (IMO we should be discouraging IPv4-only devices), IPV6 can be disabled or still built as a module. Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Missing list head init in bluetooth hidp session creation, from Tedd Ho-Jeong An. 2) Don't leak SKB in bridge netfilter error paths, from Florian Westphal. 3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien Grall. 4) Fix regression in IP over hamradio bpq encapsulation, from Ralf Baechle. 5) Fix race between rhashtable resize events and table walks, from Phil Sutter. 6) Missing validation of IFLA_VF_INFO netlink attributes, fix from Daniel Borkmann. 7) Missing security layer socket state initialization in tipc code, from Stephen Smalley. 8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from Denys Vlasenko. 9) Missing minor_idr destroy on module unload on macvtap driver, from Johannes Thumshirn. 10) Various pktgen kernel thread races, from Oleg Nesterov. 11) Fix races that can cause packets to be processed in the backlog even after a device attached to that SKB has been fully unregistered. From Julian Anastasov. 12) bcmgenet driver doesn't account packet drops vs. errors properly, fix from Petri Gynther. 13) Array index validation and off by one fix in DSA layer from Florian Fainelli * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits) can: replace timestamp as unique skb attribute ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux can: c_can: Fix default pinmux glitch at init can: rcar_can: unify error messages can: rcar_can: print request_irq() error code can: rcar_can: fix typo in error message can: rcar_can: print signed IRQ # can: rcar_can: fix IRQ check net: dsa: Fix off-by-one in switch address parsing net: dsa: Test array index before use net: switchdev: don't abort unsupported operations net: bcmgenet: fix accounting of packet drops vs errors cdc_ncm: update specs URL Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets bridge: mdb: allow the user to delete mdb entry if there's a querier net: call rcu_read_lock early in process_backlog net: do not process device backlog during unregistration bridge: fix potential crash in __netdev_pick_tx() net: axienet: Fix devm_ioremap_resource return value check ...
2015-07-12Merge tag 'linux-can-fixes-for-4.2-20150712' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-07-12 this is a pull request of 8 patchs for net/master. Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ check and several info and error messages. There are two patches by J.D. Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree, which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better approach to make the CAN skbs unique, the timestamp is replaced by a counter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-12can: replace timestamp as unique skb attributeOliver Hartkopp
Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for overlapping CAN filters" requires the skb->tstamp to be set to check for identical CAN skbs. Without timestamping to be required by user space applications this timestamp was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs by introducing several __net_timestamp() calls. This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb() to add __net_timestamp() after skbuff creation to prevent the frame loss fixed in mainline Linux. This patch removes the timestamp dependency and uses an atomic counter to create an unique identifier together with the skbuff pointer. Btw: the new skbcnt element introduced in struct can_skb_priv has to be initialized with zero in out-of-tree drivers which are not using alloc_can{,fd}_skb() too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>