summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-10-16i40e: Removed unused definesGreg Rose
Two defines that are not used are causing customer confusion - remove them. Change-ID: Icef0325aca8e0f4fcdfc519e026bdd375e791200 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16i40e: remove read/write failed messages from nvmupdateShannon Nelson
Allow the nvmupdate application to decide when a read or write error should be exposed to the user. Since the application needs to use write probes to find the ReadOnly sections on a potentially unknown NVM version in the HW and read probes to check the status of the last write, some error messages are expected, but need not be shown to the users. The driver doesn't know which are ignorable from real errors, so needs to let the application make the decision. Change-ID: I78fca8ab672bede11c10c820b83c26adfd536d03 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16i40e/i40evf: Fix compile issue related to const stringJingjing Wu
Add const to functions that return strings that aren't going to be modified. This addresses some reported compile complaints. Change-ID: Ic56b1e814ab4d23a50480e7fdec652445f776ee8 Signed-off-by: Jingjing Wu <jingjing.wu@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16i40e: generate fewer startup messagesShannon Nelson
Cut down on the number of startup log entries by putting a couple behind debug flags and combining a couple others into a single line. Change-ID: I708089f086308f84d43f8b6f0e8a634a02d058fb Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16drivers/net/intel: use napi_complete_done()Jesse Brandeburg
As per Eric Dumazet's previous patches: (see commit (24d2e4a50737) - tg3: use napi_complete_done()) Quoting verbatim: Using napi_complete_done() instead of napi_complete() allows us to use /sys/class/net/ethX/gro_flush_timeout GRO layer can aggregate more packets if the flush is delayed a bit, without having to set too big coalescing parameters that impact latencies. </end quote> Tested configuration: low latency via ethtool -C ethx adaptive-rx off rx-usecs 10 adaptive-tx off tx-usecs 15 workload: streaming rx using netperf TCP_MAERTS igb: MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo ... Interim result: 941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589 Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 1176930056 1475.36 797726 16384.00 71905 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo ... Interim result: 941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763 Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 1175182320 50476.00 23282 16384.00 71816 i40e: Hard to test because the traffic is incoming so fast (24Gb/s) that GRO always receives 87kB, even at the highest interrupt rate. Other drivers were only compile tested. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16i40evf: Add support for netpollAlexander Duyck
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16i40e/i40evf: Drop useless "IN_NETPOLL" flagAlexander Duyck
The code in i40e and i40evf is using an "IN_NETPOLL" flag that has never added any value due to the fact that the Rx clean-up is handled in NAPI. As such the flag was set, the queue was scheduled via NAPI, and then polled from the netpoll controller and if any Rx packets were processed the were processed in the wrong context. In addition the flag itself just added an unneeded conditional to the hot-path so it can safely be dropped and save us a few instructions. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16i40e/i40evf: Fix handling of napi budgetAlexander Duyck
The polling routine for i40e was rounding up the budget for Rx cleanup to 1. This is incorrect as the netpoll poll call is expecting no Rx to be processed as the budget passed was 0. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-10-16genirq/msi: Do not use pci_msi_[un]mask_irq as default methodsMarc Zyngier
When we create a generic MSI domain, that MSI_FLAG_USE_DEF_CHIP_OPS is set, and that any of .mask or .unmask are NULL in the irq_chip structure, we set them to pci_msi_[un]mask_irq. This is a bad idea for at least two reasons: - PCI_MSI might not be selected, kernel fails to build (yes, this is legitimate, at least on arm64!) - This may not be a PCI/MSI domain at all (platform MSI, for example) Either way, this looks wrong. Move the overriding of mask/unmask to the PCI counterpart, and panic is any of these two methods is not set in the core code (they really should be present). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/1444760085-27857-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-16net: Fix suspicious RCU usage in fib_rebalanceDavid Ahern
This command: ip route add 192.168.1.0/24 nexthop via 10.2.1.5 dev eth1 nexthop via 10.2.2.5 dev eth2 generated this suspicious RCU usage message: [ 63.249262] [ 63.249939] =============================== [ 63.251571] [ INFO: suspicious RCU usage. ] [ 63.253250] 4.3.0-rc3+ #298 Not tainted [ 63.254724] ------------------------------- [ 63.256401] ../include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage! [ 63.259450] [ 63.259450] other info that might help us debug this: [ 63.259450] [ 63.262297] [ 63.262297] rcu_scheduler_active = 1, debug_locks = 1 [ 63.264647] 1 lock held by ip/2870: [ 63.265896] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff813ebfb7>] rtnl_lock+0x12/0x14 [ 63.268858] [ 63.268858] stack backtrace: [ 63.270409] CPU: 4 PID: 2870 Comm: ip Not tainted 4.3.0-rc3+ #298 [ 63.272478] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 63.275745] 0000000000000001 ffff8800b8c9f8b8 ffffffff8125f73c ffff88013afcf301 [ 63.278185] ffff8800bab7a380 ffff8800b8c9f8e8 ffffffff8107bf30 ffff8800bb728000 [ 63.280634] ffff880139fe9a60 0000000000000000 ffff880139fe9a00 ffff8800b8c9f908 [ 63.283177] Call Trace: [ 63.283959] [<ffffffff8125f73c>] dump_stack+0x4c/0x68 [ 63.285593] [<ffffffff8107bf30>] lockdep_rcu_suspicious+0xfa/0x103 [ 63.287500] [<ffffffff8144d752>] __in_dev_get_rcu+0x48/0x4f [ 63.289169] [<ffffffff8144d797>] fib_rebalance+0x3e/0x127 [ 63.290753] [<ffffffff8144d986>] ? rcu_read_unlock+0x3e/0x5f [ 63.292442] [<ffffffff8144ea45>] fib_create_info+0xaf9/0xdcc [ 63.294093] [<ffffffff8106c12f>] ? sched_clock_local+0x12/0x75 [ 63.295791] [<ffffffff8145236a>] fib_table_insert+0x8c/0x451 [ 63.297493] [<ffffffff8144bf9c>] ? fib_get_table+0x36/0x43 [ 63.299109] [<ffffffff8144c3ca>] inet_rtm_newroute+0x43/0x51 [ 63.300709] [<ffffffff813ef684>] rtnetlink_rcv_msg+0x182/0x195 [ 63.302334] [<ffffffff8107d04c>] ? trace_hardirqs_on+0xd/0xf [ 63.303888] [<ffffffff813ebfb7>] ? rtnl_lock+0x12/0x14 [ 63.305346] [<ffffffff813ef502>] ? __rtnl_unlock+0x12/0x12 [ 63.306878] [<ffffffff81407c4c>] netlink_rcv_skb+0x3d/0x90 [ 63.308437] [<ffffffff813ec00e>] rtnetlink_rcv+0x21/0x28 [ 63.309916] [<ffffffff81407742>] netlink_unicast+0xfa/0x17f [ 63.311447] [<ffffffff81407a5e>] netlink_sendmsg+0x297/0x2dc [ 63.313029] [<ffffffff813c6cd4>] sock_sendmsg_nosec+0x12/0x1d [ 63.314597] [<ffffffff813c835b>] ___sys_sendmsg+0x196/0x21b [ 63.316125] [<ffffffff8100bf9f>] ? native_sched_clock+0x1f/0x3c [ 63.317671] [<ffffffff8106c12f>] ? sched_clock_local+0x12/0x75 [ 63.319185] [<ffffffff8106c397>] ? sched_clock_cpu+0x9d/0xb6 [ 63.320693] [<ffffffff8107e2d7>] ? __lock_is_held+0x32/0x54 [ 63.322145] [<ffffffff81159fcb>] ? __fget_light+0x4b/0x77 [ 63.323541] [<ffffffff813c8726>] __sys_sendmsg+0x3d/0x5b [ 63.324947] [<ffffffff813c8751>] SyS_sendmsg+0xd/0x19 [ 63.326274] [<ffffffff814c8f57>] entry_SYSCALL_64_fastpath+0x12/0x6f It looks like all of the code paths to fib_rebalance are under rtnl. Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath") Cc: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16via-rhine: fix VLAN receive handling regression.Andrej Ota
Because eth_type_trans() consumes ethernet header worth of bytes, a call to read TCI from end of packet using rhine_rx_vlan_tag() no longer works as it's reading from an invalid offset. Tested to be working on PCEngines Alix board. Fixes: 810f19bcb862 ("via-rhine: add consistent memory barrier in vlan receive code.") Signed-off-by: Andrej Ota <andrej@ota.si> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_putTom Herbert
Currently, is only called from __prog_put_rcu in the bpf_prog_release path. Need this to call this from bpf_prog_put also to get correct accounting. Fixes: aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: Tom Herbert <tom@herbertland.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16Merge branch 'robust_listener'David S. Miller
Eric Dumazet says: ==================== tcp/dccp: make our listener code more robust This patch series addresses request sockets leaks and listener dismantle phase. This survives a stress test with listeners being added/removed quite randomly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16tcp/dccp: fix race at listener dismantle phaseEric Dumazet
Under stress, a close() on a listener can trigger the WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop() We need to test if listener is still active before queueing a child in inet_csk_reqsk_queue_add() Create a common inet_child_forget() helper, and use it from inet_csk_reqsk_queue_add() and inet_csk_listen_stop() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helperEric Dumazet
Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16Revert "inet: fix double request socket freeing"Eric Dumazet
This reverts commit c69736696cf3742b37d850289dc0d7ead177bb14. At the time of above commit, tcp_req_err() and dccp_req_err() were dead code, as SYN_RECV request sockets were not yet in ehash table. Real bug was fixed later in a different commit. We need to revert to not leak a refcount on request socket. inet_csk_reqsk_queue_drop_and_put() will be added in following commit to make clean inet_csk_reqsk_queue_drop() does not release the reference owned by caller. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16Merge branch 'ipv6-blackhole-route-fix'David S. Miller
Martin KaFai Lau says: ==================== ipv6: Initialize rt6_info properly in ip6_blackhole_route() This patchset ensures the rt6_info's fields are initialized properly in ip6_blackhole_route() where xfrm_policy is the primarily user. The first patch is a prep work. The second patch is the fix. It fixes d52d3997f843 ("ipv6: Create percpu rt6_info"). Here is the oops reported by Phil Sutter <phil@nwl.cc>: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [<ffffffff8171a95e>] __ip6_datagram_connect+0x71e/0xa20 PGD c2cb1067 PUD c2d7a067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: cmac nfs lockd grace sunrpc bridge stp llc nvidia(PO) snd_usb_audio snd_usbmidi_lib iTCO_wdt CPU: 1 PID: 2964 Comm: ping6 Tainted: P O 4.2.1-aufs #10 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./4Core1333-Viiv, BIOS P1.60 07/01/2008 task: ffff8800ca62bc00 ti: ffff880129a14000 task.ti: ffff880129a14000 RIP: 0010:[<ffffffff8171a95e>] [<ffffffff8171a95e>] __ip6_datagram_connect+0x71e/0xa20 RSP: 0018:ffff880129a17da8 EFLAGS: 00010296 RAX: 000000000000000b RBX: 0000000000000000 RCX: 0000000000000006 RDX: 0000000000000007 RSI: 0000000000000246 RDI: ffff88012fc8d5a0 RBP: ffff8800cb9a9048 R08: 756e207369207472 R09: 216c6c756e207369 R10: 0000000000000665 R11: 0000000000000006 R12: ffff8800cb9a8cf8 R13: ffff8800cb9a8cf8 R14: 0000000000000000 R15: ffff8800cb9a8cc0 FS: 00007fb76ad74700(0000) GS:ffff88012fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a0 CR3: 00000000c2dba000 CR4: 00000000000406e0 Stack: ffff8800cb9a9048 ffff8800cb9a8de0 ffff8800cb9feb70 ffffffff816b2c41 00007fb70000000b ffffea0000df7200 ffff8800cb9f5cfc ffff8800cb9a8cc0 03fffffffe068a20 ffff8800cb9a8cc0 ffffffff817097c0 0000000100000000 Call Trace: [<ffffffff816b2c41>] ? udp_lib_get_port+0x1a1/0x380 [<ffffffff817097c0>] ? udpv6_rcv+0x20/0x20 [<ffffffff8171ac82>] ? ip6_datagram_connect+0x22/0x40 [<ffffffff8163ae9b>] ? SyS_connect+0x6b/0xb0 [<ffffffff810767ac>] ? __do_page_fault+0x15c/0x380 [<ffffffff8163a8d3>] ? SyS_socket+0x63/0xa0 [<ffffffff81741957>] ? entry_SYSCALL_64_fastpath+0x12/0x6a Code: ba ae 00 00 00 48 c7 c6 7b 71 94 81 48 c7 c7 63 71 94 81 e8 6c 0f 02 00 48 85 db 75 0e 48 c7 c7 9f 71 94 81 31 c0 e8 59 0f 02 00 <48> 83 bb a0 00 00 00 00 75 0e 48 c7 c7 ae 71 94 81 31 c0 e8 41 RIP [<ffffffff8171a95e>] __ip6_datagram_connect+0x71e/0xa20 RSP <ffff880129a17da8> CR2: 00000000000000a0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16ipv6: Initialize rt6_info properly in ip6_blackhole_route()Martin KaFai Lau
ip6_blackhole_route() does not initialize the newly allocated rt6_info properly. This patch: 1. Call rt6_info_init() to initialize rt6i_siblings and rt6i_uncached 2. The current rt->dst._metrics init code is incorrect: - 'rt->dst._metrics = ort->dst._metris' is not always safe - Not sure what dst_copy_metrics() is trying to do here considering ip6_rt_blackhole_cow_metrics() always returns NULL Fix: - Always do dst_copy_metrics() - Replace ip6_rt_blackhole_cow_metrics() with dst_cow_metrics_generic() 3. Mask out the RTF_PCPU bit from the newly allocated blackhole route. This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie(). It is because RTF_PCPU is set while rt->dst.from is NULL. Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reported-by: Phil Sutter <phil@nwl.cc> Tested-by: Phil Sutter <phil@nwl.cc> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16ipv6: Move common init code for rt6_info to a new function rt6_info_init()Martin KaFai Lau
Introduce rt6_info_init() to do the common init work for 'struct rt6_info' (after calling dst_alloc). It is a prep work to fix the rt6_info init logic in the ip6_blackhole_route(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: Phil Sutter <phil@nwl.cc> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16Bluetooth: Fix initializing conn_params in scan phaseJakub Pawlowski
This patch makes sure that conn_params that were created just for explicit_connect, will get properly deleted during cleanup. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanupJohan Hedberg
After clearing the params->explicit_connect variable the parameters may need to be either added back to the right list or potentially left absent from both the le_reports and the le_conns lists. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16Bluetooth: Fix remove_device behavior for explicit connectsJohan Hedberg
Devices undergoing an explicit connect should not have their conn_params struct removed by the mgmt Remove Device command. This patch fixes the necessary checks in the command handler to correct the behavior. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16Bluetooth: Fix LE reconnection logicJohan Hedberg
We can't use hci_explicit_connect_lookup() since that would only cover explicit connections, leaving normal reconnections completely untouched. Not using it in turn means leaving out entries in pend_le_reports. To fix this and simplify the logic move conn params from the reports list to the pend_le_conns list for the duration of an explicit connect. Once the connect is complete move the params back to the pend_le_reports list. This also means that the explicit connect lookup function only needs to look into the pend_le_conns list. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16Bluetooth: Fix reference counting for LE-scan based connectionsJohan Hedberg
The code should never directly call hci_conn_hash_del since many cleanup & reference counting updates would be lost. Normally hci_conn_del is the right thing to do, but in the case of a connection doing LE scanning this could cause a deadlock due to doing a cancel_delayed_work_sync() on the same work callback that we were called from. Connections in the LE scanning state actually need very little cleanup - just a small subset of hci_conn_del. To solve the issue, refactor out these essential pieces into a new hci_conn_cleanup() function and call that from the two necessary places. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16Bluetooth: Fix double scan updatesJakub Pawlowski
When disable/enable scan command is issued twice, some controllers will return an error for the second request, i.e. requests with this command will fail on some controllers, and succeed on others. This patch makes sure that unnecessary scan disable/enable commands are not issued. When adding device to the auto connect whitelist when there is pending connect attempt, there is no need to update scan. hci_connect_le_scan_cleanup is conditionally executing hci_conn_params_del, that is calling hci_update_background_scan. Make the other case also update scan, and remove reduntand call from hci_connect_le_scan_remove. When stopping interleaved discovery the state should be set to stopped only when both LE scanning and discovery has stopped. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Acked-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-16drivers/net: get rid of unnecessary initializations in .get_drvinfo()Ivan Vecera
Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len, eedump_len & regdump_len fields in their .get_drvinfo() ethtool op. It's not necessary as these fields is filled in ethtool_get_drvinfo(). v2: removed unused variable v3: removed another unused variable Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15Merge branch 'tipc-link-improvements'David S. Miller
Jon Maloy says: ==================== tipc: some link level code improvements Extensive testing has revealed some weaknesses and non-optimal solutions in the link level code. This commit series addresses those issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: update node FSM when peer RESET message is receivedJon Paul Maloy
The change made in the previous commit revealed a small flaw in the way the node FSM is updated. When the function tipc_node_link_down() is called for the last link to a node, we should check whether this was caused by a local reset or by a received RESET message from the peer. In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to the node FSM, so that it is ready to re-establish contact. If this is not done, the peer node will sometimes have to go through a second establish cycle before the link becomes stable. We fix this in this commit by conditionally issuing the mentioned event in the function tipc_node_link_down(). We also move LINK_RESET FSM even away from the link_reset() function and into the caller function, partially because it is easier to follow the code when state changes are gathered at a limited number of locations, partially because there will be cases in future commits where we don't want the link to go RESET mode when link_reset() is called. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: send out RESET immediately when link goes downJon Paul Maloy
When a link is taken down because of a node local event, such as disabling of a bearer or an interface, we currently leave it to the peer node to discover the broken communication. The default time for such failure discovery is 1.5-2 seconds. If we instead allow the terminating link endpoint to send out a RESET message at the moment it is reset, we can achieve the impression that both endpoints are going down instantly. Since this is a very common scenario, we find it worthwhile to make this small modification. Apart from letting the link produce the said message, we also have to ensure that the interface is able to transmit it before TIPC is detached. We do this by performing the disabling of a bearer in three steps: 1) Disable reception of TIPC packets from the interface in question. 2) Take down the links, while allowing them so send out a RESET message. 3) Disable transmission of TIPC packets on the interface. Apart from this, we now have to react on the NETDEV_GOING_DOWN event, instead of as currently the NEDEV_DOWN event, to ensure that such transmission is possible during the teardown phase. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: delay ESTABLISH state event when link is establishedJon Paul Maloy
Link establishing, just like link teardown, is a non-atomic action, in the sense that discovering that conditions are right to establish a link, and the actual adding of the link to one of the node's send slots is done in two different lock contexts. The link FSM is designed to help bridging the gap between the two contexts in a safe manner. We have now discovered a weakness in the implementaton of this FSM. Because we directly let the link go from state LINK_ESTABLISHING to state LINK_ESTABLISHED already in the first lock context, we are unable to distinguish between a fully established link, i.e., a link that has been added to its slot, and a link that has not yet reached the second lock context. It may hence happen that a manual intervention, e.g., when disabling an interface, causes the function tipc_node_link_down() to try removing the link from the node slots, decrementing its active link counter etc, although the link was never added there in the first place. We solve this by delaying the actual state change until we reach the second lock context, inside the function tipc_node_link_up(). This makes it possible for potentail callers of __tipc_node_link_down() to know if they should proceed or not, and the problem is solved. Unforunately, the situation described above also has a second problem. Since there by necessity is a tipc_node_link_up() call pending once the node lock has been released, we must defuse that call by setting the link back from LINK_ESTABLISHING to LINK_RESET state. This forces us to make a slight modification to the link FSM, which will now look as follows. +------------------------------------+ |RESET_EVT | | | | +--------------+ | +-----------------| SYNCHING |-----------------+ | |FAILURE_EVT +--------------+ PEER_RESET_EVT| | | A | | | | | | | | | | | | | | |SYNCH_ |SYNCH_ | | | |BEGIN_EVT |END_EVT | | | | | | | V | V V | +-------------+ +--------------+ +------------+ | | RESETTING |<---------| ESTABLISHED |--------->| PEER_RESET | | +-------------+ FAILURE_ +--------------+ PEER_ +------------+ | | EVT | A RESET_EVT | | | | | | | | +----------------+ | | | RESET_EVT| |RESET_EVT | | | | | | | | | | |ESTABLISH_EVT | | | | +-------------+ | | | | | | RESET_EVT | | | | | | | | | | | V V V | | | | +-------------+ +--------------+ RESET_EVT| +--->| RESET |--------->| ESTABLISHING |<----------------+ +-------------+ PEER_ +--------------+ | A RESET_EVT | | | | | | | |FAILOVER_ |FAILOVER_ |FAILOVER_ |BEGIN_EVT |END_EVT |BEGIN_EVT | | | V | | +-------------+ | | FAILINGOVER |<----------------+ +-------------+ Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: disallow packet duplicates in link deferred queueJon Paul Maloy
After the previous commits, we are guaranteed that no packets of type LINK_PROTOCOL or with illegal sequence numbers will be attempted added to the link deferred queue. This makes it possible to make some simplifications to the sorting algorithm in the function tipc_skb_queue_sorted(). We also alter the function so that it will drop packets if one with the same seqeunce number is already present in the queue. This is necessary because we have identified weird packet sequences, involving duplicate packets, where a legitimate in-sequence packet may advance to the head of the queue without being detected and de-queued. Finally, we make this function outline, since it will now be called only in exceptional cases. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: improve sequence number checkingJon Paul Maloy
The sequence number of an incoming packet is currently only checked for less than, equality to, or bigger than the next expected number, meaning that the receive window in practice becomes one half sequence number cycle, or U16_MAX/2. This does not make sense, and may not even be safe if there are extreme delays in the network. Any packet sent by the peer during the ongoing cycle must belong inside his current send window, or should otherwise be dropped if possible. Since a link endpoint cannot know its peer's current send window, it has to base this sanity check on a worst-case assumption, i.e., that the peer is using a maximum sized window of 8191 packets. Using this assumption, we now add a check that the sequence number is not bigger than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks done, so that the receive window test is performed before the gap test. This way, we are guaranteed that no packet with illegal sequence numbers are ever added to the deferred queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: simplify tipc_link_rcv() reception loopJon Paul Maloy
Currently, all packets received in tipc_link_rcv() are unconditionally added to the packet deferred queue, whereafter that queue is walked and all its buffers evaluated for delivery. This is both non-optimal and and makes the queue sorting function unnecessary complex. This commit changes the loop so that an arrived packet is evaluated first, and added to the deferred queue only when a sequence number gap is discovered. A non-empty deferred queue is walked until it is empty or until its head's sequence number doesn't fit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15tipc: limit usage of temporary skb list during packet receptionJon Paul Maloy
During packet reception, the function tipc_link_rcv() adds its accepted packets to a temporary buffer queue, before finally splicing this queue into the lock protected input queue that will be delivered up to the socket layer. The purpose is to reduce potential contention on the input queue lock. However, since the vast majority of packets arrive in sequence, they will anyway be added one by one to the input queue, and the use of the temporary queue becomes a sub-optimization. The only case where this queue makes sense is when unpacking buffers from a bundle packet; here we want to avoid dozens of small buffers to be added individually to the lock-protected input queue in a tight loop. In this commit, we remove the general usage of the temporary queue, and keep it only for the packet unbundling case. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlx4: corretly check failed allocationInsu Yun
When allocation fails, mlx4_alloc_cmd_mailbox returns -ENOMEM. Since there is no case that mlx4_alloc_cmd_mailbox returns NULL, it needs to be checked by IS_ERR, not IS_ERR_OR_NULL Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15bonding: support encapsulated ipv6 TSOEric Dumazet
If using a sixtofour device on top of a bonding device, skb segmentation of TCP traffic is done right before calling bonding xmit, because bonding only enables TSO for IPv4. This patch improves single flow performance by about 120 % on my hosts, because segmentation is deferred right before calling slave xmit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15Merge branch 'mlxsw-cleanups'David S. Miller
Jiri Pirko says: ==================== mlxsw: Driver update, cleanups This patchset contains various cleanups and improvements in mlxsw driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: cmd: Update CONFIG_PROFILE command documentationIdo Schimmel
The meaning of certain parameters in the profile passed to the device during initialization has changed, so update their documentation accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: Add trap group for control packetsIdo Schimmel
Previously, we trapped flooded and control packets using the same trap group. This can cause flooded packets to overflow the PCI bus and prevent control packets (e.g. STP, LACP) from getting to the CPU. Solve this by splitting the RX trap group to RX and control, which allows us to configure a policer on the first, thereby preventing it from overflowing the PCI bus. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: Simplify traps creationIdo Schimmel
The Host Trap Group Table (HTGT) register configures trap groups, which are populated with trap IDs using the Host PacKet Trap (HPKT) register. However, a trap ID can only be present inside one trap group (the last configured). Instead of passing both the trap group and ID for the function that packs HPKT, pass only the trap ID and derive from it the trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: Introduce mlxsw_reg_spms_vid_pack helper and use itJiri Pirko
Introduce separate helper for packing SPMS VIDs, as it can be used for multiple VIDs and not only for one as previous SPMS pack function provided. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: reg: Adjust definition of enum mlxsw_reg_sfgc_typeIdo Schimmel
Define max which would be needed later on. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: reg: Remove extra space in SFGC ID defineJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: reg: Uppercase letters in register IDsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: Use dev_level_ratelimited instead of net_ratelimit & dev_levelJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: core: Do not use EMADs in mlxsw_emad_finiJiri Pirko
Be symmetric with mlxsw_emad_init and don't use EMADs in mlxsw_emad_fini cleanup function. Use command interface instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: pci: Limit number of entries being sent in single MAP_FA cmdJiri Pirko
Firmware accepts only limited number of mapping entries for MAP_FA command. In order to prevent overflow, introduce a limit and in case the number of entries is bigger, call MAP_FA multiple times. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: pci: Remove MLXSW_PCI_RDQS/SDQS defines and checksJiri Pirko
Remove strict number check of queues count as various ASICs have different counts. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: pci: Do not use MLXSW_PCI_SDQS_COUNT defineJiri Pirko
Use mlxsw_pci_sdq_count helper instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15mlxsw: pci: Use MLXSW_PCI_CQS_MAX instead of MLXSW_PCI_CQS_COUNTJiri Pirko
The count of CQs can be different for various ASICs, so just define maximal value and check for that. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>