summaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)Author
2014-10-09IB/iser: Extend iser_free_ib_conn_res()Sagi Grimberg
Put all connection IB related resources release in this routine. One exception is the cm_id which cannot be destroyed as the routine is protected by the state mutex. Also move its position to avoid forward declaration. While at it fix qp NULL assignment. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09IB/iser: Remove unused variables and dead codeRoi Dayan
Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09IB/iser: Re-introduce ib_connSagi Grimberg
Structure that describes the RDMA relates connection objects. Static member of iser_conn. This patch does not change any functionality Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09IB/iser: Rename ib_conn -> iser_connSagi Grimberg
Two reasons why we choose to do this: 1. No point today calling struct iser_conn by another name ib_conn 2. In the next patches we will restructure iser control plane representation - struct iser_conn: connection logical representation - struct ib_conn: connection RDMA layout representation This patch does not change any functionality. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-07net: better IFF_XMIT_DST_RELEASE supportEric Dumazet
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-05iser-target: Disable TX completion interrupt coalescingNicholas Bellinger
This patch explicitly disables TX completion interrupt coalescing logic in isert_put_response() and isert_put_datain() that was originally added as an efficiency optimization in commit 95b60f07. It has been reported that this change can trigger ABORT_TASK timeouts under certain small block workloads, where disabling coalescing was required for stability. According to Sagi, this doesn't impact overall performance, so go ahead and disable it for now. Reported-by: Moussa Ba <moussaba@micron.com> Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: <stable@vger.kernel.org> # 3.13+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-10-03iser-target: Fix smatch warningSagi Grimberg
Unused return value from down_interruptible Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-23Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: "Last late set of InfiniBand/RDMA fixes for 3.17: - fixes for the new memory region re-registration support - iSER initiator error path fixes - grab bag of small fixes for the qib and ocrdma hardware drivers - larger set of fixes for mlx4, especially in RoCE mode" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/mlx4: Fix VF mac handling in RoCE IB/mlx4: Do not allow APM under RoCE IB/mlx4: Don't update QP1 in native mode IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses IB/core: When marshaling uverbs path, clear unused fields IB/mlx4: Avoid executing gid task when device is being removed IB/mlx4: Fix lockdep splat for the iboe lock IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up IB/mlx4: Reorder steps in RoCE GID table initialization IB/mlx4: Don't duplicate the default RoCE GID IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() IB/iser: Bump version to 1.4.1 IB/iser: Allow bind only when connection state is UP IB/iser: Fix RX/TX CQ resource leak on error flow RDMA/ocrdma: Use right macro in query AH RDMA/ocrdma: Resolve L2 address when creating user AH mlx4: Correct error flows in rereg_mr IB/qib: Correct reference counting in debugfs qp_stats IPoIB: Remove unnecessary port query ...
2014-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) If the user gives us a msg_namelen of 0, don't try to interpret anything pointed to by msg_name. From Ani Sinha. 2) Fix some bnx2i/bnx2fc randconfig compilation errors. The gist of the issue is that we firstly have drivers that span both SCSI and networking. And at the top of that chain of dependencies we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are selected. But since select is a sledgehammer and ignores dependencies, everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also explicitly select their dependencies and so on and so forth. Generally speaking 'select' is supposed to only be used for child nodes, those which have no dependencies of their own. And this whole chain of dependencies in the scsi layer violates that rather strongly. So just make SCSI_NETLINK depend upon it's dependencies, and so on and so forth for the things selecting it (either directly or indirectly). From Anish Bhatt and Randy Dunlap. 3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert. 4) Actually notice netdev feature changes in rtl_open() code, from Hayes Wang. 5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov. 6) Missing memory barrier in sunvnet driver, from David Stevens. 7) Don't leave anycast addresses around when ipv6 interface is destroyed, from Sabrina Dubroca. 8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is initialized in SFC driver, from Edward Cree. 9) Fix missing DMA error checking in 3c59x, from Neal Horman. 10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently, fix from Samuel Gauthier. 11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a build error. 12) Fix macvlan regression wherein we stopped emitting broadcast/multicast frames over software devices. From Nicolas Dichtel. 13) Fix infiniband bug due to unintended overflow of skb->cb[], from Eric Dumazet. And add an assertion so this doesn't happen again. 14) dm9000_parse_dt() should return error pointers, not NULL. From Tobias Klauser. 15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma net: bcmgenet: fix TX reclaim accounting for fragments ipv4: do not use this_cpu_ptr() in preemptible context dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt() r8169: fix an if condition r8152: disable ALDPS ipoib: validate struct ipoib_cb size net: sched: shrink struct qdisc_skb_cb to 28 bytes tg3: Work around HW/FW limitations with vlan encapsulated frames macvlan: allow to enqueue broadcast pkt on virtual device pch_gbe: 'select' NET_PTP_CLASSIFY. scsi: Use 'depends' with LIBFC instead of 'select'. openvswitch: restore OVS_FLOW_CMD_NEW notifications genetlink: add function genl_has_listeners() lib: rhashtable: remove second linux/log2.h inclusion net: allow macvlans to move to net namespace 3c59x: Fix bad offset spec in skb_frag_dma_map 3c59x: Add dma error checking and recovery sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG can: at91_can: add missing prepare and unprepare of the clock ...
2014-09-22ipoib: validate struct ipoib_cb sizeEric Dumazet
To catch future errors sooner. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-nextRoland Dreier
2014-09-22IB/iser: Bump version to 1.4.1Or Gerlitz
Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/iser: Allow bind only when connection state is UPSagi Grimberg
We need to fail the bind operation if the iser connection state != UP (started teardown) and this should be done under the state lock. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22IB/iser: Fix RX/TX CQ resource leak on error flowRoi Dayan
When failing to allocate TX CQ we already allocated RX CQ, so we need to make sure we release it. Also, when failing to register notification to the RX CQ we currently leak both RX and TX CQs of the current index, fix that too. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19IPoIB: Remove unnecessary port queryAlex Estrin
There are two queries for port attributes one after another. A second call is not needed since port_attr structure already holds the data. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-15Target/iser: Fix initiator_depth and responder_resourcesSagi Grimberg
The iser target is the RDMA requester and the iser initiator is the RDMA responder. In order to determine the max inflight RDMA READ requests to set on the QP (initiator_depth), it should take the min between the initiator published initiator_depth and the max inflight rdma read requests its local HCA support (max_qp_init_rd_atom). The target will never handle incoming RDMA READ requests so no need to set responder_resources. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15Target/iser: Avoid calling rdma_disconnect twiceSagi Grimberg
rdma_disconnect may be called in 2 code flows: - isert_wait_conn: disconnect initiated be the target - disconnected_handler: disconnect invoked by the initiator In case isert_conn->disconnect is true then rdma_disconnect was called in disconnected handler, no need to call it again from isert_wait_conn. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15Target/iser: Don't put isert_conn inside disconnected handlerSagi Grimberg
disconnected_handler is invoked on several CM events (such as DISCONNECTED, DEVICE_REMOVAL, TIMEWAIT_EXIT...). Since multiple events can occur while before isert_free_conn is invoked, we might put all isert_conn references and free the connection too early. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15Target/iser: Get isert_conn reference once got to connected_handlerSagi Grimberg
In case the connection didn't reach connected state, disconnected handler will never be invoked thus the second kref_put on isert_conn will be missing. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-08-14Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma updates from Roland Dreier: "Main set of InfiniBand/RDMA updates for 3.17 merge window: - MR reregistration support - MAD support for RMPP in userspace - iSER and SRP initiator updates - ocrdma hardware driver updates - other fixes..." * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits) IB/srp: Fix return value check in srp_init_module() RDMA/ocrdma: report asic-id in query device RDMA/ocrdma: Update sli data structure for endianness RDMA/ocrdma: Obtain SL from device structure RDMA/uapi: Include socket.h in rdma_user_cm.h IB/srpt: Handle GID change events IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0] IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0] RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf() IPoIB: Remove unnecessary test for NULL before debugfs_remove() IB/mad: Add user space RMPP support IB/mad: add new ioctl to ABI to support new registration options IB/mad: Add dev_notice messages for various umad/mad registration failures IB/mad: Update module to [pr|dev]_* style print messages IB/ipoib: Avoid multicast join attempts with invalid P_key IB/umad: Update module to [pr|dev]_* style print messages IB/ipoib: Avoid flushing the workqueue from worker context IB/ipoib: Use P_Key change event instead of P_Key polling mechanism IB/ipath: Add P_Key change event support mlx4_core: Add support for secure-host and SMP firewall ...
2014-08-14Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', ↵Roland Dreier
'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next
2014-08-14IB/srp: Fix return value check in srp_init_module()Wei Yongjun
In case of error, the function create_workqueue() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12IB/srpt: Handle GID change eventsDoug Ledford
GID change events need a refresh just like LID change events and several others. Handle this the same as the others. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12IPoIB: Remove unnecessary test for NULL before debugfs_remove()Fabian Frederick
Fix checkpatch warning: WARNING: debugfs_remove(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10IB/mad: add new ioctl to ABI to support new registration optionsIra Weiny
Registrations options are specified through flags. Definitions of flags will be in subsequent patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10IB/ipoib: Avoid multicast join attempts with invalid P_keyAlex Estrin
Currently, the parent interface keeps sending broadcast group join requests even if p_key index 0 is invalid, which is possible/common in virtualized environments where a VF has been probed to VM but the actual P_key configuration has not yet been assigned by the management software. This creates unnecessary noise on the fabric and in the kernel logs: ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22 The original code run the multicast task regardless of the actual P_key value, which can be avoided. The fix is to re-init resources and bring interface up only if P_key index 0 is valid either when starting up or on PKEY_CHANGE event. Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05IB/ipoib: Avoid flushing the workqueue from worker contextErez Shitrit
The error flow of ipoib_ib_dev_open() invokes ipoib_ib_dev_stop() with workqueue flushing enabled, which deadlocks if the open procedure itself was called by a worker thread. Fix this by adding a flush enabled flag to ipoib_ib_dev_open() and set it accordingly from the locations where such a call is made. The call trace was the following: [<ffffffff81095bc4>] ? flush_workqueue+0x54/0x80 [<ffffffffa056c657>] ? ipoib_ib_dev_stop+0x447/0x650 [ib_ipoib] [<ffffffffa056cc34>] ? ipoib_ib_dev_open+0x284/0x430 [ib_ipoib] [<ffffffffa05674a8>] ? ipoib_open+0x78/0x1d0 [ib_ipoib] [<ffffffffa05697b8>] ? ipoib_pkey_open+0x38/0x40 [ib_ipoib] [<ffffffffa056cf3c>] ? __ipoib_ib_dev_flush+0x15c/0x2c0 [ib_ipoib] [<ffffffffa056ce56>] ? __ipoib_ib_dev_flush+0x76/0x2c0 [ib_ipoib] [<ffffffffa056d0a0>] ? ipoib_ib_dev_flush_heavy+0x0/0x20 [ib_ipoib] [<ffffffffa056d0ba>] ? ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib] [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40 Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05IB/ipoib: Use P_Key change event instead of P_Key polling mechanismErez Shitrit
The current code use a dedicated polling logic to determine when the P_Key assigned to the ipoib device is present in HCA port table and act accordingly. Move to use the code which acts upon getting PKEY_CHANGE event to handle this task and remove the P_Key polling logic/thread as they add extra complexity which isn't needed. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/srp: Fix residual handlingBart Van Assche
From Documentation/scsi/scsi_mid_low_api.txt: "resid - an LLD should set this signed integer to the requested transfer length (i.e. 'request_bufflen') less the number of bytes that are actually transferred." This means that resid > 0 in case of an underrun and also that resid < 0 in case of an overrun. Modify the SRP initiator code such that it matches this requirement. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: David Dillow <dave@thedillows.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/srp: Fix deadlock between host removal and multipathdBart Van Assche
If scsi_remove_host() is invoked after a SCSI device has been blocked, if the fast_io_fail_tmo or dev_loss_tmo work gets scheduled on the workqueue executing srp_remove_work() and if an I/O request is scheduled after the SCSI device had been blocked by e.g. multipathd then the following deadlock can occur: kworker/6:1 D ffff880831f3c460 0 195 2 0x00000000 Call Trace: [<ffffffff814aafd9>] schedule+0x29/0x70 [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0 [<ffffffff8105af6f>] msleep+0x2f/0x40 [<ffffffff8123b0ae>] __blk_drain_queue+0x4e/0x180 [<ffffffff8123d2d5>] blk_cleanup_queue+0x225/0x230 [<ffffffffa0010732>] __scsi_remove_device+0x62/0xe0 [scsi_mod] [<ffffffffa000ed2f>] scsi_forget_host+0x6f/0x80 [scsi_mod] [<ffffffffa0002eba>] scsi_remove_host+0x7a/0x130 [scsi_mod] [<ffffffffa07cf5c5>] srp_remove_work+0x95/0x180 [ib_srp] [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0 [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0 [<ffffffff810758bd>] kthread+0xed/0x110 [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0 multipathd D ffff880096acc460 0 5340 1 0x00000000 Call Trace: [<ffffffff814aafd9>] schedule+0x29/0x70 [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0 [<ffffffff814ab79b>] io_schedule_timeout+0x9b/0xf0 [<ffffffff814abe1c>] wait_for_completion_io_timeout+0xdc/0x110 [<ffffffff81244b9b>] blk_execute_rq+0x9b/0x100 [<ffffffff8124f665>] sg_io+0x1a5/0x450 [<ffffffff8124fd21>] scsi_cmd_ioctl+0x2a1/0x430 [<ffffffff8124fef2>] scsi_cmd_blk_ioctl+0x42/0x50 [<ffffffffa00ec97e>] sd_ioctl+0xbe/0x140 [sd_mod] [<ffffffff8124bd04>] blkdev_ioctl+0x234/0x840 [<ffffffff811cb491>] block_ioctl+0x41/0x50 [<ffffffff811a0df0>] do_vfs_ioctl+0x300/0x520 [<ffffffff811a1051>] SyS_ioctl+0x41/0x80 [<ffffffff814b9962>] tracesys+0xd0/0xd5 Fix this by scheduling removal work on another workqueue than the transport layer timers. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: David Dillow <dave@thedillows.org> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Clarify a duplicate counters checkRoi Dayan
This is to prevent someone from thinking that this code section is redundant. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Replace connection waitqueue with completion objectAriel Nahum
Instead of waiting for events and condition changes of the iser connection state, we wait for explicit completion of connection establishment and teardown. Separate connection establishment wait object from the teardown object to avoid a situation where racing connection establishment and teardown may concurrently wakeup each other. ep_poll will wait for up_completion invoked by iser_connected_handler() and iser release worker will wait for flush_completion before releasing the connection. Bound the completion wait with a 30 seconds timeout for cases where iscsid (the user space iscsi daemon) is too slow or gone. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Protect iser state machine with a mutexAriel Nahum
The iser connection state lookups and transitions are not fully protected. Some transitions are protected with a spinlock, and in some cases the state is accessed unprotected due to specific assumptions of the flow. Introduce a new mutex to protect the connection state access. We use a mutex since we need to also include a scheduling operations executed under the state lock. Each state transition/condition and its corresponding action will be protected with the state mutex. The rdma_cm events handler acquires the mutex when handling connection events. Since iser connection state can transition to DOWN concurrently during connection establishment, we bailout from addr/route resolution events when the state is not PENDING. This addresses a scenario where ep_poll retries expire during CMA connection establishment. In this case ep_disconnect is invoked while CMA events keep coming (address/route resolution, connected, etc...). Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Remove redundant return code in iser_free_ib_conn_res()Sagi Grimberg
Make it void. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Seperate iser_conn and iscsi_endpoint storage spaceAriel Nahum
iser connection needs asynchronous cleanup completions which are triggered in ep_disconnect. As a result we are keeping the corresponding iscsi_endpoint structure hanging for no good reason. In order to avoid that, we seperate iser_conn from iscsi_endpoint storage space to have their destruction being independent. iscsi_endpoint will be destroyed at ep_disconnect stage, while the iser connection will wait for asynchronous completions to be released in an orderly fashion. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Fix responder resources advertisementSagi Grimberg
The iser initiator is the RDMA responder so it should publish to the target the max inflight rdma read requests its local HCA can handle in responder_resources (max_qp_rd_atom). The iser target should take the min of that and its local HCA max inflight oustanding rdma read requests (max_qp_init_rd_atom). We keep initiator_depth set to 1 in order to compat with old targets. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Add TIMEWAIT_EXIT event handlingRoi Dayan
In case the DISCONNECTED event is not delivered after rdma_disconnect is called, the CM waits TIMEWAIT seconds and delivers the TIMEWAIT_EXIT local event. We use this as the notification needed to continue in the teardown and release sequence. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01IB/iser: Support IPv6 address familyRoi Dayan
Replace struct sockaddr_in with struct sockaddr which supports both IPv4 and IPv6, and print using the %pIS format directive. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-15net: set name_assign_type in alloc_netdev()Tom Gundersen
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The highlights this round include: - Add support for T10 PI pass-through between vhost-scsi + virtio-scsi (MST + Paolo + MKP + nab) - Add support for T10 PI in qla2xxx target mode (Quinn + MKP + hch + nab, merged through scsi.git) - Add support for percpu-ida pre-allocation in qla2xxx target code (Quinn + nab) - A number of iser-target fixes related to hardening the network portal shutdown path (Sagi + Slava) - Fix response length residual handling for a number of control CDBs (Roland + Christophe V.) - Various iscsi RFC conformance fixes in the CHAP authentication path (Tejas and Calsoft folks + nab) - Return TASK_SET_FULL status for tcm_fc(FCoE) DataIn + Response failures (Vasu + Jun + nab) - Fix long-standing ABORT_TASK + session reset hang (nab) - Convert iser-initiator + iser-target to include T10 bytes into EDTL (Sagi + Or + MKP + Mike Christie) - Fix NULL pointer dereference regression related to XCOPY introduced in v3.15 + CC'ed to v3.12.y (nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (34 commits) target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd vhost-scsi: Include prot_bytes into expected data transfer length TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire libiscsi, iser: Adjust data_length to include protection information scsi_cmnd: Introduce scsi_transfer_length helper target: Report correct response length for some commands target/sbc: Check that the LBA and number of blocks are correct in VERIFY target/sbc: Remove sbc_check_valid_sectors() Target/iscsi: Fix sendtargets response pdu for iser transport Target/iser: Fix a wrong dereference in case discovery session is over iser iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak target: Use complete_all for se_cmd->t_transport_stop_comp target: Set CMD_T_ACTIVE bit for Task Management Requests target: cleanup some boolean tests target/spc: Simplify INQUIRY EVPD=0x80 tcm_fc: Generate TASK_SET_FULL status for response failures tcm_fc: Generate TASK_SET_FULL status for DataIN failures iscsi-target: Reject mutual authentication with reflected CHAP_C iscsi-target: Remove no-op from iscsit_tpg_del_portal_group iscsi-target: Fix CHAP_A parameter list handling ...
2014-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-11libiscsi, iser: Adjust data_length to include protection informationSagi Grimberg
In case protection information exists over the wire iscsi header data length is required to include it. Use protection information aware scsi helpers to set the correct transfer length. In order to avoid breakage, remove iser transfer length checks for each task as they are not always true and somewhat redundant anyway. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11Target/iscsi: Fix sendtargets response pdu for iser transportSagi Grimberg
In case the transport is iser we should not include the iscsi target info in the sendtargets text response pdu. This causes sendtargets response to include the target info twice. Modify iscsit_build_sendtargets_response to filter transport types that don't match. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Slava Shwartsman <valyushash@gmail.com> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11Target/iser: Fix a wrong dereference in case discovery session is over iserSagi Grimberg
In case the discovery session is carried over iser, we can't access the assumed network portal since the default portal is used. In this case we don't really need to allocate the fastreg pool, just prepare to the text pdu that will follow. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Alex Tabachnik <alext@mellanox.com> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-10Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull main InfiniBand/RDMA updates from Roland Dreier: - add iWARP port mapper to avoid conflicts between RDMA and normal stack TCP connections. - fixes for i386 / x86-64 structure padding differences (ABI compatibility for 32-on-64) from Yann Droneaud. - a pile of SRP initiator fixes from Bart Van Assche. - fixes for a writeback / memory allocation deadlock with NFS over IPoIB connected mode from Jiri Kosina. - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level drivers. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits) RDMA/cxgb4: Add support for iWARP Port Mapper user space service RDMA/nes: Add support for iWARP Port Mapper user space service RDMA/core: Add support for iWARP Port Mapper user space service IB/mlx4: Fix gfp passing in create_qp_common() IB/umad: Fix use-after-free on close IB/core: Fix kobject leak on device register error flow RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp mlx4_core: Fix GFP flags parameters to be gfp_t IB/core: Fix port kobject deletion during error flow IB/core: Remove unneeded kobject_get/put calls IB/core: Fix sparse warnings about redeclared functions IB/mad: Fix sparse warning about gfp_t use IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO IB: Add a QP creation flag to use GFP_NOIO allocations IB: Return error for unsupported QP creation flags IB: Allow build of hw/ and ulp/ subdirectories independently mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp IB/srp: Avoid problems if a header uses pr_fmt IB/umad: Fix error handling ...
2014-06-10Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', ↵Roland Dreier
'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next
2014-06-03iser-target: Add missing target_put_sess_cmd for ImmedateData failureNicholas Bellinger
This patch addresses a bug where an early exception for SCSI WRITE with ImmediateData=Yes was missing the target_put_sess_cmd() call to drop the extra se_cmd->cmd_kref reference obtained during the normal iscsit_setup_scsi_cmd() codepath execution. This bug was manifesting itself during session shutdown within isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would end up waiting indefinately for the last se_cmd->cmd_kref put to occur for the failed SCSI WRITE + ImmediateData descriptors. This fix follows what traditional iscsi-target code already does for the same failure case within iscsit_get_immediate_data(). Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-02IB: Add a QP creation flag to use GFP_NOIO allocationsOr Gerlitz
This addresses a problem where NFS client writes over IPoIB connected mode may deadlock on memory allocation/writeback. The problem is not directly memory reclamation. There is an indirect dependency between network filesystems writing back pages and ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot make forward progress until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself waiting for network transmission. Ordinarily this situation may be avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information. To address this, take a general approach and add a new QP creation flag that tells the low-level hardware driver to use GFP_NOIO for the memory allocations related to the new QP. Use the new flag in the ipoib connected mode path, and if the driver doesn't support it, re-issue the QP creation without the flag. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02IB: Allow build of hw/ and ulp/ subdirectories independentlyYann Droneaud
It is not possible to build only the drivers/infiniband/hw/ (or ulp/) subdirectory with command such as: $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/ This fails with following error messages: make[2]: Nothing to be done for `all'. make[2]: Nothing to be done for `relocs'. CHK include/config/kernel.release Using /home/ydroneaud/src/linux as source for kernel GEN /home/ydroneaud/src/linux/obj-x86_64/Makefile CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALL /home/ydroneaud/src/linux/scripts/checksyscalls.sh /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'. Stop. make[1]: *** [drivers/infiniband/hw/] Error 2 make: *** [sub-make] Error 2 This patch creates a Makefile in hw/ and ulp/ and moves each corresponding parts of drivers/infiniband/Makefile in the new Makefiles. It should not break build except if some hw/ drivers or ulp/ were allowed previously to be built while CONFIG_INFINIBAND is set to 'n', but according to drivers/infiniband/Kconfig, it's not possible. So it should be safe to apply. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29IB/srp: Avoid problems if a header uses pr_fmtJoe Perches
SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of header files before it gets around to defining PFX. This causes problems if any of the header files do a pr_... and use pr_fmt(). Fix this by using KBUILD_MODNAME instead of the private PFX. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Roland Dreier <roland@purestorage.com>