summaryrefslogtreecommitdiff
path: root/drivers/infiniband/ulp
AgeCommit message (Collapse)Author
2014-12-20Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI update from James Bottomley: "This is a much shorter set of patches that were on the go but didn't make it in to the early pull request for the merge window. It's really a set of bug fixes plus some final cleanup work on the new tag queue API" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: storvsc: ring buffer failures may result in I/O freeze ipr: set scsi_level correctly for disk arrays ipr: add support for async scanning to speed up boot scsi_debug: fix missing "break;" in SDEBUG_UA_CAPACITY_CHANGED case scsi_debug: take sdebug_host_list_lock when changing capacity scsi_debug: improve driver description in Kconfig scsi_debug: fix compare and write errors qla2xxx: fix race in handling rport deletion during recovery causes panic scsi: blacklist RSOC for Microsoft iSCSI target devices scsi: fix random memory corruption with scsi-mq + T10 PI Revert "[SCSI] mpt3sas: Remove phys on topology change" Revert "[SCSI] mpt2sas: Remove phys on topology change." esas2r: Correct typos of "validate" in a comment fc: FCP_PTA_SIMPLE is 0 ibmvfc: remove unused tag variable scsi: remove MSG_*_TAG defines scsi: remove scsi_set_tag_type scsi: remove scsi_get_tag_type scsi: never drop to untagged mode during queue ramp down scsi: remove ->change_queue_type method
2014-12-19Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "The highlights this merge window include: - Allow target fabric drivers to function as built-in. (Roland) - Fix tcm_loop multi-TPG endpoint nexus bug. (Hannes) - Move per device config_item_type into se_subsystem_api, allowing configfs attributes to be defined at module_init time. (Jerome + nab) - Convert existing IBLOCK/FILEIO/RAMDISK/PSCSI/TCMU drivers to use external configfs attributes. (nab) - A number of iser-target fixes related to active session + network portal shutdown stability during extended stress testing. (Sagi + Slava) - Dynamic allocation of T10-PI contexts for iser-target, fixing a potentially bogus iscsi_np->tpg_np pointer reference in >= v3.14 code. (Sagi) - iser-target performance + scalability improvements. (Sagi) - Fixes for SPC-4 Persistent Reservation AllRegistrants spec compliance. (Ilias + James + nab) - Avoid potential short kern_sendmsg() in iscsi-target for now until Al's conversion to use msghdr iteration is merged post -rc1. (Viro) Also, Sagi has requested a number of iser-target patches (9) that address stability issues he's encountered during extended stress testing be considered for v3.10.y + v3.14.y code. Given the amount of LOC involved, it will certainly require extra backporting effort. Apologies in advance to Greg-KH & Co on this. Sagi and I will be working post-merge to ensure they each get applied correctly" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (53 commits) target: Allow AllRegistrants to re-RESERVE existing reservation uapi/linux/target_core_user.h: fix headers_install.sh badness iscsi-target: Fail connection on short sendmsg writes iscsi-target: nullify session in failed login sequence target: Avoid dropping AllRegistrants reservation during unregister target: Fix R_HOLDER bit usage for AllRegistrants iscsi-target: Drop left-over bogus iscsi_np->tpg_np iser-target: Fix wc->wr_id cast warning iser-target: Remove code duplication iser-target: Adjust log levels and prettify some prints iser-target: Use debug_level parameter to control logging level iser-target: Fix logout sequence iser-target: Don't wait for session commands from completion context iser-target: Reduce CQ lock contention by batch polling iser-target: Introduce isert_poll_budget iser-target: Remove an atomic operation from the IO path iser-target: Remove redundant call to isert_conn_terminate iser-target: Use single CQ for TX and RX iser-target: Centralize completion elements to a context iser-target: Cast wr_id with uintptr_t instead of unsinged long ...
2014-12-18Merge remote-tracking branch 'scsi-queue/drivers-for-3.19' into for-linusJames Bottomley
2014-12-15Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and ↵Roland Dreier
'srp' into for-next
2014-12-15IB/srp: Allow newline separator for connection stringSagi Grimberg
In case the last argument of the connection string is processed as a string (destination GID for example). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Bump version to 1.5Or Gerlitz
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: DIX updateSagi Grimberg
Following few recent Block integrity updates, we align the iSER data integrity offload settings with: - Deprecate pi_guard module param - Expose support for DIX type 0. - Use scsi_transfer_length for the transfer length - Get pi_interval, ref_tag, ref_remap, bg_type and check_mask setting from scsi_cmnd Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Micro-optimize iser_handle_wcSagi Grimberg
Use likely() for wc.status == IB_WC_SUCCESS Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Micro-optimize iser loggingSagi Grimberg
And fix a checkpatch warning. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Use more completion queuesSagi Grimberg
No reason to settle with four, can use the min between device max comp vectors and number of cores. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Remove redundant is_mr indicatorSagi Grimberg
It is enough to check mem_h pointer assignment, mem_h == NULL will indicate that buffer is not registered using mr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Centralize memory region invalidation to a functionSagi Grimberg
Eliminates code duplication. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Terminate connection before cleaning inflight tasksSagi Grimberg
When closing the connection, we should first terminate the connection (in case it was not previously terminated) to guarantee the QP is in error state and we are done with servicing IO. Only then go ahead with tasks cleanup via iscsi_conn_stop. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Fix race between iser connection teardown and scsi TMFsSagi Grimberg
In certain scenarios (target kill with live IO) scsi TMFs may race with iser RDMA teardown, which might cause NULL dereference on iser IB device handle (which might have been freed). In this case we take a conditional lock for TMFs and check the connection state (avoid introducing lock contention in the IO path). This is indeed best effort approach, but sufficient to survive multi targets sudden death while heavy IO is inflight. While we are on it, add a nice kernel-doc style documentation. Reported-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Fix possible NULL derefernce ib_conn->device in session_createAriel Nahum
If rdma_cm error event comes after ep_poll but before conn_bind, we should protect against dereferncing the device (which may have been terminated) in session_create and conn_create (already protected) callbacks. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Fix sparse warningsSagi Grimberg
Use uintptr_t to handle wr_id casting, which was found by Kbuild test robot and smatch. Also remove an internal definition of variable which potentially shadows an external one (and make sparse happy). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Fix possible SQ overflowMax Gurtovoy
Fix a regression was introduced in commit 6df5a128f0fd ("IB/iser: Suppress scsi command send completions"). The sig_count was wrongly set to be static variable, thus it is possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0 condition (due to races) and the send queue will be overflowed. Instead keep sig_count per connection. We don't need it to be atomic as we are safe under the iscsi session frwd_lock taken by libiscsi on the queuecommand path. Fixes: 6df5a128f0fd ("IB/iser: Suppress scsi command send completions") Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Decrement CQ's active QPs accounting when QP creation failsSagi Grimberg
When creating a connection QP we choose the least used CQ and inc the number of active QPs on that. If we fail to create the QP, we need to decrement the active QPs counter. Reported-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Collapse cleanup and disconnect handlersAriel Nahum
No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive). As for the cma_id, only destroy it if the state is not DOWN where in this case, conn_release is already running and we don't want to compete. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Fix catastrophic error flow hangSagi Grimberg
In case of the HCA going into catasrophic error flow, the beacon post_send is likely to fail, so surely there will be no completion for it. In this case, use a best effort approach and don't wait for beacon completion if we failed to post the send. Reported-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/iser: Re-adjust CQ and QP send ring sizes to HW limitsMinh Tran
Re-adjust max CQEs per CQ and max send_wr per QP according to the resource limits supported by underlying hardware. Signed-off-by: Minh Tran <minhduc.tran@emulex.com> Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: No longer use flush as a parameterDoug Ledford
Various places in the IPoIB code had a deadlock related to flushing the ipoib workqueue. Now that we have per device workqueues and a specific flush workqueue, there is no longer a deadlock issue with flushing the device specific workqueues and we can do so unilaterally. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: Make ipoib_mcast_stop_thread flush the workqueueDoug Ledford
We used to pass a flush variable to mcast_stop_thread to indicate if we should flush the workqueue or not. This was due to some code trying to flush a workqueue that it was currently running on which is a no-no. Now that we have per-device work queues, and now that ipoib_mcast_restart_task has taken the fact that it is queued on a single thread workqueue with all of the ipoib_mcast_join_task's and therefore has no need to stop the join task while it runs, we can do away with the flush parameter and unilaterally flush always. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: Use dedicated workqueues per interfaceDoug Ledford
During my recent work on the rtnl lock deadlock in the IPoIB driver, I saw that even once I fixed the apparent races for a single device, as soon as that device had any children, new races popped up. It turns out that this is because no matter how well we protect against races on a single device, the fact that all devices use the same workqueue, and flush_workqueue() flushes *everything* from that workqueue, we can have one device in the middle of a down and holding the rtnl lock and another totally unrelated device needing to run mcast_restart_task, which wants the rtnl lock and will loop trying to take it unless is sees its own FLAG_ADMIN_UP flag go away. Because the unrelated interface will never see its own ADMIN_UP flag drop, the interface going down will deadlock trying to flush the queue. There are several possible solutions to this problem: Make carrier_on_task and mcast_restart_task try to take the rtnl for some set period of time and if they fail, then bail. This runs the real risk of dropping work on the floor, which can end up being its own separate kind of deadlock. Set some global flag in the driver that says some device is in the middle of going down, letting all tasks know to bail. Again, this can drop work on the floor. I suppose if our own ADMIN_UP flag doesn't go away, then maybe after a few tries on the rtnl lock we can queue our own task back up as a delayed work and return and avoid dropping work on the floor that way. But I'm not 100% convinced that we won't cause other problems. Or the method this patch attempts to use, which is when we bring an interface up, create a workqueue specifically for that interface, so that when we take it back down, we are flushing only those tasks associated with our interface. In addition, keep the global workqueue, but now limit it to only flush tasks. In this way, the flush tasks can always flush the device specific work queues without having deadlock issues. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: change init sequence orderingDoug Ledford
In preparation for using per device work queues, we need to move the start of the neighbor thread task to after ipoib_ib_dev_init and move the destruction of the neighbor task to before ipoib_ib_dev_cleanup. Otherwise we will end up freeing our workqueue with work possibly still on it. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: fix mcast_dev_flush/mcast_restart_task raceDoug Ledford
Our mcast_dev_flush routine and our mcast_restart_task can race against each other. In particular, they both hold the priv->lock while manipulating the rbtree and while removing mcast entries from the multicast_list and while adding entries to the remove_list, but they also both drop their locks prior to doing the actual removes. The mcast_dev_flush routine is run entirely under the rtnl lock and so has at least some locking. The actual race condition is like this: Thread 1 Thread 2 ifconfig ib0 up start multicast join for broadcast multicast join completes for broadcast start to add more multicast joins call mcast_restart_task to add new entries ifconfig ib0 down mcast_dev_flush mcast_leave(mcast A) mcast_leave(mcast A) As mcast_leave calls ib_sa_multicast_leave, and as member in core/multicast.c is ref counted, we run into an unbalanced refcount issue. To avoid stomping on each others removes, take the rtnl lock specifically when we are deleting the entries from the remove list. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: fix MCAST_FLAG_BUSY usageDoug Ledford
Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast objects") added a new flag MCAST_JOIN_STARTED, but was not very strict in how it was used. We didn't always initialize the completion struct before we set the flag, and we didn't always call complete on the completion struct from all paths that complete it. This made it less than totally effective, and certainly made its use confusing. And in the flush function we would use the presence of this flag to signal that we should wait on the completion struct, but we never cleared this flag, ever. This is further muddied by the fact that we overload the MCAST_FLAG_BUSY flag to mean two different things: we have a join in flight, and we have succeeded in getting an ib_sa_join_multicast. In order to make things clearer and aid in resolving the rtnl deadlock bug I've been chasing, I cleaned this up a bit. 1) Remove the MCAST_JOIN_STARTED flag entirely 2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight 3) Test on mcast->mc directly to see if we have completed ib_sa_join_multicast (using IS_ERR_OR_NULL) 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize the mcast->done completion struct 5) Make sure that before calling complete(&mcast->done), we always clear the MCAST_FLAG_BUSY bit 6) Take the mcast_mutex before we call ib_sa_multicast_join and also take the mutex in our join callback. This forces ib_sa_multicast_join to return and set mcast->mc before we process the callback. This way, our callback can safely clear mcast->mc if there is an error on the join and we will do the right thing as a result in mcast_dev_flush. 7) Because we need the mutex to synchronize mcast->mc, we can no longer call mcast_sendonly_join directly from mcast_send and instead must add sendonly join processing to the mcast_join_task A number of different races are resolved with these changes. These races existed with the old MCAST_FLAG_BUSY usage, the MCAST_JOIN_STARTED flag was an attempt to address them, and while it helped, a determined effort could still trip things up. One race looks something like this: Thread 1 Thread 2 ib_sa_join_multicast (as part of running restart mcast task) alloc member call callback ifconfig ib0 down wait_for_completion callback call completes wait_for_completion in mcast_dev_flush completes mcast->mc is PTR_ERR_OR_NULL so we skip ib_sa_leave_multicast return from callback return from ib_sa_join_multicast set mcast->mc = return from ib_sa_multicast We now have a permanently unbalanced join/leave issue that trips up the refcounting in core/multicast.c Another like this: Thread 1 Thread 2 Thread 3 ib_sa_multicast_join ifconfig ib0 down priv->broadcast = NULL join_complete wait_for_completion mcast->mc is not yet set, so don't clear return from ib_sa_join_multicast and set mcast->mc complete return -EAGAIN (making mcast->mc invalid) call ib_sa_multicast_leave on invalid mcast->mc, hang forever By holding the mutex around ib_sa_multicast_join and taking the mutex early in the callback, we force mcast->mc to be valid at the time we run the callback. This allows us to clear mcast->mc if there is an error and the join is going to fail. We do this before we complete the mcast. In this way, mcast_dev_flush always sees consistent state in regards to mcast->mc membership at the time that the wait_for_completion() returns. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: Make the carrier_on_task race awareDoug Ledford
We blindly assume that we can just take the rtnl lock and that will prevent races with downing this interface. Unfortunately, that's not the case. In ipoib_mcast_stop_thread() we will call flush_workqueue() in an attempt to clear out all remaining instances of ipoib_join_task. But, since this task is put on the same workqueue as the join task, the flush_workqueue waits on this thread too. But this thread is deadlocked on the rtnl lock. The better thing here is to use trylock and loop on that until we either get the lock or we see that FLAG_ADMIN_UP has been cleared, in which case we don't need to do anything anyway and we just return. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IPoIB: Consolidate rtnl_lock tasks in workqueueDoug Ledford
Setting the MTU can safely be moved to the carrier_on_task, which keeps us from needing to take the rtnl lock in the join_finish section. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-12iser-target: Fix wc->wr_id cast warningNicholas Bellinger
CC [M] drivers/infiniband/ulp/isert/ib_isert.o drivers/infiniband/ulp/isert/ib_isert.c: In function ‘isert_cq_comp_err’: drivers/infiniband/ulp/isert/ib_isert.c:1979:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Remove code duplicationSagi Grimberg
- Fall-through in switch case instead in do_control_comp. - Move rkey invalidation to a function. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Adjust log levels and prettify some printsSagi Grimberg
debug_level 1 (warn): Include warning messages. debug_level 2 (info): Include relevant info for control plane. debug_level 3 (debug): Include relevant info in the IO path. Also, added/removed some logging messages. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Use debug_level parameter to control logging levelSagi Grimberg
Personal preference, easier control of the log level with a single modparam which can be changed dynamically. Allows better saparation of control and IO plains. Replaced throughout ib_isert.c: s/pr_debug/isert_dbg/g s/pr_info/isert_info/g s/pr_warn/isert_warn/g s/pr_err/isert_err/g Plus nit checkpatch warning change. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Fix logout sequenceSagi Grimberg
We don't want to wait for conn_logout_comp from isert_comp_wq context as this blocks further completions from being processed. Instead we wait for it conditionally (if logout response was actually posted) in wait_conn. This wait should normally happen immediately as it occurs after we consumed all the completions (including flush errors) and conn_logout_comp should have been completed. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Don't wait for session commands from completion contextSagi Grimberg
Might result in a deadlock where completion context waits for session commands release where the later might need a final completion for it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Reduce CQ lock contention by batch pollingSagi Grimberg
In order to reduce the contention on CQ locking (present in some LLDDs) we poll in batches of 16 work completion items. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Introduce isert_poll_budgetSagi Grimberg
In case the CQ is packed with completions, we can't just hog the CPU forever. Poll until a sufficient budget (currently hard-coded to 64k completions) and if budget is exhausted, bailout and give a chance to other threads. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Remove an atomic operation from the IO pathSagi Grimberg
In order to know that we consumed all the connection completions we maintain atomic post_send_buf_count for each IO post send. But we can know that if we post a "beacon" (zero length RECV work request) after we move the QP into error state and the target does not serve any new IO. When we consume it, we know we finished all the connection completion and we can go ahead and destroy stuff. In error completion handler we now just need to check for ISERT_BEACON_WRID to arrive and then wait for session commands to cleanup and complete conn_wait_comp_err. We reserve another CQ and QP entries to fit the zero length post recv. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Remove redundant call to isert_conn_terminateSagi Grimberg
We are calling session reinstatement, wait_conn will start connection termination. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Use single CQ for TX and RXSagi Grimberg
Using TX and RX CQs attached to the same vector might create a throttling effect coming from the serial processing of a work-queue. Use one CQ instead, it will do better in interrupt processing and it provides a simpler code. Also, We get rid of redundant isert_rx_wq. Next we can remove the atomic post_send_buf_count from the IO path. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Centralize completion elements to a contextSagi Grimberg
A pre-step before going to a single CQ. Also this makes the code a little more simple to read. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Cast wr_id with uintptr_t instead of unsinged longSagi Grimberg
Nit, uintptr_t is designed for pointer casting, use it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Unite error completion handler for RX and TXSagi Grimberg
As a pre-step to a single CQ, we unite the error completion handlers to a single handler. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Remove interrupt coalescingSagi Grimberg
It is disabled at the moment, we will get that back in once the target is more stable. This reverts commit 95b60f0 "Add support for completion interrupt coalescing" Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Work-around live target stack shutdown resource cleanupSagi Grimberg
Currently we have no way to tell that the target stack is in shutdown sequence. In case we have open connections, the initiator immediately attempts to reconnect in a DDOS attack style, so we may end up terminating the iser enabled network portal while it's np_accept_list still have pending connections. The workaround is simply release all the connections in the list. A proper fix will be to start shutdown sequence by shutting the network portal to avoid initiator immediate reconnect attempts. But the temporary work around seems to work at this point, so I think we can do this for now... Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iscsi,iser-target: Expose supported protection ops according to t10_piSagi Grimberg
iSER will report supported protection operations based on the tpg attribute t10_pi settings and HCA PI offload capabilities. If the HCA does not support PI offload or tpg attribute t10_pi is not set, we fall to SW PI mode. In order to do that, we move iscsit_get_sup_prot_ops after connection tpg assignment. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.14+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Fix NULL dereference in SW mode DIFSagi Grimberg
Fallback to software mode DIF if HCA does not support PI (without crashing obviously). It is still possible to run with backend protection and an unprotected frontend, so looking at the command prot_op is not enough. Check device PI capability on a per-IO basis (isert_prot_cmd inline static) to determine if we need to handle protection information. Trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert] Call Trace: [<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130 [<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert] [<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0 [<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert] [<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod] [<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod] [<ffffffff8106ece2>] process_one_work+0x182/0x3b0 [<ffffffff8106fda0>] worker_thread+0x120/0x3c0 [<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190 [<ffffffff8107594e>] kthread+0xce/0xf0 [<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0 [<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70 Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.14+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Allocate PI contexts dynamicallySagi Grimberg
This patch converts to allocate PI contexts dynamically in order avoid a potentially bogus np->tpg_np and associated NULL pointer dereference in isert_connect_request() during iser-target endpoint shutdown with multiple network portals. Also, there is really no need to allocate these at connection establishment since it is not guaranteed that all the IOs on that connection will be to a PI formatted device. We can do it in a lazy fashion so the initial burst will have a transient slow down, but very fast all IOs will allocate a PI context. Squashed: iser-target: Centralize PI context handling code Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.14+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Fix implicit termination of connectionsSagi Grimberg
In situations such as bond failover, The new session establishment implicitly invokes the termination of the old connection. So, we don't want to wait for the old connection wait_conn to completely terminate before we accept the new connection and post a login response. The solution is to deffer the comp_wait completion and the conn_put to a work so wait_conn will effectively be non-blocking (flush errors are assumed to come very fast). We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE to spread the concurrency of release works. Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-12-12iser-target: Handle ADDR_CHANGE event for listener cm_idSagi Grimberg
The np listener cm_id will also get ADDR_CHANGE event upcall (in case it is bound to a specific IP). Handle it correctly by creating a new cm_id and implicitly destroy the old one. Since this is the second event a listener np cm_id may encounter, we move the np cm_id event handling to a routine. Squashed: iser-target: Move cma_id setup to a function Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>