summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-03-27scsi: qedf: Cleanup rrq_work after QEDF_CMD_OUTSTANDING is clearedShyam Sundar
Here is the relevant logs for the problem we are solving: qedf_flush_active_ios:1707]:3: Flush active i/o's num=0x17 fcport=0xffff948168fbcc80 port_id=0x550200 scsi_id=0. qedf_flush_active_ios:1708]:3: Locking flush mutex. qedf_flush_active_ios:1758]:3: Not outstanding, xid=0xaaf, cmd_type=3 refcount=1. qedf_flush_active_ios:1896]:3: Flushed 0x16 I/Os, active=0x1. qedf_flush_active_ios:1901]:3: Flushed 0x16 I/Os, active=0x1 cnt=60. qedf_send_rrq:295]:3: Sending RRQ orig io = ffffb48b8f7d7158, orig_xid = 0xaaf qedf_initiate_els:37]:3: Sending ELS qedf_initiate_els:68]:3: initiate_els els_req = 0xffffb48b8f6d3098 cb_arg = ffff948fd5e4de80 xid = 4c6 qedf_init_mp_req:2172]:3: Entered. qedf_init_mp_task:727]:3: Initializing MP task for cmd_type=4 qedf_initiate_els:134]:3: Ringing doorbell for ELS req qedf_flush_active_ios:1901]:3: Flushed 0x16 I/Os, active=0x2 cnt=20. qedf_cmd_timeout:96]:3: ELS timeout, xid=0x4c6. qedf_rrq_compl:186]:3: Entered. qedf_rrq_compl:204]:3: rrq_compl: orig io = ffffb48b8f7d7158, orig xid = 0xaaf, rrq_xid = 0x4c6, refcount=1 qedf_flush_active_ios:1935]:3: Unlocking flush mutex. qedf_upload_connection:1579]:3: Uploading connection port_id=550200. We found an ABTS command for which CMD_OUTSTANDING was cleared (line 3). For this command, delayed send_rrq was queued, but would take 10 secs to execute. Adding capability to detect that (based on io_req->state that is being introduced), and attempt to cancel rrq_work. If we succeed, we drop the reference and free the io_req. If we cannot, then the els will get sent out and we will wait for 10 secs for it to complete. Signed-off-by: Shyam Sundar <ssundar@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Check for tm_flags instead of cmd_type during cleanupSaurav Kashyap
cmd_type is over written to QEDF_CLEANUP during cleanup, so check for tm_flags. Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Add a flag to help debugging io_req which could not be cleanedShyam Sundar
- The flag will help in to figure out if io_req is cleaned or not. Signed-off-by: Shyam Sundar <ssundar@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Don't send ABTS for under run scenarioSaurav Kashyap
- Command is already completed with underrun so no need to send ABTS. Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Don't queue anything if upload is in progressShyam Sundar
- I/Os, aborts and tmf should not be queued if flush is in progress. Signed-off-by: Shyam Sundar <ssundar@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Check both the FCF and fabric ID before servicing clear virtual linkChad Dupuis
- Check proper values before servicing CVL. Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: fc_rport_priv reference counting fixesHannes Reinecke
The fc_rport_priv structure is reference counted, so we need to ensure that the reference is increased before accessing the structure. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Add missing return in qedf_scsi_done()Chad Dupuis
On completions where we do not have a bad scsi_cmnd pointer we should return before the the label lest we do a double kref_put. Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Wait for upload and link down processing during soft ctx resetChad Dupuis
- Wait for all the connections to get uploaded. Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Add additional checks for io_req->sc_cmd validityChad Dupuis
- Check the validity of various pointers before processing. Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: fixup bit operationsHannes Reinecke
test_bit() is atomic, test_bit() || test_bit() is not. So protect consecutive bit tests with a lock to avoid races. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: fixup locking in qedf_restart_rport()Hannes Reinecke
fc_rport_create() needs to be called with disc_mutex held. And we should re-assign the 'rdata' pointer in case it got changed. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: missing kref_put in qedf_xmit()Hannes Reinecke
qedf_xmit() calls fc_rport_lookup(), but discards the returned rdata structure almost immediately without decreasing the refcount. This leads to a refcount leak and the rdata never to be freed. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Check for link state before processing LL2 packets and send ↵Saurav Kashyap
fipvlan retries - Check if link is UP before sending and processing any packets on wire. Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Add missing fc_disc_init call after allocating lportChad Dupuis
When receiving an unsolicited frame we could crash on a list traversal in fc_rport_lookup while searching the rport which is associated with our lport. Initialize the lport's discovery node after allocating the lport in __qedf_probe(). Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Correct the memory barriers in qedf_ring_doorbellAndrew Vasquez
- Correct memory barriers to make sure all cmnds are flushed. Signed-off-by: Andrew Vasquez <andrewv@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Use a separate completion for cleanup commandsChad Dupuis
- If a TMF and cleanup are issued at the same time they could cause a call trace if issued against the same xid as the io_req->tm_done completion is used for both. - Set and clear cleanup bit in cleanup routine. Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Modify abort and tmf handler to handle edge condition and flushSaurav Kashyap
An I/O can be in any state when flush is called, it can be in abort, waiting for abort, RRQ send and waiting or TMF send. - HZ can be different on different architecture, correctly set abort timeout value. - Flush can complete the I/Os prematurely, handle refcount for aborted I/Os and for which RRQ is pending. - Differentiate LUN/TARGET reset, as cleanup needs to be send to firmware accordingly. - Add flush mutex to sync cleanup call from abort and flush routine. - Clear abort/outstanding bit on timeout. Signed-off-by: Shyam Sundar <shyam.sundar@marvell.com> Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Modify flush routine to handle all I/Os and TMFShyam Sundar
The purpose of flush routine is to cleanup I/Os to the firmware and complete them to scsi middle layer. This routine is invoked before connection is uploaded because of rport going away. - Don't process any I/Os, aborts, TMFs coming when flush in progress. - Add flags to handle cleanup and release of I/Os because flush can prematurely complete I/Os. - Original command can get completed to driver when cleanup for same is posted to firmware, handle this condition. - Modify flush to handle I/Os in all the states like abort, TMF, RRQ and timeouts. Signed-off-by: Shyam Sundar <ssundar@marvell.com> Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Simplify s/g list mappingChad Dupuis
When mapping the pages from a scatter/gather list from the SCSI layer we only need to follow these rules: - Max SGEs for each I/O request is 256 - No size limit on each SGE - No need to split OS provided SGEs to 4K before sending to firmware. - Slow SGE is applicable only when: - There are > 8 SGEs and any middle SGE is less than a page size (4K) Make necessary changes so that driver follows these rules. Applicable only for Write requests (not for Read requests). No need to check SGE address alignment requirements (first, middle or last) before declaring slow SGE. Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Add missing return in qedf_post_io_req() in the fcport offload checkChad Dupuis
Fixes the following crash as the return was missing from the check if an fcport is offloaded. If we hit this code we continue to try to post an invalid task which can lead to the crash: [30259.616411] [0000:61:00.3]:[qedf_post_io_req:989]:3: Session not offloaded yet. [30259.616413] [0000:61:00.3]:[qedf_upload_connection:1340]:3: Uploading connection port_id=490020. [30259.623769] BUG: unable to handle kernel NULL pointer dereference at 0000000000000198 [30259.631645] IP: [<ffffffffc035b1ed>] qedf_init_task.isra.16+0x3d/0x450 [qedf] [30259.638816] PGD 0 [30259.640841] Oops: 0000 [#1] SMP [30259.644098] Modules linked in: fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables devlink ip6table_filter ip6_tables iptable_filter vfat fat ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_umad dm_service_time skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel rpcrdma sunrpc rdma_ucm ib_uverbs lrw gf128mul ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi qedr(OE) glue_helper ablk_helper cryptd ib_core dm_round_robin joydev pcspkr ipmi_ssif ses enclosure ipmi_si ipmi_devintf ipmi_msghandler mei_me [30259.715529] mei sg hpilo hpwdt shpchp wmi lpc_ich acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic uas usb_storage mgag200 qedf(OE) i2c_algo_bit libfcoe drm_kms_helper libfc syscopyarea sysfillrect scsi_transport_fc qede(OE) sysimgblt fb_sys_fops ptp ttm pps_core drm qed(OE) smartpqi crct10dif_pclmul crct10dif_common crc32c_intel i2c_core scsi_transport_sas scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [30259.754237] CPU: 9 PID: 977 Comm: kdmwork-253:7 Kdump: loaded Tainted: G W OE ------------ 3.10.0-862.el7.x86_64 #1 [30259.765664] Hardware name: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 04/04/2018 [30259.775000] task: ffff8c801efd0000 ti: ffff8c801efd8000 task.ti: ffff8c801efd8000 [30259.782505] RIP: 0010:[<ffffffffc035b1ed>] [<ffffffffc035b1ed>] qedf_init_task.isra.16+0x3d/0x450 [qedf] [30259.792116] RSP: 0018:ffff8c801efdbbb0 EFLAGS: 00010046 [30259.797444] RAX: 0000000000000000 RBX: ffffa7f1450948d8 RCX: ffff8c7fe5bc40c8 [30259.804600] RDX: ffff8c800715b300 RSI: ffffa7f1450948d8 RDI: ffff8c80169c2480 [30259.811755] RBP: ffff8c801efdbc30 R08: 00000000000000ae R09: ffff8c800a314540 [30259.818911] R10: ffff8c7fe5bc40c8 R11: ffff8c801efdb8ae R12: 0000000000000000 [30259.826068] R13: ffff8c800715b300 R14: ffff8c80169c2480 R15: ffff8c8005da28e0 [30259.833223] FS: 0000000000000000(0000) GS:ffff8c803f840000(0000) knlGS:0000000000000000 [30259.841338] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30259.847100] CR2: 0000000000000198 CR3: 000000081242e000 CR4: 00000000007607e0 [30259.854256] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [30259.861412] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [30259.868568] PKRU: 00000000 [30259.871278] Call Trace: [30259.873737] [<ffffffffc035c948>] qedf_post_io_req+0x148/0x680 [qedf] [30259.880201] [<ffffffffc035d070>] qedf_queuecommand+0x1f0/0x240 [qedf] [30259.886749] [<ffffffffa329b050>] scsi_dispatch_cmd+0xb0/0x240 [30259.892600] [<ffffffffa32a45bc>] scsi_request_fn+0x4cc/0x680 [30259.898364] [<ffffffffa3118ad9>] __blk_run_queue+0x39/0x50 [30259.903954] [<ffffffffa3114393>] __elv_add_request+0xd3/0x260 [30259.909805] [<ffffffffa311baf0>] blk_insert_cloned_request+0xf0/0x1b0 [30259.916358] [<ffffffffc010b622>] map_request+0x142/0x220 [dm_mod] [30259.922560] [<ffffffffc010b716>] map_tio_request+0x16/0x40 [dm_mod] [30259.928932] [<ffffffffa2ebb1f5>] kthread_worker_fn+0x85/0x180 [30259.934782] [<ffffffffa2ebb170>] ? kthread_stop+0xf0/0xf0 [30259.940284] [<ffffffffa2ebae31>] kthread+0xd1/0xe0 [30259.945176] [<ffffffffa2ebad60>] ? insert_kthread_work+0x40/0x40 [30259.951290] [<ffffffffa351f61d>] ret_from_fork_nospec_begin+0x7/0x21 [30259.957750] [<ffffffffa2ebad60>] ? insert_kthread_work+0x40/0x40 [30259.963860] Code: fe 41 55 49 89 d5 41 54 53 48 89 f3 48 83 ec 58 4c 8b 67 28 4c 8b 4e 18 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 4c 8b 7e 58 <49> 8b 84 24 98 01 00 00 48 8b 00 f6 80 31 01 00 00 10 0f 85 0b [30259.983372] RIP [<ffffffffc035b1ed>] qedf_init_task.isra.16+0x3d/0x450 [qedf] [30259.990630] RSP <ffff8c801efdbbb0> [30259.994127] CR2: 0000000000000198 Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Correct xid range overlap between offloaded requests and libfc ↵Chad Dupuis
requests There is currently an overlap where exchange IDs between what is used for offloaded commands and by libfc for ELS commands. Correct this so that exchange ID range is: Offloaded requests: 0 to 0xfff libfc requests: 0x1000 to 0xfffe Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qedf: Do not retry ELS request if qedf_alloc_cmd failsChad Dupuis
If we cannot allocate an ELS middlepath request, simply fail instead of trying to delay and then reallocate. This delay logic is causing soft lockup messages: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:1:7639] Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_service_time vfat fat rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support qedr(OE) ib_core joydev ipmi_ssif pcspkr hpilo hpwdt sg ipmi_si ipmi_devintf ipmi_msghandler ioatdma shpchp lpc_ich wmi dca acpi_power_meter dm_multipath ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic qedf(OE) libfcoe mgag200 libfc i2c_algo_bit drm_kms_helper scsi_transport_fc qede(OE) syscopyarea sysfillrect sysimgblt fb_sys_fops ttm qed(OE) drm crct10dif_pclmul e1000e crct10dif_common crc32c_intel scsi_tgt hpsa i2c_core ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod CPU: 2 PID: 7639 Comm: kworker/2:1 Kdump: loaded Tainted: G OEL ------------ 3.10.0-861.el7.x86_64 #1 Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 07/21/2016 Workqueue: qedf_2_dpc qedf_handle_rrq [qedf] task: ffff959edd628fd0 ti: ffff959ed6f08000 task.ti: ffff959ed6f08000 RIP: 0010:[<ffffffff8355913a>] [<ffffffff8355913a>] delay_tsc+0x3a/0x60 RSP: 0018:ffff959ed6f0bd30 EFLAGS: 00000246 RAX: 000000008ef5f791 RBX: 5f646d635f666465 RCX: 0000025b8ededa2f RDX: 000000000000025b RSI: 0000000000000002 RDI: 0000000000217d1e RBP: ffff959ed6f0bd30 R08: ffffffffc079aae8 R09: 0000000000000200 R10: ffffffffc07952c6 R11: 0000000000000000 R12: 6c6c615f66646571 R13: ffff959ed6f0bcc8 R14: ffff959ed6f0bd08 R15: ffff959e00000028 FS: 0000000000000000(0000) GS:ffff959eff480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4117fa1eb0 CR3: 0000002039e66000 CR4: 00000000003607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff8355907d>] __const_udelay+0x2d/0x30 [<ffffffffc079444a>] qedf_initiate_els+0x13a/0x450 [qedf] [<ffffffffc0794210>] ? qedf_srr_compl+0x2a0/0x2a0 [qedf] [<ffffffffc0795337>] qedf_send_rrq+0x127/0x230 [qedf] [<ffffffffc078ed55>] qedf_handle_rrq+0x15/0x20 [qedf] [<ffffffff832b2dff>] process_one_work+0x17f/0x440 [<ffffffff832b3ac6>] worker_thread+0x126/0x3c0 [<ffffffff832b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff832bae31>] kthread+0xd1/0xe0 [<ffffffff832bad60>] ? insert_kthread_work+0x40/0x40 [<ffffffff8391f637>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffff832bad60>] ? insert_kthread_work+0x40/0x40 Signed-off-by: Chad Dupuis <cdupuis@marvell.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: MAINTAINERS: Add maintainer for MediaTek UFS driverStanley Chu
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: ibmvfc: Clean up transport eventsTyrel Datwyler
No change to functionality. Simply make transport event messages a little clearer, and rework CRQ format enums such that we have separate enums for INIT messages and XPORT events. [mkp: typo] Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: ibmvfc: Byte swap status and error codes when loggingTyrel Datwyler
Status and error codes are returned in big endian from the VIOS. The values are translated into a human readable format when logged, but the values are also logged. This patch byte swaps those values so that they are consistent between BE and LE platforms. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: ibmvfc: Add failed PRLI to cmd_status lookup arrayTyrel Datwyler
The VIOS uses the SCSI_ERROR class to report PRLI failures. These errors are indicated with the combination of a IBMVFC_FC_SCSI_ERROR return status and 0x8000 error code. Add these codes to cmd_status[] with appropriate human readable error message. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: ibmvfc: Remove "failed" from logged errorsTyrel Datwyler
The text of messages logged with ibmvfc_log_error() always contain the term "failed". In the case of cancelled commands during EH they are reported back by the VIOS using error codes. This can be confusing to somebody looking at these log messages as to whether a command was successfully cancelled. The following real log message for example it is unclear if the transaction was actaully cancelled. <6>sd 0:0:1:1: Cancelling outstanding commands. <3>sd 0:0:1:1: [sde] Command (28) failed: transaction cancelled (2:6) flags: 0 fcp_rsp: 0, resid=0, scsi_status: 0 Remove prefixing of "failed" to all error logged messages. The ibmvfc_log_error() function translates the returned error/status codes to a human readable message already. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: qla2xxx: Simplify conditional check againNathan Chancellor
Clang warns when it sees a logical not on the left side of a conditional statement because it thinks the logical not should be applied to the whole statement, not just the left side: drivers/scsi/qla2xxx/qla_nx.c:3703:7: warning: logical not is only applied to the left hand side of this comparison [-Wlogical-not-parentheses] This particular instance was already fixed by commit 0bfe7d3cae58 ("scsi: qla2xxx: Simplify conditional check") upstream but it was reintroduced by commit 3695310e37b4 ("scsi: qla2xxx: Update flash read/write routine") in the 5.2/scsi-queue. Fixes: 3695310e37b4 ("scsi: qla2xxx: Update flash read/write routine") Link: https://github.com/ClangBuiltLinux/linux/issues/80 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCNSteffen Maier
If an incoming ELS of type RSCN contains more than one element, zfcp suboptimally causes repeated erp trigger NOP trace records for each previously failed port. These could be ports that went away. It loops over each RSCN element, and for each of those in an inner loop over all zfcp_ports. The trigger to recover failed ports should be just the reception of some RSCN, no matter how many elements it has. So we can loop over failed ports separately, and only then loop over each RSCN element to handle the non-failed ports. The call chain was: zfcp_fc_incoming_rscn for (i = 1; i < no_entries; i++) _zfcp_fc_incoming_rscn list_for_each_entry(port, &adapter->port_list, list) if (masked port->d_id match) zfcp_fc_test_link if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <=== In order the reduce the "flooding" of the REC trace area in such cases, we factor out handling the failed ports to be outside of the entries loop: zfcp_fc_incoming_rscn if (no_entries > 1) <=== list_for_each_entry(port, &adapter->port_list, list) <=== if (!port->d_id) zfcp_erp_port_reopen "fcrscn1" <=== for (i = 1; i < no_entries; i++) _zfcp_fc_incoming_rscn list_for_each_entry(port, &adapter->port_list, list) if (masked port->d_id match) zfcp_fc_test_link Abbreviated example trace records before this code change: Tag : fcrscn1 WWPN : 0x500507630310d327 ERP want : 0x02 ERP need : 0x02 Tag : fcrscn1 WWPN : 0x500507630310d327 ERP want : 0x02 ERP need : 0x00 NOP => superfluous trace record The last trace entry repeats if there are more than 2 RSCN elements. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devicesSteffen Maier
Suppose more than one non-NPIV FCP device is active on the same channel. Send I/O to storage and have some of the pending I/O run into a SCSI command timeout, e.g. due to bit errors on the fibre. Now the error situation stops. However, we saw FCP requests continue to timeout in the channel. The abort will be successful, but the subsequent TUR fails. Scsi_eh starts. The LUN reset fails. The target reset fails. The host reset only did an FCP device recovery. However, for non-NPIV FCP devices, this does not close and reopen ports on the SAN-side if other non-NPIV FCP device(s) share the same open ports. In order to resolve the continuing FCP request timeouts, we need to explicitly close and reopen ports on the SAN-side. This was missing since the beginning of zfcp in v2.6.0 history commit ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter."). Note: The FSF requests for forced port reopen could run into FSF request timeouts due to other reasons. This would trigger an internal FCP device recovery. Pending forced port reopen recoveries would get dismissed. So some ports might not get fully reopened during this host reset handler. However, subsequent I/O would trigger the above described escalation and eventually all ports would be forced reopen to resolve any continuing FCP request timeouts due to earlier bit errors. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: <stable@vger.kernel.org> #3.0+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_HostSteffen Maier
An already deleted SCSI device can exist on the Scsi_Host and remain there because something still holds a reference. A new SCSI device with the same H:C:T:L and FCP device, target port WWPN, and FCP LUN can be created. When we try to unblock an rport, we still find the deleted SCSI device and return early because the zfcp_scsi_dev of that SCSI device is not ZFCP_STATUS_COMMON_UNBLOCKED. Hence we miss to unblock the rport, even if the new proper SCSI device would be in good state. Therefore, skip deleted SCSI devices when iterating the sdevs of the shost. [cf. __scsi_device_lookup{_by_target}() or scsi_device_get()] The following abbreviated trace sequence can indicate such problem: Area : REC Tag : ersfs_3 LUN : 0x4045400300000000 WWPN : 0x50050763031bd327 LUN status : 0x40000000 not ZFCP_STATUS_COMMON_UNBLOCKED Ready count : n not incremented yet Running count : 0x00000000 ERP want : 0x01 ERP need : 0xc1 ZFCP_ERP_ACTION_NONE Area : REC Tag : ersfs_3 LUN : 0x4045400300000000 WWPN : 0x50050763031bd327 LUN status : 0x41000000 Ready count : n+1 Running count : 0x00000000 ERP want : 0x01 ERP need : 0x01 ... Area : REC Level : 4 only with increased trace level Tag : ertru_l LUN : 0x4045400300000000 WWPN : 0x50050763031bd327 LUN status : 0x40000000 Request ID : 0x0000000000000000 ERP status : 0x01800000 ERP step : 0x1000 ERP action : 0x01 ERP count : 0x00 NOT followed by a trace record with tag "scpaddy" for WWPN 0x50050763031bd327. Signed-off-by: Steffen Maier <maier@linux.ibm.com> Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery") Cc: <stable@vger.kernel.org> #2.6.32+ Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: sd: Quiesce warning if device does not report optimal I/O sizeMartin K. Petersen
Commit a83da8a4509d ("scsi: sd: Optimal I/O size should be a multiple of physical block size") split one conditional into several separate statements in an effort to provide more accurate warning messages when a device reports a nonsensical value. However, this reorganization accidentally dropped the precondition of the reported value being larger than zero. This lead to a warning getting emitted on devices that do not report an optimal I/O size at all. Remain silent if a device does not report an optimal I/O size. Fixes: a83da8a4509d ("scsi: sd: Optimal I/O size should be a multiple of physical block size") Cc: Randy Dunlap <rdunlap@infradead.org> Cc: <stable@vger.kernel.org> Reported-by: Hussam Al-Tayeb <ht990332@gmx.com> Tested-by: Hussam Al-Tayeb <ht990332@gmx.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: sd: Fix a race between closing an sd device and sd I/OBart Van Assche
The scsi_end_request() function calls scsi_cmd_to_driver() indirectly and hence needs the disk->private_data pointer. Avoid that that pointer is cleared before all affected I/O requests have finished. This patch avoids that the following crash occurs: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: scsi_mq_uninit_cmd+0x1c/0x30 scsi_end_request+0x7c/0x1b8 scsi_io_completion+0x464/0x668 scsi_finish_command+0xbc/0x160 scsi_eh_flush_done_q+0x10c/0x170 sas_scsi_recover_host+0x84c/0xa98 [libsas] scsi_error_handler+0x140/0x5b0 kthread+0x100/0x12c ret_from_fork+0x10/0x18 Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Jason Yan <yanaijie@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reported-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: core: Run queue when state is set to running after being blockedzhengbin
Use dd to test a SCSI device: 1. echo "blocked" >/sys/block/sda/device/state 2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10 3. echo "running" >/sys/block/sda/device/state dd should finish this work after step 3, but it hangs. After step2, the call chain is this: blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will transition it to BLK_STS_DEV_RESOURCE which means that driver can guarantee that IO dispatch will be triggered in future when the resource is available. Need to follow the rule if we set the device state to running. [mkp: tweaked commit description and code comment as suggested by Bart] Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: sd: Inline sd_probe_part2()Bart Van Assche
Make sd_probe() easier to read by inlining sd_probe_part2(). This patch does not change any functionality. Cc: Lee Duncan <lduncan@suse.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: sd: Rely on the driver core for asynchronous probingBart Van Assche
As explained during the 2018 LSF/MM session about increasing SCSI disk probing concurrency, the problems with the current probing approach are as follows: - The driver core is unaware of asynchronous SCSI LUN probing. wait_for_device_probe() waits for all asynchronous probes except asynchronous SCSI disk probes. - There is unnecessary serialization between sd_probe() and sd_remove(). This can lead to a deadlock. Hence this patch that modifies the sd driver such that it uses the driver core framework for asynchronous probing. The async domains and get_device()/put_device() pairs that became superfluous due to this change are removed. This patch does not affect the time needed for loading the scsi_debug kernel module with parameters delay=0 and max_luns=256. This patch depends on commit ef0ff68351be ("driver core: Probe devices asynchronously instead of the driver") that went upstream in kernel version v5.1-rc1. Cc: Lee Duncan <lduncan@suse.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: mpt3sas: fix indentation issueColin Ian King
There are a couple of statements that are incorrectly indented, fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: libcxgbi: remove uninitialized variable lenColin Ian King
The variable len is not being inintialized and the uninitialized value is being returned. However, this return path is never reached because the default case in the switch statement returns -ENOSYS. Clean up the code by replacing the return -ENOSYS with a break for the default case and returning -ENOSYS at the end of the function. This allows len to be removed. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-27scsi: target: alua: fix the tg_pt_gps_counttangwenji
Reducing the count should be alua_tg_pt_gps_count instead of alua_tg_pt_gps_counter when free alua group. Signed-off-by: tangwenji <tang.wenji@zte.com.cn> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: qla4xxx: fix a potential NULL pointer dereferenceKangjie Lu
In case iscsi_lookup_endpoint fails, the fix returns -EINVAL to avoid NULL pointer dereference. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Acked-by: Manish Rangankar <mrangankar@marvell.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: gdth: Only call dma_free_coherent when buf is not NULL in ioc_generalNathan Chancellor
When building with -Wsometimes-uninitialized, Clang warns: drivers/scsi/gdth.c:3662:6: warning: variable 'paddr' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] Don't attempt to call dma_free_coherent when buf is NULL (meaning that we never called dma_alloc_coherent and initialized paddr), which avoids this warning. Link: https://github.com/ClangBuiltLinux/linux/issues/402 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: pm8001: remove set but not used variables 'param, sas_ha'YueHaibing
Fixes gcc '-Wunused-but-set-variable' warning: drivers/scsi/pm8001/pm8001_hwi.c: In function 'mpi_smp_completion': drivers/scsi/pm8001/pm8001_hwi.c:2901:6: warning: variable 'param' set but not used [-Wunused-but-set-variable] drivers/scsi/pm8001/pm8001_hwi.c: In function 'pm8001_bytes_dmaed': drivers/scsi/pm8001/pm8001_hwi.c:3247:24: warning: variable 'sas_ha' set but not used [-Wunused-but-set-variable] They're never used since introduction, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: aacraid: Insure we don't access PCIe space during AER/EEHDave Carroll
There are a few windows during AER/EEH when we can access PCIe I/O mapped registers. This will harden the access to insure we do not allow PCIe access during errors Signed-off-by: Dave Carroll <david.carroll@microsemi.com> Reviewed-by: Sagar Biradar <sagar.biradar@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: qla4xxx: avoid freeing unallocated dma memoryArnd Bergmann
Clang -Wuninitialized notices that on is_qla40XX we never allocate any DMA memory in get_fw_boot_info() but attempt to free it anyway: drivers/scsi/qla4xxx/ql4_os.c:5915:7: error: variable 'buf_dma' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (!(val & 0x07)) { ^~~~~~~~~~~~~ drivers/scsi/qla4xxx/ql4_os.c:5985:47: note: uninitialized use occurs here dma_free_coherent(&ha->pdev->dev, size, buf, buf_dma); ^~~~~~~ drivers/scsi/qla4xxx/ql4_os.c:5915:3: note: remove the 'if' if its condition is always true if (!(val & 0x07)) { ^~~~~~~~~~~~~~~~~~~ drivers/scsi/qla4xxx/ql4_os.c:5885:20: note: initialize the variable 'buf_dma' to silence this warning dma_addr_t buf_dma; ^ = 0 Skip the call to dma_free_coherent() here. Fixes: 2a991c215978 ("[SCSI] qla4xxx: Boot from SAN support for open-iscsi") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: lpfc: avoid uninitialized variable warningArnd Bergmann
clang -Wuninitialized incorrectly sees a variable being used without initialization: drivers/scsi/lpfc/lpfc_nvme.c:2102:37: error: variable 'localport' is uninitialized when used here [-Werror,-Wuninitialized] lport = (struct lpfc_nvme_lport *)localport->private; ^~~~~~~~~ drivers/scsi/lpfc/lpfc_nvme.c:2059:38: note: initialize the variable 'localport' to silence this warning struct nvme_fc_local_port *localport; ^ = NULL 1 error generated. This is clearly in dead code, as the condition leading up to it is always false when CONFIG_NVME_FC is disabled, and the variable is always initialized when nvme_fc_register_localport() got called successfully. Change the preprocessor conditional to the equivalent C construct, which makes the code more readable and gets rid of the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: lpfc: change snprintf to scnprintf for possible overflowSilvio Cesare
Change snprintf to scnprintf. There are generally two cases where using snprintf causes problems. 1) Uses of size += snprintf(buf, SIZE - size, fmt, ...) In this case, if snprintf would have written more characters than what the buffer size (SIZE) is, then size will end up larger than SIZE. In later uses of snprintf, SIZE - size will result in a negative number, leading to problems. Note that size might already be too large by using size = snprintf before the code reaches a case of size += snprintf. 2) If size is ultimately used as a length parameter for a copy back to user space, then it will potentially allow for a buffer overflow and information disclosure when size is greater than SIZE. When the size is used to index the buffer directly, we can have memory corruption. This also means when size = snprintf... is used, it may also cause problems since size may become large. Copying to userspace is mitigated by the HARDENED_USERCOPY kernel configuration. The solution to these issues is to use scnprintf which returns the number of characters actually written to the buffer, so the size variable will never exceed SIZE. Signed-off-by: Silvio Cesare <silvio.cesare@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: ufs-mediatek: Add missing MODULE_* informationAnders Roxell
When building the ufs-mediatek module the following warning shows up: WARNING: modpost: missing MODULE_LICENSE() in drivers/scsi/ufs/ufs-mediatek.o Rework to add MODULE_LICENSE,MODULE_AUTHOR and MODULE_DESCRIPTION. Fixes: ddd90623ce26 ("scsi: ufs-mediatek: Add UFS support for Mediatek SoC chips") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: ufs-mediatek: Fix platform_no_drv_owner.cocci warningsYueHaibing
Remove .owner field if calls are used which set it automatically Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-03-25scsi: mpt3sas: Fix kernel panic during expander resetSreekanth Reddy
During expander reset handling, the driver invokes kernel function scsi_host_find_tag() to obtain outstanding requests associated with the scsi host managed by the driver. Driver loops from tag value zero to hba queue depth to obtain the outstanding scmds. But when blk-mq is enabled, the block layer may return stale entry for one or more requests. This may lead to kernel panic if the returned value is inaccessible or the memory pointed by the returned value is reused. Reference of upstream discussion: https://patchwork.kernel.org/patch/10734933/ Instead of calling scsi_host_find_tag() API for each and every smid (smid is tag +1) from one to shost->can_queue, now driver will call this API (to obtain the outstanding scmd) only for those smid's which are outstanding at the driver level. Driver will determine whether this smid is outstanding at driver level by looking into it's corresponding MPI request frame, if its MPI request frame is empty, then it means that this smid is free and does not need to call scsi_host_find_tag() for it. By doing this, driver will invoke scsi_host_find_tag() for only those tags which are outstanding at the driver level. Driver will check whether particular MPI request frame is empty or not by looking into the "DevHandle" field. If this field is zero then it means that this MPI request is empty. For active MPI request DevHandle must be non-zero. Also driver will memset the MPI request frame once the corresponding scmd is processed (i.e. just before calling scmd->done function). Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>