summaryrefslogtreecommitdiff
path: root/drivers/scsi
AgeCommit message (Collapse)Author
2025-05-02scsi: pm80xx: Set phy_attached to zero when device is goneIgor Pylypiv
[ Upstream commit f7b705c238d1483f0a766e2b20010f176e5c0fb7 ] When a fatal error occurs, a phy down event may not be received to set phy->phy_attached to zero. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Salomon Dushimirimana <salomondush@google.com> Link: https://lore.kernel.org/r/20250319230305.3172920-1-salomondush@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02scsi: hisi_sas: Fix I/O errors caused by hardware port ID changesXingui Yang
[ Upstream commit daff37f00c7506ca322ccfce95d342022f06ec58 ] The hw port ID of phy may change when inserting disks in batches, causing the port ID in hisi_sas_port and itct to be inconsistent with the hardware, resulting in I/O errors. The solution is to set the device state to gone to intercept I/O sent to the device, and then execute linkreset to discard and find the disk to re-update its information. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20250312095135.3048379-3-yangxingui@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02scsi: Improve CDL controlDamien Le Moal
commit 14a3cc755825ef7b34c986aa2786ea815023e9c5 upstream. With ATA devices supporting the CDL feature, using CDL requires that the feature be enabled with a SET FEATURES command. This command is issued as the translated command for the MODE SELECT command issued by scsi_cdl_enable() when the user enables CDL through the device cdl_enable sysfs attribute. However, the implementation of scsi_cdl_enable() always issues a MODE SELECT command for ATA devices when the enable argument is true, even if CDL is already enabled on the device. While this does not cause any issue with using CDL descriptors with read/write commands (the CDL feature will be enabled on the drive), issuing the MODE SELECT command even when the device CDL feature is already enabled will cause a reset of the ATA device CDL statistics log page (as defined in ACS, any CDL enable action must reset the device statistics). Avoid this needless actions (and the implied statistics log page reset) by modifying scsi_cdl_enable() to issue the MODE SELECT command to enable CDL if and only if CDL is not reported as already enabled on the device. And while at it, simplify the initialization of the is_ata boolean variable and move the declaration of the scsi mode data and sense header variables to within the scope of ATA device handling. Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02scsi: mpi3mr: Fix pending I/O counterRanjan Kumar
commit cdd445258db9919e9dde497a6d5c3477ea7faf4d upstream. Commit 199510e33dea ("scsi: mpi3mr: Update consumer index of reply queues after every 100 replies") introduced a regression with the per-reply queue pending I/O counter which was erroneously decremented, leading to the counter going negative. Drop the incorrect atomic decrement for the pending I/O counter. Fixes: 199510e33dea ("scsi: mpi3mr: Update consumer index of reply queues after every 100 replies") Cc: stable@vger.kernel.org Co-developed-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Sathya Prakash <sathya.prakash@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20250411111419.135485-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02scsi: core: Clear flags for scsi_cmnd that did not completeAnastasia Kovaleva
[ Upstream commit 54bebe46871d4e56e05fcf55c1a37e7efa24e0a8 ] Commands that have not been completed with scsi_done() do not clear the SCMD_INITIALIZED flag and therefore will not be properly reinitialized. Thus, the next time the scsi_cmnd structure is used, the command may fail in scsi_cmd_runtime_exceeded() due to the old jiffies_at_alloc value: kernel: sd 16:0:1:84: [sdts] tag#405 timing out command, waited 720s kernel: sd 16:0:1:84: [sdts] tag#405 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=66636s Clear flags for commands that have not been completed by SCSI. Fixes: 4abafdc4360d ("block: remove the initialize_rq_fn blk_mq_ops method") Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com> Link: https://lore.kernel.org/r/20250324084933.15932-2-a.kovaleva@yadro.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02block: remove the write_hint field from struct requestChristoph Hellwig
[ Upstream commit 61952bb73486fff0f5550bccdf4062d9dd0fb163 ] The write_hint is only used for read/write requests, which must have a bio attached to them. Just use the bio field instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241112170050.1612998-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: fc0e982b8a3a ("block: make sure ->nr_integrity_segments is cloned in blk_rq_prep_clone") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25scsi: megaraid_sas: Block zero-length ATA VPD inquiryChandrakanth Patil
commit aad9945623ab4029ae7789609fb6166c97976c62 upstream. A firmware bug was observed where ATA VPD inquiry commands with a zero-length data payload were not handled and failed with a non-standard status code of 0xf0. Avoid sending ATA VPD inquiry commands without data payload by setting the device no_vpd_size flag to 1. In addition, if the firmware returns a status code of 0xf0, set scsi_cmnd->result to CHECK_CONDITION to facilitate proper error handling. Suggested-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250402193735.5098-1-chandrakanth.patil@broadcom.com Tested-by: Ryan Lahfa <ryan@lahfa.xyz> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25scsi: smartpqi: Use is_kdump_kernel() to check for kdumpMartin Wilck
[ Upstream commit a2d5a0072235a69749ceb04c1a26dc75df66a31a ] The smartpqi driver checks the reset_devices variable to determine whether special adjustments need to be made for kdump. This has the effect that after a regular kexec reboot, some driver parameters such as max_transfer_size are much lower than usual. More importantly, kexec reboot tests have revealed memory corruption caused by the driver log being written to system memory after a kexec. Fix this by testing is_kdump_kernel() rather than reset_devices where appropriate. Fixes: 058311b72f54 ("scsi: smartpqi: Add fw log to kdump") Signed-off-by: Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20250321223319.109250-1-mwilck@suse.com Cc: Randy Wright <rwright@hpe.com> Acked-by: Don Brace <don.brace@microchip.com> Tested-by: Don Brace <don.brace@microchip.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25scsi: replace blk_mq_pci_map_queues with blk_mq_map_hw_queuesDaniel Wagner
[ Upstream commit bd326a5ad6397ccfc67af862606be107c15a43e6 ] Replace all users of blk_mq_pci_map_queues with the more generic blk_mq_map_hw_queues. This in preparation to retire blk_mq_pci_map_queues. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-5-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Stable-dep-of: a2d5a0072235 ("scsi: smartpqi: Use is_kdump_kernel() to check for kdump") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25scsi: iscsi: Fix missing scsi_host_put() in error pathMiaoqian Lin
[ Upstream commit 72eea84a1092b50a10eeecfeba4b28ac9f1312ab ] Add goto to ensure scsi_host_put() is called in all error paths of iscsi_set_host_param() function. This fixes a potential memory leak when strlen() check fails. Fixes: ce51c8170084 ("scsi: iscsi: Add strlen() check in iscsi_if_set{_host}_param()") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20250318094344.91776-1-linmq006@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25scsi: hisi_sas: Enable force phy when SATA disk directly connectedXingui Yang
[ Upstream commit 8aa580cd92843b60d4d6331f3b0a9e8409bb70eb ] when a SATA disk is directly connected the SAS controller determines the disk to which I/Os are delivered based on the port ID in the DQ entry. When many phys are disconnected and reconnect, the port ID of phys were changed and used by other link, resulting in I/O being sent to incorrect disk. Data inconsistency on the SATA disk may occur during I/O retries using the old port ID. So enable force phy, then force the command to be executed in a certain phy, and if the actual phy ID of the port does not match the phy configured in the command, the chip will stop delivering the I/O to disk. Fixes: ce60689e12dd ("scsi: hisi_sas: add v3 code to send ATA frame") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Link: https://lore.kernel.org/r/20250312095135.3048379-2-yangxingui@huawei.com Reviewed-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20scsi: st: Fix array overflow in st_setup()Kai Mäkisara
[ Upstream commit a018d1cf990d0c339fe0e29b762ea5dc10567d67 ] Change the array size to follow parms size instead of a fixed value. Reported-by: Chenyuan Yang <chenyuan0y@gmail.com> Closes: https://lore.kernel.org/linux-scsi/CALGdzuoubbra4xKOJcsyThdk5Y1BrAmZs==wbqjbkAgmKS39Aw@mail.gmail.com/ Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://lore.kernel.org/r/20250311112516.5548-2-Kai.Makisara@kolumbus.fi Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20scsi: mpi3mr: Synchronous access b/w reset and tm thread for reply queueRanjan Kumar
[ Upstream commit f195fc060c738d303a21fae146dbf85e1595fb4c ] When the task management thread processes reply queues while the reset thread resets them, the task management thread accesses an invalid queue ID (0xFFFF), set by the reset thread, which points to unallocated memory, causing a crash. Add flag 'io_admin_reset_sync' to synchronize access between the reset, I/O, and admin threads. Before a reset, the reset handler sets this flag to block I/O and admin processing threads. If any thread bypasses the initial check, the reset thread waits up to 10 seconds for processing to finish. If the wait exceeds 10 seconds, the controller is marked as unrecoverable. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20250129100850.25430-4-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20scsi: mpi3mr: Avoid reply queue full conditionRanjan Kumar
[ Upstream commit f08b24d82749117ce779cc66689e8594341130d3 ] To avoid reply queue full condition, update the driver to check IOCFacts capabilities for qfull. Update the operational reply queue's Consumer Index after processing 100 replies. If pending I/Os on a reply queue exceeds a threshold (reply_queue_depth - 200), then return I/O back to OS to retry. Also increase default admin reply queue size to 2K. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20250129100850.25430-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22scsi: qla1280: Fix kernel oops when debug level > 2Magnus Lindholm
[ Upstream commit 5233e3235dec3065ccc632729675575dbe3c6b8a ] A null dereference or oops exception will eventually occur when qla1280.c driver is compiled with DEBUG_QLA1280 enabled and ql_debug_level > 2. I think its clear from the code that the intention here is sg_dma_len(s) not length of sg_next(s) when printing the debug info. Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Link: https://lore.kernel.org/r/20250125095033.26188-1-linmag7@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22scsi: core: Use GFP_NOIO to avoid circular locking dependencyRik van Riel
[ Upstream commit 5363ee9d110e139584c2d92a0b640bc210588506 ] Filesystems can write to disk from page reclaim with __GFP_FS set. Marc found a case where scsi_realloc_sdev_budget_map() ends up in page reclaim with GFP_KERNEL, where it could try to take filesystem locks again, leading to a deadlock. WARNING: possible circular locking dependency detected 6.13.0 #1 Not tainted ------------------------------------------------------ kswapd0/70 is trying to acquire lock: ffff8881025d5d78 (&q->q_usage_counter(io)){++++}-{0:0}, at: blk_mq_submit_bio+0x461/0x6e0 but task is already holding lock: ffffffff81ef5f40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x9f/0x760 The full lockdep splat can be found in Marc's report: https://lkml.org/lkml/2025/1/24/1101 Avoid the potential deadlock by doing the allocation with GFP_NOIO, which prevents both filesystem and block layer recursion. Reported-by: Marc Aurèle La France <tsi@tuyoix.net> Signed-off-by: Rik van Riel <riel@surriel.com> Link: https://lore.kernel.org/r/20250129104525.0ae8421e@fangorn Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07scsi: core: Clear driver private data when retrying requestYe Bin
[ Upstream commit dce5c4afd035e8090a26e5d776b1682c0e649683 ] After commit 1bad6c4a57ef ("scsi: zero per-cmd private driver data for each MQ I/O"), the xen-scsifront/virtio_scsi/snic drivers all removed code that explicitly zeroed driver-private command data. In combination with commit 464a00c9e0ad ("scsi: core: Kill DRIVER_SENSE"), after virtio_scsi performs a capacity expansion, the first request will return a unit attention to indicate that the capacity has changed. And then the original command is retried. As driver-private command data was not cleared, the request would return UA again and eventually time out and fail. Zero driver-private command data when a request is retried. Fixes: f7de50da1479 ("scsi: xen-scsifront: Remove code that zeroes driver-private command data") Fixes: c2bb87318baa ("scsi: virtio_scsi: Remove code that zeroes driver-private command data") Fixes: c3006a926468 ("scsi: snic: Remove code that zeroes driver-private command data") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250217021628.2929248-1-yebin@huaweicloud.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-17scsi: core: Do not retry I/Os during depopulationIgor Pylypiv
commit 9ff7c383b8ac0c482a1da7989f703406d78445c6 upstream. Fail I/Os instead of retry to prevent user space processes from being blocked on the I/O completion for several minutes. Retrying I/Os during "depopulation in progress" or "depopulation restore in progress" results in a continuous retry loop until the depopulation completes or until the I/O retry loop is aborted due to a timeout by the scsi_cmd_runtime_exceeced(). Depopulation is slow and can take 24+ hours to complete on 20+ TB HDDs. Most I/Os in the depopulation retry loop end up taking several minutes before returning the failure to user space. Cc: stable@vger.kernel.org # 4.18.x: 2bbeb8d scsi: core: Handle depopulation and restoration in progress Cc: stable@vger.kernel.org # 4.18.x Fixes: e37c7d9a0341 ("scsi: core: sanitize++ in progress") Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Link: https://lore.kernel.org/r/20250131184408.859579-1-ipylypiv@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17scsi: storvsc: Set correct data length for sending SCSI command without payloadLong Li
commit 87c4b5e8a6b65189abd9ea5010ab308941f964a4 upstream. In StorVSC, payload->range.len is used to indicate if this SCSI command carries payload. This data is allocated as part of the private driver data by the upper layer and may get passed to lower driver uninitialized. For example, the SCSI error handling mid layer may send TEST_UNIT_READY or REQUEST_SENSE while reusing the buffer from a failed command. The private data section may have stale data from the previous command. If the SCSI command doesn't carry payload, the driver may use this value as is for communicating with host, resulting in possible corruption. Fix this by always initializing this value. Fixes: be0cf6ca301c ("scsi: storvsc: Set the tablesize based on the information given by the host") Cc: stable@kernel.org Tested-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Long Li <longli@microsoft.com> Link: https://lore.kernel.org/r/1737601642-7759-1-git-send-email-longli@linuxonhyperv.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17scsi: qla2xxx: Move FCE Trace buffer allocation to user controlQuinn Tran
commit 841df27d619ee1f5ca6473e15227b39d6136562d upstream. Currently FCE Tracing is enabled to log additional ELS events. Instead, user will enable or disable this feature through debugfs. Modify existing DFS knob to allow user to enable or disable this feature. echo [1 | 0] > /sys/kernel/debug/qla2xxx/qla2xxx_??/fce cat /sys/kernel/debug/qla2xxx/qla2xxx_??/fce Cc: stable@vger.kernel.org Fixes: df613b96077c ("[SCSI] qla2xxx: Add Fibre Channel Event (FCE) tracing support.") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20241115130313.46826-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17scsi: st: Don't set pos_unknown just after device recognitionKai Mäkisara
commit 98b37881b7492ae9048ad48260cc8a6ee9eb39fd upstream. Commit 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") in v6.6 added new code to handle the Power On/Reset Unit Attention (POR UA) sense data. This was in addition to the existing method. When this Unit Attention is received, the driver blocks attempts to read, write and some other operations because the reset may have rewinded the tape. Because of the added code, also the initial POR UA resulted in blocking operations, including those that are used to set the driver options after the device is recognized. Also, reading and writing are refused, whereas they succeeded before this commit. Add code to not set pos_unknown to block operations if the POR UA is received from the first test_ready() call after the st device has been created. This restores the behavior before v6.6. Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://lore.kernel.org/r/20241216113755.30415-1-Kai.Makisara@kolumbus.fi Fixes: 9604eea5bd3a ("scsi: st: Add third party poweron reset handling") CC: stable@vger.kernel.org Closes: https://lore.kernel.org/linux-scsi/2201CF73-4795-4D3B-9A79-6EE5215CF58D@kolumbus.fi/ Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-08scsi: mpi3mr: Fix possible crash when setting up bsg failsGuixin Liu
[ Upstream commit 295006f6e8c17212d3098811166e29627d19e05c ] If bsg_setup_queue() fails, the bsg_queue is assigned a non-NULL value. Consequently, in mpi3mr_bsg_exit(), the condition "if(!mrioc->bsg_queue)" will not be satisfied, preventing execution from entering bsg_remove_queue(), which could lead to the following crash: BUG: kernel NULL pointer dereference, address: 000000000000041c Call Trace: <TASK> mpi3mr_bsg_exit+0x1f/0x50 [mpi3mr] mpi3mr_remove+0x6f/0x340 [mpi3mr] pci_device_remove+0x3f/0xb0 device_release_driver_internal+0x19d/0x220 unbind_store+0xa4/0xb0 kernfs_fop_write_iter+0x11f/0x200 vfs_write+0x1fc/0x3e0 ksys_write+0x67/0xe0 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x78/0xe2 Fixes: 4268fa751365 ("scsi: mpi3mr: Add bsg device support") Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Link: https://lore.kernel.org/r/20250107022032.24006-1-kanie@linux.alibaba.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08scsi: mpt3sas: Set ioc->manu_pg11.EEDPTagMode directly to 1Paul Menzel
[ Upstream commit ad7c3c0cb8f61d6d5a48b83e62ca4a9fd2f26153 ] Currently, the code does: if (x == 0) { x &= ~0x3; x |= 0x1; } Zeroing bits 0 and 1 of a variable that is 0 is not necessary. So directly set the variable to 1. Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Fixes: f92363d12359 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS") Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/r/20241212221817.78940-2-pmenzel@molgen.mpg.de Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-01scsi: storvsc: Ratelimit warning logs to prevent VM denial of serviceEaswar Hariharan
commit d2138eab8cde61e0e6f62d0713e45202e8457d6d upstream. If there's a persistent error in the hypervisor, the SCSI warning for failed I/O can flood the kernel log and max out CPU utilization, preventing troubleshooting from the VM side. Ratelimit the warning so it doesn't DoS the VM. Closes: https://github.com/microsoft/WSL/issues/9173 Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Link: https://lore.kernel.org/r/20250107-eahariha-ratelimit-storvsc-v1-1-7fc193d1f2b0@linux.microsoft.com Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-01scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS requestXiang Zhang
[ Upstream commit 63ca02221cc5aa0731fe2b0cc28158aaa4b84982 ] The ISCSI_UEVENT_GET_HOST_STATS request is already handled in iscsi_get_host_stats(). This fix ensures that redundant responses are skipped in iscsi_if_rx(). - On success: send reply and stats from iscsi_get_host_stats() within if_recv_msg(). - On error: fall through. Signed-off-by: Xiang Zhang <hawkxiang.cpp@gmail.com> Link: https://lore.kernel.org/r/20250107022432.65390-1-hawkxiang.cpp@gmail.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN ↵Cathy Avery
as an error [ Upstream commit b1aee7f034615b6824d2c70ddb37ef9fc23493b7 ] This partially reverts commit 812fe6420a6e ("scsi: storvsc: Handle additional SRB status values"). HyperV does not support MAINTENANCE_IN resulting in FC passthrough returning the SRB_STATUS_DATA_OVERRUN value. Now that SRB_STATUS_DATA_OVERRUN is treated as an error, multipath ALUA paths go into a faulty state as multipath ALUA submits RTPG commands via MAINTENANCE_IN. [ 3.215560] hv_storvsc 1d69d403-9692-4460-89f9-a8cbcc0f94f3: tag#230 cmd 0xa3 status: scsi 0x0 srb 0x12 hv 0xc0000001 [ 3.215572] scsi 1:0:0:32: alua: rtpg failed, result 458752 Make MAINTENANCE_IN return success to avoid the error path as is currently done with INQUIRY and MODE_SENSE. Suggested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Link: https://lore.kernel.org/r/20241127181324.3318443-1-cavery@redhat.com Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: mpi3mr: Handling of fault code for insufficient powerRanjan Kumar
[ Upstream commit fb6eb98f3965e2ee92cbcb466051d2f2acf552d1 ] Before retrying initialization, check and abort if the fault code indicates insufficient power. Also mark the controller as unrecoverable instead of issuing reset in the watch dog timer if the fault code indicates insufficient power. Signed-off-by: Prayas Patel <prayas.patel@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-5-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: mpi3mr: Start controller indexing from 0Ranjan Kumar
[ Upstream commit 0d32014f1e3e7a7adf1583c45387f26b9bb3a49d ] Instead of displaying the controller index starting from '1' make the driver display the controller index starting from '0'. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-4-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: mpi3mr: Fix corrupt config pages PHY state is switched in sysfsRanjan Kumar
[ Upstream commit 711201a8b8334a397440ac0b859df0054e174bc9 ] The driver, through the SAS transport, exposes a sysfs interface to enable/disable PHYs in a controller/expander setup. When multiple PHYs are disabled and enabled in rapid succession, the persistent and current config pages related to SAS IO unit/SAS Expander pages could get corrupted. Use separate memory for each config request. Signed-off-by: Prayas Patel <prayas.patel@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: mpi3mr: Synchronize access to ioctl data bufferRanjan Kumar
[ Upstream commit 367ac16e5ff2dcd6b7f00a8f94e6ba98875cb397 ] The driver serializes ioctls through a mutex lock but access to the ioctl data buffer is not guarded by the mutex. This results in multiple user threads being able to write to the driver's ioctl buffer simultaneously. Protect the ioctl buffer with the ioctl mutex. Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load ↵Ranjan Kumar
time [ Upstream commit 3f5eb062e8aa335643181c480e6c590c6cedfd22 ] Issue a Diag-Reset when the "Doorbell-In-Use" bit is set during the driver load/initialization. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110173341.11595-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: megaraid_sas: Fix for a potential deadlockTomas Henzl
[ Upstream commit 50740f4dc78b41dec7c8e39772619d5ba841ddd7 ] This fixes a 'possible circular locking dependency detected' warning CPU0 CPU1 ---- ---- lock(&instance->reset_mutex); lock(&shost->scan_mutex); lock(&instance->reset_mutex); lock(&shost->scan_mutex); Fix this by temporarily releasing the reset_mutex. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://lore.kernel.org/r/20240923174833.45345-1-thenzl@redhat.com Acked-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02scsi: qla1280: Fix hw revision numbering for ISP1020/1040Magnus Lindholm
[ Upstream commit c064de86d2a3909222d5996c5047f64c7a8f791b ] Fix the hardware revision numbering for Qlogic ISP1020/1040 boards. HWMASK suggests that the revision number only needs four bits, this is consistent with how NetBSD does things in their ISP driver. Verified on a IPS1040B which is seen as rev 5 not as BIT_4. Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Link: https://lore.kernel.org/r/20241113225636.2276-1-linmag7@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: st: Add MTIOCGET and MTLOAD to ioctls allowed after device resetKai Mäkisara
[ Upstream commit 0b120edb37dc9dd8ca82893d386922eb6b16f860 ] Most drives rewind the tape when the device is reset. Reading and writing are not allowed until something is done to make the tape position match the user's expectation (e.g., rewind the tape). Add MTIOCGET and MTLOAD to operations allowed after reset. MTIOCGET is modified to not touch the tape if pos_unknown is non-zero. The tape location is known after MTLOAD. Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219419#c14 Link: https://lore.kernel.org/r/20241106095723.63254-3-Kai.Makisara@kolumbus.fi Reviewed-by: John Meneghini <jmeneghi@redhat.com> Tested-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: st: Don't modify unknown block number in MTIOCGETKai Mäkisara
[ Upstream commit 5bb2d6179d1a8039236237e1e94cfbda3be1ed9e ] Struct mtget field mt_blkno -1 means it is unknown. Don't add anything to it. Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219419#c14 Link: https://lore.kernel.org/r/20241106095723.63254-2-Kai.Makisara@kolumbus.fi Reviewed-by: John Meneghini <jmeneghi@redhat.com> Tested-by: John Meneghini <jmeneghi@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: lpfc: Prevent NDLP reference count underflow in dev_loss_tmo callbackJustin Tee
[ Upstream commit 4281f44ea8bfedd25938a0031bebba1473ece9ad ] Current dev_loss_tmo handling checks whether there has been a previous call to unregister with SCSI transport. If so, the NDLP kref count is decremented a second time in dev_loss_tmo as the final kref release. However, this can sometimes result in a reference count underflow if there is also a race to unregister with NVMe transport as well. Add a check for NVMe transport registration before decrementing the final kref. If NVMe transport is still registered, then the NVMe transport unregistration is designated as the final kref decrement. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20241031223219.152342-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: lpfc: Check SLI_ACTIVE flag in FDMI cmpl before submitting follow up FDMIJustin Tee
[ Upstream commit 98f8d3588097e321be70f83b844fa67d4828fe5c ] The lpfc_cmpl_ct_disc_fdmi() routine has incorrect logic that treats an FDMI completion with error LOCAL_REJECT/SLI_ABORTED as a success status. Under the erroneous assumption of successful completion, the routine proceeds to issue follow up FDMI commands, which may never complete if the HBA is in an errata state as indicated by the errored completion status. Fix by freeing FDMI cmd resources and early return when the LPFC_SLI_ACTIVE flag is not set and a LOCAL_REJECT/SLI_ABORTED or SLI_DOWN status is received. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20241031223219.152342-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: lpfc: Call lpfc_sli4_queue_unset() in restart and rmmod pathsJustin Tee
[ Upstream commit d35f7672715d1ff3e3ad9bb4ae6ac6cb484200fe ] During initialization, the driver allocates wq->pring in lpfc_wq_create and lpfc_sli4_queue_unset() is the only place where kfree(wq->pring) is called. There is a possible memory leak in lpfc_sli_brdrestart_s4() (restart) and lpfc_pci_remove_one_s4() (rmmod) paths because there are no calls to lpfc_sli4_queue_unset() to kfree() the wq->pring. Fix by inserting a call to lpfc_sli4_queue_unset() in lpfc_sli_brdrestart_s4() and lpfc_sli4_hba_unset() routines. Also, add a check for the SLI_ACTIVE flag before issuing the Q_DESTROY mailbox command. If not set, then the mailbox command will obviously fail. In such cases, skip issuing the mailbox command and only execute the driver resource clean up portions of the lpfc_*q_destroy routines. Signed-off-by: Justin Tee <justin.tee@broadcom.com> Link: https://lore.kernel.org/r/20241031223219.152342-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: hisi_sas: Create all dump files during debugfs initializationYihang Li
[ Upstream commit 9f564f15f88490b484e02442dc4c4b11640ea172 ] For the current debugfs of hisi_sas, after user triggers dump, the driver allocate memory space to save the register information and create debugfs files to display the saved information. In this process, the debugfs files created after each dump. Therefore, when the dump is triggered while the driver is unbind, the following hang occurs: [67840.853907] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0 [67840.862947] Mem abort info: [67840.865855] ESR = 0x0000000096000004 [67840.869713] EC = 0x25: DABT (current EL), IL = 32 bits [67840.875125] SET = 0, FnV = 0 [67840.878291] EA = 0, S1PTW = 0 [67840.881545] FSC = 0x04: level 0 translation fault [67840.886528] Data abort info: [67840.889524] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [67840.895117] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [67840.900284] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [67840.905709] user pgtable: 4k pages, 48-bit VAs, pgdp=0000002803a1f000 [67840.912263] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000 [67840.919177] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [67840.996435] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [67841.003628] pc : down_write+0x30/0x98 [67841.007546] lr : start_creating.part.0+0x60/0x198 [67841.012495] sp : ffff8000b979ba20 [67841.016046] x29: ffff8000b979ba20 x28: 0000000000000010 x27: 0000000000024b40 [67841.023412] x26: 0000000000000012 x25: ffff20202b355ae8 x24: ffff20202b35a8c8 [67841.030779] x23: ffffa36877928208 x22: ffffa368b4972240 x21: ffff8000b979bb18 [67841.038147] x20: ffff00281dc1e3c0 x19: fffffffffffffffe x18: 0000000000000020 [67841.045515] x17: 0000000000000000 x16: ffffa368b128a530 x15: ffffffffffffffff [67841.052888] x14: ffff8000b979bc18 x13: ffffffffffffffff x12: ffff8000b979bb18 [67841.060263] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa368b1289b18 [67841.067640] x8 : 0000000000000012 x7 : 0000000000000000 x6 : 00000000000003a9 [67841.075014] x5 : 0000000000000000 x4 : ffff002818c5cb00 x3 : 0000000000000001 [67841.082388] x2 : 0000000000000000 x1 : ffff002818c5cb00 x0 : 00000000000000a0 [67841.089759] Call trace: [67841.092456] down_write+0x30/0x98 [67841.096017] start_creating.part.0+0x60/0x198 [67841.100613] debugfs_create_dir+0x48/0x1f8 [67841.104950] debugfs_create_files_v3_hw+0x88/0x348 [hisi_sas_v3_hw] [67841.111447] debugfs_snapshot_regs_v3_hw+0x708/0x798 [hisi_sas_v3_hw] [67841.118111] debugfs_trigger_dump_v3_hw_write+0x9c/0x120 [hisi_sas_v3_hw] [67841.125115] full_proxy_write+0x68/0xc8 [67841.129175] vfs_write+0xd8/0x3f0 [67841.132708] ksys_write+0x70/0x108 [67841.136317] __arm64_sys_write+0x24/0x38 [67841.140440] invoke_syscall+0x50/0x128 [67841.144385] el0_svc_common.constprop.0+0xc8/0xf0 [67841.149273] do_el0_svc+0x24/0x38 [67841.152773] el0_svc+0x38/0xd8 [67841.156009] el0t_64_sync_handler+0xc0/0xc8 [67841.160361] el0t_64_sync+0x1a4/0x1a8 [67841.164189] Code: b9000882 d2800002 d2800023 f9800011 (c85ffc05) [67841.170443] ---[ end trace 0000000000000000 ]--- To fix this issue, create all directories and files during debugfs initialization. In this way, the driver only needs to allocate memory space to save information each time the user triggers dumping. Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://lore.kernel.org/r/20241008021822.2617339-13-liyihang9@huawei.com Reviewed-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: hisi_sas: Add cond_resched() for no forced preemption modelYihang Li
[ Upstream commit 2233c4a0b948211743659b24c13d6bd059fa75fc ] For no forced preemption model kernel, in the scenario where the expander is connected to 12 high performance SAS SSDs, the following call trace may occur: [ 214.409199][ C240] watchdog: BUG: soft lockup - CPU#240 stuck for 22s! [irq/149-hisi_sa:3211] [ 214.568533][ C240] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 214.575224][ C240] pc : fput_many+0x8c/0xdc [ 214.579480][ C240] lr : fput+0x1c/0xf0 [ 214.583302][ C240] sp : ffff80002de2b900 [ 214.587298][ C240] x29: ffff80002de2b900 x28: ffff1082aa412000 [ 214.593291][ C240] x27: ffff3062a0348c08 x26: ffff80003a9f6000 [ 214.599284][ C240] x25: ffff1062bbac5c40 x24: 0000000000001000 [ 214.605277][ C240] x23: 000000000000000a x22: 0000000000000001 [ 214.611270][ C240] x21: 0000000000001000 x20: 0000000000000000 [ 214.617262][ C240] x19: ffff3062a41ae580 x18: 0000000000010000 [ 214.623255][ C240] x17: 0000000000000001 x16: ffffdb3a6efe5fc0 [ 214.629248][ C240] x15: ffffffffffffffff x14: 0000000003ffffff [ 214.635241][ C240] x13: 000000000000ffff x12: 000000000000029c [ 214.641234][ C240] x11: 0000000000000006 x10: ffff80003a9f7fd0 [ 214.647226][ C240] x9 : ffffdb3a6f0482fc x8 : 0000000000000001 [ 214.653219][ C240] x7 : 0000000000000002 x6 : 0000000000000080 [ 214.659212][ C240] x5 : ffff55480ee9b000 x4 : fffffde7f94c6554 [ 214.665205][ C240] x3 : 0000000000000002 x2 : 0000000000000020 [ 214.671198][ C240] x1 : 0000000000000021 x0 : ffff3062a41ae5b8 [ 214.677191][ C240] Call trace: [ 214.680320][ C240] fput_many+0x8c/0xdc [ 214.684230][ C240] fput+0x1c/0xf0 [ 214.687707][ C240] aio_complete_rw+0xd8/0x1fc [ 214.692225][ C240] blkdev_bio_end_io+0x98/0x140 [ 214.696917][ C240] bio_endio+0x160/0x1bc [ 214.701001][ C240] blk_update_request+0x1c8/0x3bc [ 214.705867][ C240] scsi_end_request+0x3c/0x1f0 [ 214.710471][ C240] scsi_io_completion+0x7c/0x1a0 [ 214.715249][ C240] scsi_finish_command+0x104/0x140 [ 214.720200][ C240] scsi_softirq_done+0x90/0x180 [ 214.724892][ C240] blk_mq_complete_request+0x5c/0x70 [ 214.730016][ C240] scsi_mq_done+0x48/0xac [ 214.734194][ C240] sas_scsi_task_done+0xbc/0x16c [libsas] [ 214.739758][ C240] slot_complete_v3_hw+0x260/0x760 [hisi_sas_v3_hw] [ 214.746185][ C240] cq_thread_v3_hw+0xbc/0x190 [hisi_sas_v3_hw] [ 214.752179][ C240] irq_thread_fn+0x34/0xa4 [ 214.756435][ C240] irq_thread+0xc4/0x130 [ 214.760520][ C240] kthread+0x108/0x13c [ 214.764430][ C240] ret_from_fork+0x10/0x18 This is because in the hisi_sas driver, both the hardware interrupt handler and the interrupt thread are executed on the same CPU. In the performance test scenario, function irq_wait_for_interrupt() will always return 0 if lots of interrupts occurs and the CPU will be continuously consumed. As a result, the CPU cannot run the watchdog thread. When the watchdog time exceeds the specified time, call trace occurs. To fix it, add cond_resched() to execute the watchdog thread. Signed-off-by: Yihang Li <liyihang9@huawei.com> Link: https://lore.kernel.org/r/20241008021822.2617339-8-liyihang9@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: qla2xxx: Remove check req_sg_cnt should be equal to rsp_sg_cntSaurav Kashyap
commit 833c70e212fc40d3e98da941796f4c7bcaecdf58 upstream. Firmware supports multiple sg_cnt for request and response for CT commands, so remove the redundant check. A check is there where sg_cnt for request and response should be same. This is not required as driver and FW have code to handle multiple and different sg_cnt on request and response. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20241115130313.46826-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14scsi: qla2xxx: Fix use after free on unloadQuinn Tran
commit 07c903db0a2ff84b68efa1a74a4de353ea591eb0 upstream. System crash is observed with stack trace warning of use after free. There are 2 signals to tell dpc_thread to terminate (UNLOADING flag and kthread_stop). On setting the UNLOADING flag when dpc_thread happens to run at the time and sees the flag, this causes dpc_thread to exit and clean up itself. When kthread_stop is called for final cleanup, this causes use after free. Remove UNLOADING signal to terminate dpc_thread. Use the kthread_stop as the main signal to exit dpc_thread. [596663.812935] kernel BUG at mm/slub.c:294! [596663.812950] invalid opcode: 0000 [#1] SMP PTI [596663.812957] CPU: 13 PID: 1475935 Comm: rmmod Kdump: loaded Tainted: G IOE --------- - - 4.18.0-240.el8.x86_64 #1 [596663.812960] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/20/2012 [596663.812974] RIP: 0010:__slab_free+0x17d/0x360 ... [596663.813008] Call Trace: [596663.813022] ? __dentry_kill+0x121/0x170 [596663.813030] ? _cond_resched+0x15/0x30 [596663.813034] ? _cond_resched+0x15/0x30 [596663.813039] ? wait_for_completion+0x35/0x190 [596663.813048] ? try_to_wake_up+0x63/0x540 [596663.813055] free_task+0x5a/0x60 [596663.813061] kthread_stop+0xf3/0x100 [596663.813103] qla2x00_remove_one+0x284/0x440 [qla2xxx] Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20241115130313.46826-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14scsi: qla2xxx: Supported speed displayed incorrectly for VPortsAnil Gurumurthy
commit e4e268f898c8a08f0a1188677e15eadbc06e98f6 upstream. The fc_function_template for vports was missing the .show_host_supported_speeds. The base port had the same. Add .show_host_supported_speeds to the vport template as well. Cc: stable@vger.kernel.org Fixes: 2c3dfe3f6ad8 ("[SCSI] qla2xxx: add support for NPIV") Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20241115130313.46826-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14scsi: qla2xxx: Fix NVMe and NPIV connect issueQuinn Tran
commit 4812b7796c144f63a1094f79a5eb8fbdad8d7ebc upstream. NVMe controller fails to send connect command due to failure to locate hw context buffer for NVMe queue 0 (blk_mq_hw_ctx, hctx_idx=0). The cause of the issue is NPIV host did not initialize the vha->irq_offset field. This field is given to blk-mq (blk_mq_pci_map_queues) to help locate the beginning of IO Queues which in turn help locate NVMe queue 0. Initialize this field to allow NVMe to work properly with NPIV host. kernel: nvme nvme5: Connect command failed, errno: -18 kernel: nvme nvme5: qid 0: secure concatenation is not supported kernel: nvme nvme5: NVME-FC{5}: create_assoc failed, assoc_id 2e9100 ret 401 kernel: nvme nvme5: NVME-FC{5}: reset: Reconnect attempt failed (401) kernel: nvme nvme5: NVME-FC{5}: Reconnect attempt in 2 seconds Cc: stable@vger.kernel.org Fixes: f0783d43dde4 ("scsi: qla2xxx: Use correct number of vectors for online CPUs") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20241115130313.46826-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14scsi: qla2xxx: Fix abort in bsg timeoutQuinn Tran
commit c423263082ee8ccfad59ab33e3d5da5dc004c21e upstream. Current abort of bsg on timeout prematurely clears the outstanding_cmds[]. Abort does not allow FW to return the IOCB/SRB. In addition, bsg_job_done() is not called to return the BSG (i.e. leak). Abort the outstanding bsg/SRB and wait for the completion. The completion IOCB will wake up the bsg_timeout thread. If abort is not successful, then driver will forcibly call bsg_job_done() and free the srb. Err Inject: - qaucli -z - assign CT Passthru IOCB's NportHandle with another initiator nport handle to trigger timeout. Remote port will drop CT request. - bsg_job_done is properly called as part of cleanup kernel: qla2xxx [0000:21:00.1]-7012:7: qla2x00_process_ct : 286 : Error Inject. kernel: qla2xxx [0000:21:00.1]-7016:7: bsg rqst type: FC_BSG_HST_CT else type: 101 - loop-id=1 portid=fffffa. kernel: qla2xxx [0000:21:00.1]-70bb:7: qla24xx_bsg_timeout CMD timeout. bsg ptr ffff9971a42f0838 msgcode 80000004 vendor cmd fa010000 kernel: qla2xxx [0000:21:00.1]-507c:7: Abort command issued - hdl=4b, type=5 kernel: qla2xxx [0000:21:00.1]-5040:7: ELS-CT pass-through-ct pass-through error hdl=4b comp_status-status=0x5 error subcode 1=0x0 error subcode 2=0xaf882e80. kernel: qla2xxx [0000:21:00.1]-7009:7: qla2x00_bsg_job_done: sp hdl 4b, result=70000 bsg ptr ffff9971a42f0838 kernel: qla2xxx [0000:21:00.1]-802c:7: Aborting bsg ffff9971a42f0838 sp=ffff99760b87ba80 handle=4b rval=0 kernel: qla2xxx [0000:21:00.1]-708a:7: bsg abort success. bsg ffff9971a42f0838 sp=ffff99760b87ba80 handle=0x4b kernel: qla2xxx [0000:21:00.1]-7012:7: qla2x00_process_ct : 286 : Error Inject. kernel: qla2xxx [0000:21:00.1]-7016:7: bsg rqst type: FC_BSG_HST_CT else type: 101 - loop-id=1 portid=fffffa. kernel: qla2xxx [0000:21:00.1]-70bb:7: qla24xx_bsg_timeout CMD timeout. bsg ptr ffff9971a42f43b8 msgcode 80000004 vendor cmd fa010000 kernel: qla2xxx [0000:21:00.1]-7012:7: qla_bsg_found : 2206 : Error Inject 2. kernel: qla2xxx [0000:21:00.1]-802c:7: Aborting bsg ffff9971a42f43b8 sp=ffff99762c304440 handle=5e rval=5 kernel: qla2xxx [0000:21:00.1]-704f:7: bsg abort fail. bsg=ffff9971a42f43b8 sp=ffff99762c304440 rval=5. kernel: qla2xxx [0000:21:00.1]-7051:7: qla_bsg_found bsg_job_done : bsg ffff9971a42f43b8 result 0xfffffffa sp ffff99762c304440. Cc: stable@vger.kernel.org Fixes: c449b4198701 ("scsi: qla2xxx: Use QP lock to search for bsg") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20241115130313.46826-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14scsi: scsi_debug: Fix hrtimer support for ndelayJohn Garry
[ Upstream commit 6918141d815acef056a0d10e966a027d869a922d ] Since commit 771f712ba5b0 ("scsi: scsi_debug: Fix cmd duration calculation"), ns_from_boot value is only evaluated in schedule_resp() for polled requests. However, ns_from_boot is also required for hrtimer support for when ndelay is less than INCLUSIVE_TIMING_MAX_NS, so fix up the logic to decide when to evaluate ns_from_boot. Fixes: 771f712ba5b0 ("scsi: scsi_debug: Fix cmd duration calculation") Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241202130045.2335194-1-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14scsi: sg: Fix slab-use-after-free read in sg_release()Suraj Sonawane
[ Upstream commit f10593ad9bc36921f623361c9e3dd96bd52d85ee ] Fix a use-after-free bug in sg_release(), detected by syzbot with KASAN: BUG: KASAN: slab-use-after-free in lock_release+0x151/0xa30 kernel/locking/lockdep.c:5838 __mutex_unlock_slowpath+0xe2/0x750 kernel/locking/mutex.c:912 sg_release+0x1f4/0x2e0 drivers/scsi/sg.c:407 In sg_release(), the function kref_put(&sfp->f_ref, sg_remove_sfp) is called before releasing the open_rel_lock mutex. The kref_put() call may decrement the reference count of sfp to zero, triggering its cleanup through sg_remove_sfp(). This cleanup includes scheduling deferred work via sg_remove_sfp_usercontext(), which ultimately frees sfp. After kref_put(), sg_release() continues to unlock open_rel_lock and may reference sfp or sdp. If sfp has already been freed, this results in a slab-use-after-free error. Move the kref_put(&sfp->f_ref, sg_remove_sfp) call after unlocking the open_rel_lock mutex. This ensures: - No references to sfp or sdp occur after the reference count is decremented. - Cleanup functions such as sg_remove_sfp() and sg_remove_sfp_usercontext() can safely execute without impacting the mutex handling in sg_release(). The fix has been tested and validated by syzbot. This patch closes the bug reported at the following syzkaller link and ensures proper sequencing of resource cleanup and mutex operations, eliminating the risk of use-after-free errors in sg_release(). Reported-by: syzbot+7efb5850a17ba6ce098b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7efb5850a17ba6ce098b Tested-by: syzbot+7efb5850a17ba6ce098b@syzkaller.appspotmail.com Fixes: cc833acbee9d ("sg: O_EXCL and other lock handling") Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Link: https://lore.kernel.org/r/20241120125944.88095-1-surajsonawane0215@gmail.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05scsi: sg: Enable runtime power managementBart Van Assche
[ Upstream commit 4045de893f691f75193c606aec440c365cf7a7be ] In 2010, runtime power management support was implemented in the SCSI core. The description of patch "[SCSI] implement runtime Power Management" mentions that the sg driver is skipped but not why. This patch enables runtime power management even if an instance of the sg driver is held open. Enabling runtime PM for the sg driver is safe because all interactions of the sg driver with the SCSI device pass through the block layer (blk_execute_rq_nowait()) and the block layer already supports runtime PM. Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Douglas Gilbert <dgilbert@interlog.com> Fixes: bc4f24014de5 ("[SCSI] implement runtime Power Management") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20241030220310.1373569-1-bvanassche@acm.org Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05scsi: qedi: Fix a possible memory leak in qedi_alloc_and_init_sb()Zhen Lei
[ Upstream commit 95bbdca4999bc59a72ebab01663d421d6ce5775d ] Hook "qedi_ops->common->sb_init = qed_sb_init" does not release the DMA memory sb_virt when it fails. Add dma_free_coherent() to free it. This is the same way as qedr_alloc_mem_sb() and qede_alloc_mem_sb(). Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20241026125711.484-3-thunder.leizhen@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05scsi: qedf: Fix a possible memory leak in qedf_alloc_and_init_sb()Zhen Lei
[ Upstream commit c62c30429db3eb4ced35c7fcf6f04a61ce3a01bb ] Hook "qed_ops->common->sb_init = qed_sb_init" does not release the DMA memory sb_virt when it fails. Add dma_free_coherent() to free it. This is the same way as qedr_alloc_mem_sb() and qede_alloc_mem_sb(). Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20241026125711.484-2-thunder.leizhen@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>