linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2023-04-11	block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT	Mike Christie
	BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's locking feature which works similar to NVMe/SCSI reservations where a host can get a lock on a device and when the lock is taken it will get failures. This patch renames BLK_STS_NEXUS so it better reflects this type of use. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230407200551.12660-3-michael.christie@oracle.com Acked-by: Stefan Haberland <sth@linux.ibm.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: cxlflash: s/semahpore/semaphore/	Geert Uytterhoeven
	Fix misspellings of "semaphore". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/d7d04004b818d7ab5d62002f286b0a1b0b493193.1681208251.git.geert+renesas@glider.be Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: lpfc: Silence an incorrect device output	Jun Chen
	In lpfc_sli4_pci_mem_unset(), case LPFC_SLI_INTF_IF_TYPE_1 does not have a break statement, resulting in an incorrect device output. Fix this by adding a break statement before the default option. Signed-off-by: Jun Chen <jun_c@hust.edu.cn> Link: https://lore.kernel.org/r/20230410023724.3209455-1-jun_c@hust.edu.cn Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame ↵	Ranjan Kumar
	allocation Driver uses spin lock without irqsave when it needs to acquire a chain frame. This is done to protect chain frame allocation from multiple submission threads. If there is any I/O queued from an interrupt context, and if that requires a chain frame, and if the chain lock is held by the CPU which got interrupted, then there will be a possible deadlock. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20230406101819.10109-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: scsi_debug: Fix missing error code in scsi_debug_init()	Harshit Mogalapalli
	Smatch reports: drivers/scsi/scsi_debug.c:6996 scsi_debug_init() warn: missing error code 'ret' Although it is unlikely that KMEM_CACHE might fail, but if it does then ret might be zero. So to fix this explicitly mark ret as "-ENOMEM" and then goto driver_unreg. Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/20230406074607.3637097-1-harshit.m.mogalapalli@oracle.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: hisi_sas: Work around build failure in suspend function	Arnd Bergmann
	The suspend/resume functions in this driver seem to have multiple problems, the latest one just got introduced by a bugfix: drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw': drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count' 5142 \| if (atomic_read(&device->power.usage_count)) { drivers/scsi/hisi_sas/hisi_sas_v3_hw.c: In function '_suspend_v3_hw': drivers/scsi/hisi_sas/hisi_sas_v3_hw.c:5142:39: error: 'struct dev_pm_info' has no member named 'usage_count' 5142 \| if (atomic_read(&device->power.usage_count)) { As far as I can tell, the 'usage_count' is not meant to be accessed by device drivers at all, though I don't know what the driver is supposed to do instead. Another problem is the use of the deprecated UNIVERSAL_DEV_PM_OPS(), and marking functions as __maybe_unused to avoid warnings about unused functions. This should probably be changed to using DEFINE_RUNTIME_DEV_PM_OPS(). Both changes require actually understanding what the driver needs to do, and being able to test this, so instead here is the simplest patch to make it pass the randconfig builds instead. Fixes: e368d38cb952 ("scsi: hisi_sas: Exit suspend state when usage count is greater than 0") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230405083611.3376739-1-arnd@kernel.org Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: ses: Handle enclosure with just a primary component gracefully	Jiri Kosina
	This reverts commit 3fe97ff3d949 ("scsi: ses: Don't attach if enclosure has no components") and introduces proper handling of case where there are no detected secondary components, but primary component (enumerated in num_enclosures) does exist. That fix was originally proposed by Ding Hui <dinghui@sangfor.com.cn>. Completely ignoring devices that have one primary enclosure and no secondary one results in ses_intf_add() bailing completely scsi 2:0:0:254: enclosure has no enumerated components scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such even on valid configurations with 1 primary and 0 secondary enclosures as below: # sg_ses /dev/sg0 3PARdata SES 3321 Supported diagnostic pages: Supported Diagnostic Pages [sdp] [0x0] Configuration (SES) [cf] [0x1] Short Enclosure Status (SES) [ses] [0x8] # sg_ses -p cf /dev/sg0 3PARdata SES 3321 Configuration diagnostic page: number of secondary subenclosures: 0 generation code: 0x0 enclosure descriptor list Subenclosure identifier: 0 [primary] relative ES process id: 0, number of ES processes: 1 number of type descriptor headers: 1 enclosure logical identifier (hex): 20000002ac02068d enclosure vendor: 3PARdata product: VV rev: 3321 type descriptor header and text list Element type: Unspecified, subenclosure id: 0 number of possible elements: 1 The changelog for the original fix follows ===== We can get a crash when disconnecting the iSCSI session, the call trace like this: [ffff00002a00fb70] kfree at ffff00000830e224 [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4 [ffff00002a00fbd0] device_del at ffff0000086b6a98 [ffff00002a00fc50] device_unregister at ffff0000086b6d58 [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c [ffff00002a00fca0] scsi_remove_device at ffff000008706134 [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4 [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0 [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4 [ffff00002a00fdb0] process_one_work at ffff00000810f35c [ffff00002a00fe00] worker_thread at ffff00000810f648 [ffff00002a00fe70] kthread at ffff000008116e98 In ses_intf_add, components count could be 0, and kcalloc 0 size scomp, but not saved in edev->component[i].scratch In this situation, edev->component[0].scratch is an invalid pointer, when kfree it in ses_intf_remove_enclosure, a crash like above would happen The call trace also could be other random cases when kfree cannot catch the invalid pointer We should not use edev->component[] array when the components count is 0 We also need check index when use edev->component[] array in ses_enclosure_data_process ===== Reported-by: Michal Kolar <mich.k@seznam.cz> Originally-by: Ding Hui <dinghui@sangfor.com.cn> Cc: stable@vger.kernel.org Fixes: 3fe97ff3d949 ("scsi: ses: Don't attach if enclosure has no components") Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pm Tested-by: Michal Kolar <mich.k@seznam.cz> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup()	Shuchang Li
	When if_type equals zero and pci_resource_start(pdev, PCI_64BIT_BAR4) returns false, drbl_regs_memmap_p is not remapped. This passes a NULL pointer to iounmap(), which can trigger a WARN() on certain arches. When if_type equals six and pci_resource_start(pdev, PCI_64BIT_BAR4) returns true, drbl_regs_memmap_p may has been remapped and ctrl_regs_memmap_p is not remapped. This is a resource leak and passes a NULL pointer to iounmap(). To fix these issues, we need to add null checks before iounmap(), and change some goto labels. Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4") Signed-off-by: Shuchang Li <lishuchang@hust.edu.cn> Link: https://lore.kernel.org/r/20230404072133.1022-1-lishuchang@hust.edu.cn Reviewed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: mpt3sas: Fix an issue when driver is being removed	Tomas Henzl
	Warnings may be logged during driver removal: mpt3sas 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT .., Fix this by deallocating DMA memory later. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://lore.kernel.org/r/20230403184736.6399-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-11	scsi: mpt3sas: Remove HBA BIOS version in the kernel log	Ranjan Kumar
	This is done to avoid ambiguity between BIOS and UEFI versions. Management tools can be used for getting accurate firmware version information. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20230322092713.6961-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-08	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four small fixes, all in drivers. They're all one or two lines except for the ufs one, but that's a simple revert of a previous feature" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param() scsi: qla2xxx: Fix memory leak in qla2x00_probe_one() scsi: mpi3mr: Handle soft reset in progress fault code (0xF002) scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
2023-04-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Conflicts: drivers/net/ethernet/google/gve/gve.h 3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts") 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format") https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/ https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/ Adjacent changes: net/can/isotp.c 051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()") 96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-03	driver core: class: remove struct class_interface * from callbacks	Greg Kroah-Hartman
	The add_dev and remove_dev callbacks in struct class_interface currently pass in a pointer back to the class_interface structure that is calling them, but none of the callback implementations actually use this pointer as it is pointless (the structure is known, the driver passed it in in the first place if it is really needed again.) So clean this up and just remove the pointer from the callbacks and fix up all callback functions. Cc: Jean Delvare <jdelvare@suse.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com> Cc: Jon Mason <jdmason@kudzu.us> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Allen Hubbe <allenbh@gmail.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Alexandre Bounine <alex.bou9@gmail.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: John Stultz <jstultz@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Wang Weiyang <wangweiyang2@huawei.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Cc: Jakob Koschel <jakobkoschel@gmail.com> Cc: Cai Xinchen <caixinchen1@huawei.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Logan Gunthorpe <logang@deltatee.com> Link: https://lore.kernel.org/r/2023040250-pushover-platter-509c@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-03	Merge 6.3-rc5 into driver-core-next	Greg Kroah-Hartman
	We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-02	Merge patch series "Fix shost command overloading issues"	Martin K. Petersen
	John Garry <john.g.garry@oracle.com> says: It's easy to get scsi_debug to error on throughput testing when we have multiple shosts: $ lsscsi [7:0:0:0] disk Linux scsi_debug 0191 [0:0:0:0] disk Linux scsi_debug 0191 $ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read --bs=4k --iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg --eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256 ... fio-3.28 Starting 40 processes [ 27.521809] hrtimer: interrupt took 33067 ns [ 27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s [ 27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s fio: io_u error [ 27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00 on file /dev/sda[ 27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s The issue is related to how the driver manages submit queues and tags. A single array of submit queues - sdebug_q_arr - with its own set of tags is shared among all shosts. As such, for occasions when we have more than one host it is possible to overload the submit queues and run out of tags. Another separate issue that we may reduce the shost submit queue depth, sdebug_max_queue, dynamically causing the shost to be overloaded. How many IOs which the shost may be sent is fixed at can_queue at init time, which is the same initial value for sdebug_max_queue. So reducing sdebug_max_queue means that the shost may be sent more IOs than it is configured to handle, causing overloading. This series removes the scsi_debug submit queue concept and uses pre-existing APIs to manage and examine tags, like scsi_block_requests() and blk_mq_tagset_busy_iter(). Using standard APIs makes the driver more maintainable and extensible in future. A restriction is also added to allow sdebug_max_queue only be modified when no shosts are present, i.e. we need to remove shosts, modify sdebug_max_queue, and then re-add the shosts. Link: https://lore.kernel.org/r/20230327074310.1862889-1-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Drop sdebug_queue	John Garry
	It's easy to get scsi_debug to error on throughput testing when we have multiple shosts: $ lsscsi [7:0:0:0] disk Linux scsi_debug 0191 [0:0:0:0] disk Linux scsi_debug 0191 $ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read --bs=4k --iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg --eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256 ... fio-3.28 Starting 40 processes [ 27.521809] hrtimer: interrupt took 33067 ns [ 27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s [ 27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s fio: io_u error [ 27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00 on file /dev/sda[ 27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s The issue is related to how the driver manages submit queues and tags. A single array of submit queues - sdebug_q_arr - with its own set of tags is shared among all shosts. As such, for occasions when we have more than one shost it is possible to overload the submit queues and run out of tags. The struct sdebug_queue is to manage tags and hold the associated queued command entry pointer (for that tag). Since the tagset iters are now used for functions like sdebug_blk_mq_poll(), there is no need to manage these queues. Indeed, blk-mq already provides what we need for managing tags and queues. Drop sdebug_queue and all its usage in the driver. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-12-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts	John Garry
	The shost->can_queue value is initially used to set per-HW queue context tag depth in the block layer. This ensures that the shost is not sent too many commands which it can deal with. However lowering sdebug_max_queue separately means that we can easily overload the shost, as in the following example: $ cat /sys/bus/pseudo/drivers/scsi_debug/max_queue 192 $ cat /sys/class/scsi_host/host0/can_queue 192 $ echo 100 > /sys/bus/pseudo/drivers/scsi_debug/max_queue $ cat /sys/class/scsi_host/host0/can_queue 192 $ fio --filename=/dev/sda --direct=1 --rw=read --bs=4k --iodepth=256 --runtime=1200 --numjobs=10 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error iops-test-job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256 ... fio-3.28 Starting 10 processes [ 111.269885] scsi_io_completion_action: 400 callbacks suppressed [ 111.269885] blk_print_req_error: 400 callbacks suppressed [ 111.269889] I/O error, dev sda, sector 440 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2 [ 111.269892] sd 0:0:0:0: [sda] tag#132 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s [ 111.269897] sd 0:0:0:0: [sda] tag#132 CDB: Read(10) 28 00 00 00 01 68 00 00 08 00 [ 111.277058] I/O error, dev sda, sector 360 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2 [...] Ensure that this cannot happen by allowing sdebug_max_queue be modified only when we have no shosts. As such, any shost->can_queue value will match sdebug_max_queue, and sdebug_max_queue cannot be modified separately. Since retired_max_queue is no longer set, remove support. Continue to apply the restriction that sdebug_host_max_queue cannot be modified when sdebug_host_max_queue is set. Adding support for that would mean extra code, and no one has complained about this restriction previously. A command like the following may be used to remove a shost: echo -1 > /sys/bus/pseudo/drivers/scsi_debug/add_host Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-11-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()	John Garry
	The functions to update ndelay and delay value first check whether we have any in-flight IO for any host. It does this by checking if any tag is used in the global submit queues. We can achieve the same by setting the host as blocked and then ensuring that we have no in-flight commands with scsi_host_busy(). Note that scsi_host_busy() checks SCMD_STATE_INFLIGHT flag, which is only set per command after we ensure that the host is not blocked, i.e. we see more commands active after the check for scsi_host_busy() returns 0. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-10-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()	John Garry
	Instead of iterating all deferred commands in the submission queue structures, use blk_mq_tagset_busy_iter(), which is a standard API for this. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-9-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()	John Garry
	Instead of iterating all deferred commands in the submission queue structures, use blk_mq_tagset_busy_iter(), which is a standard API for this. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-8-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd	John Garry
	Eventually we will drop the sdebug_queue struct as it is not really required, so start with making the sdebug_queued_cmd dynamically allocated for the lifetime of the scsi_cmnd in the driver. As an interim measure, make sdebug_queued_cmd.sd_dp a pointer to struct sdebug_defer. Also keep a value of the index allocated in sdebug_queued_cmd.qc_arr in struct sdebug_queued_cmd. To deal with an races in accessing the scsi cmnd allocated struct sdebug_queued_cmd, add a spinlock for the scsi command in its priv area. Races may be between scheduling a command for completion, aborting a command, and the command actually completing and freeing the struct sdebug_queued_cmd. [mkp: typo fix] Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-7-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Use scsi_block_requests() to block queues	John Garry
	The feature to block queues is quite dubious, since it races with in-flight IO. Indeed, it seems unnecessary for block queues for any times we do so. Anyway, to keep the same behaviour, use standard SCSI API to stop IO being sent - scsi_{un}block_requests(). Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-6-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Protect block_unblock_all_queues() with mutex	John Garry
	There is no reason that calls to block_unblock_all_queues() from different context can't race with one another, so protect with the sdebug_host_list_mutex. There's no need for a more fine-grained per shost locking here (and we don't have a per-host lock anyway). Also simplify some touched code in sdebug_change_qdepth(). Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-5-john.g.garry@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Change shost list lock to a mutex	John Garry
	The shost list lock, sdebug_host_list_lock, is a spinlock. We would only lock in non-atomic context in this driver, so use a mutex instead, which is friendlier if we need to schedule when iterating. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-4-john.g.garry@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Don't iter all shosts in clear_luns_changed_on_target()	John Garry
	In clear_luns_changed_on_target(), we iter all devices for all shosts to conditionally clear the SDEBUG_UA_LUNS_CHANGED flag in the per-device uas_bm. One condition to see whether we clear the flag is to test whether the host for the device under consideration is the same as the matching device's (devip) host. This check will only ever pass for devices for the same shost, so only iter the devices for the matching device shost. We can now drop the spinlock'ing of the sdebug_host_list_lock in the same function. This will allow us to use a mutex instead of the spinlock for the global shost lock, as clear_luns_changed_on_target() could be called in non-blocking context, in scsi_debug_queuecommand() -> make_ua() -> clear_luns_changed_on_target() (which is why required a spinlock). Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-3-john.g.garry@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_debug: Fix check for sdev queue full	John Garry
	There is a report that the blktests scsi/004 test for "TASK SET FULL" (TSF) now fails. The condition upon we should issue this TSF is when the sdev queue is full. The check for a full queue has an off-by-1 error. Previously we would increment the number of requests in the queue after testing if the queue would be full, i.e. test if one less than full. Since we now use scsi_device_busy() to count the number of requests in the queue, this would already account for the current request, so fix the test for queue full accordingly. Fixes: 151f0ec9ddb5 ("scsi: scsi_debug: Drop sdebug_dev_info.num_in_q") Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202303201334.18b30edc-oliver.sang@intel.com Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230327074310.1862889-2-john.g.garry@oracle.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	Merge patch series "scsi: hisi_sas: Some misc changes"	Martin K. Petersen
	chenxiang <chenxiang66@hisilicon.com> says: This series contain some fixes including: - Grab sas_dev lock when traversing sas_dev list to avoid NULL pointer - Handle NCQ error when IPTT is valid - Ensure all enabled PHYs up during controller reset - Exit suspend state when usage count of runtime PM is greater than 0 https://lore.kernel.org/r/1679283265-115066-1-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: hisi_sas: Exit suspend state when usage count is greater than 0	Yihang Li
	When the current status of the host controller is suspended, enabling a local PHY just after disabling all local PHYs in expander environment, a hang as follows occurs: [ 486.854655] INFO: task kworker/u256:1:899 blocked for more than 120 seconds. [ 486.862207] Not tainted 6.1.0-rc4+ #1 [ 486.870545] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 486.878893] task:kworker/u256:1 state:D stack:0 pid:899 ppid:2 flags:0x00000008 [ 486.887745] Workqueue: 0000:74:02.0_disco_q sas_discover_domain [libsas] [ 486.894704] Call trace: [ 486.897400] __switch_to+0xf0/0x170 [ 486.901146] __schedule+0x3e4/0x1160 [ 486.904970] schedule+0x64/0x104 [ 486.908442] rpm_resume+0x158/0x6a0 [ 486.912163] __pm_runtime_resume+0x5c/0x84 [ 486.916489] smp_execute_task_sg+0x1f8/0x264 [libsas] [ 486.921773] sas_discover_expander.part.0+0xbc/0x720 [libsas] [ 486.927750] sas_discover_root_expander+0x90/0x154 [libsas] [ 486.933552] sas_discover_domain+0x444/0x6d0 [libsas] [ 486.938826] process_one_work+0x1e0/0x450 [ 486.943057] worker_thread+0x150/0x44c [ 486.947015] kthread+0x114/0x120 [ 486.950447] ret_from_fork+0x10/0x20 [ 486.954292] INFO: task kworker/u256:2:1780 blocked for more than 120 seconds. [ 486.961637] Not tainted 6.1.0-rc4+ #1 [ 486.966087] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 486.974356] task:kworker/u256:2 state:D stack:0 pid:1780 ppid:2 flags:0x00000208 [ 486.983141] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas] [ 486.990252] Call trace: [ 486.992930] __switch_to+0xf0/0x170 [ 486.996645] __schedule+0x3e4/0x1160 [ 487.000439] schedule+0x64/0x104 [ 487.003886] schedule_timeout+0x17c/0x1c0 [ 487.008102] wait_for_completion+0x7c/0x160 [ 487.012488] __flush_workqueue+0x104/0x3e0 [ 487.016782] sas_porte_bytes_dmaed+0x414/0x454 [libsas] [ 487.022203] sas_port_event_worker+0x38/0x60 [libsas] [ 487.027449] process_one_work+0x1e0/0x450 [ 487.031645] worker_thread+0x150/0x44c [ 487.035594] kthread+0x114/0x120 [ 487.039017] ret_from_fork+0x10/0x20 [ 487.042828] INFO: task bash:11488 blocked for more than 121 seconds. [ 487.049366] Not tainted 6.1.0-rc4+ #1 [ 487.053746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 487.061953] task:bash state:D stack:0 pid:11488 ppid:10977 flags:0x00000204 [ 487.070698] Call trace: [ 487.073355] __switch_to+0xf0/0x170 [ 487.077050] __schedule+0x3e4/0x1160 [ 487.080833] schedule+0x64/0x104 [ 487.084270] schedule_timeout+0x17c/0x1c0 [ 487.088474] wait_for_completion+0x7c/0x160 [ 487.092851] __flush_workqueue+0x104/0x3e0 [ 487.097137] drain_workqueue+0xb8/0x160 [ 487.101159] __sas_drain_work+0x50/0x90 [libsas] [ 487.105963] sas_suspend_ha+0x64/0xd4 [libsas] [ 487.110590] suspend_v3_hw+0x198/0x1e8 [hisi_sas_v3_hw] [ 487.115989] pci_pm_runtime_suspend+0x5c/0x1d0 [ 487.120606] __rpm_callback+0x50/0x150 [ 487.124535] rpm_callback+0x74/0x80 [ 487.128204] rpm_suspend+0x110/0x640 [ 487.131955] rpm_idle+0x1f4/0x2d0 [ 487.135447] __pm_runtime_idle+0x58/0x94 [ 487.139538] queue_phy_enable+0xcc/0xf0 [libsas] [ 487.144330] store_sas_phy_enable+0x74/0x100 [ 487.148770] dev_attr_store+0x20/0x34 [ 487.152606] sysfs_kf_write+0x4c/0x5c [ 487.156437] kernfs_fop_write_iter+0x120/0x1b0 [ 487.161049] vfs_write+0x2d0/0x36c [ 487.164625] ksys_write+0x70/0x100 [ 487.168194] __arm64_sys_write+0x24/0x30 [ 487.172280] invoke_syscall+0x50/0x120 [ 487.176186] el0_svc_common.constprop.0+0x168/0x190 [ 487.181214] do_el0_svc+0x34/0xc0 [ 487.184680] el0_svc+0x2c/0xb4 [ 487.187879] el0t_64_sync_handler+0xb8/0xbc [ 487.192205] el0t_64_sync+0x19c/0x1a0 We find that when all local PHYs are disabled, all the devices will be removed, the ->runtime_suspend() callback suspend_v3_hw() directly execute since the controller usage count drop to 0. On the other side, the first local PHY is enabled through the sysfs interface, and ensures that function phy_up_v3_hw() is completed due to suspend_v3_hw()-> interrupt_disable_v3_hw(). In the expander scenario, sas_discover_root_expander() is executed in event work DISCE_DISCOVER_DOMAIN, which will increases the controller usage count and carry out a resume and sends SMPIO, it cannot be completed because the runtime PM status of the controller is RPM_SUSPENDING. At the same time, the ->runtime_suspend() callback suspend_v3_hw() also cannot complete the process because of drain libsas event queue in sas_suspend_ha(), so hung occurs. (thread 1) \| (thread 2) ... \| rpm_idle() \| ... \| __update_runtime_status(RPM_SUSPENDING)\| ... \| ... suspend_v3_hw() \| smp_execute_task_sg() ... \| ... interrupt_disable_v3_hw() \| pm_runtime_get_sync() \| ... ... \| rpm_resume() //RPM_SUSPENDING \| __sas_drain_work() \| To fix this, check if the current runtime PM status of the controller allows to be suspended continue after interrupt_disable_v3_hw(), return immediately if not. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hislicon.com> Link: https://lore.kernel.org/r/1679283265-115066-5-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: hisi_sas: Ensure all enabled PHYs up during controller reset	Yihang Li
	For the controller reset operation, hisi_sas_phy_enable() is executed for each enabled local PHY, and refresh the port id of each device based on the latest hisi_sas_phy->port_id after 1 second sleep, hisi_sas_phy->port_id is configured in the interrupt processing function phy_up_v3_hw(). However, in directly attached scenario, for some SATA disks the amount of time for phyup more than 1s sometimes. In this case, incorrect port id may be configured in hisi_sas_refresh_port_id(). As a result, all the internal IOs fail and disk lost, such as follows: [10717.666565] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=10(sata) [10718.826813] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=63 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc8003f 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.843428] sas: TMF task open reject failed 5000000000000501 [10718.849242] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=64 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80040 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.865856] sas: TMF task open reject failed 5000000000000501 [10718.871670] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=65 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80041 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.888284] sas: TMF task open reject failed 5000000000000501 [10718.894093] sas: executing TMF for 5000000000000501 failed after 3 attempts! [10718.901114] hisi_sas_v3_hw 0000:74:02.0: ata disk 5000000000000501 reset failed [10718.908410] hisi_sas_v3_hw 0000:74:02.0: controller reset complete ..... [10773.298633] ata216.00: revalidation failed (errno=-19) [10773.303753] ata216.00: disable device So the time of waitting for PHYs up is 1s which may be not enough. To solve the issue, running hisi_sas_phy_enable() in parallel through async operations and use wait_for_completion_timeout() to wait for PHYs come up instead of directly sleep for 1 second. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-4-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: hisi_sas: Handle NCQ error when IPTT is valid	Xingui Yang
	If an NCQ error occurs when the IPTT is valid and slot->abort flag is set in completion path, sas_task_abort() will be called to abort only one NCQ command now, and the host would be set to SHOST_RECOVERY state. But this may not kick-off EH Immediately until other outstanding QCs timeouts. As a result, the host may remain in the SHOST_RECOVERY state for up to 30 seconds, such as follows: [7972317.645234] hisi_sas_v3_hw 0000:74:04.0: erroneous completion iptt=3264 task=00000000466116b8 dev id=2 sas_addr=0x5000000000000502 CQ hdr: 0x1883 0x20cc0 0x40000 0x20420000 Error info: 0x0 0x0 0x200000 0x0 [7972341.508264] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [7972341.984731] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 32 tries: 1 All NCQ commands that are in the queue should be aborted when an NCQ error occurs in this scenario. Fixes: 05d91b557af9 ("scsi: hisi_sas: Directly trigger SCSI error handling for completion errors") Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: hisi_sas: Grab sas_dev lock when traversing the members of sas_dev.list	Xingui Yang
	When freeing slots in function slot_complete_v3_hw(), it is possible that sas_dev.list is being traversed elsewhere, and it may trigger a NULL pointer exception, such as follows: ==>cq thread ==>scsi_eh_6 ==>scsi_error_handler() ==>sas_eh_handle_sas_errors() ==>sas_scsi_find_task() ==>lldd_abort_task() ==>slot_complete_v3_hw() ==>hisi_sas_abort_task() ==>hisi_sas_slot_task_free() ==>dereg_device_v3_hw() ==>list_del_init() ==>list_for_each_entry_safe() [ 7165.434918] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [ 7165.434926] sas: trying to find task 0x00000000769b5ba5 [ 7165.434927] sas: sas_scsi_find_task: aborting task 0x00000000769b5ba5 [ 7165.434940] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000769b5ba5) aborted [ 7165.434964] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000c9f7aa07) ignored [ 7165.434965] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000e2a1cf01) ignored [ 7165.434968] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 7165.434972] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000022d52d93) ignored [ 7165.434975] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000066a7516c) ignored [ 7165.434976] Mem abort info: [ 7165.434982] ESR = 0x96000004 [ 7165.434991] Exception class = DABT (current EL), IL = 32 bits [ 7165.434992] SET = 0, FnV = 0 [ 7165.434993] EA = 0, S1PTW = 0 [ 7165.434994] Data abort info: [ 7165.434994] ISV = 0, ISS = 0x00000004 [ 7165.434995] CM = 0, WnR = 0 [ 7165.434997] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f29543f2 [ 7165.434998] [0000000000000000] pgd=0000000000000000 [ 7165.435003] Internal error: Oops: 96000004 [#1] SMP [ 7165.439863] Process scsi_eh_6 (pid: 4109, stack limit = 0x00000000c43818d5) [ 7165.468862] pstate: 00c00009 (nzcv daif +PAN +UAO) [ 7165.473637] pc : dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.479443] lr : dereg_device_v3_hw+0x2c/0xa8 [hisi_sas_v3_hw] [ 7165.485247] sp : ffff00001d623bc0 [ 7165.488546] x29: ffff00001d623bc0 x28: ffffa027d03b9508 [ 7165.493835] x27: ffff80278ed50af0 x26: ffffa027dd31e0a8 [ 7165.499123] x25: ffffa027d9b27f88 x24: ffffa027d9b209f8 [ 7165.504411] x23: ffffa027c45b0d60 x22: ffff80278ec07c00 [ 7165.509700] x21: 0000000000000008 x20: ffffa027d9b209f8 [ 7165.514988] x19: ffffa027d9b27f88 x18: ffffffffffffffff [ 7165.520276] x17: 0000000000000000 x16: 0000000000000000 [ 7165.525564] x15: ffff0000091d9708 x14: ffff0000093b7dc8 [ 7165.530852] x13: ffff0000093b7a23 x12: 6e7265746e692067 [ 7165.536140] x11: 0000000000000000 x10: 0000000000000bb0 [ 7165.541429] x9 : ffff00001d6238f0 x8 : ffffa027d877af00 [ 7165.546718] x7 : ffffa027d6329600 x6 : ffff7e809f58ca00 [ 7165.552006] x5 : 0000000000001f8a x4 : 000000000000088e [ 7165.557295] x3 : ffffa027d9b27fa8 x2 : 0000000000000000 [ 7165.562583] x1 : 0000000000000000 x0 : 000000003000188e [ 7165.567872] Call trace: [ 7165.570309] dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.575775] hisi_sas_abort_task+0x248/0x358 [hisi_sas_main] [ 7165.581415] sas_eh_handle_sas_errors+0x258/0x8e0 [libsas] [ 7165.586876] sas_scsi_recover_host+0x134/0x458 [libsas] [ 7165.592082] scsi_error_handler+0xb4/0x488 [ 7165.596163] kthread+0x134/0x138 [ 7165.599380] ret_from_fork+0x10/0x18 [ 7165.602940] Code: d5033e9f b9000040 aa0103e2 eb03003f (f9400021) [ 7165.609004] kernel fault(0x1) notification starting on CPU 75 [ 7165.700728] ---[ end trace fc042cbbea224efc ]--- [ 7165.705326] Kernel panic - not syncing: Fatal exception To fix the issue, grab sas_dev lock when traversing the members of sas_dev.list in dereg_device_v3_hw() and hisi_sas_release_tasks() to avoid concurrency of adding and deleting member. When function hisi_sas_release_tasks() calls hisi_sas_do_release_task() to free slot, the lock cannot be grabbed again in hisi_sas_slot_task_free(), then a bool parameter need_lock is added. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: qla4xxx: Remove unused 'count' variable	Tom Rix
	clang with W=1 reports: drivers/scsi/qla4xxx/ql4_isr.c:475:11: error: variable 'count' set but not used [-Werror,-Wunused-but-set-variable] uint32_t count = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230331175757.1860780-1-trix@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: snic: Remove unused 'xfer_len' variable	Tom Rix
	clang with W=1 reports: drivers/scsi/snic/snic_scsi.c:490:6: error: variable 'xfer_len' set but not used [-Werror,-Wunused-but-set-variable] u64 xfer_len = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230328001647.1778448-1-trix@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: qedf: Remove unused 'num_handled' variable	Tom Rix
	clang with W=1 reports: drivers/scsi/qedf/qedf_main.c:2227:6: error: variable 'num_handled' set but not used [-Werror,-Wunused-but-set-variable] int num_handled = 0; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230330203444.1842425-1-trix@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()	Zhong Jinghua
	The validity of sock should be checked before assignment to avoid incorrect values. Commit 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") introduced this change which may lead to inconsistent values of tcp_sw_conn->sendpage and conn->datadgst_en. Fix the issue by moving the position of the assignment. Fixes: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com> Link: https://lore.kernel.org/r/20230329071739.2175268-1-zhongjinghua@huaweicloud.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()	Li Zetao
	There is a memory leak reported by kmemleak: unreferenced object 0xffffc900003f0000 (size 12288): comm "modprobe", pid 19117, jiffies 4299751452 (age 42490.264s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000629261a8>] __vmalloc_node_range+0xe56/0x1110 [<0000000001906886>] __vmalloc_node+0xbd/0x150 [<000000005bb4dc34>] vmalloc+0x25/0x30 [<00000000a2dc1194>] qla2x00_create_host+0x7a0/0xe30 [qla2xxx] [<0000000062b14b47>] qla2x00_probe_one+0x2eb8/0xd160 [qla2xxx] [<00000000641ccc04>] local_pci_probe+0xeb/0x1a0 The root cause is traced to an error-handling path in qla2x00_probe_one() when the adapter "base_vha" initialize failed. The fab_scan_rp "scan.l" is used to record the port information and it is allocated in qla2x00_create_host(). However, it is not released in the error handling path "probe_failed". Fix this by freeing the memory of "scan.l" when an error occurs in the adapter initialization process. Fixes: a4239945b8ad ("scsi: qla2xxx: Add switch command to simplify fabric discovery") Signed-off-by: Li Zetao <lizetao1@huawei.com> Link: https://lore.kernel.org/r/20230325110004.363898-1-lizetao1@huawei.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: scsi_transport_fc: Remove unused 'desc_cnt' variable	Tom Rix
	clang with W=1 reports: drivers/scsi/scsi_transport_fc.c:908:6: error: variable 'desc_cnt' set but not used [-Werror,-Wunused-but-set-variable] u32 desc_cnt = 0, bytes_remain; ^ This variable is not used so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230326003222.1343671-1-trix@redhat.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: sr: Simplify the sr_open() function	Enze Li
	Simplify the sr_open() by removing the goto label as the function only returns one error code. Signed-off-by: Enze Li <lienze@kylinos.cn> Link: https://lore.kernel.org/r/20230327030237.3407253-1-lienze@kylinos.cn Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)	Ranjan Kumar
	The driver is exiting from the fault watchdog thread if it sees the 0xF002 (Soft reset in progress) fault code. If the driver initiates the soft reset, then the driver restarts the watchdog at the end of the soft reset completion. However, if the soft reset is initiated by the firmware asynchronously, then the driver will never restart the watchdog and never re-initialize the controller after the asynchronous soft reset completion. Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20230331122317.11391-1-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-04-02	scsi: libsas: Abort all in-flight requests when device is gone	Jason Yan
	When a disk is removed with in-flight I/O, the application needs to wait for 30 seconds (depending on the timeout configuration) to hear back from the kernel. Xingui tried to fix this issue by aborting the ATA link for SATA devices[1], however this approach left the SAS devices unresolved. Try to fix this issue by aborting all in-flight requests when the device is gone. This is implemented by iterating over the tagset. [1] https://lore.kernel.org/lkml/234e04db-7539-07e4-a6b8-c6b05f78193d@opensource.wdc.com/T/ Cc: Xingui Yang <yangxingui@huawei.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20230330110930.175539-1-yanaijie@huawei.com Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-31	Merge branch '6.3/scsi-fixes' into 6.4/scsi-staging	Martin K. Petersen
	Pull in the fixes branch to resolve an mpi3mr conflict reported by sfr. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-30	Merge tag 'scsi-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four small fixes, three in drivers. The core fix is yet another attempt to insulate us from UFS devices' weird behaviour for VPD pages" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: Don't print sense pool info twice scsi: core: Improve scsi_vpd_inquiry() checks scsi: megaraid_sas: Fix crash after a double completion scsi: megaraid_sas: Fix fw_crash_buffer_show()
2023-03-27	net: introduce a config option to tweak MAX_SKB_FRAGS	Eric Dumazet
	Currently, MAX_SKB_FRAGS value is 17. For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuffing 32768 bytes per frag. But with zero copy, we use order-0 pages. For BIG TCP to show its full potential, we add a config option to be able to fit up to 45 segments per skb. This is also needed for BIG TCP rx zerocopy, as zerocopy currently does not support skbs with frag list. We have used MAX_SKB_FRAGS=45 value for years at Google before we deployed 4K MTU, with no adverse effect, other than a recent issue in mlx4, fixed in commit 26782aad00cc ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS") Back then, goal was to be able to receive full size (64KB) GRO packets without the frag_list overhead. Note that /proc/sys/net/core/max_skb_frags can also be used to limit the number of fragments TCP can use in tx packets. By default we keep the old/legacy value of 17 until we get more coverage for the updated values. Sizes of struct skb_shared_info on 64bit arches MAX_SKB_FRAGS \| sizeof(struct skb_shared_info): ============================================== 17 320 21 320+64 = 384 25 320+128 = 448 29 320+192 = 512 33 320+256 = 576 37 320+320 = 640 41 320+384 = 704 45 320+448 = 768 This inflation might cause problems for drivers assuming they could pack both the incoming packet (for MTU=1500) and skb_shared_info in half a page, using build_skb(). v3: fix build error when CONFIG_NET=n v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long" Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20230323162842.1935061-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-24	scsi: mpt3sas: Don't print sense pool info twice	Jerry Snitselaar
	_base_allocate_sense_dma_pool() already prints out the sense pool information, so don't print it a second time after calling it in _base_allocate_memory_pools(). In addition the version in _base_allocate_memory_pools() was using the wrong size value, sz, which was last assigned when doing some nvme calculations instead of sense_sz to determine the pool size in kilobytes. Cc: Sathya Prakash <sathya.prakash@broadcom.com> Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com> Cc: MPT-FusionLinux.pdl@broadcom.com Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Fixes: 970ac2bb70e7 ("scsi: mpt3sas: Force sense buffer allocations to be within same 4 GB region") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20230324193204.567932-1-jsnitsel@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24	scsi: core: Improve scsi_vpd_inquiry() checks	Damien Le Moal
	Some USB-SATA adapters have broken behavior when an unsupported VPD page is probed: Depending on the VPD page number, a 4-byte header with a valid VPD page number but with a 0 length is returned. Currently, scsi_vpd_inquiry() only checks that the page number is valid to determine if the page is valid, which results in receiving only the 4-byte header for the non-existent page. This error manifests itself very often with page 0xb9 for the Concurrent Positioning Ranges detection done by sd_read_cpr(), resulting in the following error message: sd 0:0:0:0: [sda] Invalid Concurrent Positioning Ranges VPD page Prevent such misleading error message by adding a check in scsi_vpd_inquiry() to verify that the page length is not 0. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20230322022211.116327-1-damien.lemoal@opensource.wdc.com Reviewed-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24	scsi: megaraid_sas: Fix crash after a double completion	Tomas Henzl
	When a physical disk is attached directly "without JBOD MAP support" (see megasas_get_tm_devhandle()) then there is no real error handling in the driver. Return FAILED instead of SUCCESS. Fixes: 18365b138508 ("megaraid_sas: Task management support") Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://lore.kernel.org/r/20230324150134.14696-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24	scsi: megaraid_sas: Fix fw_crash_buffer_show()	Tomas Henzl
	If crash_dump_buf is not allocated then crash dump can't be available. Replace logical 'and' with 'or'. Signed-off-by: Tomas Henzl <thenzl@redhat.com> Link: https://lore.kernel.org/r/20230324135249.9733-1-thenzl@redhat.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24	scsi: megaraid: Fix mega_cmd_done() CMDID_INT_CMDS	Danila Chernetsov
	When cmdid == CMDID_INT_CMDS, the 'cmds' pointer is NULL but is dereferenced below. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 0f2bb84d2a68 ("[SCSI] megaraid: simplify internal command handling") Signed-off-by: Danila Chernetsov <listdansp@mail.ru> Link: https://lore.kernel.org/r/20230317175109.18585-1-listdansp@mail.ru Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24	Merge patch series "Constify most SCSI host templates"	Martin K. Petersen
	Bart Van Assche <bvanassche@acm.org> says: It helps humans and the compiler if it is made explicit that SCSI host templates are not modified. Hence this patch series that constifies most SCSI host templates. Please consider this patch series for the next merge window. Link: https://lore.kernel.org/r/20230322195515.1267197-1-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-24	scsi: xen-scsifront: Declare SCSI host template const	Bart Van Assche
	Make it explicit that the SCSI host template is not modified. Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230322195515.1267197-77-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>