summaryrefslogtreecommitdiff
path: root/drivers/nvme/host/pci.c
AgeCommit message (Collapse)Author
2022-07-26nvme-pci: convert to using dma_map_sgtable()Logan Gunthorpe
The dma_map operations now support P2PDMA pages directly. So remove the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls to dma_map_sgtable(). dma_map_sgtable() returns more complete error codes than dma_map_sg() and allows differentiating EREMOTEIO errors in case an unsupported P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET so the request isn't retried. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26nvme-pci: check DMA ops when indicating support for PCI P2PDMALogan Gunthorpe
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops flags can be checked for PCI P2PDMA support. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-25nvme-pci: Crucial P2 has bogus namespace idsTobias Gruetzmacher
This adds a quirk for the Crucial P2. Signed-off-by: Tobias Gruetzmacher <tobias-git@23.gs> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-14nvme-pci: fix freeze accounting for error handlingKeith Busch
A reset on a live device experiencing a link error still needs to have the queue freeze state started for the subsequent reinitialization. Skip only the register read if the device is not present instead of bypassing the freeze checks. Fixes: b98235d3a471e ("nvme-pci: harden drive presence detect in nvme_dev_disable()") Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06nvme-pci: phison e16 has bogus namespace idsKeith Busch
Add the quirk. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049 Reported-by: Chris Egolf <cegolf@ugholf.net> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-06blk-mq: Drop blk_mq_ops.timeout 'reserved' argJohn Garry
With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-30nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1Lamarque Vieira Souza
ADATA IM2P33F8ABR1 reports bogus eui64 values that appear to be the same across all drives. Quirk them out so they are not marked as "non globally unique" duplicates. Co-developed-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Lamarque V. Souza <lamarque.souza@petrosoftdesign.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-29nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)Pablo Greco
ADATA XPG SPECTRIX S40G drives report bogus eui64 values that appear to be the same across drives in one system. Quirk them out so they are not marked as "non globally unique" duplicates. Before: [ 2.258919] nvme nvme1: pci function 0000:06:00.0 [ 2.264898] nvme nvme2: pci function 0000:05:00.0 [ 2.323235] nvme nvme1: failed to set APST feature (2) [ 2.326153] nvme nvme2: failed to set APST feature (2) [ 2.333935] nvme nvme1: allocated 64 MiB host memory buffer. [ 2.336492] nvme nvme2: allocated 64 MiB host memory buffer. [ 2.339611] nvme nvme1: 7/0/0 default/read/poll queues [ 2.341805] nvme nvme2: 7/0/0 default/read/poll queues [ 2.346114] nvme1n1: p1 [ 2.347197] nvme nvme2: globally duplicate IDs for nsid 1 After: [ 2.427715] nvme nvme1: pci function 0000:06:00.0 [ 2.427771] nvme nvme2: pci function 0000:05:00.0 [ 2.488154] nvme nvme2: failed to set APST feature (2) [ 2.489895] nvme nvme1: failed to set APST feature (2) [ 2.498773] nvme nvme2: allocated 64 MiB host memory buffer. [ 2.500587] nvme nvme1: allocated 64 MiB host memory buffer. [ 2.504113] nvme nvme2: 7/0/0 default/read/poll queues [ 2.507026] nvme nvme1: 7/0/0 default/read/poll queues [ 2.509467] nvme nvme2: Ignoring bogus Namespace Identifiers [ 2.512804] nvme nvme1: Ignoring bogus Namespace Identifiers [ 2.513698] nvme1n1: p1 Signed-off-by: Pablo Greco <pgreco@centosproject.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-28block: simplify disk shutdownChristoph Hellwig
Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-23nvme: move the Samsung X5 quirk entry to the core quirksChristoph Hellwig
This device shares the PCI ID with the Samsung 970 Evo Plus that does not need or want the quirks. Move the the quirk entry to the core table based on the model number instead. Fixes: bc360b0b1611 ("nvme-pci: add quirks for Samsung X5 SSDs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
2022-06-23nvme: add a bogus subsystem NQN quirk for Micron MTFDKBA2T0TFHLeo Savernik
The Micron MTFDKBA2T0TFH device reports the same subsysem NQN for all devices. Add a quick to ignore it. Signed-off-by: Leo Savernik <l.savernik@aon.at> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: disable write zeros support on UMIC and Samsung SSDsrasheed.hsueh
Like commit 5611ec2b9814 ("nvme-pci: prevent SK hynix PC400 from using Write Zeroes command"), UMIS and Samsung has the same issue: [ 6305.633887] blk_update_request: operation not supported error, dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0 So also disable Write Zeroes command on UMIS and Samsung. Signed-off-by: rasheed.hsueh <rasheed.hsueh@lcfc.corp-partner.google.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDsNing Wang
When ZHITAI TiPro7000 SSDs entered deepest power state(ps4) it has the same APST sleep problem as Kingston A2000. by chance the system crashes and displays the same dmesg info: https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65 As the Archlinux wiki suggest (enlat + exlat) < 25000 is fine and my testing shows no system crashes ever since. Therefore disabling the deepest power state will fix the APST sleep issue. https://wiki.archlinux.org/title/Solid_state_drive/NVMe This is the APST data from 'nvme id-ctrl /dev/nvme1' NVME Identify Controller: vid : 0x1e49 ssvid : 0x1e49 sn : [...] mn : ZHITAI TiPro7000 1TB fr : ZTA32F3Y [...] ps 0 : mp:3.50W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:3.30W operational enlat:50 exlat:100 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:2.80W operational enlat:50 exlat:200 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.1500W non-operational enlat:500 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0200W non-operational enlat:2000 exlat:60000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- Signed-off-by: Ning Wang <ningwang35@outlook.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: sk hynix p31 has bogus namespace idsKeith Busch
Add the quirk. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: smi has bogus namespace idsKeith Busch
Add the quirk. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216096 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: phison e12 has bogus namespace idsKeith Busch
Add the quirk. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049 Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S50Stefan Reiter
ADATA XPG GAMMIX S50 drives report bogus eui64 values that appear to be the same across drives in one system. Quirk them out so they are not marked as "non globally unique" duplicates. Signed-off-by: Stefan Reiter <stefan@pimaker.at> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme-pci: add trouble shooting steps for timeoutsKeith Busch
Many users have encountered IO timeouts with a CSTS value of 0xffffffff, which indicates a failure to read the register. While there are various potential causes for this observation, faulty NVMe APST has been the culprit quite frequently. Add the recommended troubleshooting steps in the error output when this condition occurs. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-13nvme: add bug report info for global duplicate idKeith Busch
The recent global id check is finding poorly implemented devices in the wild. Include relavant device information in the output to help quicken an appropriate quirk patch. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-03Merge tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more block driver updates from Jens Axboe: "A collection of stragglers that were late on sending in their changes and just followup fixes. - NVMe fixes pull request via Christoph: - set controller enable bit in a separate write (Niklas Cassel) - disable namespace identifiers for the MAXIO MAP1001 (Christoph) - fix a comment typo (Julia Lawall)" - MD fixes pull request via Song: - Remove uses of bdevname (Christoph Hellwig) - Bug fixes (Guoqing Jiang, and Xiao Ni) - bcache fixes series (Coly) - null_blk zoned write fix (Damien) - nbd fixes (Yu, Zhang) - Fix for loop partition scanning (Christoph)" * tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block: (23 commits) block: null_blk: Fix null_zone_write() nvmet: fix typo in comment nvme: set controller enable bit in a separate write nvme-pci: disable namespace identifiers for the MAXIO MAP1001 bcache: avoid unnecessary soft lockup in kworker update_writeback_rate() nbd: use pr_err to output error message nbd: fix possible overflow on 'first_minor' in nbd_dev_add() nbd: fix io hung while disconnecting device nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed nbd: fix race between nbd_alloc_config() and module removal nbd: call genl_unregister_family() first in nbd_cleanup() md: bcache: check the return value of kzalloc() in detached_dev_do_request() bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() block, loop: support partitions without scanning bcache: avoid journal no-space deadlock by reserving 1 journal bucket bcache: remove incremental dirty sector counting for bch_sectors_dirty_init() bcache: improve multithreaded bch_sectors_dirty_init() bcache: improve multithreaded bch_btree_check() md: fix double free of io_acct_set bioset md: Don't set mddev private to NULL in raid0 pers->free ...
2022-05-31nvme-pci: disable namespace identifiers for the MAXIO MAP1001Christoph Hellwig
The MAXIO MAP1001 controllers reports completely bogus Namespace identifiers that even change after suspend cycles. Disable using the Identifiers entirely. Reported-by: Arman Hajishafieha <arman.hajishafieha@hotmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Tested-by: Arman Hajishafieha <arman.hajishafieha@hotmail.com>
2022-05-28blk-mq: remove the done argument to blk_execute_rq_nowaitChristoph Hellwig
Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-16nvme-pci: harden drive presence detect in nvme_dev_disable()Stefan Roese
On our ZynqMP system we observe, that a NVMe drive that resets itself while doing a firmware update causes a Kernel crash like this: [ 67.720772] pcieport 0000:02:02.0: pciehp: Slot(2): Link Down [ 67.720783] pcieport 0000:02:02.0: pciehp: Slot(2): Card not present [ 67.720795] nvme 0000:04:00.0: PME# disabled [ 67.720849] Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP [ 67.720853] nwl-pcie fd0e0000.pcie: Slave error Analysis: When nvme_dev_disable() is called because of this PCIe hotplug event, pci_is_enabled() is still true. And accessing the NVMe drive which is currently not available as it's in reboot process causes this "synchronous external abort" on this ARM64 platform. This patch adds the pci_device_is_present() check as well, which returns false in this "Card not present" hot-plug case. With this change, the NVMe driver does not try to access the NVMe registers any more and the FW update finishes without any problems. Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-05-16nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tagsSmith, Kyle Miller (Nimble Kernel)
In nvme_alloc_admin_tags, the admin_q can be set to an error (typically -ENOMEM) if the blk_mq_init_queue call fails to set up the queue, which is checked immediately after the call. However, when we return the error message up the stack, to nvme_reset_work the error takes us to nvme_remove_dead_ctrl() nvme_dev_disable() nvme_suspend_queue(&dev->queues[0]). Here, we only check that the admin_q is non-NULL, rather than not an error or NULL, and begin quiescing a queue that never existed, leading to bad / NULL pointer dereference. Signed-off-by: Kyle Smith <kyles@hpe.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-05-16nvme: mark internal passthru request RQF_QUIETChaitanya Kulkarni
Most of the internal passthru commands use __nvme_submit_sync_cmd() interface. There are few places we open code the request submission :- 1. nvme_keep_alive_work(struct work_struct *work) 2. nvme_timeout(struct request *req, bool reserved) 3. nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode) Mark the internal passthru request quiet so that we can skip the verbose error message from nvme_log_error() in nvme_end_req() completion path, this will be consistent with what we have in __nvme_submit_sync_cmd(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Alan Adamson <alan.adamson@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-04-15nvme-pci: disable namespace identifiers for Qemu controllersChristoph Hellwig
Qemu unconditionally reports a UUID, which depending on the qemu version is either all-null (which is incorrect but harmless) or contains a single bit set for all controllers. In addition it can also optionally report a eui64 which needs to be manually set. Disable namespace identifiers for Qemu controlles entirely even if in some cases they could be set correctly through manual intervention. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-04-15nvme-pci: disable namespace identifiers for the MAXIO MAP1002/1202Christoph Hellwig
The MAXIO MAP1002/1202 controllers reports completely bogus Namespace identifiers that even change after suspend cycles. Disable using the Identifiers entirely. Reported-by: 金韬 <me@kingtous.cn> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Tested-by: 金韬 <me@kingtous.cn>
2022-04-01Merge tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver fixes from Jens Axboe: "Followup block driver updates and fixes for the 5.18-rc1 merge window. In detail: - NVMe pull request - Fix multipath hang when disk goes live over reconnect (Anton Eidelman) - fix RCU hole that allowed for endless looping in multipath round robin (Chris Leech) - remove redundant assignment after left shift (Colin Ian King) - add quirks for Samsung X5 SSDs (Monish Kumar R) - fix the read-only state for zoned namespaces with unsupposed features (Pankaj Raghav) - use a private workqueue instead of the system workqueue in nvmet (Sagi Grimberg) - allow duplicate NSIDs for private namespaces (Sungup Moon) - expose use_threaded_interrupts read-only in sysfs (Xin Hao)" - nbd minor allocation fix (Zhang) - drbd fixes and maintainer addition (Lars, Jakob, Christoph) - n64cart build fix (Jackie) - loop compat ioctl fix (Carlos) - misc fixes (Colin, Dongli)" * tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block: drbd: remove check of list iterator against head past the loop body drbd: remove usage of list iterator variable after loop nbd: fix possible overflow on 'first_minor' in nbd_dev_add() MAINTAINERS: add drbd co-maintainer drbd: fix potential silent data corruption loop: fix ioctl calls using compat_loop_info nvme-multipath: fix hang when disk goes live over reconnect nvme: fix RCU hole that allowed for endless looping in multipath round robin nvme: allow duplicate NSIDs for private namespaces nvmet: remove redundant assignment after left shift nvmet: use a private workqueue instead of the system workqueue nvme-pci: add quirks for Samsung X5 SSDs nvme-pci: expose use_threaded_interrupts read-only in sysfs nvme: fix the read-only state for zoned namespaces with unsupposed features n64cart: convert bi_disk to bi_bdev->bd_disk fix build xen/blkfront: fix comment for need_copy xen-blkback: remove redundant assignment to variable i
2022-03-23nvme-pci: add quirks for Samsung X5 SSDsMonish Kumar R
Add quirks to not fail the initialization and to have quick resume latency after cold/warm reboot. Signed-off-by: Monish Kumar R <monish.kumar.r@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-23nvme-pci: expose use_threaded_interrupts read-only in sysfsXin Hao
Allow reading /sys/module/nvme/parameters/use_threaded_interrupts to see if the use_threaded_interrupts module parameter is in use. Signed-off-by: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-22Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-16nvme: remove nvme_alloc_request and nvme_alloc_request_qidChristoph Hellwig
Just open code the allocation + initialization in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-03-03mm: don't include <linux/memremap.h> in <linux/mm.h>Christoph Hellwig
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-01-27nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDsWu Zheng
The Intel P4500/P4600 SSDs do not report a subsystem NQN despite claiming compliance to a standards version where reporting one is required. Add the IGNORE_DEV_SUBNQN quirk to not fail the initialization of a second such SSDs in a system. Signed-off-by: Zheng Wu <wu.zheng@intel.com> Signed-off-by: Ye Jinhe <jinhe.ye@intel.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-01-05nvme-pci: fix queue_rqs list splittingKeith Busch
If command prep fails, current handling will orphan subsequent requests in the list. Consider a simple example: rqlist = [ 1 -> 2 ] When prep for request '1' fails, it will be appended to the 'requeue_list', leaving request '2' disconnected from the original rqlist and no longer tracked. Meanwhile, rqlist is still pointing to the failed request '1' and will attempt to submit the unprepped command. Fix this by updating the rqlist accordingly using the request list helper functions. Fixes: d62cbcf62f2f ("nvme: add support for mq_ops->queue_rqs()") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220105170518.3181469-5-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-16nvme: add support for mq_ops->queue_rqs()Jens Axboe
This enables the block layer to send us a full plug list of requests that need submitting. The block layer guarantees that they all belong to the same queue, but we do have to check the hardware queue mapping for each request. If errors are encountered, leave them in the passed in list. Then the block layer will handle them individually. This is good for about a 4% improvement in peak performance, taking us from 9.6M to 10M IOPS/core. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-16nvme: separate command prep and issueJens Axboe
Add a nvme_prep_rq() helper to setup a command, and nvme_queue_rq() is adapted to use this helper. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-16nvme: split command copy into a helperJens Axboe
We'll need it for batched submit as well. Since we now have a copy helper, get rid of the nvme_submit_cmd() wrapper. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29block: remove the gendisk argument to blk_execute_rqChristoph Hellwig
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait given that it is unused now. Also convert the boolean at_head parameter to actually use the bool type while touching the prototype. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-10-20nvme-pci: clear shadow doorbell memory on resetsKeith Busch
The host memory doorbell and event buffers need to be initialized on each reset so the driver doesn't observe stale values from the previous instantiation. Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: John Levon <john.levon@nutanix.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-19nvme: apply nvme API to quiesce/unquiesce admin queueMing Lei
Apply the added two APIs to quiesce/unquiesce admin queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18nvme: wire up completion batching for the IRQ pathJens Axboe
Trivial to do now, just need our own io_comp_batch on the stack and pass that in to the usual command completion handling. I pondered making this dependent on how many entries we had to process, but even for a single entry there's no discernable difference in performance or latency. Running a sync workload over io_uring: t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1 yields the below performance before the patch: IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) and the following after: IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1) IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1) which definitely isn't slower, about the same if you factor in a bit of variance. For peak performance workloads, benchmarking shows a 2% improvement. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18nvme: add support for batched completion of polled IOJens Axboe
Take advantage of struct io_comp_batch, if passed in to the nvme poll handler. If it's set, rather than complete each request individually inline, store them in the io_comp_batch list. We only do so for requests that will complete successfully, anything else will be completed inline as before. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: add a struct io_comp_batch argument to fops->iopoll()Jens Axboe
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: move integrity handling out of <linux/blkdev.h>Christoph Hellwig
Split the integrity/metadata handling definitions out into a new header. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-14Merge tag 'nvme-5.15-2021-10-14' of git://git.infradead.org/nvme into block-5.15Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 5.15: - fix the abort command id (Keith Busch) - nvme: fix per-namespace chardev deletion (Adam Manzanares)" * tag 'nvme-5.15-2021-10-14' of git://git.infradead.org/nvme: nvme: fix per-namespace chardev deletion nvme-pci: Fix abort command id
2021-10-07nvme-pci: Fix abort command idKeith Busch
The request tag is no longer the only component of the command id. Fixes: e7006de6c2380 ("nvme: code command_id with a genctr for use-after-free validation") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2021-09-27nvme: add command id quirk for apple controllersKeith Busch
Some apple controllers use the command id as an index to implementation specific data structures and will fail if the value is out of bounds. The nvme driver's recently introduced command sequence number breaks this controller. Provide a quirk so these spec incompliant controllers can function as before. The driver will not have the ability to detect bad completions when this quirk is used, but we weren't previously checking this anyway. The quirk bit was selected so that it can readily apply to stable. Link: https://bugzilla.kernel.org/show_bug.cgi?id=214509 Cc: Sven Peter <sven@svenpeter.dev> Reported-by: Orlando Chamberlain <redecorating@protonmail.com> Reported-by: Aditya Garg <gargaditya08@live.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20210927154306.387437-1-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-16nvme: allow user toggling hmb usageKeith Busch
The NVMe host memory buffer may consume a non-negligable amount of memory. Controllers are required to function without the host memory buffer enabled, but with possibly degraded performance. Export a sysfs property to toggle this feature on a per-device granularity so users may choose to reclaim memory at the expense of storage performance. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>