Age | Commit message (Collapse) | Author |
|
The dma_map operations now support P2PDMA pages directly. So remove
the calls to pci_p2pdma_[un]map_sg_attrs() and replace them with calls
to dma_map_sgtable().
dma_map_sgtable() returns more complete error codes than dma_map_sg()
and allows differentiating EREMOTEIO errors in case an unsupported
P2PDMA transfer is requested. When this happens, return BLK_STS_TARGET
so the request isn't retried.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This adds a quirk for the Crucial P2.
Signed-off-by: Tobias Gruetzmacher <tobias-git@23.gs>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
A reset on a live device experiencing a link error still needs to have
the queue freeze state started for the subsequent reinitialization. Skip
only the register read if the device is not present instead of bypassing
the freeze checks.
Fixes: b98235d3a471e ("nvme-pci: harden drive presence detect in nvme_dev_disable()")
Reported-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049
Reported-by: Chris Egolf <cegolf@ugholf.net>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
With new API blk_mq_is_reserved_rq() we can tell if a request is from
the reserved pool, so stop passing 'reserved' arg. There is actually
only a single user of that arg for all the callback implementations, which
can use blk_mq_is_reserved_rq() instead.
This will also allow us to stop passing the same 'reserved' around the
blk-mq iter functions next.
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ADATA IM2P33F8ABR1 reports bogus eui64 values that appear to be the same
across all drives. Quirk them out so they are not marked as "non globally
unique" duplicates.
Co-developed-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com>
Signed-off-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com>
Signed-off-by: Lamarque V. Souza <lamarque.souza@petrosoftdesign.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
ADATA XPG SPECTRIX S40G drives report bogus eui64 values that appear to
be the same across drives in one system. Quirk them out so they are
not marked as "non globally unique" duplicates.
Before:
[ 2.258919] nvme nvme1: pci function 0000:06:00.0
[ 2.264898] nvme nvme2: pci function 0000:05:00.0
[ 2.323235] nvme nvme1: failed to set APST feature (2)
[ 2.326153] nvme nvme2: failed to set APST feature (2)
[ 2.333935] nvme nvme1: allocated 64 MiB host memory buffer.
[ 2.336492] nvme nvme2: allocated 64 MiB host memory buffer.
[ 2.339611] nvme nvme1: 7/0/0 default/read/poll queues
[ 2.341805] nvme nvme2: 7/0/0 default/read/poll queues
[ 2.346114] nvme1n1: p1
[ 2.347197] nvme nvme2: globally duplicate IDs for nsid 1
After:
[ 2.427715] nvme nvme1: pci function 0000:06:00.0
[ 2.427771] nvme nvme2: pci function 0000:05:00.0
[ 2.488154] nvme nvme2: failed to set APST feature (2)
[ 2.489895] nvme nvme1: failed to set APST feature (2)
[ 2.498773] nvme nvme2: allocated 64 MiB host memory buffer.
[ 2.500587] nvme nvme1: allocated 64 MiB host memory buffer.
[ 2.504113] nvme nvme2: 7/0/0 default/read/poll queues
[ 2.507026] nvme nvme1: 7/0/0 default/read/poll queues
[ 2.509467] nvme nvme2: Ignoring bogus Namespace Identifiers
[ 2.512804] nvme nvme1: Ignoring bogus Namespace Identifiers
[ 2.513698] nvme1n1: p1
Signed-off-by: Pablo Greco <pgreco@centosproject.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.
Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.
This saves an extra queue freeze for devices without a separately
allocated queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This device shares the PCI ID with the Samsung 970 Evo Plus that
does not need or want the quirks. Move the the quirk entry to the
core table based on the model number instead.
Fixes: bc360b0b1611 ("nvme-pci: add quirks for Samsung X5 SSDs")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
|
|
The Micron MTFDKBA2T0TFH device reports the same subsysem NQN for
all devices. Add a quick to ignore it.
Signed-off-by: Leo Savernik <l.savernik@aon.at>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Like commit 5611ec2b9814 ("nvme-pci: prevent SK hynix PC400 from using
Write Zeroes command"), UMIS and Samsung has the same issue:
[ 6305.633887] blk_update_request: operation not supported error,
dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0
phys_seg 0 prio class 0
So also disable Write Zeroes command on UMIS and Samsung.
Signed-off-by: rasheed.hsueh <rasheed.hsueh@lcfc.corp-partner.google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When ZHITAI TiPro7000 SSDs entered deepest power state(ps4)
it has the same APST sleep problem as Kingston A2000.
by chance the system crashes and displays the same dmesg info:
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65
As the Archlinux wiki suggest (enlat + exlat) < 25000 is fine
and my testing shows no system crashes ever since.
Therefore disabling the deepest power state will fix the APST sleep issue.
https://wiki.archlinux.org/title/Solid_state_drive/NVMe
This is the APST data from 'nvme id-ctrl /dev/nvme1'
NVME Identify Controller:
vid : 0x1e49
ssvid : 0x1e49
sn : [...]
mn : ZHITAI TiPro7000 1TB
fr : ZTA32F3Y
[...]
ps 0 : mp:3.50W operational enlat:5 exlat:5 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:3.30W operational enlat:50 exlat:100 rrt:1 rrl:1
rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:2.80W operational enlat:50 exlat:200 rrt:2 rrl:2
rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.1500W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0200W non-operational enlat:2000 exlat:60000 rrt:4 rrl:4
rwt:4 rwl:4 idle_power:- active_power:-
Signed-off-by: Ning Wang <ningwang35@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216096
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
ADATA XPG GAMMIX S50 drives report bogus eui64 values that appear to
be the same across drives in one system. Quirk them out so they are
not marked as "non globally unique" duplicates.
Signed-off-by: Stefan Reiter <stefan@pimaker.at>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Many users have encountered IO timeouts with a CSTS value of 0xffffffff,
which indicates a failure to read the register. While there are various
potential causes for this observation, faulty NVMe APST has been the
culprit quite frequently. Add the recommended troubleshooting steps in
the error output when this condition occurs.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The recent global id check is finding poorly implemented devices in the
wild. Include relavant device information in the output to help quicken
an appropriate quirk patch.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Pull more block driver updates from Jens Axboe:
"A collection of stragglers that were late on sending in their changes
and just followup fixes.
- NVMe fixes pull request via Christoph:
- set controller enable bit in a separate write (Niklas Cassel)
- disable namespace identifiers for the MAXIO MAP1001 (Christoph)
- fix a comment typo (Julia Lawall)"
- MD fixes pull request via Song:
- Remove uses of bdevname (Christoph Hellwig)
- Bug fixes (Guoqing Jiang, and Xiao Ni)
- bcache fixes series (Coly)
- null_blk zoned write fix (Damien)
- nbd fixes (Yu, Zhang)
- Fix for loop partition scanning (Christoph)"
* tag 'for-5.19/drivers-2022-06-02' of git://git.kernel.dk/linux-block: (23 commits)
block: null_blk: Fix null_zone_write()
nvmet: fix typo in comment
nvme: set controller enable bit in a separate write
nvme-pci: disable namespace identifiers for the MAXIO MAP1001
bcache: avoid unnecessary soft lockup in kworker update_writeback_rate()
nbd: use pr_err to output error message
nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
nbd: fix io hung while disconnecting device
nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
nbd: fix race between nbd_alloc_config() and module removal
nbd: call genl_unregister_family() first in nbd_cleanup()
md: bcache: check the return value of kzalloc() in detached_dev_do_request()
bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()
block, loop: support partitions without scanning
bcache: avoid journal no-space deadlock by reserving 1 journal bucket
bcache: remove incremental dirty sector counting for bch_sectors_dirty_init()
bcache: improve multithreaded bch_sectors_dirty_init()
bcache: improve multithreaded bch_btree_check()
md: fix double free of io_acct_set bioset
md: Don't set mddev private to NULL in raid0 pers->free
...
|
|
The MAXIO MAP1001 controllers reports completely bogus Namespace
identifiers that even change after suspend cycles. Disable using
the Identifiers entirely.
Reported-by: Arman Hajishafieha <arman.hajishafieha@hotmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Arman Hajishafieha <arman.hajishafieha@hotmail.com>
|
|
Let the caller set it together with the end_io_data instead of passing
a pointless argument. Note the the target code did in fact already
set it and then just overrode it again by calling blk_execute_rq_nowait.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
On our ZynqMP system we observe, that a NVMe drive that resets itself
while doing a firmware update causes a Kernel crash like this:
[ 67.720772] pcieport 0000:02:02.0: pciehp: Slot(2): Link Down
[ 67.720783] pcieport 0000:02:02.0: pciehp: Slot(2): Card not present
[ 67.720795] nvme 0000:04:00.0: PME# disabled
[ 67.720849] Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP
[ 67.720853] nwl-pcie fd0e0000.pcie: Slave error
Analysis: When nvme_dev_disable() is called because of this PCIe hotplug
event, pci_is_enabled() is still true. And accessing the NVMe drive
which is currently not available as it's in reboot process causes this
"synchronous external abort" on this ARM64 platform.
This patch adds the pci_device_is_present() check as well, which returns
false in this "Card not present" hot-plug case. With this change, the
NVMe driver does not try to access the NVMe registers any more and the
FW update finishes without any problems.
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
In nvme_alloc_admin_tags, the admin_q can be set to an error (typically
-ENOMEM) if the blk_mq_init_queue call fails to set up the queue, which
is checked immediately after the call. However, when we return the error
message up the stack, to nvme_reset_work the error takes us to
nvme_remove_dead_ctrl()
nvme_dev_disable()
nvme_suspend_queue(&dev->queues[0]).
Here, we only check that the admin_q is non-NULL, rather than not
an error or NULL, and begin quiescing a queue that never existed, leading
to bad / NULL pointer dereference.
Signed-off-by: Kyle Smith <kyles@hpe.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Most of the internal passthru commands use __nvme_submit_sync_cmd()
interface. There are few places we open code the request submission :-
1. nvme_keep_alive_work(struct work_struct *work)
2. nvme_timeout(struct request *req, bool reserved)
3. nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
Mark the internal passthru request quiet so that we can skip the verbose
error message from nvme_log_error() in nvme_end_req() completion path,
this will be consistent with what we have in __nvme_submit_sync_cmd().
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Alan Adamson <alan.adamson@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Qemu unconditionally reports a UUID, which depending on the qemu version
is either all-null (which is incorrect but harmless) or contains a single
bit set for all controllers. In addition it can also optionally report
a eui64 which needs to be manually set. Disable namespace identifiers
for Qemu controlles entirely even if in some cases they could be set
correctly through manual intervention.
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
The MAXIO MAP1002/1202 controllers reports completely bogus Namespace
identifiers that even change after suspend cycles. Disable using
the Identifiers entirely.
Reported-by: 金韬 <me@kingtous.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: 金韬 <me@kingtous.cn>
|
|
Pull block driver fixes from Jens Axboe:
"Followup block driver updates and fixes for the 5.18-rc1 merge window.
In detail:
- NVMe pull request
- Fix multipath hang when disk goes live over reconnect (Anton
Eidelman)
- fix RCU hole that allowed for endless looping in multipath
round robin (Chris Leech)
- remove redundant assignment after left shift (Colin Ian King)
- add quirks for Samsung X5 SSDs (Monish Kumar R)
- fix the read-only state for zoned namespaces with unsupposed
features (Pankaj Raghav)
- use a private workqueue instead of the system workqueue in
nvmet (Sagi Grimberg)
- allow duplicate NSIDs for private namespaces (Sungup Moon)
- expose use_threaded_interrupts read-only in sysfs (Xin Hao)"
- nbd minor allocation fix (Zhang)
- drbd fixes and maintainer addition (Lars, Jakob, Christoph)
- n64cart build fix (Jackie)
- loop compat ioctl fix (Carlos)
- misc fixes (Colin, Dongli)"
* tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block:
drbd: remove check of list iterator against head past the loop body
drbd: remove usage of list iterator variable after loop
nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
MAINTAINERS: add drbd co-maintainer
drbd: fix potential silent data corruption
loop: fix ioctl calls using compat_loop_info
nvme-multipath: fix hang when disk goes live over reconnect
nvme: fix RCU hole that allowed for endless looping in multipath round robin
nvme: allow duplicate NSIDs for private namespaces
nvmet: remove redundant assignment after left shift
nvmet: use a private workqueue instead of the system workqueue
nvme-pci: add quirks for Samsung X5 SSDs
nvme-pci: expose use_threaded_interrupts read-only in sysfs
nvme: fix the read-only state for zoned namespaces with unsupposed features
n64cart: convert bi_disk to bi_bdev->bd_disk fix build
xen/blkfront: fix comment for need_copy
xen-blkback: remove redundant assignment to variable i
|
|
Add quirks to not fail the initialization and to have quick resume
latency after cold/warm reboot.
Signed-off-by: Monish Kumar R <monish.kumar.r@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Allow reading /sys/module/nvme/parameters/use_threaded_interrupts to see
if the use_threaded_interrupts module parameter is in use.
Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
|
|
Just open code the allocation + initialization in the callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
|
Move the check for the actual pgmap types that need the free at refcount
one behavior into the out of line helper, and thus avoid the need to
pull memremap.h into mm.h.
Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
|
The Intel P4500/P4600 SSDs do not report a subsystem NQN despite claiming
compliance to a standards version where reporting one is required.
Add the IGNORE_DEV_SUBNQN quirk to not fail the initialization of a
second such SSDs in a system.
Signed-off-by: Zheng Wu <wu.zheng@intel.com>
Signed-off-by: Ye Jinhe <jinhe.ye@intel.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
If command prep fails, current handling will orphan subsequent requests
in the list. Consider a simple example:
rqlist = [ 1 -> 2 ]
When prep for request '1' fails, it will be appended to the
'requeue_list', leaving request '2' disconnected from the original
rqlist and no longer tracked. Meanwhile, rqlist is still pointing to the
failed request '1' and will attempt to submit the unprepped command.
Fix this by updating the rqlist accordingly using the request list
helper functions.
Fixes: d62cbcf62f2f ("nvme: add support for mq_ops->queue_rqs()")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220105170518.3181469-5-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This enables the block layer to send us a full plug list of requests
that need submitting. The block layer guarantees that they all belong
to the same queue, but we do have to check the hardware queue mapping
for each request.
If errors are encountered, leave them in the passed in list. Then the
block layer will handle them individually.
This is good for about a 4% improvement in peak performance, taking us
from 9.6M to 10M IOPS/core.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a nvme_prep_rq() helper to setup a command, and nvme_queue_rq() is
adapted to use this helper.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We'll need it for batched submit as well. Since we now have a copy
helper, get rid of the nvme_submit_cmd() wrapper.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull block driver updates from Jens Axboe:
- paride driver cleanups (Christoph)
- Remove cryptoloop support (Christoph)
- null_blk poll support (me)
- Now that add_disk() supports proper error handling, add it to various
drivers (Luis)
- Make ataflop actually work again (Michael)
- s390 dasd fixes (Stefan, Heiko)
- nbd fixes (Yu, Ye)
- Remove redundant wq flush in mtip32xx (Christophe)
- NVMe updates
- fix a multipath partition scanning deadlock (Hannes Reinecke)
- generate uevent once a multipath namespace is operational again
(Hannes Reinecke)
- support unique discovery controller NQNs (Hannes Reinecke)
- fix use-after-free when a port is removed (Israel Rukshin)
- clear shadow doorbell memory on resets (Keith Busch)
- use struct_size (Len Baker)
- add error handling support for add_disk (Luis Chamberlain)
- limit the maximal queue size for RDMA controllers (Max Gurtovoy)
- use a few more symbolic names (Max Gurtovoy)
- fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy)
- add support for ->map_queues on FC (Saurav Kashyap)
- support the current discovery subsystem entry (Hannes Reinecke)
- use flex_array_size and struct_size (Len Baker)
- bcache fixes (Christoph, Coly, Chao, Lin, Qing)
- MD updates (Christoph, Guoqing, Xiao)
- Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye)
* tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits)
null_blk: Fix handling of submit_queues and poll_queues attributes
block: ataflop: Fix warning comparing pointer to 0
bcache: replace snprintf in show functions with sysfs_emit
bcache: move uapi header bcache.h to bcache code directory
nvmet: use flex_array_size and struct_size
nvmet: register discovery subsystem as 'current'
nvmet: switch check for subsystem type
nvme: add new discovery log page entry definitions
block: ataflop: more blk-mq refactoring fixes
block: remove support for cryptoloop and the xor transfer
mtd: add add_disk() error handling
rnbd: add error handling support for add_disk()
um/drivers/ubd_kern: add error handling support for add_disk()
m68k/emu/nfblock: add error handling support for add_disk()
xen-blkfront: add error handling support for add_disk()
bcache: add error handling support for add_disk()
dm: add add_disk() error handling
block: aoe: fixup coccinelle warnings
nvmet: use struct_size over open coded arithmetic
nvme: drop scan_lock and always kick requeue list when removing namespaces
...
|
|
The host memory doorbell and event buffers need to be initialized on
each reset so the driver doesn't observe stale values from the previous
instantiation.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Tested-by: John Levon <john.levon@nutanix.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Apply the added two APIs to quiesce/unquiesce admin queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014081710.1871747-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Trivial to do now, just need our own io_comp_batch on the stack and pass
that in to the usual command completion handling.
I pondered making this dependent on how many entries we had to process,
but even for a single entry there's no discernable difference in
performance or latency. Running a sync workload over io_uring:
t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
yields the below performance before the patch:
IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
and the following after:
IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
which definitely isn't slower, about the same if you factor in a bit of
variance. For peak performance workloads, benchmarking shows a 2%
improvement.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Take advantage of struct io_comp_batch, if passed in to the nvme poll
handler. If it's set, rather than complete each request individually
inline, store them in the io_comp_batch list. We only do so for requests
that will complete successfully, anything else will be completed inline as
before.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.
For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Split the integrity/metadata handling definitions out into a new header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.15:
- fix the abort command id (Keith Busch)
- nvme: fix per-namespace chardev deletion (Adam Manzanares)"
* tag 'nvme-5.15-2021-10-14' of git://git.infradead.org/nvme:
nvme: fix per-namespace chardev deletion
nvme-pci: Fix abort command id
|
|
The request tag is no longer the only component of the command id.
Fixes: e7006de6c2380 ("nvme: code command_id with a genctr for use-after-free validation")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Some apple controllers use the command id as an index to implementation
specific data structures and will fail if the value is out of bounds.
The nvme driver's recently introduced command sequence number breaks
this controller.
Provide a quirk so these spec incompliant controllers can function as
before. The driver will not have the ability to detect bad completions
when this quirk is used, but we weren't previously checking this anyway.
The quirk bit was selected so that it can readily apply to stable.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214509
Cc: Sven Peter <sven@svenpeter.dev>
Reported-by: Orlando Chamberlain <redecorating@protonmail.com>
Reported-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20210927154306.387437-1-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The NVMe host memory buffer may consume a non-negligable amount of
memory. Controllers are required to function without the host memory
buffer enabled, but with possibly degraded performance. Export a sysfs
property to toggle this feature on a per-device granularity so users may
choose to reclaim memory at the expense of storage performance.
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|