summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2018-05-03bcache: store disk name in struct cache and struct cached_devColy Li
Current code uses bdevname() or bio_devname() to reference gendisk disk name when bcache needs to display the disk names in kernel message. It was safe before bcache device failure handling patch set merged in, because when devices are failed, there was deadlock to prevent bcache printing error messages with gendisk disk name. But after the failure handling patch set merged, the deadlock is fixed, so it is possible that the gendisk structure bdev->hd_disk is released when bdevname() is called to reference bdev->bd_disk->disk_name[]. This is why I receive bug report of NULL pointers deference panic. This patch stores gendisk disk name in a buffer inside struct cache and struct cached_dev, then print out the offline device name won't reference bdev->hd_disk anymore. And this patch also avoids extra function calls of bdevname() and bio_devnmae(). Changelog: v3, add Reviewed-by from Hannes. v2, call bdevname() earlier in register_bdev() v1, first version with segguestion from Junhui Tang. Fixes: c7b7bd07404c5 ("bcache: add io_disable to struct cached_dev") Fixes: 5138ac6748e38 ("bcache: fix misleading error message in bch_count_io_errors()") Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-01md: fix two problems with setting the "re-add" device state.NeilBrown
If "re-add" is written to the "state" file for a device which is faulty, this has an effect similar to removing and re-adding the device. It should take up the same slot in the array that it previously had, and an accelerated (e.g. bitmap-based) rebuild should happen. The slot that "it previously had" is determined by rdev->saved_raid_disk. However this is not set when a device fails (only when a device is added), and it is cleared when resync completes. This means that "re-add" will normally work once, but may not work a second time. This patch includes two fixes. 1/ when a device fails, record the ->raid_disk value in ->saved_raid_disk before clearing ->raid_disk 2/ when "re-add" is written to a device for which ->saved_raid_disk is not set, fail. I think this is suitable for stable as it can cause re-adding a device to be forced to do a full resync which takes a lot longer and so puts data at more risk. Cc: <stable@vger.kernel.org> (v4.1) Fixes: 97f6cd39da22 ("md-cluster: re-add capabilities") Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-05-01raid10: check bio in r10buf_pool_free to void NULL pointer dereferenceGuoqing Jiang
For recovery case, r10buf_pool_alloc only allocates 2 bios, so we can't access more than 2 bios in r10buf_pool_free. Otherwise, we can see NULL pointer dereference as follows: [ 98.347009] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 98.355783] IP: r10buf_pool_free+0x38/0xe0 [raid10] [...] [ 98.543734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 98.550161] CR2: 0000000000000050 CR3: 000000089500a001 CR4: 00000000001606f0 [ 98.558145] Call Trace: [ 98.560881] <IRQ> [ 98.563136] put_buf+0x19/0x20 [raid10] [ 98.567426] end_sync_request+0x6b/0x70 [raid10] [ 98.572591] end_sync_write+0x9b/0x160 [raid10] [ 98.577662] blk_update_request+0x78/0x2c0 [ 98.582254] scsi_end_request+0x2c/0x1e0 [scsi_mod] [ 98.587719] scsi_io_completion+0x22f/0x610 [scsi_mod] [ 98.593472] blk_done_softirq+0x8e/0xc0 [ 98.597767] __do_softirq+0xde/0x2b3 [ 98.601770] irq_exit+0xae/0xb0 [ 98.605285] do_IRQ+0x81/0xd0 [ 98.608606] common_interrupt+0x7d/0x7d [ 98.612898] </IRQ> So we need to check the bio is valid or not before the bio is used in r10buf_pool_free. Another workable way is to free 2 bios for recovery case just like r10buf_pool_alloc. Fixes: f0250618361d ("md: raid10: don't use bio's vec table to manage resync pages") Reported-by: Alexis Castilla <pencerval@gmail.com> Tested-by: Alexis Castilla <pencerval@gmail.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-05-01md: fix an error code format and remove unsed bio_sectorYufen Yu
Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-30dm: fix some sparse warnings and whitespace in dax methodsMike Snitzer
Eliminate these sparse warnings: drivers/md/dm.c:1062:9: warning: context imbalance in 'dm_dax_direct_access' - unexpected unlock drivers/md/dm.c:1086:9: warning: context imbalance in 'dm_dax_copy_from_iter' - unexpected unlock Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-30dm cache background tracker: fix sparse warningMike Snitzer
Fix drivers/md/dm-cache-background-tracker.c:169:16: warning: symbol 'alloc_work' was not declared. Should it be static? Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-30dm bufio: fix buffer alignmentMikulas Patocka
Commit 6b5e718cc138 ("dm bufio: relax alignment constraint on slab cache") relaxed alignment on dm-bufio cache, however it may break dm-crypt or dm-integrity. dm-crypt and dm-integrity require that the size of bio vector entries (bv_len) is aligned on its sector size. bv_offset doesn't have to be aligned, but bv_len must be. XFS sends unaligned bios, but they do not cross page boundary, so the requirement for aligned bv_len is met. Commit 6b5e718cc138 made dm-bufio send unaligned bios that cross page boundary, this could break dm-crypt and dm-integrity. Reinstates the alignment. Note that misaligned entries only happen when we use slab/slub debugging. Without debugging, the entries are always aligned. Fixes: 6b5e718cc138 ("dm bufio: relax alignment constraint on slab cache") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-30dm integrity: use kvfree for kvmalloc'd memoryMikulas Patocka
Use kvfree instead of kfree because the array is allocated with kvmalloc. Fixes: 7eada909bfd7a ("dm: add integrity target") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-24dm/verity_fec: Use GFP aware reed solomon initThomas Gleixner
Allocations from the rs_pool can invoke init_rs() from the mempool allocation callback. This is problematic in fec_alloc_bufs() which invokes mempool_alloc() with GFP_NOIO to prevent a swap deadlock because init_rs() uses GFP_KERNEL allocations. Switch it to init_rs_gfp() and invoke it with the gfp_t flags which are handed in from the allocator. Note: This is not a problem today because the rs control struct is shared between the instances and its created when the mempool is initialized. But the upcoming changes which switch to a rs_control struct per instance to embed decoder buffers will trigger the swap vs. GFP_KERNEL issue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Neil Brown <neilb@suse.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-20Merge tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds
Pull MD fixes from Shaohua Li: "Three small fixes for MD: - md-cluster fix for faulty device from Guoqing - writehint fix for writebehind IO for raid1 from Mariusz - a live lock fix for interrupted recovery from Yufen" * tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid1: copy write hint from master bio to behind bio md/raid1: exit sync request if MD_RECOVERY_INTR is set md-cluster: don't update recovery_offset for faulty device
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-09Merge branch 'for-4.17/dax' into libnvdimm-for-nextDan Williams
2018-04-09raid1: copy write hint from master bio to behind bioMariusz Dabrowski
Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-09md/raid1: exit sync request if MD_RECOVERY_INTR is setYufen Yu
We met a sync thread stuck as follows: raid1_sync_request+0x2c9/0xb50 md_do_sync+0x983/0xfa0 md_thread+0x11c/0x160 kthread+0x111/0x130 ret_from_fork+0x35/0x40 0xffffffffffffffff At the same time, there is a stuck mdadm thread (mdadm --manage /dev/md2 --add /dev/sda). It is trying to stop the sync thread: kthread_stop+0x42/0xf0 md_unregister_thread+0x3a/0x70 md_reap_sync_thread+0x15/0x160 action_store+0x142/0x2a0 md_attr_store+0x6c/0xb0 kernfs_fop_write+0x102/0x180 __vfs_write+0x33/0x170 vfs_write+0xad/0x1a0 SyS_write+0x52/0xc0 do_syscall_64+0x6e/0x190 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Debug tools show that the sync thread is waiting in raise_barrier(), until raid1d() end all normal IO bios into bio_end_io_list(introduced in commit 55ce74d4bfe1). But, raid1d() cannot end these bios if MD_CHANGE_PENDING bit is set. It needs to get mddev->reconfig_mutex lock and then clear the bit in md_check_recovery(). However, the lock is holding by mdadm in action_store(). Thus, there is a loop: mdadm waiting for sync thread to stop, sync thread waiting for raid1d() to end bios, raid1d() waiting for mdadm to release mddev->reconfig_mutex lock and then it can end bios. Fix this by checking MD_RECOVERY_INTR while waiting in raise_barrier(), so that sync thread can exit while mdadm is stoping the sync thread. Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-09md-cluster: don't update recovery_offset for faulty deviceGuoqing Jiang
Device could become faulty when clustered array handling METADATA_UPDATED msg, so we don't need to call read_rdev for this device. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2018-04-06Merge tag 'for-4.17/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - DM core passthrough ioctl fix to retain reference to DM table, and that table's block devices, while issuing the ioctl to one of those block devices. - DM core passthrough ioctl fix to _not_ override the fmode_t used to issue the ioctl. Overriding by using the fmode_t that the block device was originally open with during DM table load is a liability. - Add DM core support for secure erase forwarding and update the DM linear and DM striped targets to support them. - A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write same, write zeroes) for targets that make use of the non-splitting IO variant (as is done for multipath or thinp when layered directly on NVMe). - Allow DM targets to return a payload in response to a DM message that they are sent. This is useful for DM targets that would like to provide statistics data in response to DM messages. - Update DM bufio to support non-power-of-2 block sizes. Numerous other related changes prepare the DM bufio code for this support. - Fix DM crypt to use a bounded amount of memory across the entire system. This is to avoid OOM that can otherwise occur in response to certain pathological IO workloads (e.g. discarding a large DM crypt device). - Add a 'check_at_most_once' feature to the DM verity target to allow verity to be used on mobile devices that have very limited resources. - Fix the DM integrity target to fail early if a keyed algorithm (e.g. HMAC) is to be used but the key isn't set. - Add non-power-of-2 support to the DM unstripe target. - Eliminate the use of a Variable Length Array in the DM stripe target. - Update the DM log-writes target to record metadata (REQ_META flag). - DM raid fixes for its nosync status and some variable range issues. * tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits) dm: remove fmode_t argument from .prepare_ioctl hook dm: hold DM table for duration of ioctl rather than use blkdev_get dm raid: fix parse_raid_params() variable range issue dm verity: make verity_for_io_block static dm verity: add 'check_at_most_once' option to only validate hashes once dm bufio: don't embed a bio in the dm_buffer structure dm bufio: support non-power-of-two block sizes dm bufio: use slab cache for dm_buffer structure allocations dm bufio: reorder fields in dm_buffer structure dm bufio: relax alignment constraint on slab cache dm bufio: remove code that merges slab caches dm bufio: get rid of slab cache name allocations dm bufio: move dm-bufio.h to include/linux/ dm bufio: delete outdated comment dm: add support for secure erase forwarding dm: backfill abnormal IO support to non-splitting IO submission dm raid: fix nosync status dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios() dm stripe: get rid of a Variable Length Array (VLA) dm log writes: record metadata flag for better flags record ...
2018-04-05Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
2018-04-04dm: remove fmode_t argument from .prepare_ioctl hookMike Snitzer
Use the fmode_t that is passed to dm_blk_ioctl() rather than inconsistently (varies across targets) drop it on the floor by overriding it with the fmode_t stored in 'struct dm_dev'. All the persistent reservation functions weren't using the fmode_t they got back from .prepare_ioctl so remove them. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04dm: hold DM table for duration of ioctl rather than use blkdev_getMike Snitzer
Commit 519049afead ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") inadvertantly introduced a regression relative to users of device cgroups that issue ioctls (e.g. libvirt). Using blkdev_get() in DM's passthrough ioctl support implicitly introduced a cgroup permissions check that would fail unless care were taken to add all devices in the IO stack to the device cgroup. E.g. rather than just adding the top-level DM multipath device to the cgroup all the underlying devices would need to be allowed. Fix this, to no longer require allowing all underlying devices, by simply holding the live DM table (which includes the table's original blkdev_get() reference on the blockdevice that the ioctl will be issued to) for the duration of the ioctl. Also, bump the DM ioctl version so a user can know that their device cgroup allow workaround is no longer needed. Reported-by: Michal Privoznik <mprivozn@redhat.com> Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 519049afead ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") Cc: stable@vger.kernel.org # 4.16 Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04dm raid: fix parse_raid_params() variable range issueHeinz Mauelshagen
parse_raid_params() compares variable "int value" with INT_MAX. E.g. related Coverity report excerpt: CID 1364818 (#2 of 3): Operands don't affect result (CONSTANT_EXPRESSION_RESULT) [select issue] 1433 if (value > INT_MAX) { Fix by changing checks to avoid INT_MAX. Whilst on it, avoid unnecessary checks against constants and add check for sane recovery speed min/max. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-04dm verity: make verity_for_io_block staticweiyongjun (A)
Fixes the following sparse warning: drivers/md/dm-verity-target.c:375:6: warning: symbol 'verity_for_io_block' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm verity: add 'check_at_most_once' option to only validate hashes oncePatrik Torstensson
This allows platforms that are CPU/memory contrained to verify data blocks only the first time they are read from the data device, rather than every time. As such, it provides a reduced level of security because only offline tampering of the data device's content will be detected, not online tampering. Hash blocks are still verified each time they are read from the hash device, since verification of hash blocks is less performance critical than data blocks, and a hash block will not be verified any more after all the data blocks it covers have been verified anyway. This option introduces a bitset that is used to check if a block has been validated before or not. A block can be validated more than once as there is no thread protection for the bitset. These changes were developed and tested on entry-level Android Go devices. Signed-off-by: Patrik Torstensson <totte@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: don't embed a bio in the dm_buffer structureMikulas Patocka
The bio structure consumes a substantial part of dm_buffer. The bio structure is only needed when doing I/O on the buffer, thus we don't have to embed it in the buffer. Allocate the bio structure only when doing I/O. We don't need to create a bio_set because, in case of allocation failure, dm-bufio falls back to using dm-io (which keeps its own bio_set). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: support non-power-of-two block sizesMikulas Patocka
Support block sizes that are not a power-of-two (but they must be a multiple of 512b). As always, a slab cache is used for allocations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: use slab cache for dm_buffer structure allocationsMikulas Patocka
kmalloc padded to the next power of two, using a slab cache avoids this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: reorder fields in dm_buffer structureMikulas Patocka
Reorder fields in dm_buffer structure to improve packing and reduce structure size. The compiler allocates 32-bit integer for field 'enum data_mode', so change it to unsigned char. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: relax alignment constraint on slab cacheMikulas Patocka
The I/O buffer doesn't have to be aligned on block size granularity, relax alignment to ARCH_KMALLOC_MINALIGN (required to allow DMA from slab cache memory on some architectures). Also, set SLAB_RECLAIM_ACCOUNT so that the memory allocated from the cache is accounted as reclaimable and doesn't inflate the 'used' entry in the free command. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: remove code that merges slab cachesMikulas Patocka
All slab allocators can merge duplicate caches. So dm-bufio doesn't need extra slab merging logic. Instead it can just allocate one slab cache per client and let the allocator merge them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: get rid of slab cache name allocationsMikulas Patocka
dm-bufio keeps the dm_bufio_cache_names array that holds names of the slab caches. Since the commit db265eca7700 ("mm/sl[aou]b: Move duping of slab name to slab_common.c"), the kernel automatically duplicates the slab cache name when creating the slab cache, so we no longer have to keep the name allocated. Remove the code that allocates the slab names and keeps them around. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: move dm-bufio.h to include/linux/Mikulas Patocka
Move dm-bufio.h to include/linux/ so that external GPL'd DM target modules can use it. It is better to allow the use of dm-bufio than force external modules to implement the equivalent buffered IO mechanism in some new way. The hope is this will encourage the use of dm-bufio; which will then make it easier for a GPL'd external DM target module to be included upstream. A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use EXPORT_SYMBOL_GPL. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm bufio: delete outdated commentMikulas Patocka
This comment was true when dm-bufio was written but, since 4.3, bios can now have arbitrary size and the driver splits them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm: add support for secure erase forwardingDenis Semakin
Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's data devices support secure erase. Also, add support for secure erase to both the linear and striped targets. Signed-off-by: Denis Semakin <d.semakin@omprussia.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm: backfill abnormal IO support to non-splitting IO submissionMike Snitzer
Otherwise, these abnormal IOs would be sent to the DM target regardless of whether the target advertised support for them. Factor out __process_abnormal_io() from __split_and_process_non_flush() so that discards, write same, etc may be conditionally processed. Fixes: 978e51ba3 ("dm: optimize bio-based NVMe IO submission") Cc: stable@vger.kernel.org # 4.16 Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm raid: fix nosync statusHeinz Mauelshagen
Fix a race for "nosync" activations providing "aa.." device health characters and "0/N" sync ratio rather than "AA..." and "N/N". Occurs when status for the raid set is retrieved during resume before the MD sync thread starts and clears the MD_RECOVERY_NEEDED flag. Cc: stable@vger.kernel.org # 4.16+ Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in ↵Wang Sheng-Hui
process_queued_bios() Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm stripe: get rid of a Variable Length Array (VLA)Tycho Andersen
Ideally, we'd like to get rid of all VLAs in the kernel and add -Wvla to the build args: https://lkml.org/lkml/2018/3/7/621 This one is a simple case, since we don't actually need the VLA at all: we can just iterate over the stripes twice, once to emit their names, and the second time to emit status (i.e. trade memory for time). Since the number of stripes is probably low, this is hopefully not that expensive. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm log writes: record metadata flag for better flags recordQu Wenruo
So developer could distinguish data and metadata bios easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm integrity: fail early if required HMAC key is not availableMilan Broz
Since crypto API commit 9fa68f62004 ("crypto: hash - prevent using keyed hashes without setting key") dm-integrity cannot use keyed algorithms without the key being set. The dm-integrity recognizes this too late (during use of HMAC), so it allows creation and formatting of superblock, but the device is in fact unusable. Fix it by detecting the key requirement in integrity table constructor. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm: remove unused macro DM_MOD_NAME_SIZEWang Sheng-Hui
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm unstripe: remove unnecessary header includesHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm unstripe: remove superfluous module init error path messageHeinz Mauelshagen
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Reviewed-by: Scott Bauer <Scott.Bauer@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm unstripe: add "dm-unstriped" module aliasHeinz Mauelshagen
This target's kernel module being named dm-unstripe.ko doesn't allow lvm2's DM module autoload capability to load the dm-unstripe.ko because lvm2 looks for dm-unstriped.ko due to the target name being "unstriped". Add the "dm-unstriped" module alias to resolve this oversight. NOTE: this isn't needed for the "striped" target, despite its source file being named dm-stripe.c, because it is part of dm-mod.ko. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm unstripe: support non-power-of-2 chunk sizeHeinz Mauelshagen
Address "FIXME: must support non power of 2 chunk_size, dm-stripe.c does". Bump target version to indicate change. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Tested-by: Scott Bauer <Scott.Bauer@intel.com> Reviewed-by: Scott Bauer <Scott.Bauer@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm crypt: limit the number of allocated pagesMikulas Patocka
dm-crypt consumes an excessive amount memory when the user attempts to zero a dm-crypt device with "blkdiscard -z". The command "blkdiscard -z" calls the BLKZEROOUT ioctl, it goes to the function __blkdev_issue_zeroout, __blkdev_issue_zeroout sends a large amount of write bios that contain the zero page as their payload. For each incoming page, dm-crypt allocates another page that holds the encrypted data, so when processing "blkdiscard -z", dm-crypt tries to allocate the amount of memory that is equal to the size of the device. This can trigger OOM killer or cause system crash. Fix this by limiting the amount of memory that dm-crypt allocates to 2% of total system memory. This limit is system-wide and is divided by the number of active dm-crypt devices and each device receives an equal share. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03dm: allow targets to return output from messages they are sentMike Snitzer
Could be useful for a target to return stats or other information. If a target does DMEMIT() anything to @result from its .message method then it must return 1 to the caller. Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-03dax, dm: allow device-mapper to operate without dax supportDan Williams
Change device-mapper's DAX dependency to require the presence of at least one DAX_DRIVER. This allows device-mapper to be built without bringing the DAX core along which is especially wasteful when there are no DAX drivers, like BLK_DEV_PMEM, configured. Cc: Alasdair Kergon <agk@redhat.com> Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-29dm: fix dropped return code from dm_get_bdev_for_ioctlMike Snitzer
dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it wasn't). Unfortunately commit 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") reused the variable 'r' to store the return from blkdev_get() that follows prepare_ioctl() -- whereby dropping prepare_ioctl()'s result on the floor. This can lead to an ioctl or persistent reservation being issued to a partition going unnoticed, which implies the extra permission check for CAP_SYS_RAWIO is skipped. Fix this by using a different variable to store blkdev_get()'s return. Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl") Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-29dm mpath: fix support for loading scsi_dh modules during table loadMike Snitzer
The ability to have multipath dynamically attach a scsi_dh, that the user specified in the multipath table, was broken by commit e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue"). Restore the ability to load, and attach, a particular scsi_dh module if one is specified (as noticed by checking m->hw_handler_name). Fixes: e8f74a0f00 ("dm mpath: eliminate need to use scsi_device_from_queue") Reported-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-03-18bcache: Fix a compiler warning in bcache_device_init()Bart Van Assche
Avoid that building with W=1 triggers the following compiler warning: drivers/md/bcache/super.c:776:20: warning: comparison is always false due to limited range of data type [-Wtype-limits] d->nr_stripes > SIZE_MAX / sizeof(atomic_t)) { ^ Reviewed-by: Coly Li <colyli@suse.de> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-18bcache: Reduce the number of sparse complaints about lock imbalancesBart Van Assche
Add more annotations for sparse to inform it about which functions do not have the same number of spin_lock() and spin_unlock() calls. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>