summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-10-14net: aquantia: when cleaning hw cache it should be toggledIgor Russkikh
>From HW specification to correctly reset HW caches (this is a required workaround when stopping the device), register bit should actually be toggled. It was previosly always just set. Due to the way driver stops HW this never actually caused any issues, but it still may, so cleaning this up. Fixes: 7a1bb49461b1 ("net: aquantia: fix potential IOMMU fault after driver unbind") Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: aquantia: temperature retrieval fixIgor Russkikh
Chip temperature is a two byte word, colocated internally with cable length data. We do all readouts from HW memory by dwords, thus we should clear extra high bytes, otherwise temperature output gets weird as soon as we attach a cable to the NIC. Fixes: 8f8940118654 ("net: aquantia: add infrastructure to readout chip temperature") Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more fixes from Andrew Morton: "The usual shower of hotfixes and some followups to the recently merged page_owner enhancements" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once mm/slab.c: fix kernel-doc warning for __ksize() xarray.h: fix kernel-doc warning bitmap.h: fix kernel-doc warning and typo fs/fs-writeback.c: fix kernel-doc warning fs/libfs.c: fix kernel-doc warning fs/direct-io.c: fix kernel-doc warning mm, compaction: fix wrong pfn handling in __reset_isolation_pfn() mm, hugetlb: allow hugepage allocations to reclaim as needed lib/test_meminit: add a kmem_cache_alloc_bulk() test mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations lib/generic-radix-tree.c: add kmemleak annotations mm/slub: fix a deadlock in show_slab_objects() mm, page_owner: rename flag indicating that page is allocated mm, page_owner: decouple freeing stack trace from debug_pagealloc mm, page_owner: fix off-by-one error in __set_page_owner_handle()
2019-10-14Merge branch 'PTP-driver-refactoring-for-SJA1105-DSA'David S. Miller
Vladimir Oltean says: ==================== PTP driver refactoring for SJA1105 DSA This series creates a better separation between the driver core and the PTP portion. Therefore, users who are not interested in PTP can get a simpler and smaller driver by compiling it out. This is in preparation for further patches: SPI transfer timestamping, synchronizing the hardware clock (as opposed to keeping it free-running), PPS input/output, etc. ==================== Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Change the PTP command access patternVladimir Oltean
The PTP command register contains enable bits for: - Putting the 64-bit PTPCLKVAL register in add/subtract or write mode - Taking timestamps off of the corrected vs free-running clock - Starting/stopping the TTEthernet scheduling - Starting/stopping PPS output - Resetting the switch When a command needs to be issued (e.g. "change the PTPCLKVAL from write mode to add/subtract mode"), one cannot simply write to the command register setting the PTPCLKADD bit to 1, because that would zeroize the other settings. One also cannot do a read-modify-write (that would be too easy for this hardware) because not all bits of the command register are readable over SPI. So this leaves us with the only option of keeping the value of the PTP command register in the driver, and operating on that. Actually there are 2 types of PTP operations now: - Operations that modify the cached PTP command. These operate on ptp_data->cmd as a pointer. - Operations that apply all previously cached PTP settings, but don't otherwise cache what they did themselves. The sja1105_ptp_reset function is such an example. It copies the ptp_data->cmd on stack before modifying and writing it to SPI. This practically means that struct sja1105_ptp_cmd is no longer an implementation detail, since it needs to be stored in full into struct sja1105_ptp_data, and hence in struct sja1105_private. So the (*ptp_cmd) function prototype can change and take struct sja1105_ptp_cmd as second argument now. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Move PTP data to its own private structureVladimir Oltean
This is a non-functional change with 2 goals (both for the case when CONFIG_NET_DSA_SJA1105_PTP is not enabled): - Reduce the size of the sja1105_private structure. - Make the PTP code more self-contained. Leaving priv->ptp_data.lock to be initialized in sja1105_main.c is not a leftover: it will be used in a future patch "net: dsa: sja1105: Restore PTP time after switch reset". Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Make all public PTP functions take dsa_switch as argumentVladimir Oltean
The new rule (as already started for sja1105_tas.h) is for functions of optional driver components (ones which may be disabled via Kconfig - PTP and TAS) to take struct dsa_switch *ds instead of struct sja1105_private *priv as first argument. This is so that forward-declarations of struct sja1105_private can be avoided. So make sja1105_ptp.h the second user of this rule. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14net: dsa: sja1105: Get rid of global declaration of struct ptp_clock_infoVladimir Oltean
We need priv->ptp_caps to hold a structure and not just a pointer, because we use container_of in the various PTP callbacks. Therefore, the sja1105_ptp_caps structure declared in the global memory of the driver serves no further purpose after copying it into priv->ptp_caps. So just populate priv->ptp_caps with the needed operations and remove sja1105_ptp_caps. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-15gpio: lynxpoint: set default handler to be handle_bad_irq()Andy Shevchenko
We switch the default handler to be handle_bad_irq() instead of handle_simple_irq() (which was not correct anyway). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15gpio: merrifield: Move hardware initialization to callbackAndy Shevchenko
The driver wants to initialize related registers before IRQ chip will be added. That's why move it to a corresponding callback. It also fixes the NULL pointer dereference. Fixes: 8f86a5b4ad67 ("gpio: merrifield: Pass irqchip when adding gpiochip") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15gpio: lynxpoint: Move hardware initialization to callbackAndy Shevchenko
The driver wants to initialize related registers before IRQ chip will be added. That's why move it to a corresponding callback. It also fixes the NULL pointer dereference. Fixes: 7b1e889436a1 ("gpio: lynxpoint: Pass irqchip when adding gpiochip") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15gpio: intel-mid: Move hardware initialization to callbackAndy Shevchenko
The driver wants to initialize related registers before IRQ chip will be added. That's why move it to a corresponding callback. It also fixes the NULL pointer dereference. Fixes: 8069e69a9792 ("gpio: intel-mid: Pass irqchip when adding gpiochip") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15gpiolib: Initialize the hardware with a callbackAndy Shevchenko
After changing the drivers to use GPIO core to add an IRQ chip it appears that some of them requires a hardware initialization before adding the IRQ chip. Add an optional callback ->init_hw() to allow that drivers to initialize hardware if needed. This change is a part of the fix NULL pointer dereference brought to the several drivers recently. Cc: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15gpio: merrifield: Restore use of irq_baseAndy Shevchenko
During conversion to internal IRQ chip initialization the commit 8f86a5b4ad67 ("gpio: merrifield: Pass irqchip when adding gpiochip") lost the irq_base assignment. drivers/gpio/gpio-merrifield.c: In function ‘mrfld_gpio_probe’: drivers/gpio/gpio-merrifield.c:405:17: warning: variable ‘irq_base’ set but not used [-Wunused-but-set-variable] Assign the girq->first to it. Fixes: 8f86a5b4ad67 ("gpio: merrifield: Pass irqchip when adding gpiochip") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-14xtensa: drop EXPORT_SYMBOL for outs*/ins*Max Filippov
Custom outs*/ins* implementations are long gone from the xtensa port, remove matching EXPORT_SYMBOLs. This fixes the following build warnings issued by modpost since commit 15bfc2348d54 ("modpost: check for static EXPORT_SYMBOL* functions"): WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL Cc: stable@vger.kernel.org Fixes: d38efc1f150f ("xtensa: adopt generic io routines") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2019-10-14mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped ↵Jane Chu
more than once Mmap /dev/dax more than once, then read the poison location using address from one of the mappings. The other mappings due to not having the page mapped in will cause SIGKILLs delivered to the process. SIGKILL succeeds over SIGBUS, so user process loses the opportunity to handle the UE. Although one may add MAP_POPULATE to mmap(2) to work around the issue, MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so isn't always an option. Details - ndctl inject-error --block=10 --count=1 namespace6.0 ./read_poison -x dax6.0 -o 5120 -m 2 mmaped address 0x7f5bb6600000 mmaped address 0x7f3cf3600000 doing local read at address 0x7f3cf3601400 Killed Console messages in instrumented kernel - mce: Uncorrected hardware memory error in user-access at edbe201400 Memory failure: tk->addr = 7f5bb6601000 Memory failure: address edbe201: call dev_pagemap_mapping_shift dev_pagemap_mapping_shift: page edbe201: no PUD Memory failure: tk->size_shift == 0 Memory failure: Unable to find user space address edbe201 in read_poison Memory failure: tk->addr = 7f3cf3601000 Memory failure: address edbe201: call dev_pagemap_mapping_shift Memory failure: tk->size_shift = 21 Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page => to deliver SIGKILL Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption => to deliver SIGBUS Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.com Signed-off-by: Jane Chu <jane.chu@oracle.com> Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm/slab.c: fix kernel-doc warning for __ksize()Randy Dunlap
Fix kernel-doc warning in mm/slab.c: mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize' Also add Return: documentation section for this function. Link: http://lkml.kernel.org/r/68c9fd7d-f09e-d376-e292-c7b2bdf1774d@infradead.org Fixes: 10d1f8cb3965 ("mm/slab: refactor common ksize KASAN logic into slab_common.c") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14xarray.h: fix kernel-doc warningRandy Dunlap
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>: include/linux/xarray.h:232: WARNING: Unexpected indentation. Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14bitmap.h: fix kernel-doc warning and typoRandy Dunlap
Fix kernel-doc warning in <linux/bitmap.h>: include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal' Also fix small typo (bitnaps). Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org Fixes: b9fa6442f704 ("cpumask: Implement cpumask_or_equal()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14fs/fs-writeback.c: fix kernel-doc warningRandy Dunlap
Fix kernel-doc warning in fs/fs-writeback.c: fs/fs-writeback.c:913: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id' Link: http://lkml.kernel.org/r/756645ac-0ce8-d47e-d30a-04d9e4923a4f@infradead.org Fixes: d62241c7a406 ("writeback, memcg: Implement cgroup_writeback_by_id()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14fs/libfs.c: fix kernel-doc warningRandy Dunlap
Fix kernel-doc warning in fs/libfs.c: fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end' Link: http://lkml.kernel.org/r/5fc9d70b-e377-0ec9-066a-970d49579041@infradead.org Fixes: ad2a722f196d ("libfs: Open code simple_commit_write into only user") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Boaz Harrosh <boazh@netapp.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14fs/direct-io.c: fix kernel-doc warningRandy Dunlap
Fix kernel-doc warning in fs/direct-io.c: fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete' Also, don't mark this function as having kernel-doc notation since it is not exported. Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org Fixes: 6d544bb4d901 ("dio: centralize completion in dio_complete()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Zach Brown <zab@zabbo.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()Vlastimil Babka
Florian and Dave reported [1] a NULL pointer dereference in __reset_isolation_pfn(). While the exact cause is unclear, staring at the code revealed two bugs, which might be related. One bug is that if zone starts in the middle of pageblock, block_page might correspond to different pfn than block_pfn, and then the pfn_valid_within() checks will check different pfn's than those accessed via struct page. This might result in acessing an unitialized page in CONFIG_HOLES_IN_ZONE configs. The other bug is that end_page refers to the first page of next pageblock and not last page of current pageblock. The online and valid check is then wrong and with sections, the while (page < end_page) loop might wander off actual struct page arrays. [1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/ Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz Fixes: 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Florian Weimer <fw@deneb.enyo.de> Reported-by: Dave Chinner <david@fromorbit.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm, hugetlb: allow hugepage allocations to reclaim as neededDavid Rientjes
Commit b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") has chnaged the allocator to bail out from the allocator early to prevent from a potentially excessive memory reclaim. __GFP_RETRY_MAYFAIL is designed to retry the allocation, reclaim and compaction loop as long as there is a reasonable chance to make forward progress. Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at the INIT_COMPACT_PRIORITY compaction attempt gives this feedback. The most obvious affected subsystem is hugetlbfs which allocates huge pages based on an admin request (or via admin configured overcommit). I have done a simple test which tries to allocate half of the memory for hugetlb pages while the memory is full of a clean page cache. This is not an unusual situation because we try to cache as much of the memory as possible and sysctl/sysfs interface to allocate huge pages is there for flexibility to allocate hugetlb pages at any time. System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages after the memory is prefilled by a clean page cache: root@test1:~# cat hugetlb_test.sh set -x echo 0 > /proc/sys/vm/nr_hugepages echo 3 > /proc/sys/vm/drop_caches echo 1 > /proc/sys/vm/compact_memory dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10)) TS=$(date +%s) echo 256 > /proc/sys/vm/nr_hugepages cat /proc/sys/vm/nr_hugepages The results for 2 consecutive runs on clean 5.3 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s + date +%s + TS=1569905284 + echo 256 + cat /proc/sys/vm/nr_hugepages 256 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s + date +%s + TS=1569905311 + echo 256 + cat /proc/sys/vm/nr_hugepages 256 Now with b39d0ee2632d applied root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s + date +%s + TS=1569905516 + echo 256 + cat /proc/sys/vm/nr_hugepages 11 root@test1:~# sh hugetlb_test.sh + echo 0 + echo 3 + echo 1 + dd if=/mnt/data/file-1G of=/dev/null bs=4096 262144+0 records in 262144+0 records out 1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s + date +%s + TS=1569905541 + echo 256 + cat /proc/sys/vm/nr_hugepages 12 The success rate went down by factor of 20! Although hugetlb allocation requests might fail and it is reasonable to expect them to under extremely fragmented memory or when the memory is under a heavy pressure but the above situation is not that case. Fix the regression by reverting back to the previous behavior for __GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for those requests. Mike said: : hugetlbfs allocations are commonly done via sysctl/sysfs shortly after : boot where this may not be as much of an issue. However, I am aware of at : least three use cases where allocations are made after the system has been : up and running for quite some time: : : - DB reconfiguration. If sysctl/sysfs fails to get required number of : huge pages, system is rebooted to perform allocation after boot. : : - VM provisioning. If unable get required number of huge pages, fall : back to base pages. : : - An application that does not preallocate pool, but rather allocates : pages at fault time for optimal NUMA locality. : : In all cases, I would expect b39d0ee2632d to cause regressions and : noticable behavior changes. : : My quick/limited testing in : https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com : was insufficient. It was also mentioned that if something like : b39d0ee2632d went forward, I would like exemptions for __GFP_RETRY_MAYFAIL : requests as in this patch. [mhocko@suse.com: reworded changelog] Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org Fixes: b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed") Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14lib/test_meminit: add a kmem_cache_alloc_bulk() testAlexander Potapenko
Make sure allocations from kmem_cache_alloc_bulk() and kmem_cache_free_bulk() are properly initialized. Link: http://lkml.kernel.org/r/20191007091605.30530-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Thibaut Sautereau <thibaut@sautereau.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocationsAlexander Potapenko
slab_alloc_node() already zeroed out the freelist pointer if init_on_free was on. Thibaut Sautereau noticed that the same needs to be done for kmem_cache_alloc_bulk(), which performs the allocations separately. kmem_cache_alloc_bulk() is currently used in two places in the kernel, so this change is unlikely to have a major performance impact. SLAB doesn't require a similar change, as auto-initialization makes the allocator store the freelist pointers off-slab. Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com Fixes: 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Thibaut Sautereau <thibaut@sautereau.fr> Reported-by: Kees Cook <keescook@chromium.org> Cc: Christoph Lameter <cl@linux.com> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14lib/generic-radix-tree.c: add kmemleak annotationsEric Biggers
Kmemleak is falsely reporting a leak of the slab allocation in sctp_stream_init_ext(): BUG: memory leak unreferenced object 0xffff8881114f5d80 (size 96): comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000ce7a1326>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline] [<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline] [<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553 [<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline] [<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline] [<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0 net/sctp/stream.c:157 [<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00 net/sctp/socket.c:1882 [<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102 [...] But it's freed later. Kmemleak misses the allocation because its pointer is stored in the generic radix tree sctp_stream::out, and the generic radix tree uses raw pages which aren't tracked by kmemleak. Fix this by adding the kmemleak hooks to the generic radix tree code. Link: http://lkml.kernel.org/r/20191004065039.727564-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Reported-by: <syzbot+7f3b6b106be8dcdcdeec@syzkaller.appspotmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm/slub: fix a deadlock in show_slab_objects()Qian Cai
A long time ago we fixed a similar deadlock in show_slab_objects() [1]. However, it is apparently due to the commits like 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") and 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by just reading files in /sys/kernel/slab which will generate a lockdep splat below. Since the "mem_hotplug_lock" here is only to obtain a stable online node mask while racing with NUMA node hotplug, in the worst case, the results may me miscalculated while doing NUMA node hotplug, but they shall be corrected by later reads of the same files. WARNING: possible circular locking dependency detected ------------------------------------------------------ cat/5224 is trying to acquire lock: ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at: show_slab_objects+0x94/0x3a8 but task is already holding lock: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (kn->count#45){++++}: lock_acquire+0x31c/0x360 __kernfs_remove+0x290/0x490 kernfs_remove+0x30/0x44 sysfs_remove_dir+0x70/0x88 kobject_del+0x50/0xb0 sysfs_slab_unlink+0x2c/0x38 shutdown_cache+0xa0/0xf0 kmemcg_cache_shutdown_fn+0x1c/0x34 kmemcg_workfn+0x44/0x64 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #1 (slab_mutex){+.+.}: lock_acquire+0x31c/0x360 __mutex_lock_common+0x16c/0xf78 mutex_lock_nested+0x40/0x50 memcg_create_kmem_cache+0x38/0x16c memcg_kmem_cache_create_func+0x3c/0x70 process_one_work+0x4f4/0x950 worker_thread+0x390/0x4bc kthread+0x1cc/0x1e8 ret_from_fork+0x10/0x18 -> #0 (mem_hotplug_lock.rw_sem){++++}: validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc other info that might help us debug this: Chain exists of: mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(kn->count#45); lock(slab_mutex); lock(kn->count#45); lock(mem_hotplug_lock.rw_sem); *** DEADLOCK *** 3 locks held by cat/5224: #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8 #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0 #2: b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0 stack backtrace: Call trace: dump_backtrace+0x0/0x248 show_stack+0x20/0x2c dump_stack+0xd0/0x140 print_circular_bug+0x368/0x380 check_noncircular+0x248/0x250 validate_chain+0xd10/0x2bcc __lock_acquire+0x7f4/0xb8c lock_acquire+0x31c/0x360 get_online_mems+0x54/0x150 show_slab_objects+0x94/0x3a8 total_objects_show+0x28/0x34 slab_attr_show+0x38/0x54 sysfs_kf_seq_show+0x198/0x2d4 kernfs_seq_show+0xa4/0xcc seq_read+0x30c/0x8a8 kernfs_fop_read+0xa8/0x314 __vfs_read+0x88/0x20c vfs_read+0xd8/0x10c ksys_read+0xb0/0x120 __arm64_sys_read+0x54/0x88 el0_svc_handler+0x170/0x240 el0_svc+0x8/0xc I think it is important to mention that this doesn't expose the show_slab_objects to use-after-free. There is only a single path that might really race here and that is the slab hotplug notifier callback __kmem_cache_shrink (via slab_mem_going_offline_callback) but that path doesn't really destroy kmem_cache_node data structures. [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html [akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock] Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw Fixes: 01fb58bcba63 ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path") Fixes: 03afc0e25f7f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm, page_owner: rename flag indicating that page is allocatedVlastimil Babka
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page is tracked as being allocated. Kirril suggested naming it PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat loaded term for a page". Link: http://lkml.kernel.org/r/20190930122916.14969-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm, page_owner: decouple freeing stack trace from debug_pageallocVlastimil Babka
Commit 8974558f49a6 ("mm, page_owner, debug_pagealloc: save and dump freeing stack trace") enhanced page_owner to also store freeing stack trace, when debug_pagealloc is also enabled. KASAN would also like to do this [1] to improve error reports to debug e.g. UAF issues. Kirill has suggested that the freeing stack trace saving should be also possible to be enabled separately from KASAN or debug_pagealloc, i.e. with an extra boot option. Qian argued that we have enough options already, and avoiding the extra overhead is not worth the complications in the case of a debugging option. Kirill noted that the extra stack handle in struct page_owner requires 0.1% of memory. This patch therefore enables free stack saving whenever page_owner is enabled, regardless of whether debug_pagealloc or KASAN is also enabled. KASAN kernels booted with page_owner=on will thus benefit from the improved error reports. [1] https://bugzilla.kernel.org/show_bug.cgi?id=203967 [vbabka@suse.cz: v3] Link: http://lkml.kernel.org/r/20191007091808.7096-3-vbabka@suse.cz Link: http://lkml.kernel.org/r/20190930122916.14969-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Qian Cai <cai@lca.pw> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Qian Cai <cai@lca.pw> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14mm, page_owner: fix off-by-one error in __set_page_owner_handle()Vlastimil Babka
Patch series "followups to debug_pagealloc improvements through page_owner", v3. These are followups to [1] which made it to Linus meanwhile. Patches 1 and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It would be nice if all of this made it to 5.4 with [1] already there (or at least Patch 1). This patch (of 3): As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") has introduced an off-by-one error in __set_page_owner_handle() when looking up page_ext for subpages. As a result, the head page page_owner info is set twice, while for the last tail page, it's not set at all. Fix this and also make the code more efficient by advancing the page_ext pointer we already have, instead of calling lookup_page_ext() for each subpage. Since the full size of struct page_ext is not known at compile time, we can't use a simple page_ext++ statement, so introduce a page_ext_next() inline function for that. Link: http://lkml.kernel.org/r/20190930122916.14969-2-vbabka@suse.cz Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Reported-by: Miles Chen <miles.chen@mediatek.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Walter Wu <walter-zh.wu@mediatek.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14xtensa: fix type conversion in __get_user_[no]checkMax Filippov
__get_user_[no]check uses temporary buffer of type long to store result of __get_user_size and do sign extension on it when necessary. This doesn't work correctly for 64-bit data. Fix it by moving temporary buffer/sign extension logic to __get_user_asm. Don't do assignment of __get_user_bad result to (x) as it may not always be integer-compatible now and issue warning even when it's going to be optimized. Instead do (x) = 0; and call __get_user_bad separately. Zero initialize __x in __get_user_asm and use '+' constraint for its assembly argument, so that its value is preserved in error cases. This may add at most 1 cycle to the fast path, but saves an instruction and two padding bytes in the fixup section for each use of this macro and works for both misaligned store and store exception. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2019-10-14xtensa: clean up assembly arguments in uaccess macrosMax Filippov
Numeric assembly arguments are hard to understand and assembly code that uses them is hard to modify. Use named arguments in __check_align_*, __get_user_asm and __put_user_asm. Modify macro parameter names so that they don't affect argument names. Use '+' constraint for the [err] argument instead of having it as both input and output. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2019-10-14block: Fix elv_support_iosched()Damien Le Moal
A BIO based request queue does not have a tag_set, which prevent testing for the flag BLK_MQ_F_NO_SCHED indicating that the queue does not require an elevator. This leads to an incorrect initialization of a default elevator in some cases such as BIO based null_blk (queue_mode == BIO) with zoned mode enabled as the default elevator in this case is mq-deadline instead of "none". Fix this by testing for a NULL queue mq_ops field which indicates that the queue is BIO based and should not have an elevator. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-14parisc: Remove 32-bit DMA enforcement from sba_iommuSven Schnelle
This breaks booting from sata_sil24 with the recent DMA change. According to James Bottomley this was in to improve performance by kicking the device into 32 bit descriptors, which are usually more efficient, especially with older dual descriptor format cards like we have on parisc systems. Remove it for now to make DMA working again. Fixes: dcc02c19cc06 ("sata_sil24: use dma_set_mask_and_coherent") Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de>
2019-10-14parisc: Fix vmap memory leak in ioremap()/iounmap()Helge Deller
Sven noticed that calling ioremap() and iounmap() multiple times leads to a vmap memory leak: vmap allocation for size 4198400 failed: use vmalloc=<size> to increase size It seems we missed calling vunmap() in iounmap(). Signed-off-by: Helge Deller <deller@gmx.de> Noticed-by: Sven Schnelle <svens@stackframe.org> Cc: <stable@vger.kernel.org> # v3.16+
2019-10-14parisc: prefer __section from compiler_attributes.hNick Desaulniers
Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Helge Deller <deller@gmx.de>
2019-10-14parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ defineHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2019-10-14firmware: dmi: Fix unlikely out-of-bounds read in save_mem_devicesJean Delvare
Before reading the Extended Size field, we should ensure it fits in the DMI record. There is already a record length check but it does not cover that field. It would take a seriously corrupted DMI table to hit that bug, so no need to worry, but we should still fix it. Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: 6deae96b42eb ("firmware, DMI: Add function to look up a handle and return DIMM size") Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@suse.de>
2019-10-14riscv: tlbflush: remove confusing comment on local_flush_tlb_all()Paul Walmsley
Remove a confusing comment on our local_flush_tlb_all() implementation. Per an internal discussion with Andrew, while it's true that the fence.i is not necessary, it's not the case that an sfence.vma implies a fence.i. We also drop the section about "flush[ing] the entire local TLB" to better align with the language in section 4.2.1 "Supervisor Memory-Management Fence Instruction" of the RISC-V Privileged Specification v20190608. Fixes: c901e45a999a1 ("RISC-V: `sfence.vma` orderes the instruction cache") Reported-by: Alan Kao <alankao@andestech.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Andrew Waterman <andrew@sifive.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-14riscv: dts: HiFive Unleashed: add default chosen/stdout-pathPaul Walmsley
Add a default "stdout-path" to the kernel DTS file, as is present in many of the board DTS files elsewhere in the kernel tree. With this line present, earlyconsole can be enabled by simply passing "earlycon" on the kernel command line. No specific device details are necessary, since the kernel will use the stdout-path as the default. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reviewed-by: Atish Patra <atish.patra@wdc.com>
2019-10-14riscv: remove the switch statement in do_trap_break()Vincent Chen
To make the code more straightforward, replace the switch statement with an if statement. Suggested-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> [paul.walmsley@sifive.com: cleaned up patch description; updated to apply] Link: https://lore.kernel.org/linux-riscv/20190927224711.GI4700@infradead.org/ Link: https://lore.kernel.org/linux-riscv/CABvJ_xiHJSB7P5QekuLRP=LBPzXXghAfuUpPUYb=a_HbnOQ6BA@mail.gmail.com/ Link: https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/VDCU2WOB6KQISREO4V5DTXEI2M7VOV55/ Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-10-14 The following pull-request contains BPF updates for your *net-next* tree. 12 days of development and 85 files changed, 1889 insertions(+), 1020 deletions(-) The main changes are: 1) auto-generation of bpf_helper_defs.h, from Andrii. 2) split of bpf_helpers.h into bpf_{helpers, helper_defs, endian, tracing}.h and move into libbpf, from Andrii. 3) Track contents of read-only maps as scalars in the verifier, from Andrii. 4) small x86 JIT optimization, from Daniel. 5) cross compilation support, from Ivan. 6) bpf flow_dissector enhancements, from Jakub and Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14drm/panfrost: Add missing GPU feature registersSteven Price
Three feature registers were declared but never actually read from the GPU. Add THREAD_MAX_THREADS, THREAD_MAX_WORKGROUP_SIZE and THREAD_MAX_BARRIER_SIZE so that the complete set are available. Fixes: 4bced8bea094 ("drm/panfrost: Export all GPU feature registers") Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20191014151515.13839-1-steven.price@arm.com
2019-10-14xtensa: fix {get,put}_user() for 64bit valuesAl Viro
First of all, on short copies __copy_{to,from}_user() return the amount of bytes left uncopied, *not* -EFAULT. get_user() and put_user() are expected to return -EFAULT on failure. Another problem is get_user(v32, (__u64 __user *)p); that should fetch 64bit value and the assign it to v32, truncating it in process. Current code, OTOH, reads 8 bytes of data and stores them at the address of v32, stomping on the 4 bytes that follow v32 itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2019-10-14kmemleak: Do not corrupt the object_list during clean-upCatalin Marinas
In case of an error (e.g. memory pool too small), kmemleak disables itself and cleans up the already allocated metadata objects. However, if this happens early before the RCU callback mechanism is available, put_object() skips call_rcu() and frees the object directly. This is not safe with the RCU list traversal in __kmemleak_do_cleanup(). Change the list traversal in __kmemleak_do_cleanup() to list_for_each_entry_safe() and remove the rcu_read_{lock,unlock} since the kmemleak is already disabled at this point. In addition, avoid an unnecessary metadata object rb-tree look-up since it already has the struct kmemleak_object pointer. Fixes: c5665868183f ("mm: kmemleak: use the memory pool for early allocations") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reported-by: Marc Dionne <marc.c.dionne@gmail.com> Reported-by: Ted Ts'o <tytso@mit.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14nvme-tcp: Initialize sk->sk_ll_usec only with NET_RX_BUSY_POLLSebastian Andrzej Siewior
The access to sk->sk_ll_usec should be hidden behind CONFIG_NET_RX_BUSY_POLL like the definition of sk_ll_usec. Put access to ->sk_ll_usec behind CONFIG_NET_RX_BUSY_POLL. Fixes: 1a9460cef5711 ("nvme-tcp: support simple polling") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Wait for reset state when requiredKeith Busch
Prevent simultaneous controller disabling/enabling tasks from interfering with each other through a function to wait until the task successfully transitioned the controller to the RESETTING state. This ensures disabling the controller will not be interrupted by another reset path, otherwise a concurrent reset may leave the controller in the wrong state. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Prevent resets during paused controller stateKeith Busch
A paused controller is doing critical internal activation work in the background. Prevent subsequent controller resets from occurring during this period by setting the controller state to RESETTING first. A helper function, nvme_try_sched_reset_work(), is introduced for these paths so they may continue with scheduling the reset_work after they've completed their uninterruptible critical section. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-14nvme: Restart request timers in resetting stateKeith Busch
A controller in the resetting state has not yet completed its recovery actions. The pci and fc transports were already handling this, so update the remaining transports to not attempt additional recovery in this state. Instead, just restart the request timer. Tested-by: Edmund Nadolski <edmund.nadolski@intel.com> Reviewed-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>