summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-29perf/x86/intel/uncore: Fix wrong box pointer checkKan Liang
Should not init a NULL box. It will cause system crash. The issue looks like caused by a typo. This was not noticed because there is no NULL box. Also, for most boxes, they are enabled by default. The init code is not critical. Fixes: fff4b87e594a ("perf/x86/intel/uncore: Make package handling more robust") Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
2017-06-29Merge branch 'arcnet-fixes'David S. Miller
Michael Grzeschik says: ==================== arcnet: Collection of latest fixes Here we sum up the recent fixes I collected on the way to use and stabilise the framework. Part of it is an possible deadlock that we prevent as well to fix the calculation of the dev_id that can be setup by an rotary encoder. Beside that we added an trivial spelling patch and fix some wrong and missing assignments that improves the code footprint. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29arcnet: com20020-pci: add missing pdev setup in netdev structureMichael Grzeschik
We add the pdev data to the pci devices netdev structure. This way the interface get consistent device names in the userspace (udev). Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29arcnet: com20020-pci: fix dev_id calculationMichael Grzeschik
The dev_id was miscalculated. Only the two bits 4-5 are relevant for the MA1 card. PCIARC1 and PCIFB2 use the four bits 4-7 for id selection. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29arcnet: com20020: remove needless base_addr assignmentMichael Grzeschik
The assignment is superfluous. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29Trivial fix to spelling mistake in arc_printk messageColin Ian King
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29arcnet: change irq handler to lock irqsaveMichael Grzeschik
This patch prevents the arcnet driver from the following deadlock. [ 41.273910] ====================================================== [ 41.280397] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 41.287433] 4.4.0-00034-gc0ae784 #536 Not tainted [ 41.292366] ------------------------------------------------------ [ 41.298863] arcecho/233 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire: [ 41.305628] (&(&lp->lock)->rlock){+.+...}, at: [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet] [ 41.315199] [ 41.315199] and this task is already holding: [ 41.321324] (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8 [ 41.329593] which would create a new lock dependency: [ 41.334893] (_xmit_ARCNET#2){+.-...} -> (&(&lp->lock)->rlock){+.+...} [ 41.341801] [ 41.341801] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 41.350108] (_xmit_ARCNET#2){+.-...} ... which became SOFTIRQ-irq-safe at: [ 41.357539] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.362677] [<c063ab8c>] dev_watchdog+0x5c/0x264 [ 41.367723] [<c0094edc>] call_timer_fn+0x6c/0xf4 [ 41.372759] [<c00950b8>] run_timer_softirq+0x154/0x210 [ 41.378340] [<c0036b30>] __do_softirq+0x144/0x298 [ 41.383469] [<c0036fb4>] irq_exit+0xcc/0x130 [ 41.388138] [<c0085c50>] __handle_domain_irq+0x60/0xb4 [ 41.393728] [<c0014578>] __irq_svc+0x58/0x78 [ 41.398402] [<c0010274>] arch_cpu_idle+0x24/0x3c [ 41.403443] [<c007127c>] cpu_startup_entry+0x1f8/0x25c [ 41.409029] [<c09adc90>] start_kernel+0x3c0/0x3cc [ 41.414170] [ 41.414170] to a SOFTIRQ-irq-unsafe lock: [ 41.419931] (&(&lp->lock)->rlock){+.+...} ... which became SOFTIRQ-irq-unsafe at: [ 41.427996] ... [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.433409] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet] [ 41.439646] [<c0089120>] handle_nested_irq+0x8c/0xec [ 41.445063] [<c03c1170>] regmap_irq_thread+0x190/0x314 [ 41.450661] [<c0087244>] irq_thread_fn+0x1c/0x34 [ 41.455700] [<c0087548>] irq_thread+0x13c/0x1dc [ 41.460649] [<c0050f10>] kthread+0xe4/0xf8 [ 41.465158] [<c000f810>] ret_from_fork+0x14/0x24 [ 41.470207] [ 41.470207] other info that might help us debug this: [ 41.470207] [ 41.478627] Possible interrupt unsafe locking scenario: [ 41.478627] [ 41.485763] CPU0 CPU1 [ 41.490521] ---- ---- [ 41.495279] lock(&(&lp->lock)->rlock); [ 41.499414] local_irq_disable(); [ 41.505636] lock(_xmit_ARCNET#2); [ 41.511967] lock(&(&lp->lock)->rlock); [ 41.518741] <Interrupt> [ 41.521490] lock(_xmit_ARCNET#2); [ 41.525356] [ 41.525356] *** DEADLOCK *** [ 41.525356] [ 41.531587] 1 lock held by arcecho/233: [ 41.535617] #0: (_xmit_ARCNET#2){+.-...}, at: [<c06b934c>] packet_direct_xmit+0xfc/0x1c8 [ 41.544355] the dependencies between SOFTIRQ-irq-safe lock and the holding lock: [ 41.552362] -> (_xmit_ARCNET#2){+.-...} ops: 27 { [ 41.557357] HARDIRQ-ON-W at: [ 41.560664] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.567445] [<c063ba28>] dev_deactivate_many+0x114/0x304 [ 41.574866] [<c063bc3c>] dev_deactivate+0x24/0x38 [ 41.581646] [<c0630374>] linkwatch_do_dev+0x40/0x74 [ 41.588613] [<c06305d8>] __linkwatch_run_queue+0xec/0x140 [ 41.596120] [<c0630658>] linkwatch_event+0x2c/0x34 [ 41.602991] [<c004af30>] process_one_work+0x188/0x40c [ 41.610131] [<c004b200>] worker_thread+0x4c/0x480 [ 41.616912] [<c0050f10>] kthread+0xe4/0xf8 [ 41.623048] [<c000f810>] ret_from_fork+0x14/0x24 [ 41.629735] IN-SOFTIRQ-W at: [ 41.633039] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.639820] [<c063ab8c>] dev_watchdog+0x5c/0x264 [ 41.646508] [<c0094edc>] call_timer_fn+0x6c/0xf4 [ 41.653190] [<c00950b8>] run_timer_softirq+0x154/0x210 [ 41.660425] [<c0036b30>] __do_softirq+0x144/0x298 [ 41.667201] [<c0036fb4>] irq_exit+0xcc/0x130 [ 41.673518] [<c0085c50>] __handle_domain_irq+0x60/0xb4 [ 41.680754] [<c0014578>] __irq_svc+0x58/0x78 [ 41.687077] [<c0010274>] arch_cpu_idle+0x24/0x3c [ 41.693769] [<c007127c>] cpu_startup_entry+0x1f8/0x25c [ 41.701006] [<c09adc90>] start_kernel+0x3c0/0x3cc [ 41.707791] INITIAL USE at: [ 41.711003] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.717696] [<c063ba28>] dev_deactivate_many+0x114/0x304 [ 41.725026] [<c063bc3c>] dev_deactivate+0x24/0x38 [ 41.731718] [<c0630374>] linkwatch_do_dev+0x40/0x74 [ 41.738593] [<c06305d8>] __linkwatch_run_queue+0xec/0x140 [ 41.746011] [<c0630658>] linkwatch_event+0x2c/0x34 [ 41.752789] [<c004af30>] process_one_work+0x188/0x40c [ 41.759847] [<c004b200>] worker_thread+0x4c/0x480 [ 41.766541] [<c0050f10>] kthread+0xe4/0xf8 [ 41.772596] [<c000f810>] ret_from_fork+0x14/0x24 [ 41.779198] } [ 41.780945] ... key at: [<c124d620>] netdev_xmit_lock_key+0x38/0x1c8 [ 41.788192] ... acquired at: [ 41.791309] [<c007bed8>] lock_acquire+0x70/0x90 [ 41.796361] [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54 [ 41.802324] [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet] [ 41.808844] [<c06b9380>] packet_direct_xmit+0x130/0x1c8 [ 41.814622] [<c06bc7e4>] packet_sendmsg+0x3b8/0x680 [ 41.820034] [<c05fe8b0>] sock_sendmsg+0x14/0x24 [ 41.825091] [<c05ffd68>] SyS_sendto+0xb8/0xe0 [ 41.829956] [<c05ffda8>] SyS_send+0x18/0x20 [ 41.834638] [<c000f780>] ret_fast_syscall+0x0/0x1c [ 41.839954] [ 41.841514] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 41.850302] -> (&(&lp->lock)->rlock){+.+...} ops: 5 { [ 41.855644] HARDIRQ-ON-W at: [ 41.858945] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.865726] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet] [ 41.873607] [<c0089120>] handle_nested_irq+0x8c/0xec [ 41.880666] [<c03c1170>] regmap_irq_thread+0x190/0x314 [ 41.887901] [<c0087244>] irq_thread_fn+0x1c/0x34 [ 41.894593] [<c0087548>] irq_thread+0x13c/0x1dc [ 41.901195] [<c0050f10>] kthread+0xe4/0xf8 [ 41.907338] [<c000f810>] ret_from_fork+0x14/0x24 [ 41.914025] SOFTIRQ-ON-W at: [ 41.917328] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.924106] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet] [ 41.931981] [<c0089120>] handle_nested_irq+0x8c/0xec [ 41.939028] [<c03c1170>] regmap_irq_thread+0x190/0x314 [ 41.946264] [<c0087244>] irq_thread_fn+0x1c/0x34 [ 41.952954] [<c0087548>] irq_thread+0x13c/0x1dc [ 41.959548] [<c0050f10>] kthread+0xe4/0xf8 [ 41.965689] [<c000f810>] ret_from_fork+0x14/0x24 [ 41.972379] INITIAL USE at: [ 41.975595] [<c06f8fc8>] _raw_spin_lock+0x30/0x40 [ 41.982283] [<bf083d54>] arcnet_interrupt+0x2c/0x800 [arcnet] [ 41.990063] [<c0089120>] handle_nested_irq+0x8c/0xec [ 41.997027] [<c03c1170>] regmap_irq_thread+0x190/0x314 [ 42.004172] [<c0087244>] irq_thread_fn+0x1c/0x34 [ 42.010766] [<c0087548>] irq_thread+0x13c/0x1dc [ 42.017267] [<c0050f10>] kthread+0xe4/0xf8 [ 42.023314] [<c000f810>] ret_from_fork+0x14/0x24 [ 42.029903] } [ 42.031648] ... key at: [<bf0854cc>] __key.42091+0x0/0xfffff0f8 [arcnet] [ 42.039255] ... acquired at: [ 42.042372] [<c007bed8>] lock_acquire+0x70/0x90 [ 42.047413] [<c06f9140>] _raw_spin_lock_irqsave+0x40/0x54 [ 42.053364] [<bf083bc8>] arcnet_send_packet+0x60/0x1c0 [arcnet] [ 42.059872] [<c06b9380>] packet_direct_xmit+0x130/0x1c8 [ 42.065634] [<c06bc7e4>] packet_sendmsg+0x3b8/0x680 [ 42.071030] [<c05fe8b0>] sock_sendmsg+0x14/0x24 [ 42.076069] [<c05ffd68>] SyS_sendto+0xb8/0xe0 [ 42.080926] [<c05ffda8>] SyS_send+0x18/0x20 [ 42.085601] [<c000f780>] ret_fast_syscall+0x0/0x1c [ 42.090918] [ 42.092481] [ 42.092481] stack backtrace: [ 42.097065] CPU: 0 PID: 233 Comm: arcecho Not tainted 4.4.0-00034-gc0ae784 #536 [ 42.104751] Hardware name: Generic AM33XX (Flattened Device Tree) [ 42.111183] [<c0017ec8>] (unwind_backtrace) from [<c00139d0>] (show_stack+0x10/0x14) [ 42.119337] [<c00139d0>] (show_stack) from [<c02a82c4>] (dump_stack+0x8c/0x9c) [ 42.126937] [<c02a82c4>] (dump_stack) from [<c0078260>] (check_usage+0x4bc/0x63c) [ 42.134815] [<c0078260>] (check_usage) from [<c0078438>] (check_irq_usage+0x58/0xb0) [ 42.142964] [<c0078438>] (check_irq_usage) from [<c007aaa0>] (__lock_acquire+0x1524/0x20b0) [ 42.151740] [<c007aaa0>] (__lock_acquire) from [<c007bed8>] (lock_acquire+0x70/0x90) [ 42.159886] [<c007bed8>] (lock_acquire) from [<c06f9140>] (_raw_spin_lock_irqsave+0x40/0x54) [ 42.168768] [<c06f9140>] (_raw_spin_lock_irqsave) from [<bf083bc8>] (arcnet_send_packet+0x60/0x1c0 [arcnet]) [ 42.179115] [<bf083bc8>] (arcnet_send_packet [arcnet]) from [<c06b9380>] (packet_direct_xmit+0x130/0x1c8) [ 42.189182] [<c06b9380>] (packet_direct_xmit) from [<c06bc7e4>] (packet_sendmsg+0x3b8/0x680) [ 42.198059] [<c06bc7e4>] (packet_sendmsg) from [<c05fe8b0>] (sock_sendmsg+0x14/0x24) [ 42.206199] [<c05fe8b0>] (sock_sendmsg) from [<c05ffd68>] (SyS_sendto+0xb8/0xe0) [ 42.213978] [<c05ffd68>] (SyS_sendto) from [<c05ffda8>] (SyS_send+0x18/0x20) [ 42.221388] [<c05ffda8>] (SyS_send) from [<c000f780>] (ret_fast_syscall+0x0/0x1c) Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> --- v1 -> v2: removed unneeded zero assignment of flags Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29rocker: move dereference before freeDan Carpenter
My static checker complains that ofdpa_neigh_del() can sometimes free "found". It just makes sense to use it first before deleting it. Fixes: ecf244f753e0 ("rocker: fix maybe-uninitialized warning") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29mlxsw: spectrum_router: Fix NULL pointer dereferenceIdo Schimmel
In case a VLAN device is enslaved to a bridge we shouldn't create a router interface (RIF) for it when it's configured with an IP address. This is already handled by the driver for other types of netdevs, such as physical ports and LAG devices. If this IP address is then removed and the interface is subsequently unlinked from the bridge, a NULL pointer dereference can happen, as the original 802.1d FID was replaced with an rFID which was then deleted. To reproduce: $ ip link set dev enp3s0np9 up $ ip link add name enp3s0np9.111 link enp3s0np9 type vlan id 111 $ ip link set dev enp3s0np9.111 up $ ip link add name br0 type bridge $ ip link set dev br0 up $ ip link set enp3s0np9.111 master br0 $ ip address add dev enp3s0np9.111 192.168.0.1/24 $ ip address del dev enp3s0np9.111 192.168.0.1/24 $ ip link set dev enp3s0np9.111 nomaster Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Petr Machata <petrm@mellanox.com> Tested-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29net: sched: Fix one possible panic when no destroy callbackGao Feng
When qdisc fail to init, qdisc_create would invoke the destroy callback to cleanup. But there is no check if the callback exists really. So it would cause the panic if there is no real destroy callback like the qdisc codel, fq, and so on. Take codel as an example following: When a malicious user constructs one invalid netlink msg, it would cause codel_init->codel_change->nla_parse_nested failed. Then kernel would invoke the destroy callback directly but qdisc codel doesn't define one. It causes one panic as a result. Now add one the check for destroy to avoid the possible panic. Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29virtio-net: serialize tx routine during resetJason Wang
We don't hold any tx lock when trying to disable TX during reset, this would lead a use after free since ndo_start_xmit() tries to access the virtqueue which has already been freed. Fix this by using netif_tx_disable() before freeing the vqs, this could make sure no tx after vq freeing. Reported-by: Jean-Philippe Menil <jpmenil@gmail.com> Tested-by: Jean-Philippe Menil <jpmenil@gmail.com> Fixes commit f600b6905015 ("virtio_net: Add XDP support") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Robert McCabe <robert.mccabe@rockwellcollins.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29nvme: Makefile: remove dead build ruleValentin Rothberg
Remove dead build rule for drivers/nvme/host/scsi.c which has been removed by commit ("nvme: Remove SCSI translations"). Signed-off-by: Valentin Rothberg <vrothberg@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-29blk-mq: map all HWQ also in hyperthreaded systemMax Gurtovoy
This patch performs sequential mapping between CPUs and queues. In case the system has more CPUs than HWQs then there are still CPUs to map to HWQs. In hyperthreaded system, map the unmapped CPUs and their siblings to the same HWQ. This actually fixes a bug that found unmapped HWQs in a system with 2 sockets, 18 cores per socket, 2 threads per core (total 72 CPUs) running NVMEoF (opens upto maximum of 64 HWQs). Performance results running fio (72 jobs, 128 iodepth) using null_blk (w/w.o patch): bs IOPS(read submit_queues=72) IOPS(write submit_queues=72) IOPS(read submit_queues=24) IOPS(write submit_queues=24) ----- ---------------------------- ------------------------------ ---------------------------- ----------------------------- 512 4890.4K/4723.5K 4524.7K/4324.2K 4280.2K/4264.3K 3902.4K/3909.5K 1k 4910.1K/4715.2K 4535.8K/4309.6K 4296.7K/4269.1K 3906.8K/3914.9K 2k 4906.3K/4739.7K 4526.7K/4330.6K 4301.1K/4262.4K 3890.8K/3900.1K 4k 4918.6K/4730.7K 4556.1K/4343.6K 4297.6K/4264.5K 3886.9K/3893.9K 8k 4906.4K/4748.9K 4550.9K/4346.7K 4283.2K/4268.8K 3863.4K/3858.2K 16k 4903.8K/4782.6K 4501.5K/4233.9K 4292.3K/4282.3K 3773.1K/3773.5K 32k 4885.8K/4782.4K 4365.9K/4184.2K 4307.5K/4289.4K 3780.3K/3687.3K 64k 4822.5K/4762.7K 2752.8K/2675.1K 4308.8K/4312.3K 2651.5K/2655.7K 128k 2388.5K/2313.8K 1391.9K/1375.7K 2142.8K/2152.2K 1395.5K/1374.2K Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-29ftrace: Fix regression with module command in stack_trace_filterSteven Rostedt (VMware)
When doing the following command: # echo ":mod:kvm_intel" > /sys/kernel/tracing/stack_trace_filter it triggered a crash. This happened with the clean up of probes. It required all callers to the regex function (doing ftrace filtering) to have ops->private be a pointer to a trace_array. But for the stack tracer, that is not the case. Allow for the ops->private to be NULL, and change the function command callbacks to handle the trace_array pointer being NULL as well. Fixes: d2afd57a4b96 ("tracing/ftrace: Allow instances to have their own function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2017-06-29Revert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"Brian Norris
This reverts commit 88bb94216f59e10802aaf78c858a4146085faf18. It introduced a new CONFIG_DEBUG_ATOMIC_SLEEP warning in v4.12-rc1: [ 7226.716713] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 [ 7226.716716] in_atomic(): 0, irqs_disabled(): 0, pid: 1708, name: bash [ 7226.716722] CPU: 1 PID: 1708 Comm: bash Not tainted 4.12.0-rc6+ #1213 [ 7226.716724] Hardware name: Google Kevin (DT) [ 7226.716726] Call trace: [ 7226.716738] [<ffffff8008089928>] dump_backtrace+0x0/0x24c [ 7226.716743] [<ffffff8008089b94>] show_stack+0x20/0x28 [ 7226.716749] [<ffffff8008371370>] dump_stack+0x90/0xb0 [ 7226.716755] [<ffffff80080cd2a0>] ___might_sleep+0x10c/0x124 [ 7226.716760] [<ffffff80080cd330>] __might_sleep+0x78/0x88 [ 7226.716765] [<ffffff800879e210>] mutex_lock+0x2c/0x64 [ 7226.716771] [<ffffff80083ad678>] rockchip_irq_bus_lock+0x30/0x3c [ 7226.716777] [<ffffff80080f6d40>] __irq_get_desc_lock+0x78/0x98 [ 7226.716782] [<ffffff80080f7e6c>] irq_set_irq_wake+0x44/0x12c [ 7226.716787] [<ffffff8008486e18>] dev_pm_arm_wake_irq+0x4c/0x58 [ 7226.716792] [<ffffff800848b80c>] device_wakeup_arm_wake_irqs+0x3c/0x58 [ 7226.716796] [<ffffff80084896fc>] dpm_suspend_noirq+0xf8/0x3a0 [ 7226.716800] [<ffffff80080f1384>] suspend_devices_and_enter+0x1a4/0x9a8 [ 7226.716803] [<ffffff80080f21ec>] pm_suspend+0x664/0x6a4 [ 7226.716807] [<ffffff80080f04d8>] state_store+0xd4/0xf8 ... It was reported on -rc1, and it's still not fixed in -rc6, so it should just be reverted. Cc: John Keeping <john@metanate.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-29gpio: acpi: Skip _AEI entries without a handler rather then aborting the scanHans de Goede
acpi_walk_resources will stop as soon as the callback passed in returns an error status. On a x86 tablet I have the first GpioInt in the _AEI resource list has no handler defined in the DSDT, causing acpi_walk_resources to abort scanning the rest of the resource list, which does define valid ACPI GPIO events. This commit changes the return for not finding a handler from AE_BAD_PARAMETER to AE_OK so that the rest of the resource list will get scanned normally in case of missing event handlers. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-29gpiolib: fix filtering out unwanted eventsBartosz Golaszewski
GPIOEVENT_REQUEST_BOTH_EDGES is not a single flag, but a binary OR of GPIOEVENT_REQUEST_RISING_EDGE and GPIOEVENT_REQUEST_FALLING_EDGE. The expression 'le->eflags & GPIOEVENT_REQUEST_BOTH_EDGES' we'll get evaluated to true even if only one event type was requested. Fix it by checking both RISING & FALLING flags explicitly. Cc: stable@vger.kernel.org Fixes: 61f922db7221 ("gpio: userspace ABI for reading GPIO line events") Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-06-29EDAC, pnd2: Fix Apollo Lake DIMM detectionTony Luck
Non-existent or empty DIMM slots result in error return from RD_REGP(). But we shouldn't give up on failure. So long as we find at least one DIMM we can continue. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170628234407.21521-1-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29EDAC, i5000, i5400: Fix definition of NRECMEMB registerJérémy Lefaure
In the i5000 and i5400 drivers, the NRECMEMB register is defined as a 16-bit value, which results in wrong shifts in the code, as reported by sparse. In the datasheets ([1], section 3.9.22.20 and [2], section 3.9.22.21), this register is a 32-bit register. A u32 value for the register fixes the wrong shifts warnings and matches the datasheet. Also fix the mask to access to the CAS bits [27:16] in the i5000 driver. [1]: https://www.intel.com/content/dam/doc/datasheet/5000p-5000v-5000z-chipset-memory-controller-hub-datasheet.pdf [2]: https://www.intel.se/content/dam/doc/datasheet/5400-chipset-memory-controller-hub-datasheet.pdf Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170629005729.8478-1-jeremy.lefaure@lse.epita.fr Signed-off-by: Borislav Petkov <bp@suse.de>
2017-06-29sched/numa: Hide numa_wake_affine() from UP buildThomas Gleixner
Stephen reported the following build warning in UP: kernel/sched/fair.c:2657:9: warning: 'struct sched_domain' declared inside parameter list ^ /home/sfr/next/next/kernel/sched/fair.c:2657:9: warning: its scope is only this definition or declaration, which is probably not what you want Hide the numa_wake_affine() inline stub on UP builds to get rid of it. Fixes: 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org>
2017-06-28arch: remove unused macro/function thread_saved_pc()Tobias Klauser
The only user of thread_saved_pc() in non-arch-specific code was removed in commit 8243d5597793 ("sched/core: Remove pointless printout in sched_show_task()"). Remove the implementations as well. Some architectures use thread_saved_pc() in their arch-specific code. Leave their thread_saved_pc() intact. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-29timers: Make the cpu base lock rawSebastian Andrzej Siewior
The timers cpu base lock could not be converted to a raw spinlock becaue the lock held time was non-deterministic due to cascading and long lasting timer wheel traversals. The rework of the timer wheel to the new non-cascading model removed also the wheel traversals and the lock held times are deterministic now. This allows to make the lock raw and thereby unbreaks NOHz* on preempt-RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20170627161538.30257-1-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28block: provide bio_uninit() free freeing integrity/task associationsJens Axboe
Wen reports significant memory leaks with DIF and O_DIRECT: "With nvme devive + T10 enabled, On a system it has 256GB and started logging /proc/meminfo & /proc/slabinfo for every minute and in an hour it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute leaking. /proc/meminfo | grep SUnreclaim... SUnreclaim: 6752128 kB SUnreclaim: 6874880 kB SUnreclaim: 7238080 kB .... SUnreclaim: 22307264 kB SUnreclaim: 22485888 kB SUnreclaim: 22720256 kB When testcases with T10 enabled call into __blkdev_direct_IO_simple, code doesn't free memory allocated by bio_integrity_alloc. The patch fixes the issue. HTX has been run with +60 hours without failure." Since __blkdev_direct_IO_simple() allocates the bio on the stack, it doesn't go through the regular bio free. This means that any ancillary data allocated with the bio through the stack is not freed. Hence, we can leak the integrity data associated with the bio, if the device is using DIF/DIX. Fix this by providing a bio_uninit() and export it, so that we can use it to free this data. Note that this is a minimal fix for this issue. Any current user of bio's that are allocated outside of bio_alloc_bioset() suffers from this issue, most notably some drivers. We will fix those in a more comprehensive patch for 4.13. This also means that the commit marked as being fixed by this isn't the real culprit, it's just the most obvious one out there. Fixes: 542ff7bf18c6 ("block: new direct I/O implementation") Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: Allocate queues for all possible CPUsChristoph Hellwig
Unlike most drіvers that simply pass the maximum possible vectors to pci_alloc_irq_vectors NVMe needs to configure the device before allocting the vectors, so it needs a manual update for the new scheme of using all present CPUs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-block@vger.kernel.org Cc: linux-nvme@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-4-hch@lst.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28blk-mq: Create hctx for each present CPUChristoph Hellwig
Currently we only create hctx for online CPUs, which can lead to a lot of churn due to frequent soft offline / online operations. Instead allocate one for each present CPU to avoid this and dramatically simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-block@vger.kernel.org Cc: linux-nvme@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28blk-mq: Include all present CPUs in the default queue mappingChristoph Hellwig
This way we get a nice distribution independent of the current cpu online / offline state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-block@vger.kernel.org Cc: linux-nvme@lists.infradead.org Link: http://lkml.kernel.org/r/20170626102058.10200-2-hch@lst.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28x86/PCI: Select CONFIG_PCI_LOCKLESS_CONFIGThomas Gleixner
All x86 PCI configuration space accessors have either their own serialization or can operate completely lockless (ECAM). Disable the global lock in the generic PCI configuration space accessors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20170316215057.295079391@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28PCI: Provide Kconfig option for lockless config space accessorsThomas Gleixner
The generic PCI configuration space accessors are globally serialized via pci_lock. On larger systems this causes massive lock contention when the configuration space has to be accessed frequently. One such access pattern is the Intel Uncore performance counter unit. Provide a kernel config option which can be selected by an architecture when the low level PCI configuration space accessors in the architecture use their own serialization or can operate completely lockless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20170316215057.205961140@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28x86/PCI/ce4100: Properly lock accessor functionsThomas Gleixner
x86 wants to get rid of the global pci_lock protecting the config space accessors so ECAM mode can operate completely lockless, but the CE4100 PCI code relies on that to protect the simulation registers. Restructure the code so it uses the x86 specific pci_config_lock to serialize the inner workings of the CE4100 PCI magic. That allows to remove the global locking via pci_lock later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20170316215057.126873574@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28x86/PCI: Abort if legacy init failsThomas Gleixner
If the legacy PCI init fails, then there are no PCI config space accesors available, but the code continues and tries to scan the busses, which fails due to the lack of config space accessors. Return right away, if the last init fallback fails. Switch the few printks to pr_info while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20170316215057.047576516@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28x86/PCI: Remove duplicate definesThomas Gleixner
For some historic reason these defines are duplicated and also available in arch/x86/include/asm/pci_x86.h, Remove them. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <helgaas@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-pci@vger.kernel.org Link: http://lkml.kernel.org/r/20170316215056.967808646@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-06-28Merge tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Bugfixes include: - stable fix for exclusive create if the server supports the umask attribute - trunking detection should handle ERESTARTSYS/EINTR - stable fix for a race in the LAYOUTGET function - stable fix to revert "nfs_rename() handle -ERESTARTSYS dentry left behind" - nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete()" * tag 'nfs-for-4.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: nfs4_callback_free_slot() cannot call nfs4_slot_tbl_drain_complete() Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" NFSv4.1: Fix a race in nfs4_proc_layoutget NFS: Trunking detection should handle ERESTARTSYS/EINTR NFSv4.2: Don't send mode again in post-EXCLUSIVE4_1 SETATTR with umask
2017-06-28Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "This is the final set of fixes for -rc8, just a few i915 and one vmwgfx ones. I'm off on holidays for a week, so if anything shows up for fixes I've asked Daniel or Sean Paul to herd it in the right direction" [ The additional etnaviv fixes were already herded towards me as seen in my previous pull - Linus ] * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations drm/i915: Hold struct_mutex for per-file stats in debugfs/i915_gem_object drm/i915: Retire the VMA's fence tracker before unbinding
2017-06-28Merge branch 'etnaviv/fixes' of git://git.pengutronix.de/git/lst/linuxLinus Torvalds
Pull drm/etnaviv fixes from Lucas Stach: "I realized I just missed the cut-off point for the final drm fixes pull, but I have 2 more etnaviv fixes that need to go into 4.12, as they fix fallout from the explicit sync work introduced in the last merge window" [ Pulling directly because Dave is on vacation. Noted by Daniel Vetter, and acked by Dave Airlie - Linus ] * 'etnaviv/fixes' of git://git.pengutronix.de/git/lst/linux: drm/etnaviv: Fix implicit/explicit sync sense inversion drm/etnaviv: fix submit flags getting overwritten by BO content
2017-06-28locking/refcount: Create unchecked atomic_t implementationKees Cook
Many subsystems will not use refcount_t unless there is a way to build the kernel so that there is no regression in speed compared to atomic_t. This adds CONFIG_REFCOUNT_FULL to enable the full refcount_t implementation which has the validation but is slightly slower. When not enabled, refcount_t uses the basic unchecked atomic_t routines, which results in no code changes compared to just using atomic_t directly. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Windsor <dwindsor@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Elena Reshetova <elena.reshetova@intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Hans Liljestrand <ishkamiel@gmail.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Jann Horn <jannh@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arozansk@redhat.com Cc: axboe@kernel.dk Cc: linux-arch <linux-arch@vger.kernel.org> Link: http://lkml.kernel.org/r/20170621200026.GA115679@beast Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-28nvmet-rdma: register ib_client to not deadlock in device removalSagi Grimberg
We can deadlock in case we got to a device removal event on a queue which is already in the process of destroying the cm_id is this is blocking until all events on this cm_id will drain. On the other hand we cannot guarantee that rdma_destroy_id was invoked as we only have indication that the queue disconnect flow has been queued (the queue state is updated before the realease work has been queued). So, we leave all the queue removal to a separate ib_client to avoid this deadlock as ib_client device removal is in a different context than the cm_id itself. Reported-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme_fc: fix error recovery on link down.James Smart
Currently, the fc transport invokes nvme_fc_error_recovery() on every io in which the transport detects an error. Which means: a) it's really noisy on large io loads that all get hit by a link down. b) we repeatively call nvme_stop_queues() even though queues are stopped upon the first error or as first steps of reset_work. Correct by: Errors are only meaningful if the controller is in the LIVE state. Thus, enact the reset_work only if LIVE. If called repeatively, state will have already transitioned. There's no need to stop the queues here. Let the first steps of reset_work do the queue stopping. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvmet_fc: fix crashes on bad opcodesJames Smart
if a nvme command is issued with an opcode that is not supported by the target (example: opcode 21 - detach namespace), the target crashes due to a null pointer. nvmet_req_init() detects the bad opcode and immediately calls the nvme command done routine with an error status, allowing the transport to send the response. However, the FC transport was aborting the command on error, so the abort freed the lldd point, but the rsp transmit path referenced it psot the free. Fix by removing the abort call on nvmet_req_init() failure. The completion response will be sent with an error status code. As the completion path will terminate the io, ensure the data_sg lists show an unused state so that teardown paths are successful. Signed-off-by: Paul Ely <Paul.Ely@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme_fc: Fix crash when nvme controller connection fails.James Smart
If a controller connection is attempted (say to a subsystem that does not exist), the first attempt errors out. If another connect is attempted, it crashes. Issue is the prior controller has yet execute it's final put, thus its still on lists. However, opts points on it have been cleared, thus causing the crash if they are referenced. Fix is to add the missing put after the nvme_uninit_ctrl() call on the attachment failure. Signed-off-by: Paul Ely <Paul.Ely@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme_fc: replace ioabort msleep loop with completionJames Smart
Per the recommendation by Sagi on: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html Wait for io aborts to complete wait converted from msleep look to using a struct completion. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme_fc: fix double calls to nvme_cleanup_cmd()James Smart
Current fc transport code, on io termination, is calling nvme_cleanup_cmd() followed by the transport dma unmap routine which also calls nvme_cleanup_cmd(). Which means two kfrees occur on the same address, raising havoc. This resulted in odd data errors, effectively corruption.. Fix by removing the extraneous double calls. Call now occurs only in teardown paths and as part of dma unmap routine. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme-fabrics: verify that a controller returns the correct NQNChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: simplify nvme_dev_attrs_are_visibleChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: read the subsystem NQN from Identify ControllerChristoph Hellwig
NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the Identify controller data structures. Use this NQN for the subsysnqn sysfs attribute by storing it in the nvme_ctrl structure after verifying it. For older controllers we generate a "fake" NQN per non-normative text in the NVMe 1.3 spec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: remove a misleading comment on struct nvme_nsChristoph Hellwig
While a NVMe Namespace is somewhat similar to a SCSI Logical Unit (and not a Logical Unit Number anyway) there are subtile differences. Remove the misleading comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grmberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: explicitly disable APST on quirked devicesKai-Heng Feng
A user reports APST is enabled, even when the NVMe is quirked or with option "default_ps_max_latency_us=0". The current logic will not set APST if the device is quirked. But the NVMe in question will enable APST automatically. Separate the logic "apst is supported" and "to enable apst", so we can use the latter one to explicitly disable APST at initialiaztion. BugLink: https://bugs.launchpad.net/bugs/1699004 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: use a single NVME_AQ_DEPTH and relax it to 32Sagi Grimberg
No need to differentiate fabrics from pci/loop, also lower it to 32 as we don't really need 256 inflight admin commands. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: add hostid token to fabric optionsJohannes Thumshirn
Currently we have no way to define a stable host-id but always use the one which is randomly generated when we add the host or use the default host. Provide a "hostid=%s" for user-space to pass in a persistent host-id which overrides the randomly generated one. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme: Remove SCSI translationsKeith Busch
The SCSI-to-NVMe translations were added to assist storage applications utilizing SG_IO transitioning to NVMe. It was always recommended, however, to use native NVMe for device management as too much is lost in translation and the maintenance burden in keeping this kludgey layer around has been neglected such that much of the translations are completely broken. This patch removes SG_IO handling from NVMe to avoid any confusion regarding maintenance support for this interface. The config option for NVMe SCSI emulation has been disabled by default since 4.5. The driver has supported native nvme user commands since the beginning, and native tooling is publicly available for use or as reference for anyone writing their own tools, so there's no excuse for hanging onto a broken crutch. Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-28nvme-pci: open-code polling logic in nvme_pollSagi Grimberg
Given that the code is simple enough it seems better then passing a tag by reference for each call site, also we can now get rid of __nvme_process_cq. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>