summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-11-25Btrfs: fix the number of transaction units needed to remove a block groupFilipe Manana
We were using only 1 transaction unit when attempting to delete an unused block group but in reality we need 3 + N units, where N corresponds to the number of stripes. We were accounting only for the addition of the orphan item (for the block group's free space cache inode) but we were not accounting that we need to delete one block group item from the extent tree, one free space item from the tree of tree roots and N device extent items from the device tree. While one unit is not enough, it worked most of the time because for each single unit we are too pessimistic and assume an entire tree path, with the highest possible heigth (8), needs to be COWed with eventual node splits at every possible level in the tree, so there was usually enough reserved space for removing all the items and adding the orphan item. However after adding the orphan item, writepages() can by called by the VM subsystem against the btree inode when we are under memory pressure, which causes writeback to start for the nodes we COWed before, this forces the operation to remove the free space item to COW again some (or all of) the same nodes (in the tree of tree roots). Even without writepages() being called, we could fail with ENOSPC because these items are located in multiple trees and one of them might have a higher heigth and require node/leaf splits at many levels, exhausting all the reserved space before removing all the items and adding the orphan. In the kernel 4.0 release, commit 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group"), we attempted to fix a BUG_ON due to ENOSPC when trying to add the orphan item by making the cleaner kthread reserve one transaction unit before attempting to remove the block group, but this was not enough. We had a couple user reports still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on a 4.2-rc6 kernel for example: http://www.spinics.net/lists/linux-btrfs/msg46070.html So fix this by reserving all the necessary units of metadata. Reported-by: Stefan Priebe <s.priebe@profihost.ag> Fixes: 3d84be799194 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: use global reserve when deleting unused block group after ENOSPCFilipe Manana
It's possible to reach a state where the cleaner kthread isn't able to start a transaction to delete an unused block group due to lack of enough free metadata space and due to lack of unallocated device space to allocate a new metadata block group as well. If this happens try to use space from the global block group reserve just like we do for unlink operations, so that we don't reach a permanent state where starting a transaction for filesystem operations (file creation, renames, etc) keeps failing with -ENOSPC. Such an unfortunate state was observed on a machine where over a dozen unused data block groups existed and the cleaner kthread was failing to delete them due to ENOSPC error when attempting to start a transaction, and even running balance with a -dusage=0 filter failed with ENOSPC as well. Also unmounting and mounting again the filesystem didn't help. Allowing the cleaner kthread to use the global block reserve to delete the unused data block groups fixed the problem. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25Btrfs: tests: checking for NULL instead of IS_ERR()Dan Carpenter
btrfs_alloc_dummy_root() return an error pointer on failure, it never returns NULL. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25btrfs: fix signed overflows in btrfs_sync_fileDavid Sterba
The calculation of range length in btrfs_sync_file leads to signed overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin. https://forums.grsecurity.net/viewtopic.php?f=1&t=4284 The fsync call passes 0 and LLONG_MAX, the range length does not fit to loff_t and overflows, but the value is converted to u64 so it silently works as expected. The minimal fix is a typecast to u64, switching functions to take (start, end) instead of (start, len) would be more intrusive. Coccinelle script found that there's one more opencoded calculation of the length. <smpl> @@ loff_t start, end; @@ * end - start </smpl> CC: stable@vger.kernel.org Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-11-25arm64: early_alloc: Fix check for allocation failureSuzuki K. Poulose
In early_alloc we check if the memblock_alloc failed by checking the virtual address of the result, which will never fail. This patch fixes it to check the actual result for failure. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-25rtc: ds1307: fix kernel splat due to wakeup irq handlingFelipe Balbi
Since commit 3fffd1283927 ("i2c: allow specifying separate wakeup interrupt in device tree") we have automatic wakeup irq support for i2c devices. That commit missed the fact that rtc-1307 had its own wakeup irq handling and ended up introducing a kernel splat for at least Beagle x15 boards. Fix that by reverting original commit _and_ passing correct interrupt names on DTS so i2c-core can choose correct IRQ as wakeup. Now that we have automatic wakeirq support, we can revert the original commit which did it manually. Fixes the following warning: [ 10.346582] WARNING: CPU: 1 PID: 263 at linux/drivers/base/power/wakeirq.c:43 dev_pm_attach_wake_irq+0xbc/0xd4() [ 10.359244] rtc-ds1307 2-006f: wake irq already initialized Cc: Tony Lindgren <tony@atomide.com> Cc: Nishanth Menon <nm@ti.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
2015-11-25drm/nouveau/volt/pwm/gk104: fix an off-by-one resulting in the voltage not ↵Martin Peres
being set Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Martin Peres <martin.peres@free.fr>
2015-11-25drm/nouveau/nvif: allow userspace access to its own client objectBen Skeggs
Regression from "abi16: implement limited interoperability with usif/nvif". Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf100-: fix oops when calling zbc methodsBen Skeggs
Somehow missed these two when removing dodgy void casts during the rework. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf117-: assume no PPC if NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK ↵Ben Skeggs
is zero fdo#92761 Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf117-: read NV_PGRAPH_GPC_GPM_PD_PES_TPC_ID_MASK from ↵Ben Skeggs
correct GPC Each GPCCS unit was reading the mask from GPC0, which causes problems on boards where some GPCs are missing PPCs. Part of the fix for fdo#92761. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/gr/gf100-: split out per-gpc address calculation macroBen Skeggs
There's a few places where we need to access a GPC register from ucode, but outside of the falcon's io address space. To do this we need to calculate the offset based on which GPC we're executing on. This used to be done manually, but we've since found a "base" offset that can be added by the hardware. To use this, an extra bit needs to be set in the register address, which is what this macro achieves. There should be no functional change from this commit. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/bios: return actual size of the buffer retrieved via _ROMBen Skeggs
Fixes detection of a failed attempt at fetching the entire ROM image in one-shot (a violation of the spec, that works a lot of the time). Tested on a HP Zbook 15 G2. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/instmem: protect instobj list with a spinlockBen Skeggs
No locking is required for the traversal of this list, as it only happens during suspend/resume where nothing else can be executing. Fixes some of the issues noticed during parallel piglit runs. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/pci: enable c800 magic for some unknown Samsung laptopBen Skeggs
fdo#70354 - comment #88. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25drm/nouveau/pci: enable c800 magic for Clevo P157SMKarol Herbst
this is needed for my gpu Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-11-25KEYS: Fix handling of stored error in a negatively instantiated user keyDavid Howells
If a user key gets negatively instantiated, an error code is cached in the payload area. A negatively instantiated key may be then be positively instantiated by updating it with valid data. However, the ->update key type method must be aware that the error code may be there. The following may be used to trigger the bug in the user key type: keyctl request2 user user "" @u keyctl add user user "a" @u which manifests itself as: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046 PGD 7cc30067 PUD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000 RIP: 0010:[<ffffffff810a376f>] [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046 RSP: 0018:ffff88003dd8bdb0 EFLAGS: 00010246 RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001 RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82 RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000 R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82 R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700 FS: 0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0 Stack: ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620 Call Trace: [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129 [< inline >] __key_update security/keys/key.c:730 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908 [< inline >] SYSC_add_key security/keys/keyctl.c:125 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185 Note the error code (-ENOKEY) in EDX. A similar bug can be tripped by: keyctl request2 trusted user "" @u keyctl add trusted user "a" @u This should also affect encrypted keys - but that has to be correctly parameterised or it will fail with EINVAL before getting to the bit that will crashes. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2015-11-24block: fix blk_abort_request for blk-mq driversChristoph Hellwig
We only added the request to the request list for the !blk-mq case, so we should only delete it in that case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24nvme: add missing unmaps in nvme_queue_rqChristoph Hellwig
When we fail various metadata related operations in nvme_queue_rq we need to unmap the data SGL. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24NVMe: default to 4k device page sizeNishanth Aravamudan
We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case of page sizes, we clearly want to use the IOMMU's page size in the driver. And generally, the NVMe driver in this function should be using the IOMMU's page size for the default device page size, rather than the kernel's page size. There is not currently an API to obtain the IOMMU's page size across all architectures and in the interest of a stop-gap fix to this functional issue, default the NVMe device page size to 4K, with the intent of adding such an API and implementation across all architectures in the next merge window. With the functionally equivalent v3 of this patch, our hardware test exerciser survives when using 32-bit DMA; without the patch, the kernel will BUG within a few minutes. Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24PCI: hisi: Fix deferred probingArnd Bergmann
The hisi_pcie_probe() function is incorrectly marked as __init, as Kconfig tells us: WARNING: drivers/pci/host/built-in.o(.data+0x7780): Section mismatch in reference from the variable hisi_pcie_driver to the function .init.text:hisi_pcie_probe() If the probe for this device gets deferred past the point where __init functions are removed, or the device is unbound and then reattached to the driver, we branch into uninitialized memory, which is bad. Remove the __init annotation from hisi_pcie_probe() and hisi_add_pcie_port(). Fixes: 500a1d9a43e0 ("PCI: hisi: Add HiSilicon SoC Hip05 PCIe driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org> Acked-by: Zhou Wang <wangzhou1@hisilicon.com>
2015-11-24Merge tag 'dm-4.4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Two fixes for 4.4-rc1's DM ioctl changes that introduced the potential for infinite recursion on ioctl (with DM multipath). And four stable fixes: - A DM thin-provisioning fix to restore 'error_if_no_space' setting when a thin-pool is made writable again (after having been out of space). - A DM thin-provisioning fix to properly advertise discard support for thin volumes that are stacked on a thin-pool whose underlying data device doesn't support discards. - A DM ioctl fix to allow ctrl-c to break out of an ioctl retry loop when DM multipath is configured to 'queue_if_no_path'. - A DM crypt fix for a possible hang on dm-crypt device removal" * tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm thin: fix regression in advertised discard limits dm crypt: fix a possible hang due to race condition on exit dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path dm: do not reuse dm_blk_ioctl block_device input as local variable dm: fix ioctl retry termination with signal dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition
2015-11-24PCI: designware: Remove incorrect io_base assignmentStanimir Varbanov
"pp->io" is an I/O resource, e.g., "[io 0x0000-0xffff]"; "pp->io_base" is the CPU physical address of a region where the host bridge converts CPU memory accesses into PCI I/O transactions. Corrupting pp->io_base by assigning pp->io->start to it breaks access to the PCI I/O space, as reported by Kishon. Remove the invalid assignment. [bhelgaas: changelog] Fixes: 0021d22b73d6 ("PCI: designware: Use of_pci_get_host_bridge_resources() to parse DT") Reported-and-tested-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Stanimir Varbanov <stanimir.varbanov@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2015-11-24pidns: fix NULL dereference in __task_pid_nr_ns()Eric Dumazet
I got a crash during a "perf top" session that was caused by a race in __task_pid_nr_ns() : pid_nr_ns() was inlined, but apparently compiler chose to read task->pids[type].pid twice, and the pid->level dereference crashed because we got a NULL pointer at the second read : if (pid && ns->level <= pid->level) { // CRASH Just use RCU API properly to solve this race, and not worry about "perf top" crashing hosts :( get_task_pid() can benefit from same fix. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-24ALSA: hda - Fix noise on Gigabyte Z170X moboTakashi Iwai
Gigabyte Z710X mobo with ALC1150 codec gets significant noises from the analog loopback routes even if their inputs are all muted. Simply kill the aamix for fixing it. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=108301 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-11-24selinux: fix bug in conditional rules handlingStephen Smalley
commit fa1aa143ac4a ("selinux: extended permissions for ioctls") introduced a bug into the handling of conditional rules, skipping the processing entirely when the caller does not provide an extended permissions (xperms) structure. Access checks from userspace using /sys/fs/selinux/access do not include such a structure since that interface does not presently expose extended permission information. As a result, conditional rules were being ignored entirely on userspace access requests, producing denials when access was allowed by conditional rules in the policy. Fix the bug by only skipping computation of extended permissions in this situation, not the entire conditional rules processing. Reported-by: Laurent Bigonville <bigon@debian.org> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: fixed long lines in patch description] Cc: stable@vger.kernel.org # 4.3 Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-11-24Merge tag 'kvm-arm-for-v4.4-rc3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM Fixes for v4.4-rc3. Includes some timer fixes, properly unmapping PTEs, an errata fix, and two tweaks to the EL2 panic code.
2015-11-24PCI: Prevent out of bounds access in numa_node overrideMathias Krause
Commit 1266963170f5 ("PCI: Prevent out of bounds access in numa_node override") missed that the user-provided node could also be negative. Handle this case as well to avoid out-of-bounds accesses to the node_states[] array. However, allow the special value -1, i.e. NUMA_NO_NODE, to be able to set the 'no specific node' configuration. Fixes: 1266963170f5 ("PCI: Prevent out of bounds access in numa_node override") Fixes: 63692df103e9 ("PCI: Allow numa_node override via sysfs") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Sasha Levin <sasha.levin@oracle.com> CC: Prarit Bhargava <prarit@redhat.com> CC: stable@vger.kernel.org # v3.19+
2015-11-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A round of fixes/updates for the current series. This looks a little bigger than it is, but that's mainly because we pushed the lightnvm enabled null_blk change out of the merge window so it could be updated a bit. The rest of the volume is also mostly lightnvm. In particular: - Lightnvm. Various fixes, additions, updates from Matias and Javier, as well as from Wenwei Tao. - NVMe: - Fix for potential arithmetic overflow from Keith. - Also from Keith, ensure that we reap pending completions from a completion queue before deleting it. Fixes kernel crashes when resetting a device with IO pending. - Various little lightnvm related tweaks from Matias. - Fixup flushes to go through the IO scheduler, for the cases where a flush is not required. Fixes a case in CFQ where we would be idling and not see this request, hence not break the idling. From Jan Kara. - Use list_{first,prev,next} in elevator.c for cleaner code. From Gelian Tang. - Fix for a warning trigger on btrfs and raid on single queue blk-mq devices, where we would flush plug callbacks with preemption disabled. From me. - A mac partition validation fix from Kees Cook. - Two merge fixes from Ming, marked stable. A third part is adding a new warning so we'll notice this quicker in the future, if we screw up the accounting. - Cleanup of thread name/creation in mtip32xx from Rasmus Villemoes" * 'for-linus' of git://git.kernel.dk/linux-block: (32 commits) blk-merge: warn if figured out segment number is bigger than nr_phys_segments blk-merge: fix blk_bio_segment_split block: fix segment split blk-mq: fix calling unplug callbacks with preempt disabled mac: validate mac_partition is within sector mtip32xx: use formatting capability of kthread_create_on_node NVMe: reap completion entries when deleting queue lightnvm: add free and bad lun info to show luns lightnvm: keep track of block counts nvme: lightnvm: use admin queues for admin cmds lightnvm: missing free on init error lightnvm: wrong return value and redundant free null_blk: do not del gendisk with lightnvm null_blk: use device addressing mode null_blk: use ppa_cache pool NVMe: Fix possible arithmetic overflow for max segments blk-flush: Queue through IO scheduler when flush not required null_blk: register as a LightNVM device elevator: use list_{first,prev,next}_entry lightnvm: cleanup queue before target removal ...
2015-11-24soc: Mediatek: Enable SCPSYS power domain driver by defaultEddie Huang
If enable Mediatek 8173 SoC, it should also enable power domain driver. Otherwise access clk subsystem register will fail. Signed-off-by: Eddie Huang <eddie.huang@mediatek.com> Acked-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Kevin Hilman <khilman@linaro.org>
2015-11-24drm/radeon: make rv770_set_sw_state failures non-fatalAlex Deucher
On some cards it takes a relatively long time for the change to take place. Make a timeout non-fatal. bug: https://bugs.freedesktop.org/show_bug.cgi?id=76130 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-11-24arm64: kvm: report original PAR_EL1 upon panicMark Rutland
If we call __kvm_hyp_panic while a guest context is active, we call __restore_sysregs before acquiring the system register values for the panic, in the process throwing away the PAR_EL1 value at the point of the panic. This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to restoring host register values, enabling us to report the original values at the point of the panic. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: kvm: avoid %p in __kvm_hyp_panicMark Rutland
Currently __kvm_hyp_panic uses %p for values which are not pointers, such as the ESR value. This can confusingly lead to "(null)" being printed for the value. Use %x instead, and only use %p for host pointers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: vgic: Trust the LR state for HW IRQsChristoffer Dall
We were probing the physial distributor state for the active state of a HW virtual IRQ, because we had seen evidence that the LR state was not cleared when the guest deactivated a virtual interrupted. However, this issue turned out to be a software bug in the GIC, which was solved by: 84aab5e68c2a5e1e18d81ae8308c3ce25d501b29 (KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.active, 2015-11-24) Therefore, get rid of the complexities and just look at the LR. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: arch_timer: Preserve physical dist. active state on LR.activeChristoffer Dall
We were incorrectly removing the active state from the physical distributor on the timer interrupt when the timer output level was deasserted. We shouldn't be doing this without considering the virtual interrupt's active state, because the architecture requires that when an LR has the HW bit set and the pending or active bits set, then the physical interrupt must also have the corresponding bits set. This addresses an issue where we have been observing an inconsistency between the LR state and the physical distributor state where the LR state was active and the physical distributor was not active, which shouldn't happen. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24KVM: arm/arm64: Fix preemptible timer active state crazynessChristoffer Dall
We were setting the physical active state on the GIC distributor in a preemptible section, which could cause us to set the active state on different physical CPU from the one we were actually going to run on, hacoc ensues. Since we are no longer descheduling/scheduling soft timers in the flush/sync timer functions, simply moving the timer flush into a non-preemptible section. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: KVM: Add workaround for Cortex-A57 erratum 834220Marc Zyngier
Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults when a Stage 1 permission fault or device alignment fault should have been reported. This patch implements the workaround (which is to validate that the Stage-1 translation actually succeeds) by using code patching. Cc: stable@vger.kernel.org Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24arm64: KVM: Fix AArch32 to AArch64 register mappingMarc Zyngier
When running a 32bit guest under a 64bit hypervisor, the ARMv8 architecture defines a mapping of the 32bit registers in the 64bit space. This includes banked registers that are being demultiplexed over the 64bit ones. On exceptions caused by an operation involving a 32bit register, the HW exposes the register number in the ESR_EL2 register. It was so far understood that SW had to distinguish between AArch32 and AArch64 accesses (based on the current AArch32 mode and register number). It turns out that I misinterpreted the ARM ARM, and the clue is in D1.20.1: "For some exceptions, the exception syndrome given in the ESR_ELx identifies one or more register numbers from the issued instruction that generated the exception. Where the exception is taken from an Exception level using AArch32 these register numbers give the AArch64 view of the register." Which means that the HW is already giving us the translated version, and that we shouldn't try to interpret it at all (for example, doing an MMIO operation from the IRQ mode using the LR register leads to very unexpected behaviours). The fix is thus not to perform a call to vcpu_reg32() at all from vcpu_reg(), and use whatever register number is supplied directly. The only case we need to find out about the mapping is when we actively generate a register access, which only occurs when injecting a fault in a guest. Cc: stable@vger.kernel.org Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24ARM/arm64: KVM: test properly for a PTE's uncachednessArd Biesheuvel
The open coded tests for checking whether a PTE maps a page as uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern, which is not guaranteed to work since the type of a mapping is not a set of mutually exclusive bits For HYP mappings, the type is an index into the MAIR table (i.e, the index itself does not contain any information whatsoever about the type of the mapping), and for stage-2 mappings it is a bit field where normal memory and device types are defined as follows: #define MT_S2_NORMAL 0xf #define MT_S2_DEVICE_nGnRE 0x1 I.e., masking *and* comparing with the latter matches on the former, and we have been getting lucky merely because the S2 device mappings also have the PTE_UXN bit set, or we would misidentify memory mappings as device mappings. Since the unmap_range() code path (which contains one instance of the flawed test) is used both for HYP mappings and stage-2 mappings, and considering the difference between the two, it is non-trivial to fix this by rewriting the tests in place, as it would involve passing down the type of mapping through all the functions. However, since HYP mappings and stage-2 mappings both deal with host physical addresses, we can simply check whether the mapping is backed by memory that is managed by the host kernel, and only perform the D-cache maintenance if this is the case. Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Pavel Fedin <p.fedin@samsung.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-11-24ring-buffer: Put back the length if crossed page with add_timestampSteven Rostedt (Red Hat)
Commit fcc742eaad7c "ring-buffer: Add event descriptor to simplify passing data" added a descriptor that holds various data instead of passing around several variables through parameters. The problem was that one of the parameters was modified in a function and the code was designed not to have an effect on that modified parameter. Now that the parameter is a descriptor and any modifications to it are non-volatile, the size of the data could be unnecessarily expanded. Remove the extra space added if a timestamp was added and the event went across the page. Cc: stable@vger.kernel.org # 4.3+ Fixes: fcc742eaad7c "ring-buffer: Add event descriptor to simplify passing data" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24ring-buffer: Update read stamp with first real commit on pageSteven Rostedt (Red Hat)
Do not update the read stamp after swapping out the reader page from the write buffer. If the reader page is swapped out of the buffer before an event is written to it, then the read_stamp may get an out of date timestamp, as the page timestamp is updated on the first commit to that page. rb_get_reader_page() only returns a page if it has an event on it, otherwise it will return NULL. At that point, check if the page being returned has events and has not been read yet. Then at that point update the read_stamp to match the time stamp of the reader page. Cc: stable@vger.kernel.org # 2.6.30+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-11-24ARM: dts: vfxxx: Fix dspi[01] spi-num-chipselects.Cory Tusar
Per the Vybrid Reference Manual (section 3.8.6.1), dspi0 has 6 chip select signals associated with it, while dspi1 has only 4. Signed-off-by: Cory Tusar <cory.tusar@pid1solutions.com> Acked-by: Stefan Agner <stefan@agner.ch> Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
2015-11-24ALSA: hda - Fix headphone noise after Dell XPS 13 resume back from S3Hui Wang
We have a machine Dell XPS 13 with the codec alc256, after resume back from S3, the headphone has noise when play sound. Through comparing with the coeff vaule before and after S3, we found restoring a coeff register will help remove noise. BugLink: https://bugs.launchpad.net/bugs/1519168 Cc: Kailang Yang <kailang@realtek.com> Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-11-23blk-merge: warn if figured out segment number is bigger than nr_phys_segmentsMing Lei
We had seen lots of reports of this kind issue, so add one warnning in blk-merge, then it can be triggered easily and avoid to depend on warning/bug from drivers. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-23blk-merge: fix blk_bio_segment_splitMing Lei
Commit bdced438acd83a(block: setup bi_phys_segments after splitting) introduces function of computing bio->bi_phys_segments during bio splitting. Unfortunately both bio->bi_seg_front_size and bio->bi_seg_back_size arn't computed, so too many physical segments may be obtained for one request since both the two are used to check if one segment across two bios can be possible. This patch fixes the issue by computing the two variables in blk_bio_segment_split(). Fixes: bdced438acd83a(block: setup bi_phys_segments after splitting) Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-23block: fix segment splitMing Lei
Inside blk_bio_segment_split(), previous bvec pointer(bvprvp) always points to the iterator local variable, which is obviously wrong, so fix it by pointing to the local variable of 'bvprv'. Fixes: 5014c311baa2b(block: fix bogus compiler warnings in blk-merge.c) Cc: stable@kernel.org #4.3 Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mark Salter <msalter@redhat.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-23nfs4: limit callback decoding to received bytesBenjamin Coddington
A truncated cb_compound request will cause the client to decode null or data from a previous callback for nfs4.1 backchannel case, or uninitialized data for the nfs4.0 case. This is because the path through svc_process_common() advances the request's iov_base and decrements iov_len without adjusting the overall xdr_buf's len field. That causes xdr_init_decode() to set up the xdr_stream with an incorrect length in nfs4_callback_compound(). Fixing this for the nfs4.1 backchannel case first requires setting the correct iov_len and page_len based on the length of received data in the same manner as the nfs4.0 case. Then the request's xdr_buf length can be adjusted for both cases based upon the remaining iov_len and page_len. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs4: start callback_ident at idr 1Benjamin Coddington
If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing it when the nfs_client is freed. A decoding or server bug can then find and try to put that first nfs_client which would lead to a crash. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Fixes: d6870312659d ("nfs4client: convert to idr_alloc()") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23nfs: use sliding delay when LAYOUTGET gets NFS4ERR_DELAYJeff Layton
When LAYOUTGET gets NFS4ERR_DELAY, we currently will wait 15s before retrying the call. That is a _very_ long time, so add a timeout value to struct nfs4_layoutget and pass nfs4_async_handle_error a pointer to it. This allows the RPC engine to use a sliding delay window, instead of a 15s delay. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23NFS4: Cleanup FATTR4_WORD0_FS_LOCATIONS after decoding successKinglong Mee
Commit 1ca843a2d2 "nfs: Fix GETATTR bitmap verification" has check the bitmap after decoding success, but decode_attr_fs_locations forgets cleanup the FATTR4_WORD0_FS_LOCATIONS bits. decode_getfattr_attrs always return -EIO when meeting FS_LOCATIONS now. ls: cannot access /mnt/referal: Input/output error ls: cannot access /mnt/replicas: Input/output error total 32 drwxr-xr-x. 7 root root 8192 Nov 16 20:36 pnfs ??????????? ? ? ? ? ? referal ??????????? ? ? ? ? ? replicas v2: clear the bit earlier Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>