Age | Commit message (Collapse) | Author |
|
The struct fuse_conn argument is not used and can be removed.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In case of fuse the MM subsystem doesn't guarantee that page writeback
completes by the time ->sync_fs() is called. This is because fuse
completes page writeback immediately to prevent DoS of memory reclaim by
the userspace file server.
This means that fuse itself must ensure that writes are synced before
sending the SYNCFS request to the server.
Introduce sync buckets, that hold a counter for the number of outstanding
write requests. On syncfs replace the current bucket with a new one and
wait until the old bucket's counter goes down to zero.
It is possible to have multiple syncfs calls in parallel, in which case
there could be more than one waited-on buckets. Descendant buckets must
not complete until the parent completes. Add a count to the child (new)
bucket until the (parent) old bucket completes.
Use RCU protection to dereference the current bucket and to wake up an
emptied bucket. Use fc->lock to protect against parallel assignments to
the current bucket.
This leaves just the counter to be a possible scalability issue. The
fc->num_waiting counter has a similar issue, so both should be addressed at
the same time.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Fixes: 2d82ab251ef0 ("virtiofs: propagate sync() to file server")
Cc: <stable@vger.kernel.org> # v5.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
VDUSE (vDPA Device in Userspace) is a framework to support
implementing software-emulated vDPA devices in userspace. This
document is intended to clarify the VDUSE design and usage.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-14-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This VDUSE driver enables implementing software-emulated vDPA
devices in userspace. The vDPA device is created by
ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
interface (/dev/vduse/$NAME) is exported to userspace for device
emulation.
In order to make the device emulation more secure, the device's
control path is handled in kernel. A message mechnism is introduced
to forward some dataplane related control messages to userspace.
And in the data path, the DMA buffer will be mapped into userspace
address space through different ways depending on the vDPA bus to
which the vDPA device is attached. In virtio-vdpa case, the MMU-based
software IOTLB is used to achieve that. And in vhost-vdpa case, the
DMA buffer is reside in a userspace memory region which can be shared
to the VDUSE userspace processs via transferring the shmfd.
For more details on VDUSE design and usage, please see the follow-on
Documentation commit.
NB(mst): when merging this with
b542e383d8c0 ("eventfd: Make signal recursion protection a task bit")
replace eventfd_signal_count with eventfd_signal_allowed,
and drop the previous
("eventfd: Export eventfd_wake_count to modules").
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-13-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This implements an MMU-based software IOTLB to support mapping
kernel dma buffer into userspace dynamically. The basic idea
behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The
software IOTLB will set up MMU mapping instead of IOMMU mapping
for the DMA transfer so that the userspace process is able to
use its virtual address to access the dma buffer in kernel.
To avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly which may contain other kernel data. During the mapping,
unmapping, the software IOTLB will copy the data from the original
buffer to the bounce buffer and back, depending on the direction
of the transfer. And the bounce-buffer addresses will be mapped
into the user address space instead of the original one.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-12-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-11-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The upcoming patch is going to support VA mapping/unmapping.
So let's factor out the logic of PA mapping/unmapping firstly
to make the code more readable.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-10-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Add an opaque pointer for DMA mapping.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-9-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-8-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The vdpa_reset() may fail now. This adds check to its return
value and fail the vhost_vdpa_open().
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-7-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This adds a new callback to support device specific reset
behavior. The vdpa bus driver will call the reset function
instead of setting status to zero during resetting.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210831103634.33-6-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Fix some code indent issues and following checkpatch warning:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
371: FILE: include/linux/vdpa.h:371:
+static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-5-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210831103634.33-4-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Export eventfd_wake_count so that some modules can use
the eventfd_signal_count() to check whether the
eventfd_signal() call should be deferred to a safe context.
NB(mst): this patch is not needed in Linus tree since there
eventfd_signal_count() has been superseded by an already exported
eventfd_signal_allowed().
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210831103634.33-3-xieyongji@bytedance.com
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Export alloc_iova_fast() and free_iova_fast() so that
some modules can make use of the per-CPU cache to get
rid of rbtree spinlock in alloc_iova() and free_iova()
during IOVA allocation.
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210831103634.33-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Usually we use "likely/unlikely" to optimize the fast path. Remove
redundant "likely/unlikely" statements in the control path to simplify
the code and make it easier to read.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210905085717.7427-1-mgurtovoy@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
Use the helper virtio_find_vqs().
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210723054259.2779-1-xianting.tian@linux.alibaba.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
adjusted
When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
especially there's a big tsc warp (like a new vCPU is hot-added into VM
which has been up for a long time), tsc_offset is added by a large value
then go back to guest. This causes system time jump as tsc_timestamp is
not adjusted in the meantime and pvclock monotonic character.
To fix this, just notify kvm to update vCPU's guest time before back to
guest.
Cc: stable@vger.kernel.org
Signed-off-by: Zelin Deng <zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
It is reasonable for these functions to be used only in some configurations,
for example only if the host is 64-bits (and therefore supports 64-bit
guests). It is also reasonable to keep the role_regs and role accessors
in sync even though some of the accessors may be used only for one of the
two sets (as is the case currently for CR4.LA57)..
Because clang reports warnings for unused inlines declared in a .c file,
mark both sets of accessors as __maybe_unused.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This fix a build warning:
arch/mips/kvm/vz.c: In function '_kvm_vz_restore_htimer':
>> arch/mips/kvm/vz.c:392:10: warning: variable 'freeze_time' set but not used [-Wunused-but-set-variable]
392 | ktime_t freeze_time;
| ^~~~~~~~~~~
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Message-Id: <20210406024911.2008046-1-chenhuacai@loongson.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 5.15
- Page ownership tracking between host EL1 and EL2
- Rely on userspace page tables to create large stage-2 mappings
- Fix incompatibility between pKVM and kmemleak
- Fix the PMU reset state, and improve the performance of the virtual PMU
- Move over to the generic KVM entry code
- Address PSCI reset issues w.r.t. save/restore
- Preliminary rework for the upcoming pKVM fixed feature
- A bunch of MM cleanups
- a vGIC fix for timer spurious interrupts
- Various cleanups
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix and feature for 5.15
- enable interpretion of specification exceptions
- fix a vcpu_idx vs vcpu_id mixup
|
|
Commit f4e61f0c9add3 ("x86/kvm: Fix broken irq restoration in kvm_wait")
replaced "local_irq_restore() when IRQ enabled" with "local_irq_enable()
when IRQ enabled" to suppress a warnning.
Although there is no similar debugging warnning for doing local_irq_enable()
when IRQ enabled as doing local_irq_restore() in the same IRQ situation. But
doing local_irq_enable() when IRQ enabled is no less broken as doing
local_irq_restore() and we'd better avoid it.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210814035129.154242-1-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a new stat that counts the number of times a remote TLB flush is
requested, regardless of whether it kicks vCPUs out of guest mode. This
allows us to look at how often flushes are initiated.
Unlike remote_tlb_flush, this one applies to ARM's instruction-set-based
TLB flush implementation, so apply it there too.
Original-by: David Matlack <dmatlack@google.com>
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Message-Id: <20210817002639.3856694-1-jingzhangos@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Don't export KVM's MMU notifier count helpers, under no circumstance
should any downstream module, including x86's vendor code, have a
legitimate reason to piggyback KVM's MMU notifier logic. E.g in the x86
case, only KVM's MMU should be elevating the notifier count, and that
code is always built into the core kvm.ko module.
Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210902175951.1387989-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move "lpage_disallowed_link" out of the first 64 bytes, i.e. out of the
first cache line, of kvm_mmu_page so that "spt" and to a lesser extent
"gfns" land in the first cache line. "lpage_disallowed_link" is accessed
relatively infrequently compared to "spt", which is accessed any time KVM
is walking and/or manipulating the shadow page tables.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901221023.1303578-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Move "tdp_mmu_page" into the 1-byte void left by the recently removed
"mmio_cached" so that it resides in the first 64 bytes of kvm_mmu_page,
i.e. in the same cache line as the most commonly accessed fields.
Don't bother wrapping tdp_mmu_page in CONFIG_X86_64, including the field in
32-bit builds doesn't affect the size of kvm_mmu_page, and a future patch
can always wrap the field in the unlikely event KVM gains a 1-byte flag
that is 32-bit specific.
Note, the size of kvm_mmu_page is also unchanged on CONFIG_X86_64=y due
to it previously sharing an 8-byte chunk with write_flooding_count.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210901221023.1303578-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Revert a misguided illegal GPA check when "translating" a non-nested GPA.
The check is woefully incomplete as it does not fill in @exception as
expected by all callers, which leads to KVM attempting to inject a bogus
exception, potentially exposing kernel stack information in the process.
WARNING: CPU: 0 PID: 8469 at arch/x86/kvm/x86.c:525 exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
CPU: 1 PID: 8469 Comm: syz-executor531 Not tainted 5.14.0-rc7-syzkaller #0
RIP: 0010:exception_type+0x98/0xb0 arch/x86/kvm/x86.c:525
Call Trace:
x86_emulate_instruction+0xef6/0x1460 arch/x86/kvm/x86.c:7853
kvm_mmu_page_fault+0x2f0/0x1810 arch/x86/kvm/mmu/mmu.c:5199
handle_ept_misconfig+0xdf/0x3e0 arch/x86/kvm/vmx/vmx.c:5336
__vmx_handle_exit arch/x86/kvm/vmx/vmx.c:6021 [inline]
vmx_handle_exit+0x336/0x1800 arch/x86/kvm/vmx/vmx.c:6038
vcpu_enter_guest+0x2a1c/0x4430 arch/x86/kvm/x86.c:9712
vcpu_run arch/x86/kvm/x86.c:9779 [inline]
kvm_arch_vcpu_ioctl_run+0x47d/0x1b20 arch/x86/kvm/x86.c:10010
kvm_vcpu_ioctl+0x49e/0xe50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3652
The bug has escaped notice because practically speaking the GPA check is
useless. The GPA check in question only comes into play when KVM is
walking guest page tables (or "translating" CR3), and KVM already handles
illegal GPA checks by setting reserved bits in rsvd_bits_mask for each
PxE, or in the case of CR3 for loading PTDPTRs, manually checks for an
illegal CR3. This particular failure doesn't hit the existing reserved
bits checks because syzbot sets guest.MAXPHYADDR=1, and IA32 architecture
simply doesn't allow for such an absurd MAXPHYADDR, e.g. 32-bit paging
doesn't define any reserved PA bits checks, which KVM emulates by only
incorporating the reserved PA bits into the "high" bits, i.e. bits 63:32.
Simply remove the bogus check. There is zero meaningful value and no
architectural justification for supporting guest.MAXPHYADDR < 32, and
properly filling the exception would introduce non-trivial complexity.
This reverts commit ec7771ab471ba6a945350353617e2e3385d0e013.
Fixes: ec7771ab471b ("KVM: x86: mmu: Add guest physical address check in translate_gpa()")
Cc: stable@vger.kernel.org
Reported-by: syzbot+200c08e88ae818f849ce@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210831164224.1119728-2-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
After reverting and restoring the fast tlb invalidation patch series,
the mmio_cached is not removed. Hence a unused field is left in
kvm_mmu_page.
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jia He <justin.he@arm.com>
Message-Id: <20210830145336.27183-1-justin.he@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
so increase KVM_SOFT_MAX_VCPUS to 710.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Increase KVM_MAX_VCPUS to 1024, so we can test larger VMs.
I'm not changing KVM_SOFT_MAX_VCPUS yet because I'm afraid it
might involve complicated questions around the meaning of
"supported" and "recommended" in the upstream tree.
KVM_SOFT_MAX_VCPUS will be changed in a separate patch.
For reference, visible effects of this change are:
- KVM_CAP_MAX_VCPUS will now return 1024 (of course)
- Default value for CPUID[HYPERV_CPUID_IMPLEMENT_LIMITS (00x40000005)].EAX
will now be 1024
- KVM_MAX_VCPU_ID will change from 1151 to 4096
- Size of struct kvm will increase from 19328 to 22272 bytes
(in x86_64)
- Size of struct kvm_ioapic will increase from 1780 to 5084 bytes
(in x86_64)
- Bitmap stack variables that will grow:
- At kvm_hv_flush_tlb() kvm_hv_send_ipi(),
vp_bitmap[] and vcpu_bitmap[] will now be 128 bytes long
- vcpu_bitmap at bioapic_write_indirect() will be 128 bytes long
once patch "KVM: x86: Fix stack-out-of-bounds memory access
from ioapic_write_indirect()" is applied
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-3-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Instead of requiring KVM_MAX_VCPU_ID to be manually increased
every time we increase KVM_MAX_VCPUS, set it to 4*KVM_MAX_VCPUS.
This should be enough for CPU topologies where Cores-per-Package
and Packages-per-Socket are not powers of 2.
In practice, this increases KVM_MAX_VCPU_ID from 1023 to 1152.
The only side effect of this change is making some fields in
struct kvm_ioapic larger, increasing the struct size from 1628 to
1780 bytes (in x86_64).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-2-ehabkost@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The kernel test robot reports printk format warnings in uefi.c, so
correct them.
../drivers/net/wireless/intel/iwlwifi/fw/uefi.c: In function 'iwl_uefi_get_pnvm':
../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:52:30: warning: format '%zd' expects argument of type 'signed size_t', but argument 7 has type 'long unsigned int' [-Wformat=]
52 | "PNVM UEFI variable not found %d (len %zd)\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53 | err, package_size);
| ~~~~~~~~~~~~
| |
| long unsigned int
../drivers/net/wireless/intel/iwlwifi/fw/uefi.c:59:29: warning: format '%zd' expects argument of type 'signed size_t', but argument 6 has type 'long unsigned int' [-Wformat=]
59 | IWL_DEBUG_FW(trans, "Read PNVM from UEFI with size %zd\n", package_size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~
| |
| long unsigned int
Fixes: 84c3c9952afb ("iwlwifi: move UEFI code to a separate file")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: linux-wireless@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210821020901.25901-1-rdunlap@infradead.org
|
|
If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Include pml5_root in the set of special roots if and only if the host,
and thus NPT, is using 5-level paging. mmu_alloc_special_roots() expects
special roots to be allocated as a bundle, i.e. they're either all valid
or all NULL. But for pml5_root, that expectation only holds true if the
host uses 5-level paging, which causes KVM to WARN about pml5_root being
NULL when the other special roots are valid.
The silver lining of 4-level vs. 5-level NPT being tied to the host
kernel's paging level is that KVM's shadow root level is constant; unlike
VMX's EPT, KVM can't choose 4-level NPT based on guest.MAXPHYADDR. That
means KVM can still expect pml5_root to be bundled with the other special
roots, it just needs to be conditioned on the shadow root level.
Fixes: cb0f722aff6e ("KVM: x86/mmu: Support shadowing NPT when 5-level paging is enabled in host")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210824005824.205536-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Introduce cpufreq HW driver which can support
CPU frequency adjust in MT6779 platform.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: Massaged the patch and cleaned some stuff. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Add of_perf_domain_get_sharing_cpumask function to group cpu
to specific performance domain.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: create separate routine parse_perf_domain() and always set the
cpumask. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Add devicetree bindings for MediaTek HW driver.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
CRU is a block used in e.g. Northstar devices. It can be seen in the
bcm5301x.dtsi and this binding documents its proper usage.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
|
|
Update the MAINTAINERS file email address.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The function nmve_mpath_clear_current_path returns true if the current
path has changed. In this case we have to wait for all concurrent
submissions to finish. But if we didn't change the current path, there
is no point in waiting for another RCU period to finish.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently the connection between host and NVMe-oF target gets
disconnected by keep-alive timeout when a user connects to a target
with a relatively large kato value and then sets the smaller kato
with a set features command (e.g. connects with 60 seconds kato value
and then sets 10 seconds kato value).
The cause is that keep alive command interval on the host, which is
defined as unsigned int kato in nvme_ctrl structure, does not follow
the kato value changes.
This patch updates the keep alive interval in the following steps when
the kato is modified by a set features command: stops the keep alive
work queue, then sets the kato as new timer value and re-start the queue.
Signed-off-by: Tatsuya Sasaki <tatsuya6.sasaki@kioxia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The spec says
7.4.6.1 Digest Error handling
When a host detects a data digest error in a C2HData PDU, that host
shall continue processing C2HData PDUs associated with the command and
when the command processing has completed, if a successful status was
returned by the controller, the host shall fail the command with a
non-fatal transport error.
Currently the transport is reseted when a data digest error is
detected. Instead, when a digest error is detected, mark the final
status as NVME_SC_DATA_XFER_ERROR and let the upper layer handle
the error.
In order to keep track of the final result maintain a status field in
nvme_tcp_request object and use it to overwrite the completion queue
status (which might be successful even though a digest error has been
detected) when completing the request.
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The serial number is copied into the buffer via memcpy_and_pad()
with the length NVMET_SN_MAX_SIZE. So when printing out we also
need to take just that length as anything beyond that will be
uninitialized.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The target core code never needs the host-side nvme_ctrl structure.
Open code two uses of nvmet_is_passthru_req in passthru.c, and then
switch the helpers used by the core to return bool. Also rename the
fuctions to better match their usage:
nvmet_passthru_ctrl -> nvmet_is_passthru_subsys
nvmet_req_passthru_ctrl -> nvmet_is_passthru_req
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
|
For a passthru controller make cap initialization dependent on the cap of
the passthru controller, given that multiple Command Set support needs
to be supported by the underlying controller. For that move the
initialization of CAP later so that it can use the fully initialized
nvmet_ctrl structure.
Fixes: ab5d0b38c047 (nvmet: add Command Set Identifier support)
Signed-off-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[hch: refactored the code a bit to keep it more contained in passthru.c]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Preparatory patch in order to reuse nvme_multi_css in the nvme target
code.
Signed-off-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When triggering a rescan due to a namespace resize we will be
receiving AENs on every controller, triggering a rescan of all
attached namespaces. If multipath is active only the current path and
the ns_head disk will be updated, the other paths will still refer to
the old size until AENs for the remaining controllers are received.
If I/O comes in before that it might be routed to one of the old
paths, triggering an I/O failure with 'access beyond end of device'.
With this patch the old paths are skipped from multipath path
selection until the controller serving these paths has been rescanned.
Signed-off-by: Hannes Reinecke <hare@suse.de>
[dwagner: - introduce NVME_NS_READY flag instead of NVME_NS_INVALIDATE
- use 'revalidate' instead of 'invalidate' which
follows the zoned device code path.
- clear NVME_NS_READY before clearing current_path]
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The nvme multipathing code just dispatches bios to one of the blk-mq
based paths and never blocks on its own, so set QUEUE_FLAG_NOWAIT
to support REQ_NOWAIT bios.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|