summaryrefslogtreecommitdiff
path: root/drivers/vhost/vdpa.c
AgeCommit message (Collapse)Author
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09vhost_vdpa: support PACKED when setting-getting vring_baseShannon Nelson
Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-06-08vhost_vdpa: tell vqs about the negotiatedShannon Nelson
As is done in the net, iscsi, and vsock vhost support, let the vdpa vqs know about the features that have been negotiated. This allows vhost to more safely make decisions based on the features, such as when using PACKED vs split queues. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230424225031.18947-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "virtio,vhost,vdpa: features, fixes, and cleanups: - reduction in interrupt rate in virtio - perf improvement for VDUSE - scalability for vhost-scsi - non power of 2 ring support for packed rings - better management for mlx5 vdpa - suspend for snet - VIRTIO_F_NOTIFICATION_DATA - shared backend with vdpa-sim-blk - user VA support in vdpa-sim - better struct packing for virtio and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits) vhost_vdpa: fix unmap process in no-batch mode MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS tools/virtio: fix build caused by virtio_ring changes virtio_ring: add a struct device forward declaration vdpa_sim_blk: support shared backend vdpa_sim: move buffer allocation in the devices vdpa/snet: use likely/unlikely macros in hot functions vdpa/snet: implement kick_vq_with_data callback virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support virtio: add VIRTIO_F_NOTIFICATION_DATA feature support vdpa/snet: support the suspend vDPA callback vdpa/snet: support getting and setting VQ state MAINTAINERS: add vringh.h to Virtio Core and Net Drivers vringh: address kdoc warnings vdpa: address kdoc warnings virtio_ring: don't update event idx on get_buf vdpa_sim: add support for user VA vdpa_sim: replace the spinlock with a mutex to protect the state vdpa_sim: use kthread worker vdpa_sim: make devices agnostic for work management ...
2023-04-21vhost_vdpa: fix unmap process in no-batch modeCindy Lu
While using the vdpa device with vIOMMU enabled in the guest VM, when the vdpa device bind to vfio-pci and run testpmd then system will fail to unmap. The test process is Load guest VM --> attach to virtio driver--> bind to vfio-pci driver So the mapping process is 1)batched mode map to normal MR 2)batched mode unmapped the normal MR 3)unmapped all the memory 4)mapped to iommu MR This error happened in step 3). The iotlb was freed in step 2) and the function vhost_vdpa_process_iotlb_msg will return fail Which causes failure. To fix this, we will not remove the AS while the iotlb->nmaps is 0. This will free in the vhost_vdpa_clean Cc: stable@vger.kernel.org Fixes: aaca8373c4b1 ("vhost-vdpa: support ASID based IOTLB API") Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20230420151734.860168-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vhost-vdpa: use bind_mm/unbind_mm device callbacksStefano Garzarella
When the user call VHOST_SET_OWNER ioctl and the vDPA device has `use_va` set to true, let's call the bind_mm callback. In this way we can bind the device to the user address space and directly use the user VA. The unbind_mm callback is called during the release after stopping the device. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20230404131326.44403-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vdpa: Add eventfd for the vdpa callbackXie Yongji
Add eventfd for the vdpa callback so that user can signal it directly instead of triggering the callback. It will be used for vhost-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-9-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-03Merge 6.3-rc5 into driver-core-nextGreg Kroah-Hartman
We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-23vhost-vdpa: vhost_vdpa_alloc_domain() should be using a const struct bus_type *Greg Kroah-Hartman
The function, vhost_vdpa_alloc_domain(), has a pointer to a struct bus_type, but it should be constant as the function it passes it to expects it to be const, and the vhost code does not modify it in any way. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/r/20230313182918.1312597-31-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-13vhost-vdpa: free iommu domain after last use during cleanupGautam Dawar
Currently vhost_vdpa_cleanup() unmaps the DMA mappings by calling `iommu_unmap(v->domain, map->start, map->size);` from vhost_vdpa_general_unmap() when the parent vDPA driver doesn't provide DMA config operations. However, the IOMMU domain referred to by `v->domain` is freed in vhost_vdpa_free_domain() before vhost_vdpa_cleanup() in vhost_vdpa_release() which results in NULL pointer de-reference. Accordingly, moving the call to vhost_vdpa_free_domain() in vhost_vdpa_cleanup() would makes sense. This will also help detaching the dma device in error handling of vhost_vdpa_alloc_domain(). This issue was observed on terminating QEMU with SIGQUIT. Fixes: 037d4305569a ("vhost-vdpa: call vhost_vdpa_cleanup during the release") Signed-off-by: Gautam Dawar <gautam.dawar@amd.com> Message-Id: <20230301163203.29883-1-gautam.dawar@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-25Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
2023-02-24Merge tag 'iommu-updates-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Consolidate iommu_map/unmap functions. There have been blocking and atomic variants so far, but that was problematic as this approach does not scale with required new variants which just differ in the GFP flags used. So Jason consolidated this back into single functions that take a GFP parameter. - Retire the detach_dev() call-back in iommu_ops - Arm SMMU updates from Will: - Device-tree binding updates: - Cater for three power domains on SM6375 - Document existing compatible strings for Qualcomm SoCs - Tighten up clocks description for platform-specific compatible strings - Enable Qualcomm workarounds for some additional platforms that need them - Intel VT-d updates from Lu Baolu: - Add Intel IOMMU performance monitoring support - Set No Execute Enable bit in PASID table entry - Two performance optimizations - Fix PASID directory pointer coherency - Fix missed rollbacks in error path - Cleanups - Apple t8110 DART support - Exynos IOMMU: - Implement better fault handling - Error handling fixes - Renesas IPMMU: - Add device tree bindings for r8a779g0 - AMD IOMMU: - Various fixes for handling on SNP-enabled systems and handling of faults with unknown request-ids - Cleanups and other small fixes - Various other smaller fixes and cleanups * tag 'iommu-updates-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (71 commits) iommu/amd: Skip attach device domain is same as new domain iommu: Attach device group to old domain in error path iommu/vt-d: Allow to use flush-queue when first level is default iommu/vt-d: Fix PASID directory pointer coherency iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode iommu/vt-d: Fix error handling in sva enable/disable paths iommu/amd: Improve page fault error reporting iommu/amd: Do not identity map v2 capable device when snp is enabled iommu: Fix error unwind in iommu_group_alloc() iommu/of: mark an unused function as __maybe_unused iommu: dart: DART_T8110_ERROR range should be 0 to 5 iommu/vt-d: Enable IOMMU perfmon support iommu/vt-d: Add IOMMU perfmon overflow handler support iommu/vt-d: Support cpumask for IOMMU perfmon iommu/vt-d: Add IOMMU perfmon support iommu/vt-d: Support Enhanced Command Interface iommu/vt-d: Retrieve IOMMU perfmon capability information iommu/vt-d: Support size of the register set in DRHD iommu/vt-d: Set No Execute Enable bit in PASID table entry iommu/vt-d: Remove sva from intel_svm_dev ...
2023-02-20vhost-vdpa: print warning when vhost_vdpa_alloc_domain failsAlvaro Karsz
Add a print explaining why vhost_vdpa_alloc_domain failed if the device is not IOMMU cache coherent capable. Without this print, we have no hint why the operation failed. For example: $ virsh start <domain> error: Failed to start domain <domain> error: Unable to open '/dev/vhost-vdpa-<idx>' for vdpa device: Unknown error 524 Suggested-by: Eugenio Perez Martin <eperezma@redhat.com> Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230202084212.1328530-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez Martin <eperezma@redhat.com>
2023-02-20vhost-vdpa: uAPI to resume the deviceSebastien Boeuf
This new ioctl adds support for resuming the device from userspace. This is required when trying to restore the device in a functioning state after it's been suspended. It is already possible to reset a suspended device, but that means the device must be reconfigured and all the IOMMU/IOTLB mappings must be recreated. This new operation allows the device to be resumed without going through a full reset. This is particularly useful when trying to perform offline migration of a virtual machine (also known as snapshot/restore) as it allows the VMM to resume the virtual machine back to a running state after the snapshot is performed. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Message-Id: <73b75fb87d25cff59768b4955a81fe7ffe5b4770.1672742878.git.sebastien.boeuf@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-20vhost-vdpa: Introduce RESUME backend feature bitSebastien Boeuf
Userspace knows if the device can be resumed or not by checking this feature bit. It's only exposed if the vdpa driver backend implements the resume() operation callback. Userspace trying to negotiate this feature when it hasn't been exposed will result in an error. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com> Message-Id: <b18db236ba3d990cdb41278eb4703be9201d9514.1672742878.git.sebastien.boeuf@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-25iommu: Add a gfp parameter to iommu_map()Jason Gunthorpe
The internal mechanisms support this, but instead of exposting the gfp to the caller it wrappers it into iommu_map() and iommu_map_atomic() Fix this instead of adding more variants for GFP_KERNEL_ACCOUNT. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/1-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-28vhost_vdpa: fix the crash in unmap a large memoryCindy Lu
While testing in vIOMMU, sometimes Guest will unmap very large memory, which will cause the crash. To fix this, add a new function vhost_vdpa_general_unmap(). This function will only unmap the memory that saved in iotlb. Call Trace: [ 647.820144] ------------[ cut here ]------------ [ 647.820848] kernel BUG at drivers/iommu/intel/iommu.c:1174! [ 647.821486] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 647.822082] CPU: 10 PID: 1181 Comm: qemu-system-x86 Not tainted 6.0.0-rc1home_lulu_2452_lulu7_vhost+ #62 [ 647.823139] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-29-g6a62e0cb0dfe-prebuilt.qem4 [ 647.824365] RIP: 0010:domain_unmap+0x48/0x110 [ 647.825424] Code: 48 89 fb 8d 4c f6 1e 39 c1 0f 4f c8 83 e9 0c 83 f9 3f 7f 18 48 89 e8 48 d3 e8 48 85 c0 75 59 [ 647.828064] RSP: 0018:ffffae5340c0bbf0 EFLAGS: 00010202 [ 647.828973] RAX: 0000000000000001 RBX: ffff921793d10540 RCX: 000000000000001b [ 647.830083] RDX: 00000000080000ff RSI: 0000000000000001 RDI: ffff921793d10540 [ 647.831214] RBP: 0000000007fc0100 R08: ffffae5340c0bcd0 R09: 0000000000000003 [ 647.832388] R10: 0000007fc0100000 R11: 0000000000100000 R12: 00000000080000ff [ 647.833668] R13: ffffae5340c0bcd0 R14: ffff921793d10590 R15: 0000008000100000 [ 647.834782] FS: 00007f772ec90640(0000) GS:ffff921ce7a80000(0000) knlGS:0000000000000000 [ 647.836004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 647.836990] CR2: 00007f02c27a3a20 CR3: 0000000101b0c006 CR4: 0000000000372ee0 [ 647.838107] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 647.839283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 647.840666] Call Trace: [ 647.841437] <TASK> [ 647.842107] intel_iommu_unmap_pages+0x93/0x140 [ 647.843112] __iommu_unmap+0x91/0x1b0 [ 647.844003] iommu_unmap+0x6a/0x95 [ 647.844885] vhost_vdpa_unmap+0x1de/0x1f0 [vhost_vdpa] [ 647.845985] vhost_vdpa_process_iotlb_msg+0xf0/0x90b [vhost_vdpa] [ 647.847235] ? _raw_spin_unlock+0x15/0x30 [ 647.848181] ? _copy_from_iter+0x8c/0x580 [ 647.849137] vhost_chr_write_iter+0xb3/0x430 [vhost] [ 647.850126] vfs_write+0x1e4/0x3a0 [ 647.850897] ksys_write+0x53/0xd0 [ 647.851688] do_syscall_64+0x3a/0x90 [ 647.852508] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 647.853457] RIP: 0033:0x7f7734ef9f4f [ 647.854408] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 76 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c8 [ 647.857217] RSP: 002b:00007f772ec8f040 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 [ 647.858486] RAX: ffffffffffffffda RBX: 00000000fef00000 RCX: 00007f7734ef9f4f [ 647.859713] RDX: 0000000000000048 RSI: 00007f772ec8f090 RDI: 0000000000000010 [ 647.860942] RBP: 00007f772ec8f1a0 R08: 0000000000000000 R09: 0000000000000000 [ 647.862206] R10: 0000000000000001 R11: 0000000000000293 R12: 0000000000000010 [ 647.863446] R13: 0000000000000002 R14: 0000000000000000 R15: ffffffff01100000 [ 647.864692] </TASK> [ 647.865458] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs v] [ 647.874688] ---[ end trace 0000000000000000 ]--- Cc: stable@vger.kernel.org Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20221219073331.556140-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-12-28vhost-vdpa: fix an iotlb memory leakStefano Garzarella
Before commit 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") we called vhost_vdpa_iotlb_unmap(v, iotlb, 0ULL, 0ULL - 1) during release to free all the resources allocated when processing user IOTLB messages through vhost_vdpa_process_iotlb_update(). That commit changed the handling of IOTLB a bit, and we accidentally removed some code called during the release. We partially fixed this with commit 037d4305569a ("vhost-vdpa: call vhost_vdpa_cleanup during the release") but a potential memory leak is still there as showed by kmemleak if the application does not send VHOST_IOTLB_INVALIDATE or crashes: unreferenced object 0xffff888007fbaa30 (size 16): comm "blkio-bench", pid 914, jiffies 4294993521 (age 885.500s) hex dump (first 16 bytes): 40 73 41 07 80 88 ff ff 00 00 00 00 00 00 00 00 @sA............. backtrace: [<0000000087736d2a>] kmem_cache_alloc_trace+0x142/0x1c0 [<0000000060740f50>] vhost_vdpa_process_iotlb_msg+0x68c/0x901 [vhost_vdpa] [<0000000083e8e205>] vhost_chr_write_iter+0xc0/0x4a0 [vhost] [<000000008f2f414a>] vhost_vdpa_chr_write_iter+0x18/0x20 [vhost_vdpa] [<00000000de1cd4a0>] vfs_write+0x216/0x4b0 [<00000000a2850200>] ksys_write+0x71/0xf0 [<00000000de8e720b>] __x64_sys_write+0x19/0x20 [<0000000018b12cbb>] do_syscall_64+0x3f/0x90 [<00000000986ec465>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Let's fix this calling vhost_vdpa_iotlb_unmap() on the whole range in vhost_vdpa_remove_as(). We move that call before vhost_dev_cleanup() since we need a valid v->vdev.mm in vhost_vdpa_pa_unmap(). vhost_iotlb_reset() call can be removed, since vhost_vdpa_iotlb_unmap() on the whole range removes all the entries. The kmemleak log reported was observed with a vDPA device that has `use_va` set to true (e.g. VDUSE). This patch has been tested with both types of devices. Fixes: 037d4305569a ("vhost-vdpa: call vhost_vdpa_cleanup during the release") Fixes: 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109154213.146789-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vhost-vdpa: uAPI to suspend the deviceEugenio Pérez
The ioctl adds support for suspending the device from userspace. This is a must before getting virtqueue indexes (base) for live migration, since the device could modify them after userland gets them. There are individual ways to perform that action for some devices (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no way to perform it for any vhost device (and, in particular, vhost-vdpa). After a successful return of the ioctl call the device must not process more virtqueue descriptors. The device can answer to read or writes of config fields as if it were not suspended. In particular, writing to "queue_enable" with a value of 1 will not make the device start processing buffers of the virtqueue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost-vdpa: introduce SUSPEND backend feature bitEugenio Pérez
Userland knows if it can suspend the device or not by checking this feature bit. It's only offered if the vdpa driver backend implements the suspend() operation callback, and to offer it or userland to ack it if the backend does not offer that callback is an error. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-3-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost-vdpa: Call ida_simple_remove() when failedBo Liu
In function vhost_vdpa_probe(), when code execution fails, we should call ida_simple_remove() to free ida. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220805091254.20026-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa: Use device_iommu_capable()Robin Murphy
Use the new interface to check the capability for our device specifically. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Message-Id: <548e316fa282ce513fabb991a4c4d92258062eb5.1654688822.git.robin.murphy@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-27vhost-vdpa: call vhost_vdpa_cleanup during the releaseStefano Garzarella
Before commit 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") we call vhost_vdpa_iotlb_free() during the release to clean all regions mapped in the iotlb. That commit removed vhost_vdpa_iotlb_free() and added vhost_vdpa_cleanup() to do some cleanup, including deleting all mappings, but we forgot to call it in vhost_vdpa_release(). This causes that if an application does not remove all mappings explicitly (or it crashes), the mappings remain in the iotlb and subsequent applications may fail if they map the same addresses. Calling vhost_vdpa_cleanup() also fixes a memory leak since we are not freeing `v->vdev.vqs` during the release from the same commit. Since vhost_vdpa_cleanup() calls vhost_dev_cleanup() we can remove its call from vhost_vdpa_release(). Fixes: 3d5698793897 ("vhost-vdpa: introduce asid based IOTLB") Cc: gautam.dawar@xilinx.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220622151407.51232-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-09vdpa: make get_vq_group and set_group_asid optionalJason Wang
This patch makes get_vq_group and set_group_asid optional. This is needed to unbreak the vDPA parent that doesn't support multiple address spaces. Cc: Gautam Dawar <gautam.dawar@xilinx.com> Fixes: aaca8373c4b1 ("vhost-vdpa: support ASID based IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220609041901.2029-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: return -EFAULT on copy_to_user() failureDan Carpenter
The copy_to_user() function returns the number of bytes remaining to be copied. However, we need to return a negative error code, -EFAULT, to the user. Fixes: 87f4c217413a ("vhost-vdpa: introduce uAPI to get the number of virtqueue groups") Fixes: e96ef636f154 ("vhost-vdpa: introduce uAPI to get the number of address spaces") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Message-Id: <YotG1vXKXXSayr63@kili> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-05-31vhost-vdpa: support ASID based IOTLB APIGautam Dawar
This patch extends the vhost-vdpa to support ASID based IOTLB API. The vhost-vdpa device will allocated multiple IOTLBs for vDPA device that supports multiple address spaces. The IOTLBs and vDPA device memory mappings is determined and maintained through ASID. Note that we still don't support vDPA device with more than one address spaces that depends on platform IOMMU. This work will be done by moving the IOMMU logic from vhost-vDPA to vDPA device driver. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-16-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Includes fixup: vhost-vdpa: Fix some error handling path in vhost_vdpa_process_iotlb_msg() In the error paths introduced by the original patch, a mutex may be left locked. Add the correct goto instead of a direct return. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Message-Id: <89ef0ae4c26ac3cfa440c71e97e392dcb328ac1b.1653227924.git.christophe.jaillet@wanadoo.fr> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: introduce uAPI to set group ASIDGautam Dawar
Follows the vDPA support for associating ASID to a specific virtqueue group. This patch adds a uAPI to support setting them from userspace. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-15-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: uAPI to get virtqueue group idGautam Dawar
Follows the support for virtqueue group in vDPA. This patches introduces uAPI to get the virtqueue group ID for a specific virtqueue in vhost-vdpa. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-14-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: introduce uAPI to get the number of address spacesGautam Dawar
This patch introduces the uAPI for getting the number of address spaces supported by this vDPA device. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-13-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: introduce uAPI to get the number of virtqueue groupsGautam Dawar
Follows the vDPA support for multiple address spaces, this patch introduce uAPI for the userspace to know the number of virtqueue groups supported by the vDPA device. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-12-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: introduce asid based IOTLBGautam Dawar
This patch converts the vhost-vDPA device to support multiple IOTLBs tagged via ASID via hlist. This will be used for supporting multiple address spaces in the following patches. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-11-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost: support ASID in IOTLB APIGautam Dawar
This patches allows userspace to send ASID based IOTLB message to vhost. This idea is to use the reserved u32 field in the existing V2 IOTLB message. Vhost device should advertise this capability via VHOST_BACKEND_F_IOTLB_ASID backend feature. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-10-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa: multiple address spaces supportGautam Dawar
This patches introduces the multiple address spaces support for vDPA device. This idea is to identify a specific address space via an dedicated identifier - ASID. During vDPA device allocation, vDPA device driver needs to report the number of address spaces supported by the device then the DMA mapping ops of the vDPA device needs to be extended to support ASID. This helps to isolate the environments for the virtqueue that will not be assigned directly. E.g in the case of virtio-net, the control virtqueue will not be assigned directly to guest. As a start, simply claim 1 virtqueue groups and 1 address spaces for all vDPA devices. And vhost-vDPA will simply reject the device with more than 1 virtqueue groups or address spaces. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-7-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: switch to use vhost-vdpa specific IOTLBGautam Dawar
To ease the implementation of per group ASID support for vDPA device. This patch switches to use a vhost-vdpa specific IOTLB to avoid the unnecessary refactoring of the vhost core. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-5-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vhost-vdpa: passing iotlb to IOMMU mapping helpersGautam Dawar
To prepare for the ASID support for vhost-vdpa, try to pass IOTLB object to dma helpers. No functional changes, it's just a preparation for support multiple IOTLBs. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-4-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28vdpa: support exposing the count of vqs to userspaceLongpeng
- GET_VQS_COUNT: the count of virtqueues that exposed Signed-off-by: Longpeng <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220315032553.455-4-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Longpeng &lt;<a href="mailto:longpeng2@huawei.com" target="_blank">longpeng2@huawei.com</a>&gt;<br> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28vdpa: change the type of nvqs to u32Longpeng
Change vdpa_device.nvqs and vhost_vdpa.nvqs to use u32 Signed-off-by: Longpeng <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220315032553.455-3-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Longpeng &lt;<a href="mailto:longpeng2@huawei.com" target="_blank">longpeng2@huawei.com</a>&gt;<br></blockquote><div><br></div><div>Acked-by: Jason Wang &lt;<a href="mailto:jasowang@redhat.com">jasowang@redhat.com</a>&gt;</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28vdpa: support exposing the config size to userspaceLongpeng
- GET_CONFIG_SIZE: return the size of the virtio config space. The size contains the fields which are conditional on feature bits. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Longpeng <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220315032553.455-2-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-03-28vhost_vdpa: don't setup irq offloading when irq_num < 0Zhu Lingshan
When irq number is negative(e.g., -EINVAL), the virtqueue may be disabled or the virtqueues are sharing a device irq. In such case, we should not setup irq offloading for a virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04vdpa: factor out vdpa_set_features_unlocked for vdpa internal useSi-Wei Liu
No functional change introduced. vdpa bus driver such as virtio_vdpa or vhost_vdpa is not supposed to take care of the locking for core by its own. The locked API vdpa_set_features should suffice the bus driver's need. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14vdpa: Protect vdpa reset with cf_mutexEli Cohen
Call reset using the wrapper function vdpa_reset() to make sure the operation is serialized with cf_mutex. This comes to protect from the following possible scenario: vhost_vdpa_set_status() could call the reset op. Since the call is not protected by cf_mutex, a netlink thread calling vdpa_dev_config_fill could get passed the VIRTIO_CONFIG_S_FEATURES_OK check in vdpa_dev_config_fill() and end up reporting wrong features. Fixes: 5f6e85953d8f ("vdpa: Read device configuration only if FEATURES_OK") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220111183400.38418-3-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14vdpa: Avoid taking cf_mutex lock on get statusEli Cohen
Avoid the wrapper holding cf_mutex since it is not protecting anything. To avoid confusion and unnecessary overhead incurred by it, remove. Fixes: f489f27bc0ab ("vdpa: Sync calls set/get config/status with cf_mutex") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220111183400.38418-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14vdpa: Allow to configure max data virtqueuesEli Cohen
Add netlink support to configure the max virtqueue pairs for a device. At least one pair is required. The maximum is dictated by the device. Example: $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4 Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-6-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Sync calls set/get config/status with cf_mutexEli Cohen
Add wrappers to get/set status and protect these operations with cf_mutex to serialize these operations with respect to get/set config operations. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Provide interface to read driver featuresEli Cohen
Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: clean up get_config_size ret value handlingLaura Abbott
The return type of get_config_size is size_t so it makes sense to change the type of the variable holding its result. That said, this already got taken care of (differently, and arguably not as well) by commit 3ed21c1451a1 ("vdpa: check that offsets are within bounds"). The added 'c->off > size' test in that commit will be done as an unsigned comparison on 32-bit (safe due to not being signed). On a 64-bit platform, it will be done as a signed comparison, but in that case the comparison will be done in 64-bit, and 'c->off' being an u32 it will be valid thanks to the extended range (ie both values will be positive in 64 bits). So this was a real bug, but it was already addressed and marked for stable. Signed-off-by: Laura Abbott <labbott@kernel.org> Reported-by: Luo Likang <luolikang@nsfocus.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Avoid duplicate call to vp_vdpa get_statusEugenio Pérez
It has no sense to call get_status twice, since we already have a variable for that. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Link: https://lore.kernel.org/r/20211104195833.2089796-1-eperezma@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-12-08vdpa: check that offsets are within boundsDan Carpenter
In this function "c->off" is a u32 and "size" is a long. On 64bit systems if "c->off" is greater than "size" then "size - c->off" is a negative and we always return -E2BIG. But on 32bit systems the subtraction is type promoted to a high positive u32 value and basically any "c->len" is accepted. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Reported-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211208103337.GA4047@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org