summaryrefslogtreecommitdiff
path: root/drivers/vdpa/vdpa_user/vduse_dev.c
AgeCommit message (Collapse)Author
2025-02-25vduse: add virtio_fs to allowed dev idEugenio Pérez
A VDUSE device that implements virtiofs device works fine just by adding the device id to the whitelist. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20250121103346.1030165-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-03-19vduse: enable Virtio-net device typeMaxime Coquelin
This patch adds Virtio-net device type to the supported devices types. Initialization fails if the device does not support VIRTIO_F_VERSION_1 feature, in order to guarantee the configuration space is read-only. It also fails with -EPERM if the CAP_NET_ADMIN is missing. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240109111025.1320976-4-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2024-03-19vduse: Temporarily fail if control queue feature requestedMaxime Coquelin
Virtio-net driver control queue implementation is not safe when used with VDUSE. If the VDUSE application does not reply to control queue messages, it currently ends up hanging the kernel thread sending this command. Some work is on-going to make the control queue implementation robust with VDUSE. Until it is completed, let's fail features check if control-queue feature is requested. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240109111025.1320976-3-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-03-19vduse: validate block features only with block devicesMaxime Coquelin
This patch is preliminary work to enable network device type support to VDUSE. As VIRTIO_BLK_F_CONFIG_WCE shares the same value as VIRTIO_NET_F_HOST_TSO4, we need to restrict its check to Virtio-blk device type. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240109111025.1320976-2-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vduse: implement vdpa_config_ops.get_vq_size for vduseZhu Lingshan
This commit implements get_vq_size for vdpa_config_ops. This new interface is used to report per vq size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vduse: implement DMA sync callbacksMaxime Coquelin
Since commit 295525e29a5b ("virtio_net: merge dma operations when filling mergeable buffers"), VDUSE device require support for DMA's .sync_single_for_cpu() operation as the memory is non-coherent between the device and CPU because of the use of a bounce buffer. This patch implements both .sync_single_for_cpu() and .sync_single_for_device() callbacks, and also skip bounce buffer copies during DMA map and unmap operations if the DMA_ATTR_SKIP_CPU_SYNC attribute is set to avoid extra copies of the same buffer. Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20240219170606.587290-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-21Merge branch 'vfs.file'Christian Brauner
Bring in the changes to the file infrastructure for this cycle. Mostly cleanups and some performance tweaks. * file: remove __receive_fd() * file: stop exposing receive_fd_user() * fs: replace f_rcuhead with f_task_work * file: remove pointless wrapper * file: s/close_fd_get_file()/file_close_fd()/g * Improve __fget_files_rcu() code generation (and thus __fget_light()) * file: massage cleanup of files that failed to open Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-12file: remove __receive_fd()Christian Brauner
Honestly, there's little value in having a helper with and without that int __user *ufd argument. It's just messy and doesn't really give us anything. Just expose receive_fd() with that argument and get rid of that helper. Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-5-e73ca6f4ea83@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-28eventfd: simplify eventfd_signal()Christian Brauner
Ever since the eventfd type was introduced back in 2007 in commit e1ad7468c77d ("signal/timer/event: eventfd core") the eventfd_signal() function only ever passed 1 as a value for @n. There's no point in keeping that additional argument. Link: https://lore.kernel.org/r/20231122-vfs-eventfd-signal-v2-2-bd549b14ce0c@kernel.org Acked-by: Xu Yilun <yilun.xu@intel.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> # ocxl Acked-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-11-01vduse: make vduse_class constantGreg Kroah-Hartman
Now that the driver core allows for struct class to be in read-only memory, we should make all 'class' structures declared at build time placing them into read-only memory, instead of having to be dynamically allocated at runtime. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <2023100643-tricolor-citizen-6c2d@gregkh> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10vduse: Use proper spinlock for IRQ injectionMaxime Coquelin
The IRQ injection work used spin_lock_irq() to protect the scheduling of the softirq, but spin_lock_bh() should be used. With spin_lock_irq(), we noticed delay of more than 6 seconds between the time a NAPI polling work is scheduled and the time it is executed. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: xieyongji@bytedance.com Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-07-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - resume support in vdpa/solidrun - structure size optimizations in virtio_pci - new pds_vdpa driver - immediate initialization mechanism for vdpa/ifcvf - interrupt bypass for vdpa/mlx5 - multiple worker support for vhost - viirtio net in Intel F2000X-PL support for vdpa/ifcvf - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) vhost: Make parameter name match of vhost_get_vq_desc() vduse: fix NULL pointer dereference vhost: Allow worker switching while work is queueing vhost_scsi: add support for worker ioctls vhost: allow userspace to create workers vhost: replace single worker pointer with xarray vhost: add helper to parse userspace vring state/file vhost: remove vhost_work_queue vhost_scsi: flush IO vqs then send TMF rsp vhost_scsi: convert to vhost_vq_work_queue vhost_scsi: make SCSI cmd completion per vq vhost_sock: convert to vhost_vq_work_queue vhost: convert poll work to be vq based vhost: take worker or vq for flushing vhost: take worker or vq instead of dev for queueing vhost, vhost_net: add helper to check if vq has work vhost: add vhost_worker pointer to vhost_virtqueue vhost: dynamically allocate vhost_worker vhost: create worker at end of vhost_dev_set_owner virtio_bt: call scheduler when we free unused buffs ...
2023-07-03vduse: fix NULL pointer dereferenceMaxime Coquelin
vduse_vdpa_set_vq_affinity callback can be called with NULL value as cpu_mask when deleting the vduse device. This patch resets virtqueue's IRQ affinity mask value to set all CPUs instead of dereferencing NULL cpu_mask. [ 4760.952149] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 4760.959110] #PF: supervisor read access in kernel mode [ 4760.964247] #PF: error_code(0x0000) - not-present page [ 4760.969385] PGD 0 P4D 0 [ 4760.971927] Oops: 0000 [#1] PREEMPT SMP PTI [ 4760.976112] CPU: 13 PID: 2346 Comm: vdpa Not tainted 6.4.0-rc6+ #4 [ 4760.982291] Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.8.1 06/26/2020 [ 4760.989769] RIP: 0010:memcpy_orig+0xc5/0x130 [ 4760.994049] Code: 16 f8 4c 89 07 4c 89 4f 08 4c 89 54 17 f0 4c 89 5c 17 f8 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 fa 08 72 1b <4c> 8b 06 4c 8b 4c 16 f8 4c 89 07 4c 89 4c 17 f8 c3 cc cc cc cc 66 [ 4761.012793] RSP: 0018:ffffb1d565abb830 EFLAGS: 00010246 [ 4761.018020] RAX: ffff9f4bf6b27898 RBX: ffff9f4be23969c0 RCX: ffff9f4bcadf6400 [ 4761.025152] RDX: 0000000000000008 RSI: 0000000000000000 RDI: ffff9f4bf6b27898 [ 4761.032286] RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000000 [ 4761.039416] R10: 0000000000000000 R11: 0000000000000600 R12: 0000000000000000 [ 4761.046549] R13: 0000000000000000 R14: 0000000000000080 R15: ffffb1d565abbb10 [ 4761.053680] FS: 00007f64c2ec2740(0000) GS:ffff9f635f980000(0000) knlGS:0000000000000000 [ 4761.061765] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4761.067513] CR2: 0000000000000000 CR3: 0000001875270006 CR4: 00000000007706e0 [ 4761.074645] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4761.081775] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4761.088909] PKRU: 55555554 [ 4761.091620] Call Trace: [ 4761.094074] <TASK> [ 4761.096180] ? __die+0x1f/0x70 [ 4761.099238] ? page_fault_oops+0x171/0x4f0 [ 4761.103340] ? exc_page_fault+0x7b/0x180 [ 4761.107265] ? asm_exc_page_fault+0x22/0x30 [ 4761.111460] ? memcpy_orig+0xc5/0x130 [ 4761.115126] vduse_vdpa_set_vq_affinity+0x3e/0x50 [vduse] [ 4761.120533] virtnet_clean_affinity.part.0+0x3d/0x90 [virtio_net] [ 4761.126635] remove_vq_common+0x1a4/0x250 [virtio_net] [ 4761.131781] virtnet_remove+0x5d/0x70 [virtio_net] [ 4761.136580] virtio_dev_remove+0x3a/0x90 [ 4761.140509] device_release_driver_internal+0x19b/0x200 [ 4761.145742] bus_remove_device+0xc2/0x130 [ 4761.149755] device_del+0x158/0x3e0 [ 4761.153245] ? kernfs_find_ns+0x35/0xc0 [ 4761.157086] device_unregister+0x13/0x60 [ 4761.161010] unregister_virtio_device+0x11/0x20 [ 4761.165543] device_release_driver_internal+0x19b/0x200 [ 4761.170770] bus_remove_device+0xc2/0x130 [ 4761.174782] device_del+0x158/0x3e0 [ 4761.178276] ? __pfx_vdpa_name_match+0x10/0x10 [vdpa] [ 4761.183336] device_unregister+0x13/0x60 [ 4761.187260] vdpa_nl_cmd_dev_del_set_doit+0x63/0xe0 [vdpa] Fixes: 28f6288eb63d ("vduse: Support set_vq_affinity callback") Cc: xieyongji@bytedance.com Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230622204851.318125-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-09mm/gup: remove vmas parameter from pin_user_pages()Lorenzo Stoakes
We are now in a position where no caller of pin_user_pages() requires the vmas parameter at all, so eliminate this parameter from the function and all callers. This clears the way to removing the vmas parameter from GUP altogether. Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib] Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian König <christian.koenig@amd.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-08vduse: avoid empty string for dev nameSheng Zhao
Syzkaller hits a kernel WARN when the first character of the dev name provided is NULL. Solution is to add a NULL check before calling cdev_device_add() in vduse_create_dev(). kobject: (0000000072042169): attempted to be registered with empty name! WARNING: CPU: 0 PID: 112695 at lib/kobject.c:236 Call Trace: kobject_add_varg linux/src/lib/kobject.c:390 [inline] kobject_add+0xf6/0x150 linux/src/lib/kobject.c:442 device_add+0x28f/0xc20 linux/src/drivers/base/core.c:2167 cdev_device_add+0x83/0xc0 linux/src/fs/char_dev.c:546 vduse_create_dev linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2254 [inline] vduse_ioctl+0x7b5/0xf30 linux/src/drivers/vdpa/vdpa_user/vduse_dev.c:2316 vfs_ioctl linux/src/fs/ioctl.c:47 [inline] file_ioctl linux/src/fs/ioctl.c:510 [inline] do_vfs_ioctl+0x14b/0xa80 linux/src/fs/ioctl.c:697 ksys_ioctl+0x7c/0xa0 linux/src/fs/ioctl.c:714 __do_sys_ioctl linux/src/fs/ioctl.c:721 [inline] __se_sys_ioctl linux/src/fs/ioctl.c:719 [inline] __x64_sys_ioctl+0x42/0x50 linux/src/fs/ioctl.c:719 do_syscall_64+0x94/0x330 linux/src/arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: "Xie Yongji" <xieyongji@bytedance.com> Reported-by: Xianjun Zeng <zengxianjun@bytedance.com> Signed-off-by: Sheng Zhao <sheng.zhao@bytedance.com> Message-Id: <20230530033626.1266794-1-sheng.zhao@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Cc: "Michael S. Tsirkin"<mst@redhat.com>, "Jason Wang"<jasowang@redhat.com>, Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-04-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "virtio,vhost,vdpa: features, fixes, and cleanups: - reduction in interrupt rate in virtio - perf improvement for VDUSE - scalability for vhost-scsi - non power of 2 ring support for packed rings - better management for mlx5 vdpa - suspend for snet - VIRTIO_F_NOTIFICATION_DATA - shared backend with vdpa-sim-blk - user VA support in vdpa-sim - better struct packing for virtio and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (52 commits) vhost_vdpa: fix unmap process in no-batch mode MAINTAINERS: make me a reviewer of VIRTIO CORE AND NET DRIVERS tools/virtio: fix build caused by virtio_ring changes virtio_ring: add a struct device forward declaration vdpa_sim_blk: support shared backend vdpa_sim: move buffer allocation in the devices vdpa/snet: use likely/unlikely macros in hot functions vdpa/snet: implement kick_vq_with_data callback virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support virtio: add VIRTIO_F_NOTIFICATION_DATA feature support vdpa/snet: support the suspend vDPA callback vdpa/snet: support getting and setting VQ state MAINTAINERS: add vringh.h to Virtio Core and Net Drivers vringh: address kdoc warnings vdpa: address kdoc warnings virtio_ring: don't update event idx on get_buf vdpa_sim: add support for user VA vdpa_sim: replace the spinlock with a mutex to protect the state vdpa_sim: use kthread worker vdpa_sim: make devices agnostic for work management ...
2023-04-21vduse: Support specifying bounce buffer size via sysfsXie Yongji
As discussed in [1], this adds sysfs interface to support specifying bounce buffer size in virtio-vdpa case. It would be a performance tuning parameter for high throughput workloads. [1] https://lore.kernel.org/netdev/e8f25a35-9d45-69f9-795d-bdbbb90337a3@redhat.com/ Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-12-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Delay iova domain creationXie Yongji
Delay creating iova domain until the vduse device is registered to vdpa bus. This is a preparation for adding sysfs interface to support specifying bounce buffer size for the iova domain. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-11-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Signal vq trigger eventfd directly if possibleXie Yongji
Now the vdpa callback will associate an trigger eventfd in some cases. For performance reasons, VDUSE can signal it directly during irq injection. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-10-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-04-21vduse: Add sysfs interface for irq callback affinityXie Yongji
Add sysfs interface for each vduse virtqueue to get/set the affinity for irq callback. This might be useful for performance tuning when the irq callback affinity mask contains more than one CPU. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-8-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support get_vq_affinity callbackXie Yongji
This implements get_vq_affinity callback so that the virtio-blk driver can build the blk-mq queues based on the irq callback affinity. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-7-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Support set_vq_affinity callbackXie Yongji
Since virtio-vdpa bus driver already support interrupt affinity spreading mechanism, let's implement the set_vq_affinity callback to bring it to vduse device. After we get the virtqueue's affinity, we can spread IRQs between CPUs in the affinity mask, in a round-robin manner, to run the irq callback. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20230323053043.35-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-04-21vduse: Refactor allocation for vduse virtqueuesXie Yongji
Allocate memory for vduse virtqueues one by one instead of doing one allocation for all of them. This is a preparation for adding sysfs interface for virtqueues. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230323053043.35-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-03-17driver core: class: remove module * from class_create()Greg Kroah-Hartman
The module pointer in class_create() never actually did anything, and it shouldn't have been requred to be set as a parameter even if it did something. So just remove it and fix up all callers of the function in the kernel tree at the same time. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-28vduse: Validate vq_num in vduse_validate_config()Harshit Mogalapalli
Add a limit to 'config->vq_num' which is user controlled data which comes from an vduse_ioctl to prevent large memory allocations. Micheal says - This limit is somewhat arbitrary. However, currently virtio pci and ccw are limited to a 16 bit vq number. While MMIO isn't it is also isn't used with lots of VQs due to current lack of support for per-vq interrupts. Thus, the 0xffff limit on number of VQs corresponding to a 16-bit VQ number seems sufficient for now. This is found using static analysis with smatch. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Message-Id: <20221128155717.2579992-1-harshit.m.mogalapalli@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-11-24driver core: make struct class.devnode() take a const *Greg Kroah-Hartman
The devnode() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Justin Sanders <justin@coraid.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> Cc: Liam Mark <lmark@codeaurora.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Brian Starkey <Brian.Starkey@arm.com> Cc: John Stultz <jstultz@google.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sean Young <sean@mess.org> Cc: Frank Haverkamp <haver@linux.ibm.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Xie Yongji <xieyongji@bytedance.com> Cc: Gautam Dawar <gautam.dawar@xilinx.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Eli Cohen <elic@nvidia.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: alsa-devel@alsa-project.org Cc: dri-devel@lists.freedesktop.org Cc: kvm@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-block@vger.kernel.org Cc: linux-input@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20221123122523.1332370-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-27vduse: prevent uninitialized memory accessesMaxime Coquelin
If the VDUSE application provides a smaller config space than the driver expects, the driver may use uninitialized memory from the stack. This patch prevents it by initializing the buffer passed by the driver to store the config value. This fix addresses CVE-2022-2308. Cc: stable@vger.kernel.org # v5.15+ Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20220831154923.97809-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-08-11vduse: Support querying information of IOVA regionsXie Yongji
This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to support querying some information of IOVA regions. Now it can be used to query whether the IOVA region supports userspace memory registration. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220803045523.23851-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vduse: Support registering userspace memory for IOVA regionsXie Yongji
Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support registering and de-registering userspace memory for IOVA regions. Now it only supports registering userspace memory for bounce buffer region in virtio-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220803045523.23851-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-06-24vduse: Tie vduse mgmtdev and its deviceParav Pandit
vduse devices are not backed by any real devices such as PCI. Hence it doesn't have any parent device linked to it. Kernel driver model in [1] suggests to avoid an empty device release callback. Hence tie the mgmtdevice object's life cycle to an allocate dummy struct device instead of static one. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/core-api/kobject.rst?h=v5.18-rc7#n284 Signed-off-by: Parav Pandit <parav@nvidia.com> Message-Id: <20220613195223.473966-1-parav@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-06-08vduse: Fix NULL pointer dereference on sysfs accessXie Yongji
The control device has no drvdata. So we will get a NULL pointer dereference when accessing control device's msg_timeout attribute via sysfs: [ 132.841881][ T3644] BUG: kernel NULL pointer dereference, address: 00000000000000f8 [ 132.850619][ T3644] RIP: 0010:msg_timeout_show (drivers/vdpa/vdpa_user/vduse_dev.c:1271) [ 132.869447][ T3644] dev_attr_show (drivers/base/core.c:2094) [ 132.870215][ T3644] sysfs_kf_seq_show (fs/sysfs/file.c:59) [ 132.871164][ T3644] ? device_remove_bin_file (drivers/base/core.c:2088) [ 132.872082][ T3644] kernfs_seq_show (fs/kernfs/file.c:164) [ 132.872838][ T3644] seq_read_iter (fs/seq_file.c:230) [ 132.873578][ T3644] ? __vmalloc_area_node (mm/vmalloc.c:3041) [ 132.874532][ T3644] kernfs_fop_read_iter (fs/kernfs/file.c:238) [ 132.875513][ T3644] __kernel_read (fs/read_write.c:440 (discriminator 1)) [ 132.876319][ T3644] kernel_read (fs/read_write.c:459) [ 132.877129][ T3644] kernel_read_file (fs/kernel_read_file.c:94) [ 132.877978][ T3644] kernel_read_file_from_fd (include/linux/file.h:45 fs/kernel_read_file.c:186) [ 132.879019][ T3644] __do_sys_finit_module (kernel/module.c:4207) [ 132.879930][ T3644] __ia32_sys_finit_module (kernel/module.c:4189) [ 132.880930][ T3644] do_int80_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:132) [ 132.881847][ T3644] entry_INT80_compat (arch/x86/entry/entry_64_compat.S:419) To fix it, don't create the unneeded attribute for control device anymore. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Reported-by: kernel test robot <oliver.sang@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220426073656.229-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa: multiple address spaces supportGautam Dawar
This patches introduces the multiple address spaces support for vDPA device. This idea is to identify a specific address space via an dedicated identifier - ASID. During vDPA device allocation, vDPA device driver needs to report the number of address spaces supported by the device then the DMA mapping ops of the vDPA device needs to be extended to support ASID. This helps to isolate the environments for the virtqueue that will not be assigned directly. E.g in the case of virtio-net, the control virtqueue will not be assigned directly to guest. As a start, simply claim 1 virtqueue groups and 1 address spaces for all vDPA devices. And vhost-vDPA will simply reject the device with more than 1 virtqueue groups or address spaces. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-7-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31vdpa: introduce virtqueue groupsGautam Dawar
This patch introduces virtqueue groups to vDPA device. The virtqueue group is the minimal set of virtqueues that must share an address space. And the address space identifier could only be attached to a specific virtqueue group. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Gautam Dawar <gdawar@xilinx.com> Message-Id: <20220330180436.24644-6-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Provide interface to read driver featuresEli Cohen
Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vduse: moving kvfree into callerGuanjun
This free action should be moved into caller 'vduse_ioctl' in concert with the allocation. No functional change. Signed-off-by: Guanjun <guanjun@linux.alibaba.com> Link: https://lore.kernel.org/r/1638780498-55571-1-git-send-email-guanjun@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-12-08vduse: check that offset is within bounds in get_config()Dan Carpenter
This condition checks "len" but it does not check "offset" and that could result in an out of bounds read if "offset > dev->config_size". The problem is that since both variables are unsigned the "dev->config_size - offset" subtraction would result in a very high unsigned value. I think these checks might not be necessary because "len" and "offset" are supposed to already have been validated using the vhost_vdpa_config_validate() function. But I do not know the code perfectly, and I like to be safe. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211208150956.GA29160@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org
2021-12-08vduse: fix memory corruption in vduse_dev_ioctl()Dan Carpenter
The "config.offset" comes from the user. There needs to a check to prevent it being out of bounds. The "config.offset" and "dev->config_size" variables are both type u32. So if the offset if out of bounds then the "dev->config_size - config.offset" subtraction results in a very high u32 value. The out of bounds offset can result in memory corruption. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211208103307.GA3778@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org
2021-11-01vdpa: Enable user to set mac and mtu of vdpa deviceParav Pandit
$ vdpa dev add name bar mgmtdev vdpasim_net mac 00:11:22:33:44:55 mtu 9000 $ vdpa dev config show bar: mac 00:11:22:33:44:55 link up link_announce false mtu 9000 $ vdpa dev config show -jp { "config": { "bar": { "mac": "00:11:22:33:44:55", "link ": "up", "link_announce ": false, "mtu": 9000, } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-5-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-10-22vduse: Fix race condition between resetting and irq injectingXie Yongji
The interrupt might be triggered after a reset since there is no synchronization between resetting and irq injecting. And it might break something if the interrupt is delayed until a new round of device initialization. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210929083050.88-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-22vduse: Disallow injecting interrupt before DRIVER_OK is setXie Yongji
The interrupt callback should not be triggered before DRIVER_OK is set. Otherwise, it might break the virtio device driver. So let's add a check to avoid the unexpected behavior. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210923075722.98-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-14vduse: Cleanup the old kernel states after reset failureXie Yongji
We should cleanup the old kernel states e.g. interrupt callback no matter whether the userspace handle the reset correctly or not since virtio-vdpa can't handle the reset failure now. Otherwise, the old state might be used after reset which might break something, e.g. the old interrupt callback might be triggered by userspace after reset, which can break the virtio device driver. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210906142158.181-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-09-14vduse: missing error code in vduse_init()Dan Carpenter
This should return -ENOMEM if alloc_workqueue() fails. Currently it returns success. Fixes: b66219796563 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20210907073223.GA18254@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-09-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...
2021-09-06Documentation: Add documentation for VDUSEXie Yongji
VDUSE (vDPA Device in Userspace) is a framework to support implementing software-emulated vDPA devices in userspace. This document is intended to clarify the VDUSE design and usage. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-14-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vduse: Introduce VDUSE - vDPA Device in UserspaceXie Yongji
This VDUSE driver enables implementing software-emulated vDPA devices in userspace. The vDPA device is created by ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device interface (/dev/vduse/$NAME) is exported to userspace for device emulation. In order to make the device emulation more secure, the device's control path is handled in kernel. A message mechnism is introduced to forward some dataplane related control messages to userspace. And in the data path, the DMA buffer will be mapped into userspace address space through different ways depending on the vDPA bus to which the vDPA device is attached. In virtio-vdpa case, the MMU-based software IOTLB is used to achieve that. And in vhost-vdpa case, the DMA buffer is reside in a userspace memory region which can be shared to the VDUSE userspace processs via transferring the shmfd. For more details on VDUSE design and usage, please see the follow-on Documentation commit. NB(mst): when merging this with b542e383d8c0 ("eventfd: Make signal recursion protection a task bit") replace eventfd_signal_count with eventfd_signal_allowed, and drop the previous ("eventfd: Export eventfd_wake_count to modules"). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-13-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>