summaryrefslogtreecommitdiff
path: root/drivers/vfio
AgeCommit message (Collapse)Author
2022-10-04vfio/mdev: consolidate all the name sysfs into the core codeChristoph Hellwig
Every driver just emits a static string, simply add a field to the mdev_type for the driver to fill out or fall back to the sysfs name and provide a standard sysfs show function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-12-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: consolidate all the device_api sysfs into the core codeJason Gunthorpe
Every driver just emits a static string, simply feed it through the ops and provide a standard sysfs show function. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-11-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: remove mtype_get_parent_devChristoph Hellwig
Just open code the dereferences in the only user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-10-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: remove mdev_parent_devChristoph Hellwig
Just open code the dereferences in the only user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-9-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: unexport mdev_bus_typeChristoph Hellwig
mdev_bus_type is only used in mdev.ko now, so unexport it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-8-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: remove mdev_from_devChristoph Hellwig
Just open code it in the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-7-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: simplify mdev_type handlingChristoph Hellwig
Instead of abusing struct attribute_group to control initialization of struct mdev_type, just define the actual attributes in the mdev_driver, allocate the mdev_type structures in the caller and pass them to mdev_register_parent. This allows the caller to use container_of to get at the containing structure and thus significantly simplify the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-6-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: embedd struct mdev_parent in the parent data structureChristoph Hellwig
Simplify mdev_{un}register_device by requiring the caller to pass in a structure allocate as part of the parent device structure. This removes the need for a list of parents and the separate mdev_parent refcount as we can simplify rely on the reference to the parent device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220923092652.100656-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-10-04vfio/mdev: make mdev.h standalone includableChristoph Hellwig
Include <linux/device.h> and <linux/uuid.h> so that users of this headers don't need to do that and remove those includes that aren't needed any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Link: https://lore.kernel.org/r/20220923092652.100656-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Update some log and comment formatsLongfang Liu
1. Modify some annotation information formats to keep the entire driver annotation format consistent. 2. Modify some log description formats to be consistent with the format of the entire driver log. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-6-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Remove useless macro definitionsLongfang Liu
The QM_QUE_ISO_CFG macro definition is no longer used and needs to be deleted from the current driver. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-5-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Remove useless function parameterLongfang Liu
Remove unused function parameters for vf_qm_fun_reset() and ensure the device is enabled before the reset operation is performed. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-4-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Fix device data address combination problemLongfang Liu
The queue address of the accelerator device should be combined into a dma address in a way of combining the low and high bits. The previous combination is wrong and needs to be modified. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-3-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27hisi_acc_vfio_pci: Fixes error return code issueLongfang Liu
During the process of compatibility and matching of live migration device information, if the isolation status of the two devices is inconsistent, the live migration needs to be exited. The current driver does not return the error code correctly and needs to be fixed. Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20220926093332.28824-2-liulongfang@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-27vfio: Follow a strict lifetime for struct iommu_groupJason Gunthorpe
The iommu_group comes from the struct device that a driver has been bound to and then created a struct vfio_device against. To keep the iommu layer sane we want to have a simple rule that only an attached driver should be using the iommu API. Particularly only an attached driver should hold ownership. In VFIO's case since it uses the group APIs and it shares between different drivers it is a bit more complicated, but the principle still holds. Solve this by waiting for all users of the vfio_group to stop before allowing vfio_unregister_group_dev() to complete. This is done with a new completion to know when the users go away and an additional refcount to keep track of how many device drivers are sharing the vfio group. The last driver to be unregistered will clean up the group. This solves crashes in the S390 iommu driver that come because VFIO ends up racing releasing ownership (which attaches the default iommu_domain to the device) with the removal of that same device from the iommu driver. This is a side case that iommu drivers should not have to cope with. iommu driver failed to attach the default/blocking domain WARNING: CPU: 0 PID: 5082 at drivers/iommu/iommu.c:1961 iommu_detach_group+0x6c/0x80 Modules linked in: macvtap macvlan tap vfio_pci vfio_pci_core irqbypass vfio_virqfd kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink mlx5_ib sunrpc ib_uverbs ism smc uvdevice ib_core s390_trng eadm_sch tape_3590 tape tape_class vfio_ccw mdev vfio_iommu_type1 vfio zcrypt_cex4 sch_fq_codel configfs ghash_s390 prng chacha_s390 libchacha aes_s390 mlx5_core des_s390 libdes sha3_512_s390 nvme sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common nvme_core zfcp scsi_transport_fc pkey zcrypt rng_core autofs4 CPU: 0 PID: 5082 Comm: qemu-system-s39 Tainted: G W 6.0.0-rc3 #5 Hardware name: IBM 3931 A01 782 (LPAR) Krnl PSW : 0704c00180000000 000000095bb10d28 (iommu_detach_group+0x70/0x80) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 0000000900000027 0000000000000039 000000095c97ffe0 00000000fffeffff 00000009fc290000 00000000af1fda50 00000000af590b58 00000000af1fdaf0 0000000135c7a320 0000000135e52258 0000000135e52200 00000000a29e8000 00000000af590b40 000000095bb10d24 0000038004b13c98 Krnl Code: 000000095bb10d18: c020003d56fc larl %r2,000000095c2bbb10 000000095bb10d1e: c0e50019d901 brasl %r14,000000095be4bf20 #000000095bb10d24: af000000 mc 0,0 >000000095bb10d28: b904002a lgr %r2,%r10 000000095bb10d2c: ebaff0a00004 lmg %r10,%r15,160(%r15) 000000095bb10d32: c0f4001aa867 brcl 15,000000095be65e00 000000095bb10d38: c004002168e0 brcl 0,000000095bf3def8 000000095bb10d3e: eb6ff0480024 stmg %r6,%r15,72(%r15) Call Trace: [<000000095bb10d28>] iommu_detach_group+0x70/0x80 ([<000000095bb10d24>] iommu_detach_group+0x6c/0x80) [<000003ff80243b0e>] vfio_iommu_type1_detach_group+0x136/0x6c8 [vfio_iommu_type1] [<000003ff80137780>] __vfio_group_unset_container+0x58/0x158 [vfio] [<000003ff80138a16>] vfio_group_fops_unl_ioctl+0x1b6/0x210 [vfio] pci 0004:00:00.0: Removing from iommu group 4 [<000000095b5b62e8>] __s390x_sys_ioctl+0xc0/0x100 [<000000095be5d3b4>] __do_syscall+0x1d4/0x200 [<000000095be6c072>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<000000095be4bf80>] __warn_printk+0x60/0x68 It indicates that domain->ops->attach_dev() failed because the driver has already passed the point of destructing the device. Fixes: 9ac8545199a1 ("iommu: Fix use-after-free in iommu_release_device") Reported-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-a3c5f4429e2a+55-iommu_group_lifetime_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-26Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', ↵Joerg Roedel
'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-09-22vfio: Move container code into drivers/vfio/container.cJason Gunthorpe
All the functions that dereference struct vfio_container are moved into container.c. Simple code motion, no functional change. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Split the register_device ops call into functionsJason Gunthorpe
This is a container item. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Rename vfio_ioctl_check_extension()Jason Gunthorpe
To vfio_container_ioctl_check_extension(). A following patch will turn this into a non-static function, make it clear it is related to the container. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Split out container code from the init/cleanup functionsJason Gunthorpe
This miscdev, noiommu driver and a couple of globals are all container items. Move this init into its own functions. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMUJason Gunthorpe
This can all be accomplished using typical IS_ENABLED techniques, drop it all. Also rename the variable to vfio_noiommu so this can be made global in following patches. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Split the container logic into vfio_container_attach_group()Jason Gunthorpe
This splits up the ioctl of vfio_group_ioctl_set_container() so it determines the type of file then invokes a type specific attachment function. Future patches will add iommufd to this function as an alternative type. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Rename __vfio_group_unset_container()Jason Gunthorpe
To vfio_group_detach_container(). This function is really a container function. Fold the WARN_ON() into it as a precondition assertion. A following patch will move the vfio_container functions to their own .c file. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-22vfio: Add header guards and includes to drivers/vfio/vfio.hJason Gunthorpe
As is normal for headers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-297af71838d2+b9-vfio_container_split_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio: Add struct device to vfio_deviceYi Liu
and replace kref. With it a 'vfio-dev/vfioX' node is created under the sysfs path of the parent, indicating the device is bound to a vfio driver, e.g.: /sys/devices/pci0000\:6f/0000\:6f\:01.0/vfio-dev/vfio0 It is also a preparatory step toward adding cdev for supporting future device-oriented uAPI. Add Documentation/ABI/testing/sysfs-devices-vfio-dev. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-16-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio: Rename vfio_device_put() and vfio_device_try_get()Kevin Tian
With the addition of vfio_put_device() now the names become confusing. vfio_put_device() is clear from object life cycle p.o.v given kref. vfio_device_put()/vfio_device_try_get() are helpers for tracking users on a registered device. Now rename them: - vfio_device_put() -> vfio_device_put_registration() - vfio_device_try_get() -> vfio_device_try_get_registration() Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-15-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/ccw: Use the new device life cycle helpersKevin Tian
ccw is the only exception which cannot use vfio_alloc_device() because its private device structure is designed to serve both mdev and parent. Life cycle of the parent is managed by css_driver so vfio_ccw_private must be allocated/freed in css_driver probe/remove path instead of conforming to vfio core life cycle for mdev. Given that use a wait/completion scheme so the mdev remove path waits after vfio_put_device() until receiving a completion notification from @release. The completion indicates that all active references on vfio_device have been released. After that point although free of vfio_ccw_private is delayed to css_driver it's at least guaranteed to have no parallel reference on released vfio device part from other code paths. memset() in @probe is removed. vfio_device is either already cleared when probed for the first time or cleared in @release from last probe. The right fix is to introduce separate structures for mdev and parent, but this won't happen in short term per prior discussions. Remove vfio_init/uninit_group_dev() as no user now. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20220921104401.38898-14-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/amba: Use the new device life cycle helpersKevin Tian
Implement amba's own vfio_device_ops. Remove vfio_platform_probe/remove_common() given no user now. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-13-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/platform: Use the new device life cycle helpersKevin Tian
Move vfio_device_ops from platform core to platform drivers so device specific init/cleanup can be added. Introduce two new helpers vfio_platform_init/release_common() for the use in driver @init/@release. vfio_platform_probe/remove_common() will be deprecated. Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-12-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/fsl-mc: Use the new device life cycle helpersYi Liu
Also add a comment to mark that vfio core releases device_set if @init fails. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-11-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/hisi_acc: Use the new device life cycle helpersYi Liu
Tidy up @probe so all migration specific initialization logic is moved to migration specific @init callback. Remove vfio_pci_core_{un}init_device() given no user now. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20220921104401.38898-5-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/mlx5: Use the new device life cycle helpersYi Liu
mlx5 has its own @init/@release for handling migration cap. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-4-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio/pci: Use the new device life cycle helpersYi Liu
Also introduce two pci core helpers as @init/@release for pci drivers: - vfio_pci_core_init_dev() - vfio_pci_core_release_dev() Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220921104401.38898-3-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-21vfio: Add helpers for unifying vfio_device life cycleKevin Tian
The idea is to let vfio core manage the vfio_device life cycle instead of duplicating the logic cross drivers. This is also a preparatory step for adding struct device into vfio_device. New pair of helpers together with a kref in vfio_device: - vfio_alloc_device() - vfio_put_device() Drivers can register @init/@release callbacks to manage any private state wrapping the vfio_device. However vfio-ccw doesn't fit this model due to a life cycle mess that its private structure mixes both parent and mdev info hence must be allocated/freed outside of the life cycle of vfio device. Per prior discussions this won't be fixed in short term by IBM folks. Instead of waiting for those modifications introduce another helper vfio_init_device() so ccw can call it to initialize a pre-allocated vfio_device. Further implication of the ccw trick is that vfio_device cannot be freed uniformly in vfio core. Instead, require *EVERY* driver to implement @release and free vfio_device inside. Then ccw can choose to delay the free at its own discretion. Another trick down the road is that kvzalloc() is used to accommodate the need of gvt which uses vzalloc() while all others use kzalloc(). So drivers should call a helper vfio_free_device() to free the vfio_device instead of assuming that kfree() or vfree() is appliable. Later once the ccw mess is fixed we can remove those tricks and fully handle structure alloc/free in vfio core. Existing vfio_{un}init_group_dev() will be deprecated after all existing usages are converted to the new model. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20220921104401.38898-2-kevin.tian@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-09Merge tag 'vfio-v6.0-rc5' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fix from Alex Williamson: - Fix zero page refcount leak (Alex Williamson) * tag 'vfio-v6.0-rc5' of https://github.com/awilliam/linux-vfio: vfio/type1: Unpin zero pages
2022-09-08vfio/mlx5: Set the driver DMA logging callbacksYishai Hadas
Now that everything is ready set the driver DMA logging callbacks if supported by the device. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-11-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Manage error scenarios on trackerYishai Hadas
Handle async error events and health/recovery flow to safely stop the tracker upon error scenarios. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-10-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Report dirty pages from trackerYishai Hadas
Report dirty pages from tracker. It includes: Querying for dirty pages in a given IOVA range, this is done by modifying the tracker into the reporting state and supplying the required range. Using the CQ event completion mechanism to be notified once data is ready on the CQ/QP to be processed. Once data is available turn on the corresponding bits in the bit map. This functionality will be used as part of the 'log_read_and_clear' driver callback in the next patches. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-9-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Create and destroy page tracker objectYishai Hadas
Add support for creating and destroying page tracker object. This object is used to control/report the device dirty pages. As part of creating the tracker need to consider the device capabilities for max ranges and adapt/combine ranges accordingly. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-8-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/mlx5: Init QP based resources for dirty trackingYishai Hadas
Init QP based resources for dirty tracking to be used upon start logging. It includes: Creating the host and firmware RC QPs, move each of them to its expected state based on the device specification, etc. Creating the relevant resources which are needed by both QPs as of UAR, PD, etc. Creating the host receive side resources as of MKEY, CQ, receive WQEs, etc. The above resources are cleaned-up upon stop logging. The tracker object that will be introduced by next patches will use those resources. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Introduce the DMA logging feature supportYishai Hadas
Introduce the DMA logging feature support in the vfio core layer. It includes the processing of the device start/stop/report DMA logging UAPIs and calling the relevant driver 'op' to do the work. Specifically, Upon start, the core translates the given input ranges into an interval tree, checks for unexpected overlapping, non aligned ranges and then pass the translated input to the driver for start tracking the given ranges. Upon report, the core translates the given input user space bitmap and page size into an IOVA kernel bitmap iterator. Then it iterates it and call the driver to set the corresponding bits for the dirtied pages in a specific IOVA range. Upon stop, the driver is called to stop the previous started tracking. The next patches from the series will introduce the mlx5 driver implementation for the logging ops. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio: Add an IOVA bitmap supportJoao Martins
The new facility adds a bunch of wrappers that abstract how an IOVA range is represented in a bitmap that is granulated by a given page_size. So it translates all the lifting of dealing with user pointers into its corresponding kernel addresses backing said user memory into doing finally the (non-atomic) bitmap ops to change various bits. The formula for the bitmap is: data[(iova / page_size) / 64] & (1ULL << (iova % 64)) Where 64 is the number of bits in a unsigned long (depending on arch) It introduces an IOVA iterator that uses a windowing scheme to minimize the pinning overhead, as opposed to pinning it on demand 4K at a time. Assuming a 4K kernel page and 4K requested page size, we can use a single kernel page to hold 512 page pointers, mapping 2M of bitmap, representing 64G of IOVA space. An example usage of these helpers for a given @base_iova, @page_size, @length and __user @data: bitmap = iova_bitmap_alloc(base_iova, page_size, length, data); if (IS_ERR(bitmap)) return -ENOMEM; ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn); iova_bitmap_free(bitmap); Each iteration of the @dirty_reporter_fn is called with a unique @iova and @length argument, indicating the current range available through the iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty areas (@iova_length) within that provided range, as following: iova_bitmap_set(bitmap, iova, iova_length); The facility is intended to be used for user bitmaps representing dirtied IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20220908183448.195262-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08Merge remote-tracking branch 'mlx5/mlx5-vfio' into v6.1/vfio/nextAlex Williamson
Merge net/mlx5 depedencies for device DMA logging and mlx5 variant driver suppport. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-08vfio/fsl-mc: Fix a typo in a messageChristophe JAILLET
L and S are swapped in the message. s/VFIO_FLS_MC/VFIO_FSL_MC/ Also use 'ret' instead of 'WARN_ON(ret)' to avoid a duplicated message. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Diana Craciun <diana.craciun@oss.nxp.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/a7c1394346725b7435792628c8d4c06a0a745e0b.1662134821.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-07iommu/dma: Move public interfaces to linux/iommu.hRobin Murphy
The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with only a couple more public interfaces left pertaining to MSI integration. Since these depend on the main IOMMU API header anyway, move their declarations there, taking the opportunity to update the half-baked comments to proper kerneldoc along the way. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-01hisi_acc_vfio_pci: Correct the function prefix for hssi_acc_drvdata()Shameer Kolothum
Commit 91be0bd6c6cf("vfio/pci: Have all VFIO PCI drivers store the vfio_pci_core_device in drvdata") introduced a helper function to retrieve the drvdata but used "hssi" instead of "hisi" for the function prefix. Correct that and also while at it, moved the function a bit down so that it's close to other hisi_ prefixed functions. No functional changes. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220831085943.993-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio: Remove vfio_group dev_counterJason Gunthorpe
This counts the number of devices attached to a vfio_group, ie the number of items in the group->device_list. It is only read in vfio_pin_pages(), as some kind of protection against limitations in type1. However, with all the code cleanups in this area, now that vfio_pin_pages() accepts a vfio_device directly it is redundant. All drivers are already calling vfio_register_emulated_iommu_dev() which directly creates a group specifically for the device and thus it is guaranteed that there is a singleton group. Leave a note in the comment about this requirement and remove the logic. Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v2-d4374a7bf0c9+c4-vfio_dev_counter_jgg@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUPAbhishek Sahu
This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Once the device access has been finished, then the host can move the device again into low power state. With the low power entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP, the device will not be moved back into the low power state and a notification will be sent to the user by triggering wakeup eventfd. vfio_pci_core_pm_entry() will be called for both the variants of low power feature entry so add an extra argument for wakeup eventfd context and store locally in 'struct vfio_pci_core_device'. For the entry happened without wakeup eventfd, all the exit related handling will be done by the LOW_POWER_EXIT device feature only. When the LOW_POWER_EXIT will be called, then the vfio core layer vfio_device_pm_runtime_get() will increment the usage count and will resume the device. In the driver runtime_resume callback, the 'pm_wake_eventfd_ctx' will be NULL. Then vfio_pci_core_pm_exit() will call vfio_pci_runtime_pm_exit() and all the exit related handling will be done. For the entry happened with wakeup eventfd, in the driver resume callback, eventfd will be triggered and all the exit related handling will be done. When vfio_pci_runtime_pm_exit() will be called by vfio_pci_core_pm_exit(), then it will return early. But if the runtime suspend has not happened on the host side, then all the exit related handling will be done in vfio_pci_core_pm_exit() only. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-6-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXITAbhishek Sahu
Currently, if the runtime power management is enabled for vfio-pci based devices in the guest OS, then the guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state (on the supported systems) which will help in saving maximum power. 1. D3cold state can't be achieved by writing PCI standard PM config registers. This patch implements the following newly added low power related device features: - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY feature will allow the device to make use of low power platform states on the host while the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will prevent further use of those power states. 2. The vfio-pci driver uses runtime PM framework for low power entry and exit. On the platforms where D3cold state is supported, the runtime PM framework will put the device into D3cold otherwise, D3hot or some other power state will be used. There are various cases where the device will not go into the runtime suspended state. For example, - The runtime power management is disabled on the host side for the device. - The user keeps the device busy after calling LOW_POWER_ENTRY. - There are dependent devices that are still in runtime active state. For these cases, the device will be in the same power state that has been configured by the user through PCI_PM_CTRL register. 3. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS invokes the _OFF method during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then call the low power device feature IOCTL. 4. The 'pm_runtime_engaged' flag tracks the entry and exit to runtime PM. This flag is protected with 'memory_lock' semaphore. 5. All the config and other region access are wrapped under pm_runtime_resume_and_get() and pm_runtime_put(). So, if any device access happens while the device is in the runtime suspended state, then the device will be resumed first before access. Once the access has been finished, then the device will again go into the runtime suspended state. 6. The memory region access through mmap will not be allowed in the low power state. Since __vfio_pci_memory_enabled() is a common function, so check for 'pm_runtime_engaged' has been added explicitly in vfio_pci_mmap_fault() to block only mmap'ed access. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-09-01vfio/pci: Mask INTx during runtime suspendAbhishek Sahu
This patch adds INTx handling during runtime suspend/resume. All the suspend/resume related code for the user to put the device into the low power state will be added in subsequent patches. The INTx lines may be shared among devices. Whenever any INTx interrupt comes for the VFIO devices, then vfio_intx_handler() will be called for each device sharing the interrupt. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() and checks if the interrupt has been generated for the current device. Now, if the device is already in the D3cold state, then the config space can not be read. Attempt to read config space in D3cold state can cause system unresponsiveness in a few systems. To prevent this, mask INTx in runtime suspend callback, and unmask the same in runtime resume callback. If INTx has been already masked, then no handling is needed in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this, and vfio_pci_intx_mask() has been updated to return true if the INTx vfio_pci_irq_ctx.masked value is changed inside this function. For the runtime suspend which is triggered for the no user of VFIO device, the 'irq_type' will be VFIO_PCI_NUM_IRQS and these callbacks won't do anything. The MSI/MSI-X are not shared so similar handling should not be needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal() without doing any device-specific config access. When the user performs any config access or IOCTL after receiving the eventfd notification, then the device will be moved to the D0 state first before servicing any request. Another option was to check this flag 'pm_intx_masked' inside vfio_intx_handler() instead of masking the interrupts. This flag is being set inside the runtime_suspend callback but the device can be in non-D3cold state (for example, if the user has disabled D3cold explicitly by sysfs, the D3cold is not supported in the platform, etc.). Also, in D3cold supported case, the device will be in D0 till the PCI core moves the device into D3cold. In this case, there is a possibility that the device can generate an interrupt. Adding check in the IRQ handler will not clear the IRQ status and the interrupt line will still be asserted. This can cause interrupt flooding. Signed-off-by: Abhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220829114850.4341-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>