summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2023-01-13iommu/amd: Do not allocate io_pgtable_ops for passthrough domainVasant Hegde
In passthrough mode we do not use IOMMU page table. Hence we don't need to allocate io_pgtable_ops. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20230105091728.42469-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Tidy up io-pgtable dependenciesRobin Murphy
Some io-pgtable implementations, and thus their users too, carry a slightly odd dependency to get around the GENERIC_ATOMIC64 version of cmpxchg64() often failing to compile. Since this is a functional dependency, it's a bit misleading and untidy to tie it explicitly to COMPILE_TEST while assuming that it's also implied by the other platform/architecture options. Make things clearer by separating these functional dependencies into distinct statements from those controlling visibility, and since they do look a bit non-obvious to the uninitiated, also commenting them for good measure. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/51d8c78e2ecc6696ac5907526580209ea6da167f.1673553587.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Remove detach_dev callbackLu Baolu
The detach_dev callback of domain ops is not called in the IOMMU core. Remove this callback to avoid dead code. The trace event for detaching domain from device is removed accordingly. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230110025408.667767-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Remove deferred attach check from __iommu_detach_device()Jason Gunthorpe
At the current moment, __iommu_detach_device() is only called via call chains that are after the device driver is attached - eg via explicit attach APIs called by the device driver. Commit bd421264ed30 ("iommu: Fix deferred domain attachment") has removed deferred domain attachment check from __iommu_attach_device() path, so it should just unconditionally work in the __iommu_detach_device() path. It actually looks like a bug that we were blocking detach on these paths since the attach was unconditional and the caller is going to free the (probably) UNAMANGED domain once this returns. The only place we should be testing for deferred attach is during the initial point the dma device is linked to the group, and then again during the dma api calls. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230110025408.667767-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Add set_platform_dma_ops callbacksLu Baolu
For those IOMMU drivers that don't provide default domain support, add an implementation of set_platform_dma_ops callback so that the IOMMU core could return the DMA control to platform DMA ops. At the same time, with the set_platform_dma_ops implemented, there is no need for detach_dev. Remove it to avoid dead code. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230110025408.667767-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Add set_platform_dma_ops iommu opsLu Baolu
When VFIO finishes assigning a device to user space and calls iommu_group_release_dma_owner() to return the device to kernel, the IOMMU core will attach the default domain to the device. Unfortunately, some IOMMU drivers don't support default domain, hence in the end, the core calls .detach_dev instead. This adds set_platform_dma_ops iommu ops to make it clear that what it does is returning control back to the platform DMA ops. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230110025408.667767-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Remove detach_dev callbacksLu Baolu
The iommu core calls the driver's detach_dev domain op callback only when a device is finished assigning to user space and iommu_group_release_dma_owner() is called to return the device to the kernel, where iommu core wants to set the default domain to the device but the driver didn't provide one. In other words, if any iommu driver provides default domain support, the .detach_dev callback will never be called. This removes the detach_dev callbacks in those IOMMU drivers that support default domain. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> # apple-dart Acked-by: Chunyan Zhang <zhang.lyra@gmail.com> # sprd Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> # amd Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230110025408.667767-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()Christophe JAILLET
A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in the error handling path of mtk_iommu_v1_probe(). Add the corresponding clk_disable_unprepare(), as already done in the remove function. Fixes: b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu/iova: Fix alloc iova overflows issueYunfei Wang
In __alloc_and_insert_iova_range, there is an issue that retry_pfn overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will overflow. As a result, if the retry logic is executed, low_pfn is updated to 0, and then new_pfn < low_pfn returns false to make the allocation successful. This issue occurs in the following two situations: 1. The first iova size exceeds the domain size. When initializing iova domain, iovad->cached_node is assigned as iovad->anchor. For example, the iova domain size is 10M, start_pfn is 0x1_F000_0000, and the iova size allocated for the first time is 11M. The following is the log information, new->pfn_lo is smaller than iovad->cached_node. Example log as follows: [ 223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00 [ 223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff 2. The node with the largest iova->pfn_lo value in the iova domain is deleted, iovad->cached_node will be updated to iovad->anchor, and then the alloc iova size exceeds the maximum iova size that can be allocated in the domain. After judging that retry_pfn is less than limit_pfn, call retry_pfn+1 to fix the overflow issue. Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com> Cc: <stable@vger.kernel.org> # 5.15.* Fixes: 4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search fails") Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Fix refcount leak in iommu_device_claim_dma_ownerMiaoqian Lin
iommu_group_get() returns the group with the reference incremented. Move iommu_group_get() after owner check to fix the refcount leak. Fixes: 89395ccedbc1 ("iommu: Add device-centric DMA ownership interfaces") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221230083100.1489569-1-linmq006@gmail.com [ joro: Remove *group = NULL initialization ] Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu/arm-smmu-v3: Don't unregister on shutdownVladimir Oltean
Similar to SMMUv2, this driver calls iommu_device_unregister() from the shutdown path, which removes the IOMMU groups with no coordination whatsoever with their users - shutdown methods are optional in device drivers. This can lead to NULL pointer dereferences in those drivers' DMA API calls, or worse. Instead of calling the full arm_smmu_device_remove() from arm_smmu_device_shutdown(), let's pick only the relevant function call - arm_smmu_device_disable() - more or less the reverse of arm_smmu_device_reset() - and call just that from the shutdown path. Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221215141251.3688780-2-vladimir.oltean@nxp.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-13iommu/arm-smmu: Don't unregister on shutdownVladimir Oltean
Michael Walle says he noticed the following stack trace while performing a shutdown with "reboot -f". He suggests he got "lucky" and just hit the correct spot for the reboot while there was a packet transmission in flight. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000098 CPU: 0 PID: 23 Comm: kworker/0:1 Not tainted 6.1.0-rc5-00088-gf3600ff8e322 #1930 Hardware name: Kontron KBox A-230-LS (DT) pc : iommu_get_dma_domain+0x14/0x20 lr : iommu_dma_map_page+0x9c/0x254 Call trace: iommu_get_dma_domain+0x14/0x20 dma_map_page_attrs+0x1ec/0x250 enetc_start_xmit+0x14c/0x10b0 enetc_xmit+0x60/0xdc dev_hard_start_xmit+0xb8/0x210 sch_direct_xmit+0x11c/0x420 __dev_queue_xmit+0x354/0xb20 ip6_finish_output2+0x280/0x5b0 __ip6_finish_output+0x15c/0x270 ip6_output+0x78/0x15c NF_HOOK.constprop.0+0x50/0xd0 mld_sendpack+0x1bc/0x320 mld_ifc_work+0x1d8/0x4dc process_one_work+0x1e8/0x460 worker_thread+0x178/0x534 kthread+0xe0/0xe4 ret_from_fork+0x10/0x20 Code: d503201f f9416800 d503233f d50323bf (f9404c00) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt This appears to be reproducible when the board has a fixed IP address, is ping flooded from another host, and "reboot -f" is used. The following is one more manifestation of the issue: $ reboot -f kvm: exiting hardware virtualization cfg80211: failed to load regulatory.db arm-smmu 5000000.iommu: disabling translation sdhci-esdhc 2140000.mmc: Removing from iommu group 11 sdhci-esdhc 2150000.mmc: Removing from iommu group 12 fsl-edma 22c0000.dma-controller: Removing from iommu group 17 dwc3 3100000.usb: Removing from iommu group 9 dwc3 3110000.usb: Removing from iommu group 10 ahci-qoriq 3200000.sata: Removing from iommu group 2 fsl-qdma 8380000.dma-controller: Removing from iommu group 20 platform f080000.display: Removing from iommu group 0 etnaviv-gpu f0c0000.gpu: Removing from iommu group 1 etnaviv etnaviv: Removing from iommu group 1 caam_jr 8010000.jr: Removing from iommu group 13 caam_jr 8020000.jr: Removing from iommu group 14 caam_jr 8030000.jr: Removing from iommu group 15 caam_jr 8040000.jr: Removing from iommu group 16 fsl_enetc 0000:00:00.0: Removing from iommu group 4 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000 fsl_enetc 0000:00:00.1: Removing from iommu group 5 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000002, GFSYNR1 0x00000429, GFSYNR2 0x00000000 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x80000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000 fsl_enetc 0000:00:00.2: Removing from iommu group 6 fsl_enetc_mdio 0000:00:00.3: Removing from iommu group 8 mscc_felix 0000:00:00.5: Removing from iommu group 3 fsl_enetc 0000:00:00.6: Removing from iommu group 7 pcieport 0001:00:00.0: Removing from iommu group 18 arm-smmu 5000000.iommu: Blocked unknown Stream ID 0x429; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications arm-smmu 5000000.iommu: GFSR 0x00000002, GFSYNR0 0x00000000, GFSYNR1 0x00000429, GFSYNR2 0x00000000 pcieport 0002:00:00.0: Removing from iommu group 19 Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8 pc : iommu_get_dma_domain+0x14/0x20 lr : iommu_dma_unmap_page+0x38/0xe0 Call trace: iommu_get_dma_domain+0x14/0x20 dma_unmap_page_attrs+0x38/0x1d0 enetc_unmap_tx_buff.isra.0+0x6c/0x80 enetc_poll+0x170/0x910 __napi_poll+0x40/0x1e0 net_rx_action+0x164/0x37c __do_softirq+0x128/0x368 run_ksoftirqd+0x68/0x90 smpboot_thread_fn+0x14c/0x190 Code: d503201f f9416800 d503233f d50323bf (f9405400) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The problem seems to be that iommu_group_remove_device() is allowed to run with no coordination whatsoever with the shutdown procedure of the enetc PCI device. In fact, it almost seems as if it implies that the pci_driver :: shutdown() method is mandatory if DMA is used with an IOMMU, otherwise this is inevitable. That was never the case; shutdown methods are optional in device drivers. This is the call stack that leads to iommu_group_remove_device() during reboot: kernel_restart -> device_shutdown -> platform_shutdown -> arm_smmu_device_shutdown -> arm_smmu_device_remove -> iommu_device_unregister -> bus_for_each_dev -> remove_iommu_group -> iommu_release_device -> iommu_group_remove_device I don't know much about the arm_smmu driver, but arm_smmu_device_shutdown() invoking arm_smmu_device_remove() looks suspicious, since it causes the IOMMU device to unregister and that's where everything starts to unravel. It forces all other devices which depend on IOMMU groups to also point their ->shutdown() to ->remove(), which will make reboot slower overall. There are 2 moments relevant to this behavior. First was commit b06c076ea962 ("Revert "iommu/arm-smmu: Make arm-smmu explicitly non-modular"") when arm_smmu_device_shutdown() was made to run the exact same thing as arm_smmu_device_remove(). Prior to that, there was no iommu_device_unregister() call in arm_smmu_device_shutdown(). However, that was benign until commit 57365a04c921 ("iommu: Move bus setup to IOMMU device registration"), which made iommu_device_unregister() call remove_iommu_group(). Restore the old shutdown behavior by making remove() call shutdown(), but shutdown() does not call the remove() specific bits. Fixes: 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") Reported-by: Michael Walle <michael@walle.cc> Tested-by: Michael Walle <michael@walle.cc> # on kontron-sl28 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221215141251.3688780-1-vladimir.oltean@nxp.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-13iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even bettererRobin Murphy
Although it's vanishingly unlikely that anyone would integrate an SMMU within a coherent interconnect without also making the pagetable walk interface coherent, the same effect happens if a coherent SMMU fails to advertise CTTW correctly. This turns out to be the case on some popular NXP SoCs, where VFIO started failing the IOMMU_CAP_CACHE_COHERENCY test, even though IOMMU_CACHE *was* previously achieving the desired effect anyway thanks to the underlying integration. While those SoCs stand to gain some more general benefits from a firmware update to override CTTW correctly in DT/ACPI, it's also easy to work around this in Linux as well, to avoid imposing too much on affected users - since the upstream client devices *are* correctly marked as coherent, we can trivially infer their coherent paths through the SMMU as well. Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Fixes: df198b37e72c ("iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY better") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/d6dc41952961e5c7b21acac08a8bf1eb0f69e124.1671123115.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2023-01-11iommu: Remove IOMMU_CAP_INTR_REMAPJason Gunthorpe
No iommu driver implements this any more, get rid of it. Link: https://lore.kernel.org/r/9-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11irq/s390: Add arch_is_isolated_msi() for s390Jason Gunthorpe
s390 doesn't use irq_domains, so it has no place to set IRQ_DOMAIN_FLAG_ISOLATED_MSI. Instead of continuing to abuse the iommu subsystem to convey this information add a simple define which s390 can make statically true. The define will cause msi_device_has_isolated() to return true. Remove IOMMU_CAP_INTR_REMAP from the s390 iommu driver. Link: https://lore.kernel.org/r/8-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11iommu/x86: Replace IOMMU_CAP_INTR_REMAP with IRQ_DOMAIN_FLAG_ISOLATED_MSIJason Gunthorpe
On x86 platforms when the HW can support interrupt remapping the iommu driver creates an irq_domain for the IR hardware and creates a child MSI irq_domain. When the global irq_remapping_enabled is set, the IR MSI domain is assigned to the PCI devices (by intel_irq_remap_add_device(), or amd_iommu_set_pci_msi_domain()) making those devices have the isolated MSI property. Due to how interrupt domains work, setting IRQ_DOMAIN_FLAG_ISOLATED_MSI on the parent IR domain will cause all struct devices attached to it to return true from msi_device_has_isolated_msi(). This replaces the IOMMU_CAP_INTR_REMAP flag as all places using IOMMU_CAP_INTR_REMAP also call msi_device_has_isolated_msi() Set the flag and delete the cap. Link: https://lore.kernel.org/r/7-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11iommufd: Convert to msi_device_has_isolated_msi()Jason Gunthorpe
Trivially use the new API. Link: https://lore.kernel.org/r/4-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-11iommu: Add iommu_group_has_isolated_msi()Jason Gunthorpe
Compute the isolated_msi over all the devices in the IOMMU group because iommufd and vfio both need to know that the entire group is isolated before granting access to it. Link: https://lore.kernel.org/r/2-v3-3313bb5dd3a3+10f11-secure_msi_jgg@nvidia.com Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-19Merge tag 'iommu-updates-v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core code: - map/unmap_pages() cleanup - SVA and IOPF refactoring - Clean up and document return codes from device/domain attachment AMD driver: - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and ivrs_acpihid command line options - Some smaller cleanups Intel driver: - Blocking domain support - Cleanups S390 driver: - Fixes and improvements for attach and aperture handling PAMU driver: - Resource leak fix and cleanup Rockchip driver: - Page table permission bit fix Mediatek driver: - Improve safety from invalid dts input - Smaller fixes and improvements Exynos driver: - Fix driver initialization sequence Sun50i driver: - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever - Various other fixes" * tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits) iommu/mediatek: Fix forever loop in error handling iommu/mediatek: Fix crash on isr after kexec() iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY iommu/amd: Fix typo in macro parameter name iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data iommu/mediatek: Improve safety for mediatek,smi property in larb nodes iommu/mediatek: Validate number of phandles associated with "mediatek,larbs" iommu/mediatek: Add error path for loop of mm_dts_parse iommu/mediatek: Use component_match_add iommu/mediatek: Add platform_device_put for recovering the device refcnt iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() iommu/vt-d: Use real field for indication of first level iommu/vt-d: Remove unnecessary domain_context_mapped() iommu/vt-d: Rename domain_add_dev_info() iommu/vt-d: Rename iommu_disable_dev_iotlb() iommu/vt-d: Add blocking domain support iommu/vt-d: Add device_block_translation() helper iommu/vt-d: Allocate pasid table in device probe path iommu/amd: Check return value of mmu_notifier_register() iommu/amd: Fix pci device refcount leak in ppr_notifier() ...
2022-12-17Merge tag 'x86_mm_for_6.2_v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Dave Hansen: "New Feature: - Randomize the per-cpu entry areas Cleanups: - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it - Move to "native" set_memory_rox() helper - Clean up pmd_get_atomic() and i386-PAE - Remove some unused page table size macros" * tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) x86/mm: Ensure forced page table splitting x86/kasan: Populate shadow for shared chunk of the CPU entry area x86/kasan: Add helpers to align shadow addresses up and down x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area x86/mm: Recompute physical address for every page of per-CPU CEA mapping x86/mm: Rename __change_page_attr_set_clr(.checkalias) x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias() x86/mm: Untangle __change_page_attr_set_clr(.checkalias) x86/mm: Add a few comments x86/mm: Fix CR3_ADDR_MASK x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros mm: Convert __HAVE_ARCH_P..P_GET to the new style mm: Remove pointless barrier() after pmdp_get_lockless() x86/mm/pae: Get rid of set_64bit() x86_64: Remove pointless set_64bit() usage x86/mm/pae: Be consistent with pXXp_get_and_clear() x86/mm/pae: Use WRITE_ONCE() x86/mm/pae: Don't (ab)use atomic64 mm/gup: Fix the lockless PMD access ...
2022-12-15x86_64: Remove pointless set_64bit() usagePeter Zijlstra
The use of set_64bit() in X86_64 only code is pretty pointless, seeing how it's a direct assignment. Remove all this nonsense. [nathanchance: unbreak irte] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.168036718%40infradead.org
2022-12-14Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...
2022-12-13Merge tag 'dma-mapping-6.2-2022-12-13' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - reduce the swiotlb buffer size on allocation failure (Alexey Kardashevskiy) - clean up passing of bogus GFP flags to the dma-coherent allocator (Christoph Hellwig) * tag 'dma-mapping-6.2-2022-12-13' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reject __GFP_COMP in dma_alloc_attrs ALSA: memalloc: don't pass bogus GFP_ flags to dma_alloc_* s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent cnic: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/qib: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/hfi1: don't pass bogus GFP_ flags to dma_alloc_coherent media: videobuf-dma-contig: use dma_mmap_coherent swiotlb: reduce the swiotlb buffer size on allocation failure
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-12-12Merge tag 'hyperv-next-signed-20221208' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav Kohli) - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli) - Allow IRQ remapping to work without x2apic (Nuno Das Neves) - Fix comments (Olaf Hering) - Expand hv_vp_assist_page definition (Saurabh Sengar) - Improvement to page reporting (Shradha Gupta) - Make sure TSC clocksource works when Linux runs as the root partition (Stanislav Kinsburskiy) * tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Remove unregister syscore call from Hyper-V cleanup iommu/hyper-v: Allow hyperv irq remapping without x2apic clocksource: hyper-v: Add TSC page support for root partition clocksource: hyper-v: Use TSC PFN getter to map vvar page clocksource: hyper-v: Introduce TSC PFN getter clocksource: hyper-v: Introduce a pointer to TSC page x86/hyperv: Expand definition of struct hv_vp_assist_page PCI: hv: update comment in x86 specific hv_arch_irq_unmask hv: fix comment typo in vmbus_channel/low_latency drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers video: hyperv_fb: Avoid taking busy spinlock on panic path hv_balloon: Add support for configurable order free page reporting mm/page_reporting: Add checks for page_reporting_order param
2022-12-12Merge branches 'arm/allwinner', 'arm/exynos', 'arm/mediatek', ↵Joerg Roedel
'arm/rockchip', 'arm/smmu', 'ppc/pamu', 's390', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-12-12iommu/mediatek: Fix forever loop in error handlingDan Carpenter
There is a typo so this loop does i++ where i-- was intended. It will result in looping until the kernel crashes. Fixes: 26593928564c ("iommu/mediatek: Add error path for loop of mm_dts_parse") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/Y5C3mTam2nkbaz6o@kili Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-09iommufd: Change the order of MSI setupJason Gunthorpe
Eric points out this is wrong for the rare case of someone using allow_unsafe_interrupts on ARM. We always have to setup the MSI window in the domain if the iommu driver asks for it. Move the iommu_get_msi_cookie() setup to the top of the function and always do it, regardless of the security mode. Add checks to iommufd_device_setup_msi() to ensure the driver is not doing something incomprehensible. No current driver will set both a HW and SW MSI window, or have more than one SW MSI window. Fixes: e8d57210035b ("iommufd: Add kAPI toward external drivers for physical devices") Link: https://lore.kernel.org/r/3-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reported-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09iommufd: Improve a few unclear bits of codeJason Gunthorpe
Correct a few items noticed late in review: - We should assert that the math in batch_clear_carry() doesn't underflow - user->locked should be -1 not 0 sicne we just did mmput - npages should not have been recalculated, it already has that value No functional change. Fixes: 8d160cd4d506 ("iommufd: Algorithms for PFN storage") Link: https://lore.kernel.org/r/2-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09iommufd: Fix comment typosJason Gunthorpe
Repair some typos in comments that were noticed late in the review cycle. Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages") Link: https://lore.kernel.org/r/1-v1-0362a1a1c034+98-iommufd_fixes1_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reported-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-05iommu/amd: Enable PCI/IMSThomas Gleixner
PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag, but only when on real hardware. Virtualized IOMMUs need additional support. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232327.140571546@linutronix.de
2022-12-05iommu/vt-d: Enable PCI/IMSThomas Gleixner
PCI/IMS works like PCI/MSI-X in the remapping. Just add the feature flag, but only when on real hardware. Virtualized IOMMUs need additional support, e.g. for PASID. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232327.081482253@linutronix.de
2022-12-05iommu/amd: Switch to MSI base domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de
2022-12-05iommu/vt-d: Switch to MSI parent domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
2022-12-05x86/apic/vector: Provide MSI parent domainThomas Gleixner
Enable MSI parent domain support in the x86 vector domain and fixup the checks in the iommu implementations to check whether device::msi::domain is the default MSI parent domain. That keeps the existing logic to protect e.g. devices behind VMD working. The interrupt remap PCI/MSI code still works because the underlying vector domain still provides the same functionality. None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are affected either. They still work the same way both at the low level and the PCI/MSI implementations they provide. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
2022-12-05iommu/vt-d: Fix buggy QAT device maskJacob Pan
Impacted QAT device IDs that need extra dtlb flush quirk is ranging from 0x4940 to 0x4943. After bitwise AND device ID with 0xfffc the result should be 0x4940 instead of 0x494c to identify these devices. Fixes: e65a6897be5e ("iommu/vt-d: Add a fix for devices need extra dtlb flush") Reported-by: Raghunathan Srinivasan <raghunathan.srinivasan@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20221203005610.2927487-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Fix crash on isr after kexec()Ricardo Ribalda
If the system is rebooted via isr(), the IRQ handler might be triggered before the domain is initialized. Resulting on an invalid memory access error. Fix: [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070 [ 0.501166] Call trace: [ 0.501174] report_iommu_fault+0x28/0xfc [ 0.501180] mtk_iommu_isr+0x10c/0x1c0 Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20221125-mtk-iommu-v2-0-e168dff7d43e@chromium.org [ joro: Fixed spelling in commit message ] Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITYJason Gunthorpe
This driver treats IOMMU_DOMAIN_IDENTITY the same as UNMANAGED, which cannot possibly be correct. UNMANAGED domains are required to start out blocking all DMAs. This seems to be what this driver does as it allocates a first level 'dt' for the IO page table that is 0 filled. Thus UNMANAGED looks like a working IO page table, and so IDENTITY must be a mistake. Remove it. Fixes: 4100b8c229b3 ("iommu: Add Allwinner H6 IOMMU driver") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/0-v1-97f0adf27b5e+1f0-s50_identity_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/amd: Fix typo in macro parameter nameMichael Forney
IVRS_GET_SBDF_ID is only called with fn as the fourth parameter, so this had no effect, but fixing the name will avoid bugs if that ever changes. Signed-off-by: Michael Forney <mforney@mforney.org> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/381fbc430c0ccdd78b3b696cfc0c32b233526ca5.1669159392.git.mforney@mforney.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Remove unused "mapping" member from mtk_iommu_dataYong Wu
Just remove a unused variable that only is for mtk_iommu_v1. Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-7-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Improve safety for mediatek,smi property in larb nodesYong Wu
No functional change. Just improve safety from dts. All the larbs that connect to one IOMMU must connect with the same smi-common. This patch checks all the mediatek,smi property for each larb, If their mediatek,smi are different, it will return fails. Also avoid there is no available smi-larb nodes. Suggested-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-6-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Validate number of phandles associated with "mediatek,larbs"Guenter Roeck
Fix the smatch warnings: drivers/iommu/mtk_iommu.c:878 mtk_iommu_mm_dts_parse() error: uninitialized symbol 'larbnode'. If someone abuse the dtsi node(Don't follow the definition of dt-binding), for example "mediatek,larbs" is provided as boolean property, "larb_nr" will be zero and cause abnormal. To fix this problem and improve the code safety, add some checking for the invalid input from dtsi, e.g. checking the larb_nr/larbid valid range, and avoid "mediatek,larb-id" property conflicts in the smi-larb nodes. Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-5-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Add error path for loop of mm_dts_parseYong Wu
The mtk_iommu_mm_dts_parse will parse the smi larbs nodes. if the i+1 larb is parsed fail, we should put_device for the i..0 larbs. There are two places need to comment: 1) The larbid may be not linear mapping, we should loop whole the array in the error path. 2) I move this line position: "data->larb_imu[id].dev = &plarbdev->dev;" before "if (!plarbdev->dev.driver)", That means set data->larb_imu[id].dev before the error path. then we don't need "platform_device_put(plarbdev)" again in probe_defer case. All depend on "put_device" of the error path in error cases. Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE") Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-4-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Use component_match_addYong Wu
In order to simplify the error patch(avoid call of_node_put), Use component_match_add instead component_match_add_release since we are only interested in the "device" here. Then we could always call of_node_put in normal path. Strictly this is not a fixes patch, but it is a prepare for adding the error path, thus I add a Fixes tag too. Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE") Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-3-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-05iommu/mediatek: Add platform_device_put for recovering the device refcntYong Wu
Add platform_device_put to match with of_find_device_by_node. Meanwhile, I add a new variable "pcommdev" which is for smi common device. Otherwise, "platform_device_put(plarbdev)" for smi-common dev may be not readable. And add a checking for whether pcommdev is NULL. Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE") Signed-off-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20221018024258.19073-2-yong.wu@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02Merge tag 'v6.1-rc7' into iommufd.git for-nextJason Gunthorpe
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version. The rc fix was done a different way when iommufd patches reworked this code. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02iommufd: Allow iommufd to supply /dev/vfio/vfioJason Gunthorpe
If the VFIO container is compiled out, give a kconfig option for iommufd to provide the miscdev node with the same name and permissions as vfio uses. The compatibility node supports the same ioctls as VFIO and automatically enables the VFIO compatible pinned page accounting mode. Link: https://lore.kernel.org/r/10-v4-42cd2eb0e3eb+335a-vfio_iommufd_jgg@nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Yu He <yu.he@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-02iommu/vt-d: Fix PCI device refcount leak in dmar_dev_scope_init()Xiongfeng Wang
for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() for the error path to avoid reference count leak. Fixes: 2e4552893038 ("iommu/vt-d: Unify the way to process DMAR device scope array") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-3-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02iommu/vt-d: Fix PCI device refcount leak in has_external_pci()Xiongfeng Wang
for_each_pci_dev() is implemented by pci_get_device(). The comment of pci_get_device() says that it will increase the reference count for the returned pci_dev and also decrease the reference count for the input pci_dev @from if it is not NULL. If we break for_each_pci_dev() loop with pdev not NULL, we need to call pci_dev_put() to decrease the reference count. Add the missing pci_dev_put() before 'return true' to avoid reference count leak. Fixes: 89a6079df791 ("iommu/vt-d: Force IOMMU on for platform opt in hint") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Link: https://lore.kernel.org/r/20221121113649.190393-2-wangxiongfeng2@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-02iommu/vt-d: Fix PCI device refcount leak in prq_event_thread()Yang Yingliang
As comment of pci_get_domain_bus_and_slot() says, it returns a pci device with refcount increment, when finish using it, the caller must decrease the reference count by calling pci_dev_put(). So call pci_dev_put() after using the 'pdev' to avoid refcount leak. Besides, if the 'pdev' is null or intel_svm_prq_report() returns error, there is no need to trace this fault. Fixes: 06f4b8d09dba ("iommu/vt-d: Remove unnecessary SVA data accesses in page fault path") Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221119144028.2452731-1-yangyingliang@huawei.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>