summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2025-04-17iommu/pages: Move from struct page to struct ioptdesc and folioJason Gunthorpe
This brings the iommu page table allocator into the modern world of having its own private page descriptor and not re-using fields from struct page for its own purpose. It follows the basic pattern of struct ptdesc which did this transformation for the CPU page table allocator. Currently iommu-pages is pretty basic so this isn't a huge benefit, however I see a coming need for features that CPU allocator has, like sub PAGE_SIZE allocations, and RCU freeing. This provides the base infrastructure to implement those cleanly. Remove numa_node_id() calls from the inlines and instead use NUMA_NO_NODE which will get switched to numa_mem_id(), which seems to be the right ID to use for memory allocations. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/13-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: Remove iommu_put_pages_list_old and the _GenericJason Gunthorpe
Nothing uses the old list_head path now, remove it. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/12-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu: Change iommu_iotlb_gather to use iommu_page_listJason Gunthorpe
This converts the remaining places using list of pages to the new API. The Intel free path was shared with its gather path, so it is converted at the same time. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/11-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/amd: Convert to use struct iommu_pages_listJason Gunthorpe
Change the internal freelist to use struct iommu_pages_list. AMD uses the freelist to batch free the entire table during domain destruction, and to replace table levels with leafs during map. Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/10-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/riscv: Convert to use struct iommu_pages_listJason Gunthorpe
Change the internal freelist to use struct iommu_pages_list. riscv uses this page list to free page table levels that are replaced with leaf ptes. Reviewed-by: Tomasz Jeznach <tjeznach@rivosinc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: Formalize the freelist APIJason Gunthorpe
We want to get rid of struct page references outside the internal allocator implementation. The free list has the driver open code something like: list_add_tail(&virt_to_page(ptr)->lru, freelist); Move the above into a small inline and make the freelist into a wrapper type 'struct iommu_pages_list' so that the compiler can help check all the conversion. This struct has also proven helpful in some future ideas to convert to a singly linked list to get an extra pointer in the struct page, and to signal that the pages should be freed with RCU. Use a temporary _Generic so we don't need to rename the free function as the patches progress. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: De-inline the substantial functionsJason Gunthorpe
These are called in a lot of places and are not trivial. Move them to the core module. Tidy some of the comments and function arguments, fold __iommu_alloc_account() into its only caller, change __iommu_free_account() into __iommu_free_page() to remove some duplication. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: Remove iommu_free_page()Jason Gunthorpe
Use iommu_free_pages() instead. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: Remove the order argument to iommu_free_pages()Jason Gunthorpe
Now that we have a folio under the allocation iommu_free_pages() can know the order of the original allocation and do the correct thing to free it. The next patch will rename iommu_free_page() to iommu_free_pages() so we have naming consistency with iommu_alloc_pages_node(). Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: Make iommu_put_pages_list() work with high order allocationsJason Gunthorpe
alloc_pages_node(, order) needs to be paired with __free_pages(, order) to free all the allocated pages. For order != 0 the return from alloc_pages_node() is just a page list, it hasn't been formed into a folio. However iommu_put_pages_list() just calls put_page() on the head page of an allocation, which will end up leaking the tail pages if order != 0. Fix this by using __GFP_COMP to create a high order folio and then always use put_page() to free the full high order folio. __iommu_free_account() can get the order of the allocation via folio_order(), which corrects the accounting of high order allocations in iommu_put_pages_list(). This is the same technique slub uses. As far as I can tell, none of the places using high order allocations are also using the free list, so this not a current bug. Fixes: 06c375053cef ("iommu/vt-d: add wrapper functions for page allocations") Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/pages: Remove __iommu_alloc_pages()/__iommu_free_pages()Jason Gunthorpe
These were only used by tegra-smmu and leaked the struct page out of the API. Delete them since tega-smmu has been converted to the other APIs. In the process flatten the call tree so we have fewer one line functions calling other one line functions.. iommu_alloc_pages_node() is the real allocator and everything else can just call it directly. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/tegra: Do not use struct page as the handle for ptsJason Gunthorpe
Instead use the virtual address and dma_map_single() like as->pd uses. Introduce a small struct tegra_pt instead of void * to have some clarity what is using this API and add compile safety during the conversion. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/terga: Do not use struct page as the handle for as->pd memoryJason Gunthorpe
Instead use the virtual address. Change from dma_map_page() to dma_map_single() which works directly on a KVA. Add a type for the pd table level for clarity. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-c8663abbb606+3f7-iommu_pages_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu/arm-smmu-v3: Fail aliasing StreamIDs more gracefullyRobin Murphy
We've never supported StreamID aliasing between devices, and as such they will never have had functioning DMA, but this is not fatal to the SMMU itself. Although aliasing between hard-wired platform device StreamIDs would tend to raise questions about the whole system, in practice it's far more likely to occur relatively innocently due to legacy PCI bridges, where the underlying StreamID mappings are still perfectly reasonable. As such, return a more benign -ENODEV when failing probe for such an unsupported device (and log a more obvious error message), so that it doesn't break the entire SMMU probe now that bus_iommu_probe() runs in the right order and can propagate that error back. The end result is still that the device doesn't get an IOMMU group and probably won't work, same as before. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/39d54e49c8476efc4653e352150d44b185d6d50f.1744380554.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream idsNicolin Chen
ASPEED VGA card has two built-in devices: 0008:06:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 06) 0008:07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 52) Its toplogy looks like this: +-[0008:00]---00.0-[01-09]--+-00.0-[02-09]--+-00.0-[03]----00.0 Sandisk Corp Device 5017 | +-01.0-[04]-- | +-02.0-[05]----00.0 NVIDIA Corporation Device | +-03.0-[06-07]----00.0-[07]----00.0 ASPEED Technology, Inc. ASPEED Graphics Family | +-04.0-[08]----00.0 Renesas Technology Corp. uPD720201 USB 3.0 Host Controller | \-05.0-[09]----00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller \-00.1 PMC-Sierra Inc. Device 4028 The IORT logic populaties two identical IDs into the fwspec->ids array via DMA aliasing in iort_pci_iommu_init() called by pci_for_each_dma_alias(). Though the SMMU driver had been able to handle this situation since commit 563b5cbe334e ("iommu/arm-smmu-v3: Cope with duplicated Stream IDs"), that got broken by the later commit cdf315f907d4 ("iommu/arm-smmu-v3: Maintain a SID->device structure"), which ended up with allocating separate streams with the same stuffing. On a kernel prior to v6.15-rc1, there has been an overlooked warning: pci 0008:07:00.0: vgaarb: setting as boot VGA device pci 0008:07:00.0: vgaarb: bridge control possible pci 0008:07:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none pcieport 0008:06:00.0: Adding to iommu group 14 ast 0008:07:00.0: stream 67328 already in tree <===== WARNING ast 0008:07:00.0: enabling device (0002 -> 0003) ast 0008:07:00.0: Using default configuration ast 0008:07:00.0: AST 2600 detected ast 0008:07:00.0: [drm] Using analog VGA ast 0008:07:00.0: [drm] dram MCLK=396 Mhz type=1 bus_width=16 [drm] Initialized ast 0.1.0 for 0008:07:00.0 on minor 0 ast 0008:07:00.0: [drm] fb0: astdrmfb frame buffer device With v6.15-rc, since the commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path"), the error returned with the warning is moved to the SMMU device probe flow: arm_smmu_probe_device+0x15c/0x4c0 __iommu_probe_device+0x150/0x4f8 probe_iommu_group+0x44/0x80 bus_for_each_dev+0x7c/0x100 bus_iommu_probe+0x48/0x1a8 iommu_device_register+0xb8/0x178 arm_smmu_device_probe+0x1350/0x1db0 which then fails the entire SMMU driver probe: pci 0008:06:00.0: Adding to iommu group 21 pci 0008:07:00.0: stream 67328 already in tree arm-smmu-v3 arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3 arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 Since SMMU driver had been already expecting a potential duplicated Stream ID in arm_smmu_install_ste_for_dev(), change the arm_smmu_insert_master() routine to ignore a duplicated ID from the fwspec->sids array as well. Note: this has been failing the iommu_device_probe() since 2021, although a recent iommu commit in v6.15-rc1 that moves iommu_device_probe() started to fail the SMMU driver probe. Since nobody has cared about DMA Alias support, leave that as it was but fix the fundamental iommu_device_probe() breakage. Fixes: cdf315f907d4 ("iommu/arm-smmu-v3: Maintain a SID->device structure") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20250415185620.504299-1-nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17iommu/arm-smmu-v3: Fix pgsize_bit for sva domainsBalbir Singh
UBSan caught a bug with IOMMU SVA domains, where the reported exponent value in __arm_smmu_tlb_inv_range() was >= 64. __arm_smmu_tlb_inv_range() uses the domain's pgsize_bitmap to compute the number of pages to invalidate and the invalidation range. Currently arm_smmu_sva_domain_alloc() does not setup the iommu domain's pgsize_bitmap. This leads to __ffs() on the value returning 64 and that leads to undefined behaviour w.r.t. shift operations Fix this by initializing the iommu_domain's pgsize_bitmap to PAGE_SIZE. Effectively the code needs to use the smallest page size for invalidation Cc: stable@vger.kernel.org Fixes: eb6c97647be2 ("iommu/arm-smmu-v3: Avoid constructing invalid range commands") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250412002354.3071449-1-balbirs@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2025-04-17iommu/arm-smmu-v3: Add missing S2FWB feature detectionAneesh Kumar K.V (Arm)
Commit 67e4fe398513 ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") introduced S2FWB usage but omitted the corresponding feature detection. As a result, vIOMMU allocation fails on FVP in arm_vsmmu_alloc(), due to the following check: if (!arm_smmu_master_canwbs(master) && !(smmu->features & ARM_SMMU_FEAT_S2FWB)) return ERR_PTR(-EOPNOTSUPP); This patch adds the missing detection logic to prevent allocation failure when S2FWB is supported. Fixes: 67e4fe398513 ("iommu/arm-smmu-v3: Use S2FWB for NESTED domains") Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250408033351.1012411-1-aneesh.kumar@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2025-04-11iommu/tegra241-cmdqv: Fix warnings due to dmam_free_coherent()Nicolin Chen
Two WARNINGs are observed when SMMU driver rolls back upon failure: arm-smmu-v3.9.auto: Failed to register iommu arm-smmu-v3.9.auto: probe with driver arm-smmu-v3 failed with error -22 ------------[ cut here ]------------ WARNING: CPU: 5 PID: 1 at kernel/dma/mapping.c:74 dmam_free_coherent+0xc0/0xd8 Call trace: dmam_free_coherent+0xc0/0xd8 (P) tegra241_vintf_free_lvcmdq+0x74/0x188 tegra241_cmdqv_remove_vintf+0x60/0x148 tegra241_cmdqv_remove+0x48/0xc8 arm_smmu_impl_remove+0x28/0x60 devm_action_release+0x1c/0x40 ------------[ cut here ]------------ 128 pages are still in use! WARNING: CPU: 16 PID: 1 at mm/page_alloc.c:6902 free_contig_range+0x18c/0x1c8 Call trace: free_contig_range+0x18c/0x1c8 (P) cma_release+0x154/0x2f0 dma_free_contiguous+0x38/0xa0 dma_direct_free+0x10c/0x248 dma_free_attrs+0x100/0x290 dmam_free_coherent+0x78/0xd8 tegra241_vintf_free_lvcmdq+0x74/0x160 tegra241_cmdqv_remove+0x98/0x198 arm_smmu_impl_remove+0x28/0x60 devm_action_release+0x1c/0x40 This is because the LVCMDQ queue memory are managed by devres, while that dmam_free_coherent() is called in the context of devm_action_release(). Jason pointed out that "arm_smmu_impl_probe() has mis-ordered the devres callbacks if ops->device_remove() is going to be manually freeing things that probe allocated": https://lore.kernel.org/linux-iommu/20250407174408.GB1722458@nvidia.com/ In fact, tegra241_cmdqv_init_structures() only allocates memory resources which means any failure that it generates would be similar to -ENOMEM, so there is no point in having that "falling back to standard SMMU" routine, as the standard SMMU would likely fail to allocate memory too. Remove the unwind part in tegra241_cmdqv_init_structures(), and return a proper error code to ask SMMU driver to call tegra241_cmdqv_remove() via impl_ops->device_remove(). Then, drop tegra241_vintf_free_lvcmdq() since devres will take care of that. Fixes: 483e0bd8883a ("iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherent") Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250407201908.172225-1-nicolinc@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu: remove unneeded semicolonPei Xiao
cocci warnings: drivers/iommu/dma-iommu.c:1788:2-3: Unneeded semicolon so remove unneeded semicolon to fix cocci warnings. Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/tencent_73EEE47E6ECCF538229C9B9E6A0272DA2B05@qq.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/mediatek: Fix NULL pointer deference in mtk_iommu_device_groupLouis-Alexis Eyraud
Currently, mtk_iommu calls during probe iommu_device_register before the hw_list from driver data is initialized. Since iommu probing issue fix, it leads to NULL pointer dereference in mtk_iommu_device_group when hw_list is accessed with list_first_entry (not null safe). So, change the call order to ensure iommu_device_register is called after the driver data are initialized. Fixes: 9e3a2a643653 ("iommu/mediatek: Adapt sharing and non-sharing pgtable case") Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Reviewed-by: Yong Wu <yong.wu@mediatek.com> Tested-by: Chen-Yu Tsai <wenst@chromium.org> # MT8183 Juniper, MT8186 Tentacruel Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Louis-Alexis Eyraud <louisalexis.eyraud@collabora.com> Link: https://lore.kernel.org/r/20250403-fix-mtk-iommu-error-v2-1-fe8b18f8b0a8@collabora.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/exynos: Fix suspend/resume with IDENTITY domainMarek Szyprowski
Commit bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") changed the sequence of probing the SYSMMU controller devices and calls to arm_iommu_attach_device(), what results in resuming SYSMMU controller earlier, when it is still set to IDENTITY mapping. Such change revealed the bug in IDENTITY handling in the exynos-iommu driver. When SYSMMU controller is set to IDENTITY mapping, data->domain is NULL, so adjust checks in suspend & resume callbacks to handle this case correctly. Fixes: b3d14960e629 ("iommu/exynos: Implement an IDENTITY domain") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20250401202731.2810474-1-m.szyprowski@samsung.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/ipmmu-vmsa: Register in a sensible orderRobin Murphy
IPMMU registers almost-initialised instances, but misses assigning the drvdata to make them fully functional, so initial calls back into ipmmu_probe_device() are likely to fail unnecessarily. Reorder this to work as it should, also pruning the long-out-of-date comment and adding the missing sysfs cleanup on error for good measure. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/53be6667544de65a15415b699e38a9a965692e45.1742481687.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu: Clear iommu-dma ops on cleanupRobin Murphy
If iommu_device_register() encounters an error, it can end up tearing down already-configured groups and default domains, however this currently still leaves devices hooked up to iommu-dma (and even historically the behaviour in this area was at best inconsistent across architectures/drivers...) Although in the case that an IOMMU is present whose driver has failed to probe, users cannot necessarily expect DMA to work anyway, it's still arguable that we should do our best to put things back as if the IOMMU driver was never there at all, and certainly the potential for crashing in iommu-dma itself is undesirable. Make sure we clean up the dev->dma_iommu flag along with everything else. Reported-by: Chen-Yu Tsai <wenst@chromium.org> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Closes: https://lore.kernel.org/all/CAGXv+5HJpTYmQ2h-GD7GjyeYT7bL9EBCvu0mz5LgpzJZtzfW0w@mail.gmail.com/ Tested-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/e788aa927f6d827dd4ea1ed608fada79f2bab030.1744284228.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/vt-d: Remove an unnecessary call set_dma_ops()Petr Tesarik
Do not touch per-device DMA ops when the driver has been converted to use the dma-iommu API. Fixes: c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Signed-off-by: Petr Tesarik <ptesarik@suse.com> Link: https://lore.kernel.org/r/20250403165605.278541-1-ptesarik@suse.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu/vt-d: Wire up irq_ack() to irq_move_irq() for posted MSIsSean Christopherson
Set the posted MSI irq_chip's irq_ack() hook to irq_move_irq() instead of a dummy/empty callback so that posted MSIs process pending changes to the IRQ's SMP affinity. Failure to honor a pending set-affinity results in userspace being unable to change the effective affinity of the IRQ, as IRQD_SETAFFINITY_PENDING is never cleared and so irq_set_affinity_locked() always defers moving the IRQ. The issue is most easily reproducible by setting /proc/irq/xx/smp_affinity multiple times in quick succession, as only the first update is likely to be handled in process context. Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs") Cc: Robert Lippert <rlippert@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Wentao Yang <wentaoyang@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250321194249.1217961-1-seanjc@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-11iommu: Fix crash in report_iommu_fault()Fedor Pchelkin
The following crash is observed while handling an IOMMU fault with a recent kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: ffff8c708299f700 PGD 19ee01067 P4D 19ee01067 PUD 101c10063 PMD 80000001028001e3 Oops: Oops: 0011 [#1] SMP NOPTI CPU: 4 UID: 0 PID: 139 Comm: irq/25-AMD-Vi Not tainted 6.15.0-rc1+ #20 PREEMPT(lazy) Hardware name: LENOVO 21D0/LNVNB161216, BIOS J6CN50WW 09/27/2024 RIP: 0010:0xffff8c708299f700 Call Trace: <TASK> ? report_iommu_fault+0x78/0xd3 ? amd_iommu_report_page_fault+0x91/0x150 ? amd_iommu_int_thread+0x77/0x180 ? __pfx_irq_thread_fn+0x10/0x10 ? irq_thread_fn+0x23/0x60 ? irq_thread+0xf9/0x1e0 ? __pfx_irq_thread_dtor+0x10/0x10 ? __pfx_irq_thread+0x10/0x10 ? kthread+0xfc/0x240 ? __pfx_kthread+0x10/0x10 ? ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ? ret_from_fork_asm+0x1a/0x30 </TASK> report_iommu_fault() checks for an installed handler comparing the corresponding field to NULL. It can (and could before) be called for a domain with a different cookie type - IOMMU_COOKIE_DMA_IOVA, specifically. Cookie is represented as a union so we may end up with a garbage value treated there if this happens for a domain with another cookie type. Formerly there were two exclusive cookie types in the union. IOMMU_DOMAIN_SVA has a dedicated iommu_report_device_fault(). Call the fault handler only if the passed domain has a required cookie type. Found by Linux Verification Center (linuxtesting.org). Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250408213342.285955-1-pchelkin@ispras.ru Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Two significant new items: - Allow reporting IOMMU HW events to userspace when the events are clearly linked to a device. This is linked to the VIOMMU object and is intended to be used by a VMM to forward HW events to the virtual machine as part of emulating a vIOMMU. ARM SMMUv3 is the first driver to use this mechanism. Like the existing fault events the data is delivered through a simple FD returning event records on read(). - PASID support in VFIO. The "Process Address Space ID" is a PCI feature that allows the device to tag all PCI DMA operations with an ID. The IOMMU will then use the ID to select a unique translation for those DMAs. This is part of Intel's vIOMMU support as VT-D HW requires the hypervisor to manage each PASID entry. The support is generic so any VFIO user could attach any translation to a PASID, and the support should work on ARM SMMUv3 as well. AMD requires additional driver work. Some minor updates, along with fixes: - Prevent using nested parents with fault's, no driver support today - Put a single "cookie_type" value in the iommu_domain to indicate what owns the various opaque owner fields" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits) iommufd: Test attach before detaching pasid iommufd: Fix iommu_vevent_header tables markup iommu: Convert unreachable() to BUG() iommufd: Balance veventq->num_events inc/dec iommufd: Initialize the flags of vevent in iommufd_viommu_report_event() iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices ida: Add ida_find_first_range() iommufd/selftest: Add coverage for iommufd pasid attach/detach iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add a helper to get test device iommufd/selftest: Add set_dev_pasid in mock iommu iommufd: Allow allocating PASID-compatible domain iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support iommufd: Enforce PASID-compatible domain for RID iommufd: Support pasid attach/replace iommufd: Enforce PASID-compatible domain in PASID path iommufd/device: Add pasid_attach array to track per-PASID attach ...
2025-03-28iommufd: Test attach before detaching pasidYi Liu
Check if the pasid has been attached before going further in the detach path. This fixes a crash found by syzkaller. Add a selftest as well. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 668 Comm: repro Not tainted 6.14.0-next-20250325-eb4bc4b07f66 #1 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org4 RIP: 0010:iommufd_hw_pagetable_detach+0x8a/0x4d0 Code: 00 00 00 44 89 ee 48 89 c7 48 89 75 c8 48 89 45 c0 e8 ca 55 17 02 48 89 c2 49 89 c4 48 b8 00 00 00b RSP: 0018:ffff888021b17b78 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff888014b5a000 RCX: ffff888021b17a64 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88801dad07fc RBP: ffff888021b17bc8 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: ffff88801dad0e58 R12: 0000000000000000 R13: 0000000000000001 R14: ffff888021b17e18 R15: ffff8880132d3008 FS: 00007fca52013600(0000) GS:ffff8880e3684000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200006c0 CR3: 00000000112d0005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> iommufd_device_detach+0x2a/0x2e0 iommufd_test+0x2f99/0x5cd0 iommufd_fops_ioctl+0x38e/0x520 __x64_sys_ioctl+0x1ba/0x220 x64_sys_call+0x122e/0x2150 do_syscall_64+0x6d/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Link: https://patch.msgid.link/r/20250328133448.22052-1-yi.l.liu@intel.com Reported-by: Lai Yi <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/linux-iommu/Z+X0tzxhiaupJT7b@ly-workstation Fixes: c0e301b2978d ("iommufd/device: Add pasid_attach array to track per-PASID attach") Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommu: Convert unreachable() to BUG()Josh Poimboeuf
Bare unreachable() should be avoided as it generates undefined behavior, e.g. falling through to the next function. Use BUG() instead so the error is defined. Fixes the following warnings: drivers/iommu/dma-iommu.o: warning: objtool: iommu_dma_sw_msi+0x92: can't find jump dest instruction at .text+0x54d5 vmlinux.o: warning: objtool: iommu_dma_get_msi_page() falls through to next function __iommu_dma_unmap() Link: https://patch.msgid.link/r/0c801ae017ec078cacd39f8f0898fc7780535f85.1743053325.git.jpoimboe@kernel.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/314f8809-cd59-479b-97d7-49356bf1c8d1@infradead.org Reported-by: Paul E. McKenney <paulmck@kernel.org> Closes: https://lore.kernel.org/5dd1f35e-8ece-43b7-ad6d-86d02d2718f6@paulmck-laptop Fixes: 6aa63a4ec947 ("iommu: Sort out domain user data") Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommufd: Balance veventq->num_events inc/decYi Liu
iommufd_veventq_fops_read() decrements veventq->num_events when a vevent is read out. However, the report path ony increments veventq->num_events for normal events. To be balanced, make the read path decrement num_events only for normal vevents. Fixes: e36ba5ab808e ("iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC") Link: https://patch.msgid.link/r/20250324120034.5940-3-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()Yi Liu
The vevent->header.flags is not initialized per allocation, hence the vevent read path may treat the vevent as lost_events_header wrongly. Use kzalloc() to alloc memory for new vevent. Fixes: e8e1ef9b77a7 ("iommufd/viommu: Add iommufd_viommu_report_event helper") Link: https://patch.msgid.link/r/20250324120034.5940-2-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-28iommufd: Extend IOMMU_GET_HW_INFO to report PASID capabilityYi Liu
PASID usage requires PASID support in both device and IOMMU. Since the iommu drivers always enable the PASID capability for the device if it is supported, this extends the IOMMU_GET_HW_INFO to report the PASID capability to userspace. Also, enhances the selftest accordingly. Link: https://patch.msgid.link/r/20250321180143.8468-5-yi.l.liu@intel.com Cc: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> #aarch64 platform Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-26Merge tag 'iommu-updates-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core iommufd dependencies from Jason: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API Core fixes for probe-issues from Robin Intel VT-d changes: - Checking for SVA support in domain allocation and attach paths - Move PCI ATS and PRI configuration into probe paths - Fix a pentential hang on reboot -f - Miscellaneous cleanups AMD-Vi changes: - Support for up to 2k IRQs per PCI device function - Set of smaller fixes ARM-SMMU changes: - SMMUv2 devicetree binding updates for Qualcomm implementations (QCS8300 GPU and MSM8937) - Clean up SMMUv2 runtime PM implementation to help with wider rework of pm_runtime_put_autosuspend() Rockchip driver changes: - Driver adjustments for recent DT probing changes S390 IOMMU changes: - Support for IOMMU passthrough Apple Dart changes: - Driver adjustments to meet ISP device requirements - Null-ptr deref fix - Disable subpage protection for DART 1" * tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommu/vt-d: Fix possible circular locking dependency iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled iommu: apple-dart: fix potential null pointer deref iommu/rockchip: Retire global dma_dev workaround iommu/rockchip: Register in a sensible order iommu/rockchip: Allocate per-device data sensibly iommu/mediatek-v1: Support COMPILE_TEST iommu/amd: Enable support for up to 2K interrupts per function iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro iommu/amd: Replace slab cache allocator with page allocator iommu/amd: Introduce generic function to set multibit feature value iommu: Don't warn prematurely about dodgy probes iommu/arm-smmu: Set rpm auto_suspend once during probe dt-bindings: arm-smmu: Document QCS8300 GPU SMMU iommu: Get DT/ACPI parsing into the proper probe path iommu: Keep dev->iommu state consistent iommu: Resolve ops in iommu_init_device() iommu: Handle race with default domain setup iommu: Unexport iommu_fwspec_free() ...
2025-03-25Merge tag 'hyperv-next-signed-20250324' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Add support for running as the root partition in Hyper-V (Microsoft Hypervisor) by exposing /dev/mshv (Nuno and various people) - Add support for CPU offlining in Hyper-V (Hamza Mahfooz) - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael Kelley, Thorsten Blum) * tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: fix an indentation issue in mshyperv.h x86/hyperv: Add comments about hv_vpset and var size hypercall input args Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs hyperv: Add definitions for root partition driver to hv headers x86: hyperv: Add mshv_handler() irq handler and setup function Drivers: hv: Introduce per-cpu event ring tail Drivers: hv: Export some functions for use by root partition module acpi: numa: Export node_to_pxm() hyperv: Introduce hv_recommend_using_aeoi() arm64/hyperv: Add some missing functions to arm64 x86/mshyperv: Add support for extended Hyper-V features hyperv: Log hypercall status codes as strings x86/hyperv: Fix check of return value from snp_set_vmsa() x86/hyperv: Add VTL mode callback for restarting the system x86/hyperv: Add VTL mode emergency restart callback hyperv: Remove unused union and structs hyperv: Add CONFIG_MSHV_ROOT to gate root partition support hyperv: Change hv_root_partition into a function hyperv: Convert hypercall statuses to linux error codes drivers/hv: add CPU offlining support ...
2025-03-25iommufd/selftest: Add test ops to test pasid attach/detachYi Liu
This adds 4 test ops for pasid attach/replace/detach testing. There are ops to attach/detach pasid, and also op to check the attached hwpt of a pasid. Link: https://patch.msgid.link/r/20250321171940.7213-18-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/selftest: Add a helper to get test deviceYi Liu
There is need to get the selftest device (sobj->type == TYPE_IDEV) in multiple places, so have a helper to for it. Link: https://patch.msgid.link/r/20250321171940.7213-17-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/selftest: Add set_dev_pasid in mock iommuYi Liu
The callback is needed to make pasid_attach/detach path complete for mock device. A nop is enough for set_dev_pasid. A MOCK_FLAGS_DEVICE_PASID is added to indicate a pasid-capable mock device for the pasid test cases. Other test cases will still create a non-pasid mock device. While the mock iommu always pretends to be pasid-capable. Link: https://patch.msgid.link/r/20250321171940.7213-16-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Allow allocating PASID-compatible domainYi Liu
The underlying infrastructure has supported the PASID attach and related enforcement per the requirement of the IOMMU_HWPT_ALLOC_PASID flag. This extends iommufd to support PASID compatible domain requested by userspace. Link: https://patch.msgid.link/r/20250321171940.7213-15-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID supportYi Liu
Intel iommu driver just treats it as a nop since Intel VT-d does not have special requirement on domains attached to either the PASID or RID of a PASID-capable device. Link: https://patch.msgid.link/r/20250321171940.7213-14-yi.l.liu@intel.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Enforce PASID-compatible domain for RIDYi Liu
Per the definition of IOMMU_HWPT_ALLOC_PASID, iommufd needs to enforce the RID to use PASID-compatible domain if PASID has been attached, and vice versa. The PASID path has already enforced it. This adds the enforcement in the RID path. This enforcement requires a lock across the RID and PASID attach path, the idev->igroup->lock is used as both the RID and the PASID path holds it. Link: https://patch.msgid.link/r/20250321171940.7213-13-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Support pasid attach/replaceYi Liu
This extends the below APIs to support PASID. Device drivers to manage pasid attach/replace/detach. int iommufd_device_attach(struct iommufd_device *idev, ioasid_t pasid, u32 *pt_id); int iommufd_device_replace(struct iommufd_device *idev, ioasid_t pasid, u32 *pt_id); void iommufd_device_detach(struct iommufd_device *idev, ioasid_t pasid); The pasid operations share underlying attach/replace/detach infrastructure with the device operations, but still have some different implications: - no reserved region per pasid otherwise SVA architecture is already broken (CPU address space doesn't count device reserved regions); - accordingly no sw_msi trick; Cache coherency enforcement is still applied to pasid operations since it is about memory accesses post page table walking (no matter the walk is per RID or per PASID). Link: https://patch.msgid.link/r/20250321171940.7213-12-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Enforce PASID-compatible domain in PASID pathYi Liu
AMD IOMMU requires attaching PASID-compatible domains to PASID-capable devices. This includes the domains attached to RID and PASIDs. Related discussions in link [1] and [2]. ARM also has such a requirement, Intel does not need it, but can live up with it. Hence, iommufd is going to enforce this requirement as it is not harmful to vendors that do not need it. Mark the PASID-compatible domains and enforce it in the PASID path. [1] https://lore.kernel.org/linux-iommu/20240709182303.GK14050@ziepe.ca/ [2] https://lore.kernel.org/linux-iommu/20240822124433.GD3468552@ziepe.ca/ Link: https://patch.msgid.link/r/20250321171940.7213-11-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Add pasid_attach array to track per-PASID attachYi Liu
PASIDs of PASID-capable device can be attached to hwpt separately, hence a pasid array to track per-PASID attachment is necessary. The index IOMMU_NO_PASID is used by the RID path. Hence drop the igroup->attach. Link: https://patch.msgid.link/r/20250321171940.7213-10-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Replace device_list with device_arrayYi Liu
igroup->attach->device_list is used to track attached device of a group in the RID path. Such tracking is also needed in the PASID path in order to share path with the RID path. While there is only one list_head in the iommufd_device. It cannot work if the device has been attached in both RID path and PASID path. To solve it, replacing the device_list with an xarray. The attached iommufd_device is stored in the entry indexed by the idev->obj.id. Link: https://patch.msgid.link/r/20250321171940.7213-9-yi.l.liu@intel.com Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Wrap igroup->hwpt and igroup->device_list into attach structYi Liu
The igroup->hwpt and igroup->device_list are used to track the hwpt attach of a group in the RID path. While the coming PASID path also needs such tracking. To be prepared, wrap igroup->hwpt and igroup->device_list into attach struct which is allocated per attaching the first device of the group and freed per detaching the last device of the group. Link: https://patch.msgid.link/r/20250321171940.7213-8-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Add helper to detect the first attach of a groupYi Liu
The existing code detects the first attach by checking the igroup->device_list. However, the igroup->hwpt can also be used to detect the first attach. In future modifications, it is better to check the igroup->hwpt instead of the device_list. To improve readbility and also prepare for further modifications on this part, this adds a helper for it. Link: https://patch.msgid.link/r/20250321171940.7213-7-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Replace idev->igroup with local variableYi Liu
With more use of the fields of igroup, use a local vairable instead of using the idev->igroup heavily. No functional change expected. Link: https://patch.msgid.link/r/20250321171940.7213-6-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd/device: Only add reserved_iova in non-pasid pathYi Liu
As the pasid is passed through the attach/replace/detach helpers, it is necessary to ensure only the non-pasid path adds reserved_iova. Link: https://patch.msgid.link/r/20250321171940.7213-5-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-25iommufd: Pass @pasid through the device attach/replace pathYi Liu
Most of the core logic before conducting the actual device attach/ replace operation can be shared with pasid attach/replace. So pass @pasid through the device attach/replace helpers to prepare adding pasid attach/replace. So far the @pasid should only be IOMMU_NO_PASID. No functional change. Link: https://patch.msgid.link/r/20250321171940.7213-4-yi.l.liu@intel.com Signed-off-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>