summaryrefslogtreecommitdiff
path: root/drivers/iommu/Kconfig
AgeCommit message (Collapse)Author
12 daysMerge branch 'apple/dart' into nextWill Deacon
* apple/dart: iommu/apple-dart: Drop default ARCH_APPLE in Kconfig
2025-07-04iommu/intel: Convert to msi_create_parent_irq_domain() helperMarc Zyngier
Now that we have a concise helper to create an MSI parent domain, switch the Intel IOMMU remapping over to that. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241204124549.607054-10-maz@kernel.org Link: https://lore.kernel.org/r/169c793c50be8493cfd9d11affb00e9ed6341c36.1750858125.git.namcao@linutronix.de Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-06-27iommu/apple-dart: Drop default ARCH_APPLE in KconfigSven Peter
When the first driver for Apple Silicon was upstreamed we accidentally included `default ARCH_APPLE` in its Kconfig which then spread to almost every subsequent driver. As soon as ARCH_APPLE is set to y this will pull in many drivers as built-ins which is not what we want. Thus, drop `default ARCH_APPLE` from Kconfig. Signed-off-by: Sven Peter <sven@kernel.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Janne Grunau <j@jannau.net> Link: https://lore.kernel.org/r/20250612-apple-kconfig-defconfig-v1-7-0e6f9cb512c1@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-05-16iommu: remove duplicate selection of DMAR_TABLERolf Eike Beer
This is already done in intel/Kconfig. Fixes: 70bad345e622 ("iommu: Fix compilation without CONFIG_IOMMU_INTEL") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/2232605.Mh6RI2rZIc@devpool92.emlix.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-04-17iommu: Split out and tidy up Arm KconfigRobin Murphy
There are quite a lot of options for the Arm drivers, still all buried in the top-level Kconfig. For ease of use and consistency with all the other subdirectories, break these out into drivers/arm. For similar clarity and self-consistency, also tweak the ARM_SMMU sub-options to use "if" instead of "depends", to match ARM_SMMU_V3. Lastly also clean up the slightly messy description of ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT as highlighted by Geert - by now we really shouldn't need commentary on v4.x kernel behaviour anyway - and downgrade it to EXPERT as the first step in the 6-year-old threat to remove it entirely. Cc: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/a614ec86ba78c09cd16e348f633f6bb38793391f.1742480488.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-13iommu/mediatek-v1: Support COMPILE_TESTRobin Murphy
Add COMPILE_TEST support to Mediatek v1 so I have less chance of breaking it again in future. This just needs the ARM dma-iommu API stubbing out for now. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/a8d7c0b4393b360ce556f5f15e229d1e2fe73c83.1741702556.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-21irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need itJason Gunthorpe
Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll make more sense for irqchips that call prepare/compose to select it, and that will trigger all the additional code and data to be compiled into the kernel. If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the prepare/compose() will be NOP stubs. If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on the iommu side is compiled out. Link: https://patch.msgid.link/r/a2620f67002c5cdf974e89ca3bf905f5c0817be6.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-07iommu/arm-smmu: Re-enable context caching in smmu reset operationBibek Kumar Patro
Default MMU-500 reset operation disables context caching in prefetch buffer. It is however expected for context banks using the ACTLR register to retain their prefetch value during reset and runtime suspend. Add config 'ARM_SMMU_MMU_500_CPRE_ERRATA' to gate this errata workaround in default MMU-500 reset operation which defaults to 'Y' and provide option to disable workaround for context caching in prefetch buffer as and when needed. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Bibek Kumar Patro <quic_bibekkum@quicinc.com> Link: https://lore.kernel.org/r/20241212151402.159102-2-quic_bibekkum@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-15Merge branches 'intel/vt-d', 'amd/amd-vi' and 'iommufd/arm-smmuv3-nested' ↵Joerg Roedel
into next
2024-11-05iommu/arm-smmu-v3: Support IOMMU_GET_HW_INFO via struct arm_smmu_hw_infoNicolin Chen
For virtualization cases the IDR/IIDR/AIDR values of the actual SMMU instance need to be available to the VMM so it can construct an appropriate vSMMUv3 that reflects the correct HW capabilities. For userspace page tables these values are required to constrain the valid values within the CD table and the IOPTEs. The kernel does not sanitize these values. If building a VMM then userspace is required to only forward bits into a VM that it knows it can implement. Some bits will also require a VMM to detect if appropriate kernel support is available such as for ATS and BTM. Start a new file and kconfig for the advanced iommufd support. This lets it be compiled out for kernels that are not intended to support virtualization, and allows distros to leave it disabled until they are shipping a matching qemu too. Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Donald Dutile <ddutile@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v4-9e99b76f3518+3a8-smmuv3_nesting_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-10-29iommu/riscv: Add RISC-V IOMMU platform device driverTomasz Jeznach
Introduce platform device driver for implementation of RISC-V IOMMU architected hardware. Hardware interface definition located in file iommu-bits.h is based on ratified RISC-V IOMMU Architecture Specification version 1.0.0. This patch implements platform device initialization, early check and configuration of the IOMMU interfaces and enables global pass-through address translation mode (iommu_mode == BARE), without registering hardware instance in the IOMMU subsystem. Link: https://github.com/riscv-non-isa/riscv-iommu Co-developed-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/2f2e4530c0ee4a81385efa90f1da932f5179f3fb.1729059707.git.tjeznach@rivosinc.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-09-19Merge tag 'dma-mapping-6.12-2024-09-19' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - support DMA zones for arm64 systems where memory starts at > 4GB (Baruch Siach, Catalin Marinas) - support direct calls into dma-iommu and thus obsolete dma_map_ops for many common configurations (Leon Romanovsky) - add DMA-API tracing (Sean Anderson) - remove the not very useful return value from various dma_set_* APIs (Christoph Hellwig) - misc cleanups and minor optimizations (Chen Y, Yosry Ahmed, Christoph Hellwig) * tag 'dma-mapping-6.12-2024-09-19' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reflow dma_supported dma-mapping: reliably inform about DMA support for IOMMU dma-mapping: add tracing for dma-mapping API calls dma-mapping: use IOMMU DMA calls for common alloc/free page calls dma-direct: optimize page freeing when it is not addressable dma-mapping: clearly mark DMA ops as an architecture feature vdpa_sim: don't select DMA_OPS arm64: mm: keep low RAM dma zone dma-mapping: don't return errors from dma_set_max_seg_size dma-mapping: don't return errors from dma_set_seg_boundary dma-mapping: don't return errors from dma_set_min_align_mask scsi: check that busses support the DMA API before setting dma parameters arm64: mm: fix DMA zone when dma-ranges is missing dma-mapping: direct calls for dma-iommu dma-mapping: call ->unmap_page and ->unmap_sg unconditionally arm64: support DMA zone above 4GB dma-mapping: replace zone_dma_bits by zone_dma_limit dma-mapping: use bit masking to check VM_DMA_COHERENT
2024-08-30iommu/arm-smmu-v3: Add in-kernel support for NVIDIA Tegra241 (Grace) CMDQVNate Watterson
NVIDIA's Tegra241 Soc has a CMDQ-Virtualization (CMDQV) hardware, extending the standard ARM SMMU v3 IP to support multiple VCMDQs with virtualization capabilities. In terms of command queue, they are very like a standard SMMU CMDQ (or ECMDQs), but only support CS_NONE in the CS field of CMD_SYNC. Add a new tegra241-cmdqv driver, and insert its structure pointer into the existing arm_smmu_device, and then add related function calls in the SMMUv3 driver to interact with the CMDQV driver. In the CMDQV driver, add a minimal part for the in-kernel support: reserve VINTF0 for in-kernel use, and assign some of the VCMDQs to the VINTF0, and select one VCMDQ based on the current CPU ID to execute supported commands. This multi-queue design for in-kernel use gives some limited improvements: up to 20% reduction of invalidation time was measured by a multi-threaded DMA unmap benchmark, compared to a single queue. The other part of the CMDQV driver will be user-space support that gives a hypervisor running on the host OS to talk to the driver for virtualization use cases, allowing VMs to use VCMDQs without trappings, i.e. no VM Exits. This is designed based on IOMMUFD, and its RFC series is also under review. It will provide a guest OS a bigger improvement: 70% to 90% reductions of TLB invalidation time were measured by DMA unmap tests running in a guest, compared to nested SMMU CMDQ (with trappings). As the initial version, the CMDQV driver only supports ACPI configurations. Signed-off-by: Nate Watterson <nwatterson@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/dce50490b2c10b7254fb36aa73ed7ffd812b283a.1724970714.git.nicolinc@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-22dma-mapping: direct calls for dma-iommuLeon Romanovsky
Directly call into dma-iommu just like we have been doing for dma-direct for a while. This avoids the indirect call overhead for IOMMU ops and removes the need to have DMA ops entirely for many common configurations. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2024-07-03iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mappingKunkun Jiang
If io-pgtable quirk flag indicates support for hardware update of dirty state, enable HA/HD bits in the SMMU CD and also set the DBM bit in the page descriptor. Now report the dirty page tracking capability of SMMUv3 and select IOMMUFD_DRIVER for ARM_SMMU_V3 if IOMMUFD is enabled. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Link: https://lore.kernel.org/r/20240703101604.2576-6-shameerali.kolothum.thodi@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-05-10iommu/arm-smmu-v3: Make the kunit into a moduleJason Gunthorpe
It turns out kconfig has problems ensuring the SMMU module and the KUNIT module are consistently y/m to allow linking. It will permit KUNIT to be a module while SMMU is built in. Also, Fedora apparently enables kunit on production kernels. So, put the entire kunit in its own module using the VISIBLE_IF_KUNIT/EXPORT_SYMBOL_IF_KUNIT machinery. This keeps it out of vmlinus on Fedora and makes the kconfig work in the normal way. There is no cost if kunit is disabled. Fixes: 56e1a4cc2588 ("iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry") Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Link: https://lore.kernel.org/all/aeea8546-5bce-4c51-b506-5d2008e52fef@leemhuis.info Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/0-v1-24cba6c0f404+2ae-smmu_kunit_module_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-05-01iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entryJason Gunthorpe
Add tests for some of the more common STE update operations that we expect to see, as well as some artificial STE updates to test the edges of arm_smmu_write_entry. These also serve as a record of which common operation is expected to be hitless, and how many syncs they require. arm_smmu_write_entry implements a generic algorithm that updates an STE/CD to any other abritrary STE/CD configuration. The update requires a sequence of write+sync operations with some invariants that must be held true after each sync. arm_smmu_write_entry lends itself well to unit-testing since the function's interaction with the STE/CD is already abstracted by input callbacks that we can hook to introspect into the sequence of operations. We can use these hooks to guarantee that invariants are held throughout the entire update operation. Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Michael Shavit <mshavit@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/9-v9-5040dc602008+177d7-smmuv3_newapi_p2_jgg@nvidia.com Signed-off-by: Will Deacon <will@kernel.org>
2024-05-01iommu/arm-smmu-qcom: Don't build debug features as a kernel moduleWill Deacon
The Qualcomm TBU debug support introduced by 414ecb030870 ("iommu/arm-smmu-qcom-debug: Add support for TBUs") provides its own driver initialisation function, which breaks the link when the core SMMU driver is built as a module: ld.lld: error: duplicate symbol: init_module >>> defined at arm-smmu.c >>> drivers/iommu/arm/arm-smmu/arm-smmu.o:(init_module) >>> defined at arm-smmu-qcom-debug.c >>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom-debug.o:(.init.text+0x4) Since we're late in the cycle, just make the debug features depend on a non-modular SMMU driver for now while the initialisation is reworked to hang off qcom_smmu_impl_init(). Signed-off-by: Will Deacon <will@kernel.org>
2024-04-18iommu/arm-smmu-qcom-debug: Add support for TBUsGeorgi Djakov
Operating the TBUs (Translation Buffer Units) from Linux on Qualcomm platforms can help with debugging context faults. To help with that, the TBUs can run ATOS (Address Translation Operations) to manually trigger address translation of IOVA to physical address in hardware and provide more details when a context fault happens. The driver will control the resources needed by the TBU to allow running the debug operations such as ATOS, check for outstanding transactions, do snapshot capture etc. Signed-off-by: Georgi Djakov <quic_c_gdjako@quicinc.com> Link: https://lore.kernel.org/r/20240417133731.2055383-3-quic_c_gdjako@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2024-03-13Merge tag 'iommu-updates-v6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Constification of bus_type pointer - Preparations for user-space page-fault delivery - Use a named kmem_cache for IOVA magazines Intel VT-d changes from Lu Baolu: - Add RBTree to track iommu probed devices - Add Intel IOMMU debugfs document - Cleanup and refactoring ARM-SMMU Updates from Will Deacon: - Device-tree binding updates for a bunch of Qualcomm SoCs - SMMUv2: Support for Qualcomm X1E80100 MDSS - SMMUv3: Significant rework of the driver's STE manipulation and domain handling code. This is the initial part of a larger scale rework aiming to improve the driver's implementation of the IOMMU-API in preparation for hooking up IOMMUFD support. AMD-Vi Updates: - Refactor GCR3 table support for SVA - Cleanups Some smaller cleanups and fixes" * tag 'iommu-updates-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (88 commits) iommu: Fix compilation without CONFIG_IOMMU_INTEL iommu/amd: Fix sleeping in atomic context iommu/dma: Document min_align_mask assumption iommu/vt-d: Remove scalabe mode in domain_context_clear_one() iommu/vt-d: Remove scalable mode context entry setup from attach_dev iommu/vt-d: Setup scalable mode context entry in probe path iommu/vt-d: Fix NULL domain on device release iommu: Add static iommu_ops->release_domain iommu/vt-d: Improve ITE fault handling if target device isn't present iommu/vt-d: Don't issue ATS Invalidation request when device is disconnected PCI: Make pci_dev_is_disconnected() helper public for other drivers iommu/vt-d: Use device rbtree in iopf reporting path iommu/vt-d: Use rbtree to track iommu probed devices iommu/vt-d: Merge intel_svm_bind_mm() into its caller iommu/vt-d: Remove initialization for dynamically heap-allocated rcu_head iommu/vt-d: Remove treatment for revoking PASIDs with pending page faults iommu/vt-d: Add the document for Intel IOMMU debugfs iommu/vt-d: Use kcalloc() instead of kzalloc() iommu/vt-d: Remove INTEL_IOMMU_BROKEN_GFX_WA iommu: re-use local fwnode variable in iommu_ops_from_fwnode() ...
2024-03-08Merge branches 'arm/mediatek', 'arm/renesas', 'arm/smmu', 'x86/vt-d', ↵Joerg Roedel
'x86/amd' and 'core' into next
2024-03-08iommu: Fix compilation without CONFIG_IOMMU_INTELBert Karwatzki
When the kernel is comiled with CONFIG_IRQ_REMAP=y but without CONFIG_IOMMU_INTEL compilation fails since commit def054b01a8678 with an undefined reference to device_rbtree_find(). This patch makes sure that intel specific code is only compiled with CONFIG_IOMMU_INTEL=y. Signed-off-by: Bert Karwatzki <spasswolf@web.de> Fixes: 80a9b50c0b9e ("iommu/vt-d: Improve ITE fault handling if target device isn't present") Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20240307194419.15801-1-spasswolf@web.de Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-16iommu: Separate SVA and IOPFLu Baolu
Add CONFIG_IOMMU_IOPF for page fault handling framework and select it from its real consumer. Move iopf function declaration from iommu-sva.h to iommu.h and remove iommu-sva.h as it's empty now. Consolidate all SVA related code into iommu-sva.c: - Move iommu_sva_domain_alloc() from iommu.c to iommu-sva.c. - Move sva iopf handling code from io-pgfault.c to iommu-sva.c. Consolidate iommu_report_device_fault() and iommu_page_response() into io-pgfault.c. Export iopf_free_group() and iopf_group_response() for iopf handlers implemented in modules. Some functions are renamed with more meaningful names. No other intentional functionality changes. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20240212012227.119381-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-02-06iommu/msm-iommu: don't limit the driver too muchDmitry Baryshkov
In preparation of dropping most of ARCH_QCOM subtypes, stop limiting the driver just to those machines. Allow it to be built for any 32-bit Qualcomm platform (ARCH_QCOM). Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Link: https://lore.kernel.org/r/20231216162700.863456-2-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2023-12-12iommu: Change kconfig around IOMMU_SVAJason Gunthorpe
Linus suggested that the kconfig here is confusing: https://lore.kernel.org/all/CAHk-=wgUiAtiszwseM1p2fCJ+sC4XWQ+YN4TanFhUgvUqjr9Xw@mail.gmail.com/ Let's break it into three kconfigs controlling distinct things: - CONFIG_IOMMU_MM_DATA controls if the mm_struct has the additional fields for the IOMMU. Currently only PASID, but later patches store a struct iommu_mm_data * - CONFIG_ARCH_HAS_CPU_PASID controls if the arch needs the scheduling bit for keeping track of the ENQCMD instruction. x86 will select this if IOMMU_SVA is enabled - IOMMU_SVA controls if the IOMMU core compiles in the SVA support code for iommu driver use and the IOMMU exported API This way ARM will not enable CONFIG_ARCH_HAS_CPU_PASID Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231027000525.1278806-2-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-09Merge tag 'iommu-updates-v6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Make default-domains mandatory for all IOMMU drivers - Remove group refcounting - Add generic_single_device_group() helper and consolidate drivers - Cleanup map/unmap ops - Scaling improvements for the IOVA rcache depot - Convert dart & iommufd to the new domain_alloc_paging() ARM-SMMU: - Device-tree binding update: - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC - SMMUv2: - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs - SMMUv3: - Large refactoring of the context descriptor code to move the CD table into the master, paving the way for '->set_dev_pasid()' support on non-SVA domains - Minor cleanups to the SVA code Intel VT-d: - Enable debugfs to dump domain attached to a pasid - Remove an unnecessary inline function AMD IOMMU: - Initial patches for SVA support (not complete yet) S390 IOMMU: - DMA-API conversion and optimized IOTLB flushing And some smaller fixes and improvements" * tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits) iommu/dart: Remove the force_bypass variable iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging() iommu/dart: Convert to domain_alloc_paging() iommu/dart: Move the blocked domain support to a global static iommu/dart: Use static global identity domains iommufd: Convert to alloc_domain_paging() iommu/vt-d: Use ops->blocked_domain iommu/vt-d: Update the definition of the blocking domain iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain Revert "iommu/vt-d: Remove unused function" iommu/amd: Remove DMA_FQ type from domain allocation path iommu: change iommu_map_sgtable to return signed values iommu/virtio: Add __counted_by for struct viommu_request and use struct_size() iommu/vt-d: debugfs: Support dumping a specified page table iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} iommu/vt-d: debugfs: Dump entry pointing to huge page iommu/vt-d: Remove unused function iommu/arm-smmu-v3-sva: Remove bond refcount iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle iommu/arm-smmu-v3: Rename cdcfg to cd_table ...
2023-11-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This brings three new iommufd capabilities: - Dirty tracking for DMA. AMD/ARM/Intel CPUs can now record if a DMA writes to a page in the IOPTEs within the IO page table. This can be used to generate a record of what memory is being dirtied by DMA activities during a VM migration process. A VMM like qemu will combine the IOMMU dirty bits with the CPU's dirty log to determine what memory to transfer. VFIO already has a DMA dirty tracking framework that requires PCI devices to implement tracking HW internally. The iommufd version provides an alternative that the VMM can select, if available. The two are designed to have very similar APIs. - Userspace controlled attributes for hardware page tables (HWPT/iommu_domain). There are currently a few generic attributes for HWPTs (support dirty tracking, and parent of a nest). This is an entry point for the userspace iommu driver to control the HW in detail. - Nested translation support for HWPTs. This is a 2D translation scheme similar to the CPU where a DMA goes through a first stage to determine an intermediate address which is then translated trough a second stage to a physical address. Like for CPU translation the first stage table would exist in VM controlled memory and the second stage is in the kernel and matches the VM's guest to physical map. As every IOMMU has a unique set of parameter to describe the S1 IO page table and its associated parameters the userspace IOMMU driver has to marshal the information into the correct format. This is 1/3 of the feature, it allows creating the nested translation and binding it to VFIO devices, however the API to support IOTLB and ATC invalidation of the stage 1 io page table, and forwarding of IO faults are still in progress. The series includes AMD and Intel support for dirty tracking. Intel support for nested translation. Along the way are a number of internal items: - New iommu core items: ops->domain_alloc_user(), ops->set_dirty_tracking, ops->read_and_clear_dirty(), IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user - UAF fix in iopt_area_split() - Spelling fixes and some test suite improvement" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (52 commits) iommufd: Organize the mock domain alloc functions closer to Joerg's tree iommufd/selftest: Fix page-size check in iommufd_test_dirty() iommufd: Add iopt_area_alloc() iommufd: Fix missing update of domains_itree after splitting iopt_area iommu/vt-d: Disallow read-only mappings to nest parent domain iommu/vt-d: Add nested domain allocation iommu/vt-d: Set the nested domain to a device iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Extend dmar_domain to support nested domain iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Enhance capability check for nested parent domain allocation iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs iommufd/selftest: Add nested domain allocation for mock domain iommu: Add iommu_copy_struct_from_user helper iommufd: Add a nested HW pagetable object iommu: Pass in parent domain with user_data to domain_alloc_user op iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable ...
2023-10-24vfio: Move iova_bitmap into iommufdJoao Martins
Both VFIO and IOMMUFD will need iova bitmap for storing dirties and walking the user bitmaps, so move to the common dependency into IOMMUFD. In doing so, create the symbol IOMMUFD_DRIVER which designates the builtin code that will be used by drivers when selected. Today this means MLX5_VFIO_PCI and PDS_VFIO_PCI. IOMMU drivers will do the same (in future patches) when supporting dirty tracking and select IOMMUFD_DRIVER accordingly. Given that the symbol maybe be disabled, add header definitions in iova_bitmap.h for when IOMMUFD_DRIVER=n Link: https://lore.kernel.org/r/20231024135109.73787-3-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-02s390/pci: Use dma-iommu layerNiklas Schnelle
While s390 already has a standard IOMMU driver and previous changes have added I/O TLB flushing operations this driver is currently only used for user-space PCI access such as vfio-pci. For the DMA API s390 instead utilizes its own implementation in arch/s390/pci/pci_dma.c which drives the same hardware and shares some code but requires a complex and fragile hand over between DMA API and IOMMU API use of a device and despite code sharing still leads to significant duplication and maintenance effort. Let's utilize the common code DMAP API implementation from drivers/iommu/dma-iommu.c instead allowing us to get rid of arch/s390/pci/pci_dma.c. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-3-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/tegra-gart: Remove tegra-gartJason Gunthorpe
Thierry says this is not used anymore, and doesn't think it makes sense as an iommu driver. The HW it supports is about 10 years old now and newer HW uses different IOMMU drivers. As this is the only driver with a GART approach, and it doesn't really meet the driver expectations from the IOMMU core, let's just remove it so we don't have to think about how to make it fit in. It has a number of identified problems: - The assignment of iommu_groups doesn't match the HW behavior - It claims to have an UNMANAGED domain but it is really an IDENTITY domain with a translation aperture. This is inconsistent with the core expectation for security sensitive operations - It doesn't implement a SW page table under struct iommu_domain so * It can't accept a map until the domain is attached * It forgets about all maps after the domain is detached * It doesn't clear the HW of maps once the domain is detached (made worse by having the wrong groups) Cc: Thierry Reding <treding@nvidia.com> Cc: Dmitry Osipenko <digetx@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v8-81230027b2fa+9d-iommu_all_defdom_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-11arch: Remove Itanium (IA-64) architectureArd Biesheuvel
The Itanium architecture is obsolete, and an informal survey [0] reveals that any residual use of Itanium hardware in production is mostly HP-UX or OpenVMS based. The use of Linux on Itanium appears to be limited to enthusiasts that occasionally boot a fresh Linux kernel to see whether things are still working as intended, and perhaps to churn out some distro packages that are rarely used in practice. None of the original companies behind Itanium still produce or support any hardware or software for the architecture, and it is listed as 'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers that contributed on behalf of those companies (nor anyone else, for that matter) have been willing to support or maintain the architecture upstream or even be responsible for applying the odd fix. The Intel firmware team removed all IA-64 support from the Tianocore/EDK2 reference implementation of EFI in 2018. (Itanium is the original architecture for which EFI was developed, and the way Linux supports it deviates significantly from other architectures.) Some distros, such as Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have dropped support years ago. While the argument is being made [1] that there is a 'for the common good' angle to being able to build and run existing projects such as the Grid Community Toolkit [2] on Itanium for interoperability testing, the fact remains that none of those projects are known to be deployed on Linux/ia64, and very few people actually have access to such a system in the first place. Even if there were ways imaginable in which Linux/ia64 could be put to good use today, what matters is whether anyone is actually doing that, and this does not appear to be the case. There are no emulators widely available, and so boot testing Itanium is generally infeasible for ordinary contributors. GCC still supports IA-64 but its compile farm [3] no longer has any IA-64 machines. GLIBC would like to get rid of IA-64 [4] too because it would permit some overdue code cleanups. In summary, the benefits to the ecosystem of having IA-64 be part of it are mostly theoretical, whereas the maintenance overhead of keeping it supported is real. So let's rip off the band aid, and remove the IA-64 arch code entirely. This follows the timeline proposed by the Debian/ia64 maintainer [5], which removes support in a controlled manner, leaving IA-64 in a known good state in the most recent LTS release. Other projects will follow once the kernel support is removed. [0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/ [1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/ [2] https://gridcf.org/gct-docs/latest/index.html [3] https://cfarm.tetaneutral.net/machines/list/ [4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/ [5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/ Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-19iommu/dma: force bouncing if the size is not cacheline-alignedCatalin Marinas
Similarly to the direct DMA, bounce small allocations as they may have originated from a kmalloc() cache not safe for DMA. Unlike the direct DMA, iommu_dma_map_sg() cannot call iommu_dma_map_sg_swiotlb() for all non-coherent devices as this would break some cases where the iova is expected to be contiguous (dmabuf). Instead, scan the scatterlist for any small sizes and only go the swiotlb path if any element of the list needs bouncing (note that iommu_dma_map_page() would still only bounce those buffers which are not DMA-aligned). To avoid scanning the scatterlist on the 'sync' operations, introduce an SG_DMA_SWIOTLB flag set by iommu_dma_map_sg_swiotlb(). The dev_use_swiotlb() function together with the newly added dev_use_sg_swiotlb() now check for both untrusted devices and unaligned kmalloc() buffers (suggested by Robin Murphy). Link: https://lkml.kernel.org/r/20230612153201.554742-16-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Mike Snitzer <snitzer@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-22iommu: Make IPMMU_VMSA dependencies more strictRandy Dunlap
On riscv64, linux-next-20233030 (and for several days earlier), there is a kconfig warning: WARNING: unmet direct dependencies detected for IOMMU_IO_PGTABLE_LPAE Depends on [n]: IOMMU_SUPPORT [=y] && (ARM || ARM64 || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n] Selected by [y]: - IPMMU_VMSA [=y] && IOMMU_SUPPORT [=y] && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) && !GENERIC_ATOMIC64 [=n] and build errors: riscv64-linux-ld: drivers/iommu/io-pgtable-arm.o: in function `.L140': io-pgtable-arm.c:(.init.text+0x1e8): undefined reference to `alloc_io_pgtable_ops' riscv64-linux-ld: drivers/iommu/io-pgtable-arm.o: in function `.L168': io-pgtable-arm.c:(.init.text+0xab0): undefined reference to `free_io_pgtable_ops' riscv64-linux-ld: drivers/iommu/ipmmu-vmsa.o: in function `.L140': ipmmu-vmsa.c:(.text+0xbc4): undefined reference to `free_io_pgtable_ops' riscv64-linux-ld: drivers/iommu/ipmmu-vmsa.o: in function `.L0 ': ipmmu-vmsa.c:(.text+0x145e): undefined reference to `alloc_io_pgtable_ops' Add ARM || ARM64 || COMPILE_TEST dependencies to IPMMU_VMSA to prevent these issues, i.e., so that ARCH_RENESAS on RISC-V is not allowed. This makes the ARCH dependencies become: depends on (ARCH_RENESAS && (ARM || ARM64)) || COMPILE_TEST but that can be a bit hard to read. Fixes: 8292493c22c8 ("riscv: Kconfig.socs: Add ARCH_RENESAS kconfig option") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: iommu@lists.linux.dev Cc: Conor Dooley <conor@kernel.org> Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230330165817.21920-1-rdunlap@infradead.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-05-17s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMUJason Gunthorpe
These don't do anything anymore, the only user of the symbol was VFIO_CCW/AP which already "depends on VFIO" and VFIO itself selects IOMMU_API. When this was added VFIO was wrongly doing "depends on IOMMU_API" which required some contortions like this to ensure IOMMU_API was turned on. Reviewed-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v2-eb322ce2e547+188f-rm_iommu_ccw_jgg@nvidia.com Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-03-31iommu: Remove ioasid infrastructureJason Gunthorpe
This has no use anymore, delete it all. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-8-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-22iommu: Spelling s/cpmxchg64/cmpxchg64/Geert Uytterhoeven
Fix misspellings of "cmpxchg64" Fixes: d286a58bc8f4d5cf ("iommu: Tidy up io-pgtable dependencies") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/eab156858147249d44463662eb9192202c39ab9f.1678295792.git.geert+renesas@glider.be Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-13iommu: Tidy up io-pgtable dependenciesRobin Murphy
Some io-pgtable implementations, and thus their users too, carry a slightly odd dependency to get around the GENERIC_ATOMIC64 version of cmpxchg64() often failing to compile. Since this is a functional dependency, it's a bit misleading and untidy to tie it explicitly to COMPILE_TEST while assuming that it's also implied by the other platform/architecture options. Make things clearer by separating these functional dependencies into distinct statements from those controlling visibility, and since they do look a bit non-obvious to the uninitiated, also commenting them for good measure. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/51d8c78e2ecc6696ac5907526580209ea6da167f.1673553587.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-14Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-11-30iommufd: File descriptor, context, kconfig and makefilesJason Gunthorpe
This is the basic infrastructure of a new miscdevice to hold the iommufd IOCTL API. It provides: - A miscdevice to create file descriptors to run the IOCTL interface over - A table based ioctl dispatch and centralized extendable pre-validation step - An xarray mapping userspace ID's to kernel objects. The design has multiple inter-related objects held within in a single IOMMUFD fd - A simple usage count to build a graph of object relations and protect against hostile userspace racing ioctls The only IOCTL provided in this patch is the generic 'destroy any object by handle' operation. Link: https://lore.kernel.org/r/6-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Tested-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Lixiao Yang <lixiao.yang@intel.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-28iommu/hyper-v: Allow hyperv irq remapping without x2apicNuno Das Neves
If x2apic is not available, hyperv-iommu skips remapping irqs. This breaks root partition which always needs irqs remapped. Fix this by allowing irq remapping regardless of x2apic, and change hyperv_enable_irq_remapping() to return IRQ_REMAP_XAPIC_MODE in case x2apic is missing. Tested with root and non-root hyperv partitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1668715899-8971-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-17genirq: Get rid of GENERIC_MSI_IRQ_DOMAINThomas Gleixner
Adjust to reality and remove another layer of pointless Kconfig indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve all purposes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122014.524842979@linutronix.de
2022-09-26Merge branches 'apple/dart', 'arm/mediatek', 'arm/omap', 'arm/smmu', ↵Joerg Roedel
'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-09-26iommu/io-pgtable: Move Apple DART support to its own fileJanne Grunau
The pte format used by the DARTs found in the Apple M1 (t8103) is not fully compatible with io-pgtable-arm. The 24 MSB are used for subpage protection (mapping only parts of page) and conflict with the address mask. In addition bit 1 is not available for tagging entries but disables subpage protection. Subpage protection could be useful to support a CPU granule of 4k with the fixed IOMMU page size of 16k. The DARTs found on Apple M1 Pro/Max/Ultra use another different pte format which is even less compatible. To support an output address size of 42 bit the address is shifted down by 4. Subpage protection is mandatory and bit 1 signifies uncached mappings used by the display controller. It would be advantageous to share code for all known Apple DART variants to support common features. The page table allocator for DARTs is less complex since it uses a two levels of translation table without support for huge pages. Signed-off-by: Janne Grunau <j@jannau.net> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Sven Peter <sven@svenpeter.dev> Acked-by: Hector Martin <marcan@marcan.st> Link: https://lore.kernel.org/r/20220916094152.87137-3-j@jannau.net [ joro: Fix compile warning in __dart_alloc_pages()] Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-07iommu/dma: Clean up KconfigRobin Murphy
Although iommu-dma is a per-architecture chonce, that is currently implemented in a rather haphazard way. Selecting from the arch Kconfig was the original logical approach, but is complicated by having to manage dependencies; conversely, selecting from drivers ends up hiding the architecture dependency *too* well. Instead, let's just have it enable itself automatically when IOMMU API support is enabled for the relevant architectures. It can't get much clearer than that. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/2e33c8bc2b1bb478157b7964bfed976cb7466139.1660668998.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-08-06Revert "iommu/dma: Add config for PCI SAC address trick"Linus Torvalds
This reverts commit 4bf7fda4dce22214c70c49960b1b6438e6260b67. It turns out that it was hopelessly naive to think that this would work, considering that we've always done this. The first machine I actually tested this on broke at bootup, getting to Reached target cryptsetup.target - Local Encrypted Volumes. and then hanging. It's unclear what actually fails, since there's a lot else going on around that time (eg amdgpu probing also happens around that same time, but it could be some other random init thing that didn't complete earlier and just caused the boot to hang at that point). The expectations that we should default to some unsafe and untested mode seems entirely unfounded, and the belief that this wouldn't affect modern systems is clearly entirely false. The machine in question is about two years old, so it's not exactly shiny, but it's also not some dusty old museum piece PDP-11 in a closet. Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: John Garry <john.garry@huawei.com> Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-29Merge branches 'arm/exynos', 'arm/mediatek', 'arm/msm', 'arm/smmu', ↵Joerg Roedel
'virtio', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-07-08iommu/arm-smmu-qcom: Add debug support for TLB sync timeoutsSai Prakash Ranjan
TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan <quic_saipraka@quicinc.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20220708094230.4349-1-quic_saipraka@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2022-06-22iommu/dma: Add config for PCI SAC address trickRobin Murphy
For devices stuck behind a conventional PCI bus, saving extra cycles at 33MHz is probably fairly significant. However since native PCI Express is now the norm for high-performance devices, the optimisation to always prefer 32-bit addresses for the sake of avoiding DAC is starting to look rather anachronistic. Technically 32-bit addresses do have shorter TLPs on PCIe, but unless the device is saturating its link bandwidth with small transfers it seems unlikely that the difference is appreciable. What definitely is appreciable, however, is that the IOVA allocator doesn't behave all that well once the 32-bit space starts getting full. As DMA working sets get bigger, this optimisation increasingly backfires and adds considerable overhead to the dma_map path for use-cases like high-bandwidth networking. We've increasingly bandaged the allocator in attempts to mitigate this, but it remains fundamentally at odds with other valid requirements to try as hard as possible to satisfy a request within the given limit; what we really need is to just avoid this odd notion of a speculative allocation when it isn't beneficial anyway. Unfortunately that's where things get awkward... Having been present on x86 for 15 years or so now, it turns out there are systems which fail to properly define the upper limit of usable IOVA space for certain devices and this trick was the only thing letting them work OK. I had a similar ulterior motive for a couple of early arm64 systems when originally adding it to iommu-dma, but those really should be fixed with proper firmware bindings by now. Let's be brave and default it to off in the hope that CI systems and developers will find and fix those bugs, but expect that desktop-focused distro configs are likely to want to turn it back on for maximum compatibility. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/3f06994f9f370f9d35b2630ab75171ecd2065621.1654782107.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>