summaryrefslogtreecommitdiff
path: root/drivers/iommu/intel/iommu.h
AgeCommit message (Collapse)Author
2024-02-16iommu: Merge iommu_fault_event and iopf_faultLu Baolu
The iommu_fault_event and iopf_fault data structures store the same information about an iopf fault. They are also used in the same way. Merge these two data structures into a single one to make the code more concise and easier to maintain. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Yan Zhao <yan.y.zhao@intel.com> Tested-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20240212012227.119381-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2024-01-03Merge branches 'apple/dart', 'arm/rockchip', 'arm/smmu', 'virtio', ↵Joerg Roedel
'x86/vt-d', 'x86/amd' and 'core' into next
2023-12-19iommu/vt-d: Move inline helpers to header filesLu Baolu
Move inline helpers to header files so that other files can use them without duplicating the code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Remove unused vcmd interfacesLu Baolu
Commit 99b5726b4423 ("iommu: Remove ioasid infrastructure") has removed ioasid allocation interfaces from the iommu subsystem. As a result, these vcmd interfaces have become obsolete. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-19iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directlyLu Baolu
The device_to_iommu() helper was originally designed to look up the DMAR ACPI table to retrieve the iommu device and the request ID for a given device. However, it was also being used in other places where there was no need to lookup the ACPI table at all. Retrieve the iommu device directly from the per-device iommu private data in functions called after device is probed. Rename the original device_to_iommu() function to a more meaningful name, device_lookup_iommu(), to avoid mis-using it. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20231116015048.29675-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-27iommu/vt-d: Support enforce_cache_coherency only for empty domainsLu Baolu
The enforce_cache_coherency callback ensures DMA cache coherency for devices attached to the domain. Intel IOMMU supports enforced DMA cache coherency when the Snoop Control bit in the IOMMU's extended capability register is set. Supporting it differs between legacy and scalable modes. In legacy mode, it's supported page-level by setting the SNP field in second-stage page-table entries. In scalable mode, it's supported in PASID-table granularity by setting the PGSNP field in PASID-table entries. In legacy mode, mappings before attaching to a device have SNP fields cleared, while mappings after the callback have them set. This means partial DMAs are cache coherent while others are not. One possible fix is replaying mappings and flipping SNP bits when attaching a domain to a device. But this seems to be over-engineered, given that all real use cases just attach an empty domain to a device. To meet practical needs while reducing mode differences, only support enforce_cache_coherency on a domain without mappings if SNP field is used. Fixes: fc0051cb9590 ("iommu/vt-d: Check domain force_snooping against attached devices") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20231114011036.70142-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-11-09Merge tag 'iommu-updates-v6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Make default-domains mandatory for all IOMMU drivers - Remove group refcounting - Add generic_single_device_group() helper and consolidate drivers - Cleanup map/unmap ops - Scaling improvements for the IOVA rcache depot - Convert dart & iommufd to the new domain_alloc_paging() ARM-SMMU: - Device-tree binding update: - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC - SMMUv2: - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs - SMMUv3: - Large refactoring of the context descriptor code to move the CD table into the master, paving the way for '->set_dev_pasid()' support on non-SVA domains - Minor cleanups to the SVA code Intel VT-d: - Enable debugfs to dump domain attached to a pasid - Remove an unnecessary inline function AMD IOMMU: - Initial patches for SVA support (not complete yet) S390 IOMMU: - DMA-API conversion and optimized IOTLB flushing And some smaller fixes and improvements" * tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits) iommu/dart: Remove the force_bypass variable iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging() iommu/dart: Convert to domain_alloc_paging() iommu/dart: Move the blocked domain support to a global static iommu/dart: Use static global identity domains iommufd: Convert to alloc_domain_paging() iommu/vt-d: Use ops->blocked_domain iommu/vt-d: Update the definition of the blocking domain iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain Revert "iommu/vt-d: Remove unused function" iommu/amd: Remove DMA_FQ type from domain allocation path iommu: change iommu_map_sgtable to return signed values iommu/virtio: Add __counted_by for struct viommu_request and use struct_size() iommu/vt-d: debugfs: Support dumping a specified page table iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} iommu/vt-d: debugfs: Dump entry pointing to huge page iommu/vt-d: Remove unused function iommu/arm-smmu-v3-sva: Remove bond refcount iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle iommu/arm-smmu-v3: Rename cdcfg to cd_table ...
2023-11-01Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "This brings three new iommufd capabilities: - Dirty tracking for DMA. AMD/ARM/Intel CPUs can now record if a DMA writes to a page in the IOPTEs within the IO page table. This can be used to generate a record of what memory is being dirtied by DMA activities during a VM migration process. A VMM like qemu will combine the IOMMU dirty bits with the CPU's dirty log to determine what memory to transfer. VFIO already has a DMA dirty tracking framework that requires PCI devices to implement tracking HW internally. The iommufd version provides an alternative that the VMM can select, if available. The two are designed to have very similar APIs. - Userspace controlled attributes for hardware page tables (HWPT/iommu_domain). There are currently a few generic attributes for HWPTs (support dirty tracking, and parent of a nest). This is an entry point for the userspace iommu driver to control the HW in detail. - Nested translation support for HWPTs. This is a 2D translation scheme similar to the CPU where a DMA goes through a first stage to determine an intermediate address which is then translated trough a second stage to a physical address. Like for CPU translation the first stage table would exist in VM controlled memory and the second stage is in the kernel and matches the VM's guest to physical map. As every IOMMU has a unique set of parameter to describe the S1 IO page table and its associated parameters the userspace IOMMU driver has to marshal the information into the correct format. This is 1/3 of the feature, it allows creating the nested translation and binding it to VFIO devices, however the API to support IOTLB and ATC invalidation of the stage 1 io page table, and forwarding of IO faults are still in progress. The series includes AMD and Intel support for dirty tracking. Intel support for nested translation. Along the way are a number of internal items: - New iommu core items: ops->domain_alloc_user(), ops->set_dirty_tracking, ops->read_and_clear_dirty(), IOMMU_DOMAIN_NESTED, and iommu_copy_struct_from_user - UAF fix in iopt_area_split() - Spelling fixes and some test suite improvement" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (52 commits) iommufd: Organize the mock domain alloc functions closer to Joerg's tree iommufd/selftest: Fix page-size check in iommufd_test_dirty() iommufd: Add iopt_area_alloc() iommufd: Fix missing update of domains_itree after splitting iopt_area iommu/vt-d: Disallow read-only mappings to nest parent domain iommu/vt-d: Add nested domain allocation iommu/vt-d: Set the nested domain to a device iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Extend dmar_domain to support nested domain iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Enhance capability check for nested parent domain allocation iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs iommufd/selftest: Add nested domain allocation for mock domain iommu: Add iommu_copy_struct_from_user helper iommufd: Add a nested HW pagetable object iommu: Pass in parent domain with user_data to domain_alloc_user op iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable ...
2023-10-27Merge branches 'iommu/fixes', 'arm/tegra', 'arm/smmu', 'virtio', 'x86/vt-d', ↵Joerg Roedel
'x86/amd', 'core' and 's390' into next
2023-10-26iommu/vt-d: Add nested domain allocationLu Baolu
This adds the support for IOMMU_HWPT_DATA_VTD_S1 type. And 'nested_parent' is added to mark the nested parent domain to sanitize the input parent domain. Link: https://lore.kernel.org/r/20231026044216.64964-8-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Make domain attach helpers to be externYi Liu
This makes the helpers visible to nested.c. Link: https://lore.kernel.org/r/20231026044216.64964-6-yi.l.liu@intel.com Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Add helper for nested domain allocationLu Baolu
This adds helper for accepting user parameters and allocate a nested domain. Link: https://lore.kernel.org/r/20231026044216.64964-4-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Extend dmar_domain to support nested domainLu Baolu
The nested domain fields are exclusive to those that used for a DMA remapping domain. Use union to avoid memory waste. Link: https://lore.kernel.org/r/20231026044216.64964-3-yi.l.liu@intel.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-26iommu/vt-d: Enhance capability check for nested parent domain allocationYi Liu
This adds the scalable mode check before allocating the nested parent domain as checking nested capability is not enough. User may turn off scalable mode which also means no nested support even if the hardware supports it. Fixes: c97d1b20d383 ("iommu/vt-d: Add domain_alloc_user op") Link: https://lore.kernel.org/r/20231024150011.44642-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-24iommu/vt-d: Access/Dirty bit support for SS domainsJoao Martins
IOMMU advertises Access/Dirty bits for second-stage page table if the extended capability DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first stage table is compatible with CPU page table thus A/D bits are implicitly supported. Relevant Intel IOMMU SDM ref for first stage table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second stage table "3.7.2 Accessed and Dirty Flags". First stage page table is enabled by default so it's allowed to set dirty tracking and no control bits needed, it just returns 0. To use SSADS, set bit 9 (SSADE) in the scalable-mode PASID table entry and flush the IOTLB via pasid_flush_caches() following the manual. Relevant SDM refs: "3.7.2 Accessed and Dirty Flags" "6.5.3.3 Guidance to Software for Invalidations, Table 23. Guidance to Software for Invalidations" PTE dirty bit is located in bit 9 and it's cached in the IOTLB so flush IOTLB to make sure IOMMU attempts to set the dirty bit again. Note that iommu_dirty_bitmap_record() will add the IOVA to iotlb_gather and thus the caller of the iommu op will flush the IOTLB. Relevant manuals over the hardware translation is chapter 6 with some special mention to: "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" "6.2.4 IOTLB" Select IOMMUFD_DRIVER only if IOMMUFD is enabled, given that IOMMU dirty tracking requires IOMMUFD. Link: https://lore.kernel.org/r/20231024135109.73787-13-joao.m.martins@oracle.com Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-10-16iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid}Jingqi Liu
Add a debugfs directory per pair of {device, pasid} if the mappings of its page table are created and destroyed by the iommu_map/unmap() interfaces. i.e. /sys/kernel/debug/iommu/intel/<device source id>/<pasid>. Create a debugfs file in the directory for users to dump the page table corresponding to {device, pasid}. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/1/domain_translation_struct. For the default domain without pasid, it creates a debugfs file in the debugfs device directory for users to dump its page table. e.g. /sys/kernel/debug/iommu/intel/0000:00:02.0/domain_translation_struct. When setting a domain to a PASID of device, create a debugfs file in the pasid debugfs directory for users to dump the page table of the specified pasid. Remove the debugfs device directory of the device when releasing a device. e.g. /sys/kernel/debug/iommu/intel/0000:00:01.0 Signed-off-by: Jingqi Liu <Jingqi.liu@intel.com> Link: https://lore.kernel.org/r/20231013135811.73953-3-Jingqi.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-25iommu/vt-d: Avoid memory allocation in iommu_suspend()Zhang Rui
The iommu_suspend() syscore suspend callback is invoked with IRQ disabled. Allocating memory with the GFP_KERNEL flag may re-enable IRQs during the suspend callback, which can cause intermittent suspend/hibernation problems with the following kernel traces: Calling iommu_suspend+0x0/0x1d0 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 15 at kernel/time/timekeeping.c:868 ktime_get+0x9b/0xb0 ... CPU: 0 PID: 15 Comm: rcu_preempt Tainted: G U E 6.3-intel #r1 RIP: 0010:ktime_get+0x9b/0xb0 ... Call Trace: <IRQ> tick_sched_timer+0x22/0x90 ? __pfx_tick_sched_timer+0x10/0x10 __hrtimer_run_queues+0x111/0x2b0 hrtimer_interrupt+0xfa/0x230 __sysvec_apic_timer_interrupt+0x63/0x140 sysvec_apic_timer_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ... ------------[ cut here ]------------ Interrupts enabled after iommu_suspend+0x0/0x1d0 WARNING: CPU: 0 PID: 27420 at drivers/base/syscore.c:68 syscore_suspend+0x147/0x270 CPU: 0 PID: 27420 Comm: rtcwake Tainted: G U W E 6.3-intel #r1 RIP: 0010:syscore_suspend+0x147/0x270 ... Call Trace: <TASK> hibernation_snapshot+0x25b/0x670 hibernate+0xcd/0x390 state_store+0xcf/0xe0 kobj_attr_store+0x13/0x30 sysfs_kf_write+0x3f/0x50 kernfs_fop_write_iter+0x128/0x200 vfs_write+0x1fd/0x3c0 ksys_write+0x6f/0xf0 __x64_sys_write+0x1d/0x30 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc Given that only 4 words memory is needed, avoid the memory allocation in iommu_suspend(). CC: stable@kernel.org Fixes: 33e07157105e ("iommu/vt-d: Avoid GFP_ATOMIC where it is not needed") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Ooi, Chin Hao <chin.hao.ooi@intel.com> Link: https://lore.kernel.org/r/20230921093956.234692-1-rui.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230925120417.55977-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09iommu/vt-d: Add set_dev_pasid callback for dma domainLu Baolu
This allows the upper layers to set a domain to a PASID of a device if the PASID feature is supported by the IOMMU hardware. The typical use cases are, for example, kernel DMA with PASID and hardware assisted mediated device drivers. The attaching device and pasid information is tracked in a per-domain list and is used for IOTLB and devTLB invalidation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-8-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-08-09iommu/vt-d: Make prq draining code genericLu Baolu
Currently draining page requests and responses for a pasid is part of SVA implementation. This is because the driver only supports attaching an SVA domain to a device pasid. As we are about to support attaching other types of domains to a device pasid, the prq draining code becomes generic. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230802212427.1497170-6-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-04-14Merge branches 'iommu/fixes', 'arm/allwinner', 'arm/exynos', 'arm/mediatek', ↵Joerg Roedel
'arm/omap', 'arm/renesas', 'arm/rockchip', 'arm/smmu', 'ppc/pamu', 'unisoc', 'x86/vt-d', 'x86/amd', 'core' and 'platform-remove_new' into next
2023-04-13iommu/vt-d: Remove extern from function prototypesLu Baolu
The kernel coding style does not require 'extern' in function prototypes in .h files, so remove them from drivers/iommu/intel/iommu.h as they are not needed. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230331045452.500265-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplugKan Liang
A warning can be triggered when hotplug CPU 0. $ echo 0 > /sys/devices/system/cpu/cpu0/online ------------[ cut here ]------------ Voluntary context switch within RCU read-side critical section! WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4f4/0x580 RIP: 0010:rcu_note_context_switch+0x4f4/0x580 Call Trace: <TASK> ? perf_event_update_userpage+0x104/0x150 __schedule+0x8d/0x960 ? perf_event_set_state.part.82+0x11/0x50 schedule+0x44/0xb0 schedule_timeout+0x226/0x310 ? __perf_event_disable+0x64/0x1a0 ? _raw_spin_unlock+0x14/0x30 wait_for_completion+0x94/0x130 __wait_rcu_gp+0x108/0x130 synchronize_rcu+0x67/0x70 ? invoke_rcu_core+0xb0/0xb0 ? __bpf_trace_rcu_stall_warning+0x10/0x10 perf_pmu_migrate_context+0x121/0x370 iommu_pmu_cpu_offline+0x6a/0xa0 ? iommu_pmu_del+0x1e0/0x1e0 cpuhp_invoke_callback+0x129/0x510 cpuhp_thread_fun+0x94/0x150 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> ---[ end trace 0000000000000000 ]--- The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(), when migrating a PMU to a new CPU. However, the current for_each_iommu() is within RCU read-side critical section. Two methods were considered to fix the issue. - Use the dmar_global_lock to replace the RCU read lock when going through the drhd list. But it triggers a lockdep warning. - Use the cpuhp_setup_state_multi() to set up a dedicated state for each IOMMU PMU. The lock can be avoided. The latter method is implemented in this patch. Since each IOMMU PMU has a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track the state. The state can be dynamically allocated now. Remove the CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE. Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230328182028.1366416-1-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230329134721.469447-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31iommu: Remove ioasid infrastructureJason Gunthorpe
This has no use anymore, delete it all. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-8-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-31iommu/vt-d: Remove virtual command interfaceJacob Pan
Virtual command interface was introduced to allow using host PASIDs inside VMs. It is unused and abandoned due to architectural change. With this patch, we can safely remove this feature and the related helpers. Link: https://lore.kernel.org/r/20230210230206.3160144-2-jacob.jun.pan@linux.intel.com Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-2-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-18Merge branches 'apple/dart', 'arm/exynos', 'arm/renesas', 'arm/smmu', ↵Joerg Roedel
'x86/vt-d', 'x86/amd' and 'core' into next
2023-02-03iommu/vt-d: Add IOMMU perfmon overflow handler supportKan Liang
While enabled to count events and an event occurrence causes the counter value to increment and roll over to or past zero, this is termed a counter overflow. The overflow can trigger an interrupt. The IOMMU perfmon needs to handle the case properly. New HW IRQs are allocated for each IOMMU device for perfmon. The IRQ IDs are after the SVM range. In the overflow handler, the counter is not frozen. It's very unlikely that the same counter overflows again during the period. But it's possible that other counters overflow at the same time. Read the overflow register at the end of the handler and check whether there are more. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-7-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Add IOMMU perfmon supportKan Liang
Implement the IOMMU performance monitor capability, which supports the collection of information about key events occurring during operation of the remapping hardware, to aid performance tuning and debug. The IOMMU perfmon support is implemented as part of the IOMMU driver and interfaces with the Linux perf subsystem. The IOMMU PMU has the following unique features compared with the other PMUs. - Support counting. Not support sampling. - Does not support per-thread counting. The scope is system-wide. - Support per-counter capability register. The event constraints can be enumerated. - The available event and event group can also be enumerated. - Extra Enhanced Commands are introduced to control the counters. Add a new variable, struct iommu_pmu *pmu, to in the struct intel_iommu to track the PMU related information. Add iommu_pmu_register() and iommu_pmu_unregister() to register and unregister a IOMMU PMU. The register function setup the IOMMU PMU ops and invoke the standard perf_pmu_register() interface to register a PMU in the perf subsystem. This patch only exposes the functions. The following patch will enable them in the IOMMU driver. The IOMMU PMUs can be found under /sys/bus/event_source/devices/dmar* The available filters and event format can be found at the format folder $ ls /sys/bus/event_source/devices/dmar1/format/ event event_group filter_ats filter_ats_en filter_page_table filter_page_table_en The supported events can be found at the events folder $ ls /sys/bus/event_source/devices/dmar1/events/ ats_blocked fs_nonleaf_hit int_cache_hit_posted iommu_mem_blocked iotlb_hit pasid_cache_lookup ss_nonleaf_hit ctxt_cache_hit fs_nonleaf_lookup int_cache_lookup iommu_mrds iotlb_lookup pg_req_posted ss_nonleaf_lookup ctxt_cache_lookup int_cache_hit_nonposted iommu_clocks iommu_requests pasid_cache_hit pw_occupancy The command below illustrates filter usage with a simple example. $ perf stat -e dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/ -a sleep 1 Performance counter stats for 'system wide': 368,947 dmar1/iommu_requests,filter_ats_en=0x1,filter_ats=0x1/ 1.002592074 seconds time elapsed Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-5-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Support Enhanced Command InterfaceKan Liang
The Enhanced Command Register is to submit command and operand of enhanced commands to DMA Remapping hardware. It can supports up to 256 enhanced commands. There is a HW register to indicate the availability of all 256 enhanced commands. Each bit stands for each command. But there isn't an existing interface to read/write all 256 bits. Introduce the u64 ecmdcap[4] to store the existence of each enhanced command. Read 4 times to get all of them in map_iommu(). Add a helper to facilitate an enhanced command launch. Make sure hardware complete the command. Also add a helper to facilitate the check of PMU essentials. These helpers will be used later. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-4-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Retrieve IOMMU perfmon capability informationKan Liang
The performance monitoring infrastructure, perfmon, is to support collection of information about key events occurring during operation of the remapping hardware, to aid performance tuning and debug. Each remapping hardware unit has capability registers that indicate support for performance monitoring features and enumerate the capabilities. Add alloc_iommu_pmu() to retrieve IOMMU perfmon capability information for each iommu unit. The information is stored in the iommu->pmu data structure. Capability registers are read-only, so it's safe to prefetch and store them in the pmu structure. This could avoid unnecessary VMEXIT when this code is running in the virtualization environment. Add free_iommu_pmu() to free the saved capability information when freeing the iommu unit. Add a kernel config option for the IOMMU perfmon feature. Unless a user explicitly uses the perf tool to monitor the IOMMU perfmon event, there isn't any impact for the existing IOMMU. Enable it by default. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230128200428.1459118-3-kan.liang@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Remove sva from intel_svm_devLu Baolu
After commit be51b1d6bbff ("iommu/sva: Refactoring iommu_sva_bind/unbind_device()"), the iommu driver doesn't need to return an iommu_sva pointer anymore. This removes the sva field from intel_svm_dev and cleanups the code accordingly. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230109014955.147068-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Remove users from intel_svm_devLu Baolu
It was used as a reference counter of an existing bond between device and user application memory address. Commit be51b1d6bbff ("iommu/sva: Refactoring iommu_sva_bind/unbind_device()") has added this in iommu core. Remove it to avoid duplicate code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230109014955.147068-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Remove unused fields in svm structuresLu Baolu
They aren't used anywhere. Remove them to avoid dead code. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230109014955.147068-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-02-03iommu/vt-d: Remove include/linux/intel-svm.hLu Baolu
There's no need to have a public header for Intel SVA implementation. The device driver should interact with Intel SVA implementation via the IOMMU generic APIs. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20230109014955.147068-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-01-25iommu/intel: Add a gfp parameter to alloc_pgtable_page()Jason Gunthorpe
This is eventually called by iommufd through intel_iommu_map_pages() and it should not be forced to atomic. Push the GFP_ATOMIC to all callers. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-76b587fe28df+6e3-iommu_map_gfp_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-12-19Merge tag 'iommu-updates-v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core code: - map/unmap_pages() cleanup - SVA and IOPF refactoring - Clean up and document return codes from device/domain attachment AMD driver: - Rework and extend parsing code for ivrs_ioapic, ivrs_hpet and ivrs_acpihid command line options - Some smaller cleanups Intel driver: - Blocking domain support - Cleanups S390 driver: - Fixes and improvements for attach and aperture handling PAMU driver: - Resource leak fix and cleanup Rockchip driver: - Page table permission bit fix Mediatek driver: - Improve safety from invalid dts input - Smaller fixes and improvements Exynos driver: - Fix driver initialization sequence Sun50i driver: - Remove IOMMU_DOMAIN_IDENTITY as it has not been working forever - Various other fixes" * tag 'iommu-updates-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (74 commits) iommu/mediatek: Fix forever loop in error handling iommu/mediatek: Fix crash on isr after kexec() iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY iommu/amd: Fix typo in macro parameter name iommu/mediatek: Remove unused "mapping" member from mtk_iommu_data iommu/mediatek: Improve safety for mediatek,smi property in larb nodes iommu/mediatek: Validate number of phandles associated with "mediatek,larbs" iommu/mediatek: Add error path for loop of mm_dts_parse iommu/mediatek: Use component_match_add iommu/mediatek: Add platform_device_put for recovering the device refcnt iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() iommu/vt-d: Use real field for indication of first level iommu/vt-d: Remove unnecessary domain_context_mapped() iommu/vt-d: Rename domain_add_dev_info() iommu/vt-d: Rename iommu_disable_dev_iotlb() iommu/vt-d: Add blocking domain support iommu/vt-d: Add device_block_translation() helper iommu/vt-d: Allocate pasid table in device probe path iommu/amd: Check return value of mmu_notifier_register() iommu/amd: Fix pci device refcount leak in ppr_notifier() ...
2022-12-14Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd implementation from Jason Gunthorpe: "iommufd is the user API to control the IOMMU subsystem as it relates to managing IO page tables that point at user space memory. It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO container) which is the VFIO specific interface for a similar idea. We see a broad need for extended features, some being highly IOMMU device specific: - Binding iommu_domain's to PASID/SSID - Userspace IO page tables, for ARM, x86 and S390 - Kernel bypassed invalidation of user page tables - Re-use of the KVM page table in the IOMMU - Dirty page tracking in the IOMMU - Runtime Increase/Decrease of IOPTE size - PRI support with faults resolved in userspace Many of these HW features exist to support VM use cases - for instance the combination of PASID, PRI and Userspace IO Page Tables allows an implementation of DMA Shared Virtual Addressing (vSVA) within a guest. Dirty tracking enables VM live migration with SRIOV devices and PASID support allow creating "scalable IOV" devices, among other things. As these features are fundamental to a VM platform they need to be uniformly exposed to all the driver families that do DMA into VMs, which is currently VFIO and VDPA" For more background, see the extended explanations in Jason's pull request: https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/ * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits) iommufd: Change the order of MSI setup iommufd: Improve a few unclear bits of code iommufd: Fix comment typos vfio: Move vfio group specific code into group.c vfio: Refactor dma APIs for emulated devices vfio: Wrap vfio group module init/clean code into helpers vfio: Refactor vfio_device open and close vfio: Make vfio_device_open() truly device specific vfio: Swap order of vfio_device_container_register() and open_device() vfio: Set device->group in helper function vfio: Create wrappers for group register/unregister vfio: Move the sanity check of the group to vfio_create_group() vfio: Simplify vfio_create_group() iommufd: Allow iommufd to supply /dev/vfio/vfio vfio: Make vfio_container optionally compiled vfio: Move container related MODULE_ALIAS statements into container.c vfio-iommufd: Support iommufd for emulated VFIO devices vfio-iommufd: Support iommufd for physical VFIO devices vfio-iommufd: Allow iommufd to be used in place of a container fd vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent() ...
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-12-12Merge branches 'arm/allwinner', 'arm/exynos', 'arm/mediatek', ↵Joerg Roedel
'arm/rockchip', 'arm/smmu', 'ppc/pamu', 's390', 'x86/vt-d', 'x86/amd' and 'core' into next
2022-12-05iommu/vt-d: Switch to MSI parent domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
2022-12-02iommu/vt-d: Add a fix for devices need extra dtlb flushJacob Pan
QAT devices on Intel Sapphire Rapids and Emerald Rapids have a defect in address translation service (ATS). These devices may inadvertently issue ATS invalidation completion before posted writes initiated with translated address that utilized translations matching the invalidation address range, violating the invalidation completion ordering. This patch adds an extra device TLB invalidation for the affected devices, it is needed to ensure no more posted writes with translated address following the invalidation completion. Therefore, the ordering is preserved and data-corruption is prevented. Device TLBs are invalidated under the following six conditions: 1. Device driver does DMA API unmap IOVA 2. Device driver unbind a PASID from a process, sva_unbind_device() 3. PASID is torn down, after PASID cache is flushed. e.g. process exit_mmap() due to crash 4. Under SVA usage, called by mmu_notifier.invalidate_range() where VM has to free pages that were unmapped 5. userspace driver unmaps a DMA buffer 6. Cache invalidation in vSVA usage (upcoming) For #1 and #2, device drivers are responsible for stopping DMA traffic before unmap/unbind. For #3, iommu driver gets mmu_notifier to invalidate TLB the same way as normal user unmap which will do an extra invalidation. The dTLB invalidation after PASID cache flush does not need an extra invalidation. Therefore, we only need to deal with #4 and #5 in this patch. #1 is also covered by this patch due to common code path with #5. Tested-by: Yuzhang Luo <yuzhang.luo@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20221130062449.1360063-1-jacob.jun.pan@linux.intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-22iommu/vt-d: Use real field for indication of first levelLu Baolu
The dmar_domain uses bit field members to indicate the behaviors. Add a bit field for using first level and remove the flags member to avoid duplication. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20221118132451.114406-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03iommu: Remove SVA related callbacks from iommu opsLu Baolu
These ops'es have been deprecated. There's no need for them anymore. Remove them to avoid dead code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-11-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03iommu/vt-d: Add SVA domain supportLu Baolu
Add support for SVA domain allocation and provide an SVA-specific iommu_domain_ops. This implementation is based on the existing SVA code. Possible cleanup and refactoring are left for incremental changes later. The VT-d driver will also need to support setting a DMA domain to a PASID of device. Current SVA implementation uses different data structures to track the domain and device PASID relationship. That's the reason why we need to check the domain type in remove_dev_pasid callback. Eventually we'll consolidate the data structures and remove the need of domain type check. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03iommu: Remove SVM_FLAG_SUPERVISOR_MODE supportLu Baolu
The current kernel DMA with PASID support is based on the SVA with a flag SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address space to a PASID of the device. The device driver programs the device with kernel virtual address (KVA) for DMA access. There have been security and functional issues with this approach: - The lack of IOTLB synchronization upon kernel page table updates. (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.) - Other than slight more protection, using kernel virtual address (KVA) has little advantage over physical address. There are also no use cases yet where DMA engines need kernel virtual addresses for in-kernel DMA. This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface. The device drivers are suggested to handle kernel DMA with PASID through the kernel DMA APIs. The drvdata parameter in iommu_sva_bind_device() and all callbacks is not needed anymore. Cleanup them as well. Link: https://lore.kernel.org/linux-iommu/20210511194726.GP1002214@nvidia.com/ Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-03iommu: Add max_pasids field in struct iommu_deviceLu Baolu
Use this field to keep the number of supported PASIDs that an IOMMU hardware is able to support. This is a generic attribute of an IOMMU and lifting it into the per-IOMMU device structure makes it possible to allocate a PASID for device without calls into the IOMMU drivers. Any iommu driver that supports PASID related features should set this field before enabling them on the devices. In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU enclave so that the pasid_supported() helper could be used in dmar.c without compilation errors. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org> Tested-by: Tony Zhu <tony.zhu@intel.com> Link: https://lore.kernel.org/r/20221031005917.45690-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26iommu/vt-d: Avoid unnecessary global DMA cache invalidationLu Baolu
Some VT-d hardware implementations invalidate all DMA remapping hardware translation caches as part of SRTP flow. The VT-d spec adds a ESRTPS (Enhanced Set Root Table Pointer Support, section 11.4.2 in VT-d spec) capability bit to indicate this. With this bit set, software has no need to issue the global invalidation request. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220919062523.3438951-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26iommu/vt-d: Avoid unnecessary global IRTE cache invalidationLu Baolu
Some VT-d hardware implementations invalidate all interrupt remapping hardware translation caches as part of SIRTP flow. The VT-d spec adds a ESIRTPS (Enhanced Set Interrupt Remap Table Pointer Support, section 11.4.2 in VT-d spec) capability bit to indicate this. The spec also states in 11.4.4 that hardware also performs global invalidation on all interrupt remapping caches as part of Interrupt Remapping Disable operation if ESIRTPS capability bit is set. This checks the ESIRTPS capability bit and skip software global cache invalidation if it's set. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220921065741.3572495-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26iommu/vt-d: Rename cap_5lp_support to cap_fl5lp_supportYi Liu
This renaming better describes it is for first level page table (a.k.a first stage page table since VT-d spec 3.4). Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220916071326.2223901-1-yi.l.liu@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26iommu/vt-d: Decouple PASID & PRI enabling from SVALu Baolu
Previously the PCI PASID and PRI capabilities are enabled in the path of iommu device probe only if INTEL_IOMMU_SVM is configured and the device supports ATS. As we've already decoupled the I/O page fault handler from SVA, we could also decouple PASID and PRI enabling from it to make room for growth of new features like kernel DMA with PASID, SIOV and nested translation. At the same time, the iommu_enable_dev_iotlb() helper is also called in iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) path. It's unnecessary and duplicate. This cleanups this helper to make the code neat. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220915085814.2261409-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-09-26iommu/vt-d: Remove unnecessary SVA data accesses in page fault pathLu Baolu
The existing I/O page fault handling code accesses the per-PASID SVA data structures. This is unnecessary and makes the fault handling code only suitable for SVA scenarios. This removes the SVA data accesses from the I/O page fault reporting and responding code, so that the fault handling code could be generic. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20220914011821.400986-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>