summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)Author
2025-03-10iommu/amd/pgtbl_v2: Improve error handlingVasant Hegde
Return -ENOMEM if v2_alloc_pte() fails to allocate memory. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-4-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10iommu/amd: Remove unused variableVasant Hegde
Remove 'amd_iommu_aperture_order' as its not used since commit d9cfed925448. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-3-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10iommu/amd: Log IOMMU control register in event log pathVasant Hegde
Useful for debugging ILLEGAL_DEV_TABLE_ENTRY events as some of the DTE settings depends on Control register settings. Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250227162320.5805-2-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-10iommu/dma: Remove redundant lockingRobin Murphy
This reverts commit ac9a5d522bb80be50ea84965699e1c8257d745ce. iommu_dma_init_domain() is now only called under the group mutex, as it should be given that that the default domain belongs to the group, so the additional internal locking is no longer needed. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/a943d4c198e6a1fffe998337d577dc3aa7f660a9.1740585469.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-03-07iommufd: Fail replace if device has not been attachedYi Liu
The current implementation of iommufd_device_do_replace() implicitly assumes that the input device has already been attached. However, there is no explicit check to verify this assumption. If another device within the same group has been attached, the replace operation might succeed, but the input device itself may not have been attached yet. As a result, the input device might not be tracked in the igroup->device_list, and its reserved IOVA might not be added. Despite this, the caller might incorrectly assume that the device has been successfully replaced, which could lead to unexpected behavior or errors. To address this issue, add a check to ensure that the input device has been attached before proceeding with the replace operation. This check will help maintain the integrity of the device tracking system and prevent potential issues arising from incorrect assumptions about the device's attachment status. Fixes: e88d4ec154a8 ("iommufd: Add iommufd_device_replace()") Link: https://patch.msgid.link/r/20250306034842.5950-1-yi.l.liu@intel.com Cc: stable@vger.kernel.org Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-07iommufd: Set domain->iommufd_hwpt in all hwpt->domain allocatorsNicolin Chen
Setting domain->iommufd_hwpt in iommufd_hwpt_alloc() only covers the HWPT allocations from user space, but not for an auto domain. This resulted in a NULL pointer access in the auto domain pathway: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 pc : iommufd_sw_msi+0x54/0x2b0 lr : iommufd_sw_msi+0x40/0x2b0 Call trace: iommufd_sw_msi+0x54/0x2b0 (P) iommu_dma_prepare_msi+0x64/0xa8 its_irq_domain_alloc+0xf0/0x2c0 irq_domain_alloc_irqs_parent+0x2c/0xa8 msi_domain_alloc+0xa0/0x1a8 Since iommufd_sw_msi() requires to access the domain->iommufd_hwpt, it is better to set that explicitly prior to calling iommu_domain_set_sw_msi(). Fixes: 748706d7ca06 ("iommu: Turn fault_data to iommufd private pointer") Link: https://patch.msgid.link/r/20250305211800.229465-1-nicolinc@nvidia.com Reported-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Ankit Agrawal <ankita@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-04iommufd: Fix uninitialized rc in iommufd_access_rw()Nicolin Chen
Reported by smatch: drivers/iommu/iommufd/device.c:1392 iommufd_access_rw() error: uninitialized symbol 'rc'. Fixes: 8d40205f6093 ("iommufd: Add kAPI toward external drivers for kernel access") Link: https://patch.msgid.link/r/20250227200729.85030-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202502271339.a2nWr9UA-lkp@intel.com/ [nicolinc: can't find an original report but only in "old smatch warnings"] Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-04iommufd: Disallow allocating nested parent domain with fault IDYi Liu
Allocating a domain with a fault ID indicates that the domain is faultable. However, there is a gap for the nested parent domain to support PRI. Some hardware lacks the capability to distinguish whether PRI occurs at stage 1 or stage 2. This limitation may require software-based page table walking to resolve. Since no in-tree IOMMU driver currently supports this functionality, it is disallowed. For more details, refer to the related discussion at [1]. [1] https://lore.kernel.org/linux-iommu/bd1655c6-8b2f-4cfa-adb1-badc00d01811@intel.com/ Link: https://patch.msgid.link/r/20250226104012.82079-1-yi.l.liu@intel.com Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-28Merge tag 'for-joerg' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu shared branch with iommufd The three dependent series on a shared branch: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API
2025-02-28iommu: Swap the order of setting group->pasid_array and calling attach op of ↵Yi Liu
iommu drivers The current implementation stores entry to the group->pasid_array before the underlying iommu driver has successfully set the new domain. This can lead to issues where PRIs are received on the new domain before the attach operation is completed. This patch swaps the order of operations to ensure that the domain is set in the underlying iommu driver before updating the group->pasid_array. Link: https://patch.msgid.link/r/20250226011849.5102-5-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-28iommu: Store either domain or handle in group->pasid_arrayYi Liu
iommu_attach_device_pasid() only stores handle to group->pasid_array when there is a valid handle input. However, it makes the iommu_attach_device_pasid() unable to detect if the pasid has been attached or not previously. To be complete, let the iommu_attach_device_pasid() store the domain to group->pasid_array if no valid handle. The other users of the group->pasid_array should be updated to be consistent. e.g. the iommu_attach_group_handle() and iommu_replace_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-4-yi.l.liu@intel.com Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-28iommu: Drop iommu_group_replace_domain()Yi Liu
iommufd does not use it now, so drop it. Link: https://patch.msgid.link/r/20250226011849.5102-3-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-28iommu: Make @handle mandatory in iommu_{attach|replace}_group_handle()Yi Liu
Caller of the two APIs always provide a valid handle, make @handle as mandatory parameter. Take this chance incoporate the handle->domain set under the protection of group->mutex in iommu_attach_group_handle(). Link: https://patch.msgid.link/r/20250226011849.5102-2-yi.l.liu@intel.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-28iommu/vt-d: Fix suspicious RCU usageLu Baolu
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list. Without this fix, the following warning is triggered: ============================= WARNING: suspicious RCU usage 6.14.0-rc3 #55 Not tainted ----------------------------- drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat about a possible deadlock between dmar_global_lock and cpu_hotplug_lock. This is avoided by not holding dmar_global_lock when calling iommu_device_register(), which initiates the device probe process. Fixes: d74169ceb0d2 ("iommu/vt-d: Allocate DMAR fault interrupts locally") Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/ Tested-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28iommu/vt-d: Remove device comparison in context_setup_pass_through_cbJerry Snitselaar
Remove the device comparison check in context_setup_pass_through_cb. pci_for_each_dma_alias already makes a decision on whether the callback function should be called for a device. With the check in place it will fail to create context entries for aliases as it walks up to the root bus. Fixes: 2031c469f816 ("iommu/vt-d: Add support for static identity domain") Closes: https://lore.kernel.org/linux-iommu/82499eb6-00b7-4f83-879a-e97b4144f576@linux.intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20250224180316.140123-1-jsnitsel@redhat.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-28iommu/amd: Preserve default DTE fields when updating Host Page Table RootAlejandro Jimenez
When updating the page table root field on the DTE, avoid overwriting any bits that are already set. The earlier call to make_clear_dte() writes default values that all DTEs must have set (currently DTE[V]), and those must be preserved. Currently this doesn't cause problems since the page table root update is the first field that is set after make_clear_dte() is called, and DTE_FLAG_V is set again later along with the permission bits (IR/IW). Remove this redundant assignment too. Fixes: fd5dff9de4be ("iommu/amd: Modify set_dte_entry() to use 256-bit DTE helpers") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250106191413.3107140-1-alejandro.j.jimenez@oracle.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-27iommufd: Implement sw_msi support nativelyJason Gunthorpe
iommufd has a model where the iommu_domain can be changed while the VFIO device is attached. In this case, the MSI should continue to work. This corner case has not worked because the dma-iommu implementation of sw_msi is tied to a single domain. Implement the sw_msi mapping directly and use a global per-fd table to associate assigned IOVA to the MSI pages. This allows the MSI pages to be loaded into a domain before it is attached ensuring that MSI is not disrupted. Link: https://patch.msgid.link/r/e13d23eeacd67c0a692fc468c85b483f4dd51c57.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-22hyperv: Change hv_root_partition into a functionNuno Das Neves
Introduce hv_curr_partition_type to store the partition type as an enum. Right now this is limited to guest or root partition, but there will be other kinds in future and the enum is easily extensible. Set up hv_curr_partition_type early in Hyper-V initialization with hv_identify_partition_type(). hv_root_partition() just queries this value, and shouldn't be called before that. Making this check into a function sets the stage for adding a config option to gate the compilation of root partition code. In particular, hv_root_partition() can be stubbed out always be false if root partition support isn't desired. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com>
2025-02-21iommu: Turn fault_data to iommufd private pointerNicolin Chen
A "fault_data" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases, along with the attach_handle. Now, the iommufd version of the sw_msi function will reuse the attach_handle and fault_data for a non-fault case. Rename "fault_data" to "iommufd_hwpt" so as not to confine it to a "fault" case. Move it into a union to be the iommufd private pointer. A following patch will move the iova_cookie to the union for dma-iommu too after the iommufd_sw_msi implementation is added. Since we have two unions now, add some simple comments for readability. Link: https://patch.msgid.link/r/ee5039503f28a16590916e9eef28b917e2d1607a.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need itJason Gunthorpe
Currently, IRQ_MSI_IOMMU is selected if DMA_IOMMU is available to provide an implementation for iommu_dma_prepare/compose_msi_msg(). However, it'll make more sense for irqchips that call prepare/compose to select it, and that will trigger all the additional code and data to be compiled into the kernel. If IRQ_MSI_IOMMU is selected with no IOMMU side implementation, then the prepare/compose() will be NOP stubs. If IRQ_MSI_IOMMU is not selected by an irqchip, then the related code on the iommu side is compiled out. Link: https://patch.msgid.link/r/a2620f67002c5cdf974e89ca3bf905f5c0817be6.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21iommu: Make iommu_dma_prepare_msi() into a generic operationJason Gunthorpe
SW_MSI supports IOMMU to translate an MSI message before the MSI message is delivered to the interrupt controller. On such systems, an iommu_domain must have a translation for the MSI message for interrupts to work. The IRQ subsystem will call into IOMMU to request that a physical page be set up to receive MSI messages, and the IOMMU then sets an IOVA that maps to that physical page. Ultimately the IOVA is programmed into the device via the msi_msg. Generalize this by allowing iommu_domain owners to provide implementations of this mapping. Add a function pointer in struct iommu_domain to allow a domain owner to provide its own implementation. Have dma-iommu supply its implementation for IOMMU_DOMAIN_DMA types during the iommu_get_dma_cookie() path. For IOMMU_DOMAIN_UNMANAGED types used by VFIO (and iommufd for now), have the same iommu_dma_sw_msi set as well in the iommu_get_msi_cookie() path. Hold the group mutex while in iommu_dma_prepare_msi() to ensure the domain doesn't change or become freed while running. Races with IRQ operations from VFIO and domain changes from iommufd are possible here. Replace the msi_prepare_lock with a lockdep assertion for the group mutex as documentation. For the dmau_iommu.c each iommu_domain is unique to a group. Link: https://patch.msgid.link/r/4ca696150d2baee03af27c4ddefdb7b0b0280e7b.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21genirq/msi: Refactor iommu_dma_compose_msi_msg()Jason Gunthorpe
The two-step process to translate the MSI address involves two functions, iommu_dma_prepare_msi() and iommu_dma_compose_msi_msg(). Previously iommu_dma_compose_msi_msg() needed to be in the iommu layer as it had to dereference the opaque cookie pointer. Now, the previous patch changed the cookie pointer into an integer so there is no longer any need for the iommu layer to be involved. Further, the call sites of iommu_dma_compose_msi_msg() all follow the same pattern of setting an MSI message address_hi/lo to non-translated and then immediately calling iommu_dma_compose_msi_msg(). Refactor iommu_dma_compose_msi_msg() into msi_msg_set_addr() that directly accepts the u64 version of the address and simplifies all the callers. Move the new helper to linux/msi.h since it has nothing to do with iommu. Aside from refactoring, this logically prepares for the next patch, which allows multiple implementation options for iommu_dma_prepare_msi(). So, it does not make sense to have the iommu_dma_compose_msi_msg() in dma-iommu.c as it no longer provides the only iommu_dma_prepare_msi() implementation. Link: https://patch.msgid.link/r/eda62a9bafa825e9cdabd7ddc61ad5a21c32af24.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of iommu_cookieJason Gunthorpe
The IOMMU translation for MSI message addresses has been a 2-step process, separated in time: 1) iommu_dma_prepare_msi(): A cookie pointer containing the IOVA address is stored in the MSI descriptor when an MSI interrupt is allocated. 2) iommu_dma_compose_msi_msg(): this cookie pointer is used to compute a translated message address. This has an inherent lifetime problem for the pointer stored in the cookie that must remain valid between the two steps. However, there is no locking at the irq layer that helps protect the lifetime. Today, this works under the assumption that the iommu domain is not changed while MSI interrupts being programmed. This is true for normal DMA API users within the kernel, as the iommu domain is attached before the driver is probed and cannot be changed while a driver is attached. Classic VFIO type1 also prevented changing the iommu domain while VFIO was running as it does not support changing the "container" after starting up. However, iommufd has improved this so that the iommu domain can be changed during VFIO operation. This potentially allows userspace to directly race VFIO_DEVICE_ATTACH_IOMMUFD_PT (which calls iommu_attach_group()) and VFIO_DEVICE_SET_IRQS (which calls into iommu_dma_compose_msi_msg()). This potentially causes both the cookie pointer and the unlocked call to iommu_get_domain_for_dev() on the MSI translation path to become UAFs. Fix the MSI cookie UAF by removing the cookie pointer. The translated IOVA address is already known during iommu_dma_prepare_msi() and cannot change. Thus, it can simply be stored as an integer in the MSI descriptor. The other UAF related to iommu_get_domain_for_dev() will be addressed in patch "iommu: Make iommu_dma_prepare_msi() into a generic operation" by using the IOMMU group mutex. Link: https://patch.msgid.link/r/a4f2cd76b9dc1833ee6c1cf325cba57def22231c.1740014950.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21iommu/io-pgtable-dart: Only set subpage protection disable for DART 1Asahi Lina
Subpage protection can't be disabled on t6000-style darts, as such the disable flag no longer applies, and probably even affects something else. Fixes: dc09fe1c5edd ("iommu/io-pgtable-dart: Add DART PTE support for t6000") Signed-off-by: Asahi Lina <lina@asahilina.net> Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com> Reviewed-by: Sven Peter <sven@svenpeter.dev> Link: https://lore.kernel.org/r/20250219-dart2-no-sp-disable-v1-1-9f324cfa4e70@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-21iommu/s390: implement iommu passthrough via identity domainMatthew Rosato
Enabled via the kernel command-line 'iommu.passthrough=1' option. Introduce the concept of identity domains to s390-iommu, which relies on the bus_dma_region to offset identity mappings to the start of the DMA aperture advertized by CLP. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250212213418.182902-5-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-21iommu/s390: handle IOAT registration based on domainMatthew Rosato
At this point, the dma_table is really a property of the s390-iommu domain. Rather than checking its contents elsewhere in the codebase, move the code that registers the table with firmware into s390-iommu and make a decision what to register with firmware based upon the type of domain in use for the device in question. Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20250212213418.182902-4-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Large set of fixes for vector handling, especially in the interactions between host and guest state. This fixes a number of bugs affecting actual deployments, and greatly simplifies the FP/SIMD/SVE handling. Thanks to Mark Rutland for dealing with this thankless task. - Fix an ugly race between vcpu and vgic creation/init, resulting in unexpected behaviours - Fix use of kernel VAs at EL2 when emulating timers with nVHE - Small set of pKVM improvements and cleanups x86: - Fix broken SNP support with KVM module built-in, ensuring the PSP module is initialized before KVM even when the module infrastructure cannot be used to order initcalls - Reject Hyper-V SEND_IPI hypercalls if the local APIC isn't being emulated by KVM to fix a NULL pointer dereference - Enter guest mode (L2) from KVM's perspective before initializing the vCPU's nested NPT MMU so that the MMU is properly tagged for L2, not L1 - Load the guest's DR6 outside of the innermost .vcpu_run() loop, as the guest's value may be stale if a VM-Exit is handled in the fastpath" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits) x86/sev: Fix broken SNP support with KVM module built-in KVM: SVM: Ensure PSP module is initialized if KVM module is built-in crypto: ccp: Add external API interface for PSP module initialization KVM: arm64: vgic: Hoist SGI/PPI alloc from vgic_init() to kvm_create_vgic() KVM: arm64: timer: Drop warning on failed interrupt signalling KVM: arm64: Fix alignment of kvm_hyp_memcache allocations KVM: arm64: Convert timer offset VA when accessed in HYP code KVM: arm64: Simplify warning in kvm_arch_vcpu_load_fp() KVM: arm64: Eagerly switch ZCR_EL{1,2} KVM: arm64: Mark some header functions as inline KVM: arm64: Refactor exit handlers KVM: arm64: Refactor CPTR trap deactivation KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN KVM: arm64: Remove host FPSIMD saving for non-protected KVM KVM: arm64: Unconditionally save+flush host FPSIMD/SVE/SME state KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop KVM: nSVM: Enter guest mode before initializing nested NPT MMU KVM: selftests: Add CPUID tests for Hyper-V features that need in-kernel APIC KVM: selftests: Manage CPUID array in Hyper-V CPUID test's core helper ...
2025-02-14x86/sev: Fix broken SNP support with KVM module built-inAshish Kalra
Fix issues with enabling SNP host support and effectively SNP support which is broken with respect to the KVM module being built-in. SNP host support is enabled in snp_rmptable_init() which is invoked as device_initcall(). SNP check on IOMMU is done during IOMMU PCI init (IOMMU_PCI_INIT stage). And for that reason snp_rmptable_init() is currently invoked via device_initcall() and cannot be invoked via subsys_initcall() as core IOMMU subsystem gets initialized via subsys_initcall(). Now, if kvm_amd module is built-in, it gets initialized before SNP host support is enabled in snp_rmptable_init() : [ 10.131811] kvm_amd: TSC scaling supported [ 10.136384] kvm_amd: Nested Virtualization enabled [ 10.141734] kvm_amd: Nested Paging enabled [ 10.146304] kvm_amd: LBR virtualization supported [ 10.151557] kvm_amd: SEV enabled (ASIDs 100 - 509) [ 10.156905] kvm_amd: SEV-ES enabled (ASIDs 1 - 99) [ 10.162256] kvm_amd: SEV-SNP enabled (ASIDs 1 - 99) [ 10.171508] kvm_amd: Virtual VMLOAD VMSAVE supported [ 10.177052] kvm_amd: Virtual GIF supported ... ... [ 10.201648] kvm_amd: in svm_enable_virtualization_cpu And then svm_x86_ops->enable_virtualization_cpu() (svm_enable_virtualization_cpu) programs MSR_VM_HSAVE_PA as following: wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa); So VM_HSAVE_PA is non-zero before SNP support is enabled on all CPUs. snp_rmptable_init() gets invoked after svm_enable_virtualization_cpu() as following : ... [ 11.256138] kvm_amd: in svm_enable_virtualization_cpu ... [ 11.264918] SEV-SNP: in snp_rmptable_init This triggers a #GP exception in snp_rmptable_init() when snp_enable() is invoked to set SNP_EN in SYSCFG MSR: [ 11.294289] unchecked MSR access error: WRMSR to 0xc0010010 (tried to write 0x0000000003fc0000) at rIP: 0xffffffffaf5d5c28 (native_write_msr+0x8/0x30) ... [ 11.294404] Call Trace: [ 11.294482] <IRQ> [ 11.294513] ? show_stack_regs+0x26/0x30 [ 11.294522] ? ex_handler_msr+0x10f/0x180 [ 11.294529] ? search_extable+0x2b/0x40 [ 11.294538] ? fixup_exception+0x2dd/0x340 [ 11.294542] ? exc_general_protection+0x14f/0x440 [ 11.294550] ? asm_exc_general_protection+0x2b/0x30 [ 11.294557] ? __pfx_snp_enable+0x10/0x10 [ 11.294567] ? native_write_msr+0x8/0x30 [ 11.294570] ? __snp_enable+0x5d/0x70 [ 11.294575] snp_enable+0x19/0x20 [ 11.294578] __flush_smp_call_function_queue+0x9c/0x3a0 [ 11.294586] generic_smp_call_function_single_interrupt+0x17/0x20 [ 11.294589] __sysvec_call_function+0x20/0x90 [ 11.294596] sysvec_call_function+0x80/0xb0 [ 11.294601] </IRQ> [ 11.294603] <TASK> [ 11.294605] asm_sysvec_call_function+0x1f/0x30 ... [ 11.294631] arch_cpu_idle+0xd/0x20 [ 11.294633] default_idle_call+0x34/0xd0 [ 11.294636] do_idle+0x1f1/0x230 [ 11.294643] ? complete+0x71/0x80 [ 11.294649] cpu_startup_entry+0x30/0x40 [ 11.294652] start_secondary+0x12d/0x160 [ 11.294655] common_startup_64+0x13e/0x141 [ 11.294662] </TASK> This #GP exception is getting triggered due to the following errata for AMD family 19h Models 10h-1Fh Processors: Processor may generate spurious #GP(0) Exception on WRMSR instruction: Description: The Processor will generate a spurious #GP(0) Exception on a WRMSR instruction if the following conditions are all met: - the target of the WRMSR is a SYSCFG register. - the write changes the value of SYSCFG.SNPEn from 0 to 1. - One of the threads that share the physical core has a non-zero value in the VM_HSAVE_PA MSR. The document being referred to above: https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/revision-guides/57095-PUB_1_01.pdf To summarize, with kvm_amd module being built-in, KVM/SVM initialization happens before host SNP is enabled and this SVM initialization sets VM_HSAVE_PA to non-zero, which then triggers a #GP when SYSCFG.SNPEn is being set and this will subsequently cause SNP_INIT(_EX) to fail with INVALID_CONFIG error as SYSCFG[SnpEn] is not set on all CPUs. Essentially SNP host enabling code should be invoked before KVM initialization, which is currently not the case when KVM is built-in. Add fix to call snp_rmptable_init() early from iommu_snp_enable() directly and not invoked via device_initcall() which enables SNP host support before KVM initialization with kvm_amd module built-in. Add additional handling for `iommu=off` or `amd_iommu=off` options. Note that IOMMUs need to be enabled for SNP initialization, therefore, if host SNP support is enabled but late IOMMU initialization fails then that will cause PSP driver's SNP_INIT to fail as IOMMU SNP sanity checks in SNP firmware will fail with invalid configuration error as below: [ 9.723114] ccp 0000:23:00.1: sev enabled [ 9.727602] ccp 0000:23:00.1: psp enabled [ 9.732527] ccp 0000:a2:00.1: enabling device (0000 -> 0002) [ 9.739098] ccp 0000:a2:00.1: no command queues available [ 9.745167] ccp 0000:a2:00.1: psp enabled [ 9.805337] ccp 0000:23:00.1: SEV-SNP: failed to INIT rc -5, error 0x3 [ 9.866426] ccp 0000:23:00.1: SEV API:1.53 build:5 Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature") Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Acked-by: Joerg Roedel <jroedel@suse.de> Message-ID: <138b520fb83964782303b43ade4369cd181fdd9c.1739226950.git.ashish.kalra@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-14iommu/vt-d: Make intel_iommu_drain_pasid_prq() cover faults for RIDLu Baolu
This driver supports page faults on PCI RID since commit <9f831c16c69e> ("iommu/vt-d: Remove the pasid present check in prq_event_thread") by allowing the reporting of page faults with the pasid_present field cleared to the upper layer for further handling. The fundamental assumption here is that the detach or replace operations act as a fence for page faults. This implies that all pending page faults associated with a specific RID or PASID are flushed when a domain is detached or replaced from a device RID or PASID. However, the intel_iommu_drain_pasid_prq() helper does not correctly handle faults for RID. This leads to faults potentially remaining pending in the iommu hardware queue even after the domain is detached, thereby violating the aforementioned assumption. Fix this issue by extending intel_iommu_drain_pasid_prq() to cover faults for RID. Fixes: 9f831c16c69e ("iommu/vt-d: Remove the pasid present check in prq_event_thread") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250121023150.815972-1-baolu.lu@linux.intel.com Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20250211005512.985563-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-14iommu/exynos: Fix typosAndrew Kreimer
There are some typos in comments/messages: - modyfying -> modifying - Unabled -> Unable Fix them via codespell. Signed-off-by: Andrew Kreimer <algonell@gmail.com> Link: https://lore.kernel.org/r/20250210112027.29791-1-algonell@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-14iommu: Fix a spelling errorEaswar Hariharan
Fix spelling error IDENITY -> IDENTITY in drivers/iommu/iommu.c. Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250128190522.70800-1-eahariha@linux.microsoft.com [ joro: Add commit message ] Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-14iommu/amd: Expicitly enable CNTRL.EPHEn bit in resume pathVasant Hegde
With recent kernel, AMDGPU failed to resume after suspend on certain laptop. Sample log: ----------- Nov 14 11:52:19 Thinkbook kernel: iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:06:00.0 pasid=0x00000 address=0x135300000 flags=0x0080] Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[0]: 7d90000000000003 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[1]: 0000100103fc0009 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[2]: 2000000117840013 Nov 14 11:52:19 Thinkbook kernel: AMD-Vi: DTE[3]: 0000000000000000 This is because in resume path, CNTRL[EPHEn] is not set. Fix this by setting CNTRL[EPHEn] to 1 in resume path if EFR[EPHSUP] is set. Note May be better approach is to save the control register in suspend path and restore it in resume path instead of trying to set indivisual bits. We will have separate patch for that. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219499 Fixes: c4cb23111103 ("iommu/amd: Add support for enable/disable IOPF") Tested-by: Hamish McIntyre-Bhatty <kernel-bugzilla@regd.hamishmb.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/20250127094411.5931-1-vasant.hegde@amd.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-02-11iommufd/fault: Remove iommufd_fault_domain_attach/detach/replace_dev()Nicolin Chen
There are new attach/detach/replace helpers in device.c taking care of both the attach_handle and the fault specific routines for iopf_enable/disable() and auto response. Clean up these redundant functions in the fault.c file. Link: https://patch.msgid.link/r/3ca94625e9d78270d9a715fa0809414fddd57e58.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-11iommufd: Make attach_handle generic than fault specificNicolin Chen
"attach_handle" was added exclusively for the iommufd_fault_iopf_handler() used by IOPF/PRI use cases. Now, both the MSI and PASID series require to reuse the attach_handle for non-fault cases. Add a set of new attach/detach/replace helpers that does the attach_handle allocation/releasing/replacement in the common path and also handles those fault specific routines such as iopf enabling/disabling and auto response. This covers both non-fault and fault cases in a clean way, replacing those inline helpers in the header. The following patch will clean up those old helpers in the fault.c file. Link: https://patch.msgid.link/r/32687df01c02291d89986a9fca897bbbe2b10987.1738645017.git.nicolinc@nvidia.com Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-10iommu: Fix potential memory leak in iopf_queue_remove_device()Lu Baolu
The iopf_queue_remove_device() helper removes a device from the per-iommu iopf queue when PRI is disabled on the device. It responds to all outstanding iopf's with an IOMMU_PAGE_RESP_INVALID code and detaches the device from the queue. However, it fails to release the group structure that represents a group of iopf's awaiting for a response after responding to the hardware. This can cause a memory leak if iopf_queue_remove_device() is called with pending iopf's. Fix it by calling iopf_free_group() after the iopf group is responded. Fixes: 199112327135 ("iommu: Track iopf group instead of last fault") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20250117055800.782462-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-25Merge tag 'hyperv-next-signed-20250123' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Introduce a new set of Hyper-V headers in include/hyperv and replace the old hyperv-tlfs.h with the new headers (Nuno Das Neves) - Fixes for the Hyper-V VTL mode (Roman Kisel) - Fixes for cpu mask usage in Hyper-V code (Michael Kelley) - Document the guest VM hibernation behaviour (Michael Kelley) - Miscellaneous fixes and cleanups (Jacob Pan, John Starks, Naman Jain) * tag 'hyperv-next-signed-20250123' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Documentation: hyperv: Add overview of guest VM hibernation hyperv: Do not overlap the hvcall IO areas in hv_vtl_apicid_to_vp_id() hyperv: Do not overlap the hvcall IO areas in get_vtl() hyperv: Enable the hypercall output page for the VTL mode hv_balloon: Fallback to generic_online_page() for non-HV hot added mem Drivers: hv: vmbus: Log on missing offers if any Drivers: hv: vmbus: Wait for boot-time offers during boot and resume uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup iommu/hyper-v: Don't assume cpu_possible_mask is dense Drivers: hv: Don't assume cpu_possible_mask is dense x86/hyperv: Don't assume cpu_possible_mask is dense hyperv: Remove the now unused hyperv-tlfs.h files hyperv: Switch from hyperv-tlfs.h to hyperv/hvhdk.h hyperv: Add new Hyper-V headers in include/hyperv hyperv: Clean up unnecessary #includes hyperv: Move hv_connection_id to hyperv-tlfs.h
2025-01-24Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix struct iommu_hwpt_pgfault init and padding iommufd/fault: Use a separate spinlock to protect fault->deliver list iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() iommufd: Keep OBJ/IOCTL lists in an alphabetical order iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() iommu: iommufd: fix WARNING in iommufd_device_unbind iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core iommufd/selftest: Remove domain_alloc_paging()
2025-01-24Merge tag 'iommu-updates-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core changes: - PASID support for the blocked_domain ARM-SMMU Updates: - SMMUv2: - Implement per-client prefetcher configuration on Qualcomm SoCs - Support for the Adreno SMMU on Qualcomm's SDM670 SOC - SMMUv3: - Pretty-printing of event records - Drop the ->domain_alloc_paging implementation in favour of domain_alloc_paging_flags(flags==0) - IO-PGTable: - Generalisation of the page-table walker to enable external walkers (e.g. for debugging unexpected page-faults from the GPU) - Minor fix for handling concatenated PGDs at stage-2 with 16KiB pages - Misc: - Clean-up device probing and replace the crufty probe-deferral hack with a more robust implementation of arm_smmu_get_by_fwnode() - Device-tree binding updates for a bunch of Qualcomm platforms Intel VT-d Updates: - Remove domain_alloc_paging() - Remove capability audit code - Draining PRQ in sva unbind path when FPD bit set - Link cache tags of same iommu unit together AMD-Vi Updates: - Use CMPXCHG128 to update DTE - Cleanups of the domain_alloc_paging() path RiscV IOMMU: - Platform MSI support - Shutdown support Rockchip IOMMU: - Add DT bindings for Rockchip RK3576 More smaller fixes and cleanups" * tag 'iommu-updates-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits) iommu: Use str_enable_disable-like helpers iommu/amd: Fully decode all combinations of alloc_paging_flags iommu/amd: Move the nid to pdom_setup_pgtable() iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_mode iommu/amd: Remove type argument from do_iommu_domain_alloc() and related iommu/amd: Remove dev == NULL checks iommu/amd: Remove domain_alloc() iommu/amd: Remove unused amd_iommu_domain_update() iommu/riscv: Fixup compile warning iommu/arm-smmu-v3: Add missing #include of linux/string_choices.h iommu/arm-smmu-v3: Use str_read_write helper w/ logs iommu/io-pgtable-arm: Add way to debug pgtable walk iommu/io-pgtable-arm: Re-use the pgtable walk for iova_to_phys iommu/io-pgtable-arm: Make pgtable walker more generic iommu/arm-smmu: Add ACTLR data and support for qcom_smmu_500 iommu/arm-smmu: Introduce ACTLR custom prefetcher settings iommu/arm-smmu: Add support for PRR bit setup iommu/arm-smmu: Refactor qcom_smmu structure to include single pointer iommu/arm-smmu: Re-enable context caching in smmu reset operation iommu/vt-d: Link cache tags of same iommu unit together ...
2025-01-21Merge tag 'irq-core-2025-01-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: - Consolidate the machine_kexec_mask_interrupts() by providing a generic implementation and replacing the copy & pasta orgy in the relevant architectures. - Prevent unconditional operations on interrupt chips during kexec shutdown, which can trigger warnings in certain cases when the underlying interrupt has been shut down before. - Make the enforcement of interrupt handling in interrupt context unconditionally available, so that it actually works for non x86 related interrupt chips. The earlier enablement for ARM GIC chips set the required chip flag, but did not notice that the check was hidden behind a config switch which is not selected by ARM[64]. - Decrapify the handling of deferred interrupt affinity setting. Some interrupt chips require that affinity changes are made from the context of handling an interrupt to avoid certain race conditions. For x86 this was the default, but with interrupt remapping this requirement was lifted and a flag was introduced which tells the core code that affinity changes can be done in any context. Unrestricted affinity changes are the default for the majority of interrupt chips. RISCV has the requirement to add the deferred mode to one of it's interrupt controllers, but with the original implementation this would require to add the any context flag to all other RISC-V interrupt chips. That's backwards, so reverse the logic and require that chips, which need the deferred mode have to be marked accordingly. That avoids chasing the 'sane' chips and marking them. - Add multi-node support to the Loongarch AVEC interrupt controller driver. - The usual tiny cleanups, fixes and improvements all over the place. * tag 'irq-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/generic_chip: Export irq_gc_mask_disable_and_ack_set() genirq/timings: Add kernel-doc for a function parameter genirq: Remove IRQ_MOVE_PCNTXT and related code x86/apic: Convert to IRQCHIP_MOVE_DEFERRED genirq: Provide IRQCHIP_MOVE_DEFERRED hexagon: Remove GENERIC_PENDING_IRQ leftover ARC: Remove GENERIC_PENDING_IRQ genirq: Remove handle_enforce_irqctx() wrapper genirq: Make handle_enforce_irqctx() unconditionally available irqchip/loongarch-avec: Add multi-nodes topology support irqchip/ts4800: Replace seq_printf() by seq_puts() irqchip/ti-sci-inta : Add module build support irqchip/ti-sci-intr: Add module build support irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic function irqchip: keystone: Use syscon_regmap_lookup_by_phandle_args genirq/kexec: Prevent redundant IRQ masking by checking state before shutdown kexec: Consolidate machine_kexec_mask_interrupts() implementation genirq: Reuse irq_thread_fn() for forced thread case genirq: Move irq_thread_fn() further up in the code
2025-01-21iommufd: Fix struct iommu_hwpt_pgfault init and paddingNicolin Chen
The iommu_hwpt_pgfault is used to report IO page fault data to userspace, but iommufd_fault_fops_read was never zeroing its padding. This leaks the content of the kernel stack memory to userspace. Also, the iommufd uAPI requires explicit padding and use of __aligned_u64 to ensure ABI compatibility's with 32 bit. pahole result, before: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 */ __u32 dev_id; /* 4 4 */ __u32 pasid; /* 8 4 */ __u32 grpid; /* 12 4 */ __u32 perm; /* 16 4 */ /* XXX 4 bytes hole, try to pack */ __u64 addr; /* 24 8 */ __u32 length; /* 32 4 */ __u32 cookie; /* 36 4 */ /* size: 40, cachelines: 1, members: 8 */ /* sum members: 36, holes: 1, sum holes: 4 */ /* last cacheline: 40 bytes */ }; pahole result, after: struct iommu_hwpt_pgfault { __u32 flags; /* 0 4 */ __u32 dev_id; /* 4 4 */ __u32 pasid; /* 8 4 */ __u32 grpid; /* 12 4 */ __u32 perm; /* 16 4 */ __u32 __reserved; /* 20 4 */ __u64 addr __attribute__((__aligned__(8))); /* 24 8 */ __u32 length; /* 32 4 */ __u32 cookie; /* 36 4 */ /* size: 40, cachelines: 1, members: 9 */ /* forced alignments: 1 */ /* last cacheline: 40 bytes */ } __attribute__((__aligned__(8))); Fixes: c714f15860fc ("iommufd: Add fault and response message definitions") Link: https://patch.msgid.link/r/20250120195051.2450-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-20iommufd/fault: Use a separate spinlock to protect fault->deliver listNicolin Chen
The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-17Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', ↵Joerg Roedel
'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next
2025-01-17iommu: Use str_enable_disable-like helpersKrzysztof Kozlowski
Replace ternary (condition ? "enable" : "disable") syntax with helpers from string_choices.h because: 1. Simple function call with one argument is easier to read. Ternary operator has three arguments and with wrapping might lead to quite long code. 2. Is slightly shorter thus also easier to read. 3. It brings uniformity in the text - same string. 4. Allows deduping by the linker, which results in a smaller binary file. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Pranjal Shrivastava <praan@google.com> Link: https://lore.kernel.org/r/20250114192642.912331-1-krzysztof.kozlowski@linaro.org Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Fully decode all combinations of alloc_paging_flagsJason Gunthorpe
Currently AMD does not support IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_DIRTY_TRACKING It should be rejected. Instead it creates a V1 domain without dirty tracking support. Use a switch to fully decode the flags. Fixes: ce2cd175469f ("iommu/amd: Enhance amd_iommu_domain_alloc_user()") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Move the nid to pdom_setup_pgtable()Jason Gunthorpe
The only thing that uses the nid is the io_pgtable code, and it should be set before calling alloc_io_pgtable_ops() to ensure that the top levels are allocated on the correct nid. Since dev is never NULL now we can just do this trivially and remove the other uses of nid. SVA and identity code paths never use it since they don't use io_pgtable. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Change amd_iommu_pgtable to use enum protection_domain_modeJason Gunthorpe
Currently it uses enum io_pgtable_fmt which is from the io pagetable code and most of the enum values are invalid. protection_domain_mode is internal the driver and has the only two valid values. Fix some signatures and variables to use the right type as well. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Remove type argument from do_iommu_domain_alloc() and relatedJason Gunthorpe
do_iommu_domain_alloc() is only called from amd_iommu_domain_alloc_paging_flags() so type is always IOMMU_DOMAIN_UNMANAGED. Remove type and all the dead conditionals checking it. IOMMU_DOMAIN_IDENTITY checks are similarly obsolete as the conversion to the global static identity domain removed those call paths. The caller of protection_domain_alloc() should set the type, fix the miss in the SVA code. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Remove dev == NULL checksJason Gunthorpe
This is no longer possible, amd_iommu_domain_alloc_paging_flags() is never called with dev = NULL from the core code. Similarly get_amd_iommu_from_dev() can never be NULL either. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Remove domain_alloc()Jason Gunthorpe
IOMMU drivers should not be sensitive to the domain type, a paging domain should be created based only on the flags passed in, the same for all callers. AMD was using the domain_alloc() path to force VFIO into a v1 domain type, because v1 gives higher performance. However now that IOMMU_HWPT_ALLOC_PASID is present, and a NULL device is not possible, domain_alloc_paging_flags() will do the right thing for VFIO. When invoked from VFIO flags will be 0 and the amd_iommu_pgtable type of domain will be selected. This is v1 by default unless the kernel command line has overridden it to v2. If the admin is forcing v2 assume they know what they are doing so force it everywhere, including for VFIO. Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2025-01-17iommu/amd: Remove unused amd_iommu_domain_update()Alejandro Jimenez
All the callers have been removed by the below commit, remove the implementation and prototypes. Fixes: 322d889ae7d3 ("iommu/amd: Remove amd_iommu_domain_update() from page table freeing") Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v2-9776c53c2966+1c7-amd_paging_flags_jgg@nvidia.com Signed-off-by: Joerg Roedel <jroedel@suse.de>