linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
3 days	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx " instead of a "void ", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-08	PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function	Lorenzo Pieralisi
	IRQchip drivers need a PCI/MSI function to map a RID to a MSI controller deviceID namespace and at the same time retrieve the struct device_node pointer of the MSI controller the RID is mapped to. Add pci_msi_map_rid_ctlr_node() to achieve this purpose. Cc Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-25-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-06-17	PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations	Shradha Gupta
	For supporting dynamic MSI-X vector allocation by PCI controllers, enabling the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() to prepare the MSI descriptor is also needed. Export pci_msix_prepare_desc() to allow PCI controllers to support dynamic MSI-X vector allocation. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2025-06-04	PCI/MSI: Size device MSI domain with the maximum number of vectors	Marc Zyngier
	Zenghui reports that since 1396e89e09f0 ("genirq/msi: Move prepare() call to per-device allocation"), his Multi-MSI capable device isn't working anymore. This is a consequence of 15c72f824b32 ("PCI/MSI: Add support for per device MSI[X] domains"), which always creates a MSI domain of size 1, even in the presence of Multi-MSI. While this was somehow working until then, moving the .prepare() call ends up sizing the ITS table with a tiny value for this device, and making the endpoint driver unhappy. Instead, always create the domain and call the .prepare() helper with the maximum expected size. Fixes: 1396e89e09f0 ("genirq/msi: Move prepare() call to per-device allocation") Fixes: 15c72f824b32 ("PCI/MSI: Add support for per device MSI[X] domains") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Link: https://lore.kernel.org/all/20250603141801.915305-1-maz@kernel.org Closes: https://lore.kernel.org/r/0b1d7aec-1eac-a9cd-502a-339e216e08a1@huawei.com
2024-12-16	PCI/MSI: Handle lack of irqdomain gracefully	Thomas Gleixner
	Alexandre observed a warning emitted from pci_msi_setup_msi_irqs() on a RISCV platform which does not provide PCI/MSI support: WARNING: CPU: 1 PID: 1 at drivers/pci/msi/msi.h:121 pci_msi_setup_msi_irqs+0x2c/0x32 __pci_enable_msix_range+0x30c/0x596 pci_msi_setup_msi_irqs+0x2c/0x32 pci_alloc_irq_vectors_affinity+0xb8/0xe2 RISCV uses hierarchical interrupt domains and correctly does not implement the legacy fallback. The warning triggers from the legacy fallback stub. That warning is bogus as the PCI/MSI layer knows whether a PCI/MSI parent domain is associated with the device or not. There is a check for MSI-X, which has a legacy assumption. But that legacy fallback assumption is only valid when legacy support is enabled, but otherwise the check should simply return -ENOTSUPP. Loongarch tripped over the same problem and blindly enabled legacy support without implementing the legacy fallbacks. There are weak implementations which return an error, so the problem was papered over. Correct pci_msi_domain_supports() to evaluate the legacy mode and add the missing supported check into the MSI enable path to complete it. Fixes: d2a463b29741 ("PCI/MSI: Reject multi-MSI early") Reported-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/87ed2a8ow5.ffs@tglx
2024-07-18	PCI/MSI: Provide MSI_FLAG_PCI_MSI_MASK_PARENT	Shivamurthy Shastri
	Most ARM(64) PCI/MSI domains mask and unmask in the parent domain after or before the PCI mask/unmask operation takes place. So there are more than a dozen of the same wrapper implementation all over the place. Don't make the same mistake with the new per device PCI/MSI domains and provide a new MSI feature flag, which lets the domain implementation enable this sequence in the PCI/MSI code. Signed-off-by: Shivamurthy Shastri <shivamurthy.shastri@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/87ed8j34pj.ffs@tglx
2024-05-15	Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support"	Bjorn Helgaas
	This reverts commit 0194425af0c87acaad457989a2c6d90dba58e776. IMS (Interrupt Message Store) support appeared in v6.2, but there are no users yet. Remove it for now. We can add it back when a user comes along. If this is re-added later, the relevant part of 41efa431244f ("PCI/MSI: Provide stubs for IMS functions") should be squashed into it. Link: https://lore.kernel.org/r/20240410221307.2162676-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2024-02-19	PCI/MSI: Prevent MSI hardware interrupt number truncation	Vidya Sagar
	While calculating the hardware interrupt number for a MSI interrupt, the higher bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI domain number gets truncated because of the shifted value casting to return type of pci_domain_nr() which is 'int'. This for example is resulting in same hardware interrupt number for devices 0019:00:00.0 and 0039:00:00.0. To address this cast the PCI domain number to 'irq_hw_number_t' before left shifting it to calculate the hardware interrupt number. Please note that this fixes the issue only on 64-bit systems and doesn't change the behavior for 32-bit systems i.e. the 32-bit systems continue to have the issue. Since the issue surfaces only if there are too many PCIe controllers in the system which usually is the case in modern server systems and they don't tend to run 32-bit kernels. Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain") Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115135649.708536-1-vidyas@nvidia.com
2023-08-25	PCI: Fix typos in docs and comments	Bjorn Helgaas
	Fix typos in docs and comments. Link: https://lore.kernel.org/r/20230824193712.542167-11-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2022-12-05	PCI/MSI: Provide IMS (Interrupt Message Store) support	Thomas Gleixner
	IMS (Interrupt Message Store) is a new specification which allows implementation specific storage of MSI messages contrary to the strict standard specified MSI and MSI-X message stores. This requires new device specific interrupt domains to handle the implementation defined storage which can be an array in device memory or host/guest memory which is shared with hardware queues. Add a function to create IMS domains for PCI devices. IMS domains are using the new per device domain mechanism and are configured by the device driver via a template. IMS domains are created as secondary device domains so they work side on side with MSI[-X] on the same device. The IMS domains have a few constraints: - The index space is managed by the core code. Device memory based IMS provides a storage array with a fixed size which obviously requires an index. But there is no association between index and functionality so the core can randomly allocate an index in the array. System memory based IMS does not have the concept of an index as the storage is somewhere in memory. In that case the index is purely software based to keep track of the allocations. - There is no requirement for consecutive index ranges This is currently a limitation of the MSI core and can be implemented if there is a justified use case by changing the internal storage from xarray to maple_tree. For now it's single vector allocation. - The interrupt chip must provide the following callbacks: - irq_mask() - irq_unmask() - irq_write_msi_msg() - The interrupt chip must provide the following optional callbacks when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks cannot operate directly on hardware, e.g. in the case that the interrupt message store is in queue memory: - irq_bus_lock() - irq_bus_unlock() These callbacks are invoked from preemptible task context and are allowed to sleep. In this case the mandatory callbacks above just store the information. The irq_bus_unlock() callback is supposed to make the change effective before returning. - Interrupt affinity setting is handled by the underlying parent interrupt domain and communicated to the IMS domain via irq_write_msi_msg(). IMS domains cannot have a irq_set_affinity() callback. That's a reasonable restriction similar to the PCI/MSI device domain implementations. The domain is automatically destroyed when the PCI device is removed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.904316841@linutronix.de
2022-12-05	PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X	Thomas Gleixner
	MSI-X vectors can be allocated after the initial MSI-X enablement, but this needs explicit support of the underlying interrupt domains. Provide a function to query the ability and functions to allocate/free individual vectors post-enable. The allocation can either request a specific index in the MSI-X table or with the index argument MSI_ANY_INDEX it allocates the next free vector. The return value is a struct msi_map which on success contains both index and the Linux interrupt number. In case of failure index is negative and the Linux interrupt number is 0. The allocation function is for a single MSI-X index at a time as that's sufficient for the most urgent use case VFIO to get rid of the 'disable MSI-X, reallocate, enable-MSI-X' cycle which is prone to lost interrupts and redirections to the legacy and obviously unhandled INTx. As single index allocation is also sufficient for the use cases Jason Gunthorpe pointed out: Allocation of a MSI-X or IMS vector for a network queue. See Link below. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20211126232735.547996838@linutronix.de Link: https://lore.kernel.org/r/20221124232326.731233614@linutronix.de
2022-12-05	PCI/MSI: Provide prepare_desc() MSI domain op	Thomas Gleixner
	The setup of MSI descriptors for PCI/MSI-X interrupts depends partially on the MSI index for which the descriptor is initialized. Dynamic MSI-X vector allocation post MSI-X enablement allows to allocate vectors at a given index or at any free index in the available table range. The latter requires that the descriptor is initialized after the MSI core has chosen an index. Implement the prepare_desc() op in the PCI/MSI-X specific msi_domain_ops which is invoked before the core interrupt descriptor and the associated Linux interrupt number is allocated. That callback is also provided for the upcoming PCI/IMS implementations so the implementation specific interrupt domain can do their domain specific initialization of the MSI descriptors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.673658806@linutronix.de
2022-12-05	PCI/MSI: Remove unused pci_dev_has_special_msi_domain()	Thomas Gleixner
	The check for special MSI domains like VMD which prevents the interrupt remapping code to overwrite device::msi::domain is not longer required and has been replaced by an x86 specific version which is aware of MSI parent domains. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.093093200@linutronix.de
2022-12-05	PCI/MSI: Add support for per device MSI[X] domains	Thomas Gleixner
	Provide a template and the necessary callbacks to create PCI/MSI and PCI/MSI-X domains. The domains are created when MSI or MSI-X is enabled. The domain's lifetime is either the device lifetime or in case that e.g. MSI-X was tried first and failed, then the MSI-X domain is removed and a MSI domain is created as both are mutually exclusive and reside in the default domain ID slot of the per device domain pointer array. Also expand pci_msi_domain_supports() to handle feature checks correctly even in the case that the per device domain was not yet created by checking the features supported by the MSI parent. Add the necessary setup calls into the MSI and MSI-X enable code path. These setup calls are backwards compatible. They return success when there is no parent domain found, which means the existing global domains or the legacy allocation path keep just working. Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.975388241@linutronix.de
2022-12-05	PCI/MSI: Use msi_domain_alloc/free_irqs_all_locked()	Thomas Gleixner
	Switch to the new domain id aware interfaces to phase out the previous ones. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124230314.455168748@linutronix.de
2022-11-17	PCI/MSI: Remove redundant msi_check() callback	Thomas Gleixner
	All these sanity checks are now done _before_ any allocation work happens. No point in doing it twice. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122015.749446904@linutronix.de
2022-11-17	PCI/MSI: Reject multi-MSI early	Thomas Gleixner
	When hierarchical MSI interrupt domains are enabled then there is no point to do tons of work and detect the missing support for multi-MSI late in the allocation path. Just query the domain feature flags right away. The query function is going to be used for other purposes later and has a mode argument which influences the result: ALLOW_LEGACY returns true when: - there is no irq domain attached (legacy support) - there is a irq domain attached which has the feature flag set DENY_LEGACY returns only true when: - there is a irq domain attached which has the feature flag set This allows to use the function universally without ifdeffery in the calling code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122015.574339988@linutronix.de
2022-11-17	PCI/MSI: Let the MSI core free descriptors	Ahmed S. Darwish
	Let the core do the freeing of descriptors and just keep it around for the legacy case. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122014.409654736@linutronix.de
2022-11-17	PCI/MSI: Use msi_domain_info:: Bus_token	Ahmed S. Darwish
	Set the bus token in the msi_domain_info structure and let the core code handle the update. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20221111122014.352437595@linutronix.de
2022-01-21	PCI/MSI: Prevent UAF in error path	Thomas Gleixner
	When the core MSI allocation fails, then the PCI/MSI code uses an already freed MSI descriptor to unmask the MSI mask register in order to bring it back into reset state. Remove MSI_FLAG_FREE_MSI_DESCS from the PCI/MSI irqdomain flags and let the PCI/MSI code free the MSI descriptors after usage. Fixes: 0f62d941acf9 ("genirq/msi: Provide msi_domain_alloc/free_irqs_descs_locked()") Reported-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Tong Zhang <ztong0001@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/87r1938vbn.ffs@tglx
2021-12-16	PCI/MSI: Use msi_on_each_desc()	Thomas Gleixner
	Use the new iterator functions which pave the way for dynamically extending MSI-X vectors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210748.142603657@linutronix.de
2021-12-16	PCI/MSI: Let core code free MSI descriptors	Thomas Gleixner
	Set the domain info flag which tells the core code to free the MSI descriptors from msi_domain_free_irqs() and add an explicit call to the core function into the legacy code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210748.089085131@linutronix.de
2021-12-16	PCI/MSI: Protect MSI operations	Thomas Gleixner
	To prepare for dynamic extension of MSI-X vectors, protect the MSI operations for MSI and MSI-X. This requires to move the invocation of irq_create_affinity_masks() out of the descriptor lock section to avoid reverse lock ordering vs. CPU hotplug lock as some callers of the PCI/MSI allocation interfaces already hold it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210747.982292705@linutronix.de
2021-12-16	PCI/MSI: Provide MSI_FLAG_MSIX_CONTIGUOUS	Thomas Gleixner
	Provide a domain info flag which makes the core code check for a contiguous MSI-X index on allocation. That's simpler than checking it at some other domain callback in architecture code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211210221814.662401116@linutronix.de
2021-12-16	PCI/MSI: Use msi_desc::msi_index	Thomas Gleixner
	The usage of msi_desc::pci::entry_nr is confusing at best. It's the index into the MSI[X] descriptor table. Use msi_desc::msi_index which is shared between all MSI incarnations instead of having a PCI specific storage for no value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211210221814.602911509@linutronix.de
2021-12-16	PCI/MSI: Let the irq code handle sysfs groups	Thomas Gleixner
	Set the domain info flag which makes the core code handle sysfs groups and put an explicit invocation into the legacy code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Michael Kelley <mikelley@microsoft.com> Tested-by: Nishanth Menon <nm@ti.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211210221814.048612053@linutronix.de
2021-12-09	genirq/msi: Handle PCI/MSI allocation fail in core code	Thomas Gleixner
	Get rid of yet another irqdomain callback and let the core code return the already available information of how many descriptors could be allocated. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI Link: https://lore.kernel.org/r/20211206210225.046615302@linutronix.de
2021-12-09	PCI/MSI: Make pci_msi_domain_check_cap() static	Thomas Gleixner
	No users outside of that file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210224.980989243@linutronix.de
2021-12-09	PCI/MSI: Split out irqdomain code	Thomas Gleixner
	Move the irqdomain specific code into its own file. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20211206210224.817754783@linutronix.de