summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)Author
2024-07-19Merge branch 'pci/controller/al'Bjorn Helgaas
- Check IORESOURCE_BUS existence to avoid NULL pointer dereference (Aleksandr Mishin) * pci/controller/al: PCI: al: Check IORESOURCE_BUS existence during probe
2024-07-19Merge branch 'pci/controller/dwc'Bjorn Helgaas
- Use msleep() in DWC core instead of usleep_range() for ~100 ms sleep (Konrad Dybcio) - Fix iATU slot management to avoid using the wrong slot after PERST# assert/deassert, which could potentially cause DMA to go the wrong place (Frank Li) - Consolidate dw_pcie_prog_outbound_atu() arguments into a struct to ease adding new functionality like initiating Message TLPs (Yoshihiro Shimoda) - Add support for endpoints to initiate PCIe messages (Yoshihiro Shimoda) - Add #defines for PCIe INTx messages (Yoshihiro Shimoda) - Add support for endpoints to initiate PCIe PME_Turn_Off messages for system suspend (Frank Li) - Add dw_pcie_ep_linkdown() to reinitialize registers that are lost when the link goes down (Manivannan Sadhasivam) - Use dw_pcie_ep_linkdown() to reinitialize qcom non-sticky registers that are lost when the link goes down (Manivannan Sadhasivam) - Enforce DWC limitation that 64-bit BARs must start with the even numbered BAR (Niklas Cassel) * pci/controller/dwc: PCI: dwc: ep: Enforce DWC specific 64-bit BAR limitation PCI: layerscape-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down event PCI: qcom-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down event PCI: dwc: ep: Remove dw_pcie_ep_init_notify() wrapper PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown() API to handle Link Down event PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system suspend PCI: Add PCIE_MSG_CODE_PME_TURN_OFF message macro PCI: Add PCIE_MSG_CODE_ASSERT_INTx message macros PCI: dwc: Add outbound MSG TLPs support PCI: dwc: Consolidate args of dw_pcie_prog_outbound_atu() into a structure PCI: dwc: Fix index 0 incorrectly being interpreted as a free ATU slot PCI: dwc: Use msleep() in dw_pcie_wait_for_link()
2024-07-19Merge branch 'pci/controller/gpio'Bjorn Helgaas
- Include <linux/irqchip/chained_irq.h> in dra7xx to avoid implicitly including it elsewhere (Andy Shevchenko) - Remove unused <linux/of_gpio.h> from aardvark and dwc drivers (dra7xx, meson, qcom, tegra194) (Andy Shevchenko) - Convert kirin to use scoped for_each_available_child_of_node() to ease future error exits (Javier Carrasco) - Convert imx6 and kirin to use the agnostic GPIO API to simplify GPIO setup and remove usage of the deprecated of_gpio.h API (Andy Shevchenko) * pci/controller/gpio: PCI: kirin: Convert to use agnostic GPIO API PCI: kirin: Convert kirin_pcie_parse_port() to scoped iterator PCI: imx6: Convert to use agnostic GPIO API PCI: dwc: Remove unused of_gpio.h inclusion PCI: aardvark: Remove unused of_gpio.h inclusion PCI: dra7xx: Add missing chained IRQ header inclusion
2024-07-19Merge branch 'pci/endpoint'Bjorn Helgaas
- Remove unused struct pci_epf_group.type_group (Christophe JAILLET) - Use cached epc_features instead of pci_epc_get_features() to avoid having to check for failure (potential NULL pointer dereference) (Manivannan Sadhasivam) - Drop pointless local msix_capable variable in pci_epf_test_alloc_space() (Manivannan Sadhasivam) - Rename struct pci_epc_event_ops.core_init to .epc_init, since "core" is no longer meaningful here (Manivannan Sadhasivam) - Rename pci_epc_bme_notify(), pci_epf_mhi_bme(), pci_epc_bme_notify() to spell out "bus_master_enable" instead of "bme" (Manivannan Sadhasivam) - Factor pci_epf_test_clear_bar() and pci_epf_test_free_space() out of pci_epf_test_unbind() so they can be reused elsewhere (Manivannan Sadhasivam) - Move DMA initialization to the pci_epf_mhi_epc_init() callback so endpoint drivers do this uniformly (Manivannan Sadhasivam) - Add endpoint testing for Link Down events (Manivannan Sadhasivam) - Add 'epc_deinit' event so endpoints that can be reset via PERST# (qcom, tegra194) can notify EPF drivers when this happens (Manivannan Sadhasivam) - Make pci_epc_class constant (Greg Kroah-Hartman) - Fix vpci_scan_bus() error checking to print error for failure (not success) and clean up after failure (Dan Carpenter) - Fix epf_ntb_epc_cleanup() error handling to clean up scratchpad BARs and clean up in mirror order of allocation (Dan Carpenter) - Add rk3588, which requires 64KB BAR alignment, to pci_endpoint_test (Niklas Cassel) - Use memcpy_toio()/memcpy_fromio() for endpoint BAR tests to improve performance (Niklas Cassel) - Set DMA mask to 48 bits always to simplify endpoint test, since there's there's no need to check for error or to fallback to 32 bits (Frank Li) - Suggest using programmable Vendor/Device ID (when supported) to use pci_endpoint_test without having to add new entries (Yoshihiro Shimoda) - Remove unused pci_endpoint_test_bar_{readl,writel}() (Jiapeng Chong) - Remove 'linkup' and add 'add_cfs' to the endpoint function driver 'ops' documentation to match the code (Alexander Stein) - * pci/endpoint: Documentation: PCI: pci-endpoint: Fix EPF ops list misc: pci_endpoint_test: Remove unused pci_endpoint_test_bar_{readl,writel} functions misc: pci_endpoint_test: Document policy about adding pci_device_id misc: pci_endpoint_test: Refactor dma_set_mask_and_coherent() logic misc: pci_endpoint_test: Use memcpy_toio()/memcpy_fromio() for BAR tests misc: pci_endpoint_test: Add support for Rockchip rk3588 PCI: endpoint: Fix error handling in epf_ntb_epc_cleanup() PCI: endpoint: Clean up error handling in vpci_scan_bus() PCI: endpoint: Make pci_epc_class struct constant PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF drivers PCI: endpoint: pci-epf-test: Handle Link Down event PCI: endpoint: pci-epf-{mhi/test}: Move DMA initialization to EPC init callback PCI: endpoint: pci-epf-test: Refactor pci_epf_test_unbind() function PCI: endpoint: Rename BME to Bus Master Enable PCI: endpoint: Rename core_init() callback in 'struct pci_epc_event_ops' to epc_init() PCI: endpoint: pci-epf-test: Use 'msix_capable' flag directly in pci_epf_test_alloc_space() PCI: endpoint: pci-epf-test: Make use of cached 'epc_features' in pci_epf_test_core_init() PCI: endpoint: Remove unused field in struct pci_epf_group
2024-07-19Merge branch 'pci/resource'Bjorn Helgaas
- Rename find_resource() to find_resource_space() to make it more descriptive for exporting outside resource.c (Ilpo Järvinen) - Document find_resource_space() and the resource_constraint struct it uses (Ilpo Järvinen) - Add typedef resource_alignf to make it simpler to declare allocation constraint alignf callbacks (Ilpo Järvinen) - Open-code the no-constraint simple alignment case to make the simple_align_resource() default callback unnecessary (Ilpo Järvinen) - Export find_resource_space() because PCI bridge window allocation needs to learn whether there's space for a window (Ilpo Järvinen) - Fix a double-counting problem in PCI calculate_memsize() that led to allocating larger windows each time a bus was removed and rescanned (Ilpo Järvinen) - When we don't have space to allocate larger bridge windows, allocate windows only large enough for the downstream devices to prevent cases where a device worked originally, but not after being removed and re-added (Ilpo Järvinen) * pci/resource: PCI: Relax bridge window tail sizing rules PCI: Make minimum bridge window alignment reference more obvious PCI: Fix resource double counting on remove & rescan resource: Export find_resource_space() resource: Handle simple alignment inside __find_resource_space() resource: Use typedef for alignf callback resource: Document find_resource_space() and resource_constraint resource: Rename find_resource() to find_resource_space()
2024-07-19Merge branch 'pci/reset'Bjorn Helgaas
- Warn about doing a Secondary Bus Reset without holding the device lock (Dan Williams) - Lock bridge in addition to downstream hierarchy before doing a Secondary Bus Reset (Dan Williams) * pci/reset: PCI: Add missing bridge lock to pci_bus_lock() PCI: Warn on missing cfg_access_lock during secondary bus reset
2024-07-19Merge branch 'pci/hotplug'Bjorn Helgaas
- Detect if a device was removed or replaced during system sleep so we don't assume a new device is the one that used to be there. This uses Vendor/Device/Subsystem/Class/Revision and Device Serial Number (if implemented), so it's not fool-proof and drivers may know how to detect more cases (Lukas Wunner) - Add missing MODULE_DESCRIPTION() macro (Jeff Johnson) * pci/hotplug: PCI: acpiphp: Add missing MODULE_DESCRIPTION() macro PCI: pciehp: Detect device replacement during system sleep
2024-07-19Merge branch 'pci/err'Bjorn Helgaas
- Disable AER and DPC during suspend so that if they share an interrupt with PME and errors occur during suspend, the AER or DPC interrupt doesn't cause spurious wakeups (Kai-Heng Feng) * pci/err: PCI/DPC: Disable DPC service on suspend PCI/AER: Disable AER service on suspend
2024-07-19Merge branch 'pci/enumeration'Bjorn Helgaas
- Move the PRESERVE_BOOT_CONFIG ACPI _DSM evaluation from drivers/acpi to drivers/pci so we can unify with similar DT functionality (Vidya Sagar) - Add of_pci_preserve_config() to check for a DT "linux,pci-probe-only" property on a per-host bridge basis in addition to a global basis (Vidya Sagar) - Unify ACPI PRESERVE_BOOT_CONFIG _DSM and DT "linux,pci-probe-only" in a generic pci_preserve_config() path (Vidya Sagar) * pci/enumeration: PCI: Use preserve_config in place of pci_flags PCI: Unify ACPI and DT 'preserve config' support PCI: of: Add of_pci_preserve_config() for per-host bridge support PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()
2024-07-19Merge branch 'pci/dpc'Bjorn Helgaas
- If there's a device below a bridge, prevent a use-after-free by holding a reference to the device while waiting for the secondary bus to be ready in case the device is concurrently removed, e.g., by DPC (Lukas Wunner) * pci/dpc: PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
2024-07-19Merge branch 'pci/devres'Bjorn Helgaas
- Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions to simplify devres iomap table (Philipp Stanner) - Reimplement devres that take a bit mask of BARs in a way that can be used to map partial BARs as well as entire BARs (Philipp Stanner) - Deprecate pcim_iomap_table() and pcim_iomap_regions_request_all() in favor of pcim_* request plus pcim_* mapping (Philipp Stanner) - Add pcim_request_region(), a managed interface to request a single BAR (Philipp Stanner) - Use the existing pci_is_enabled() interface to replace the struct devres.enabled bit (Philipp Stanner) - Move the struct pci_devres.pinned bit to struct pci_dev (Philipp Stanner) - Reimplement pcim_set_mwi() so it uses its own devres cleanup callback instead of a special-purpose bit in struct pci_devres (Philipp Stanner) - Add pcim_intx(), which is unambiguously managed, unlike pci_intx(), which is managed if pcim_enable_device() has been called but unmanaged otherwise (Philipp Stanner) - Remove pcim_release(), which is no longer needed after previous cleanups of pcim_set_mwi() and pci_intx() (Philipp Stanner) - Add pcim_iomap_range(), a managed interface to map part of a BAR (Philipp Stanner) - Fix vboxvideo leak by using the new pcim_iomap_range() instead of the unmanaged pci_iomap_range() (Philipp Stanner) * pci/devres: drm/vboxvideo: fix mapping leaks PCI: Add managed pcim_iomap_range() PCI: Remove legacy pcim_release() PCI: Add managed pcim_intx() PCI: Give pcim_set_mwi() its own devres cleanup callback PCI: Move struct pci_devres.pinned bit to struct pci_dev PCI: Remove struct pci_devres.enabled status bit PCI: Document hybrid devres hazards PCI: Add managed pcim_request_region() PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all() PCI: Add managed partial-BAR request and map infrastructure PCI: Add devres helpers for iomap table PCI: Add and use devres helper for bit masks
2024-07-12PCI: Extend ACS configurabilityVidya Sagar
PCIe ACS settings control the level of isolation and the possible P2P paths between devices. With greater isolation the kernel will create smaller iommu_groups and with less isolation there is more HW that can achieve P2P transfers. From a virtualization perspective all devices in the same iommu_group must be assigned to the same VM as they lack security isolation. There is no way for the kernel to automatically know the correct ACS settings for any given system and workload. Existing command line options (e.g., disable_acs_redir) allow only for large scale change, disabling all isolation, but this is not sufficient for more complex cases. Add a kernel command-line option 'config_acs' to directly control all the ACS bits for specific devices, which allows the operator to setup the right level of isolation to achieve the desired P2P configuration. The definition is future proof; when new ACS bits are added to the spec the open syntax can be extended. ACS needs to be setup early in the kernel boot as the ACS settings affect how iommu_groups are formed. iommu_group formation is a one time event during initial device discovery, so changing ACS bits after kernel boot can result in an inaccurate view of the iommu_groups compared to the current isolation configuration. ACS applies to PCIe Downstream Ports and multi-function devices. The default ACS settings are strict and deny any direct traffic between two functions. This results in the smallest iommu_group the HW can support. Frequently these values result in slow or non-working P2PDMA. ACS offers a range of security choices controlling how traffic is allowed to go directly between two devices. Some popular choices: - Full prevention - Translated requests can be direct, with various options - Asymmetric direct traffic, A can reach B but not the reverse - All traffic can be direct Along with some other less common ones for special topologies. The intention is that this option would be used with expert knowledge of the HW capability and workload to achieve the desired configuration. Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> [bhelgaas: add example, tidy printk formats] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-12PCI: Add missing bridge lock to pci_bus_lock()Dan Williams
One of the true positives that the cfg_access_lock lockdep effort identified is this sequence: WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70 RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70 Call Trace: <TASK> ? __warn+0x8c/0x190 ? pci_bridge_secondary_bus_reset+0x5d/0x70 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? pci_bridge_secondary_bus_reset+0x5d/0x70 pci_reset_bus+0x1d8/0x270 vmd_probe+0x778/0xa10 pci_device_probe+0x95/0x120 Where pci_reset_bus() users are triggering unlocked secondary bus resets. Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses pci_bus_lock() before issuing the reset which locks everything *but* the bridge itself. For the same motivation as adding: bridge = pci_upstream_bridge(dev); if (bridge) pci_dev_lock(bridge); to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add pci_dev_lock() for @bus->self to pci_bus_lock(). Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: squash in recursive locking deadlock fix from Keith Busch: https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11PCI: Add managed pcim_iomap_range()Philipp Stanner
The only managed mapping function currently is pcim_iomap() which doesn't allow for mapping an area starting at a certain offset, which many drivers want. Add pcim_iomap_range() as an exported function. Link: https://lore.kernel.org/r/20240613115032.29098-13-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-11PCI: Remove legacy pcim_release()Philipp Stanner
Thanks to preceding cleanup steps, pcim_release() is now not needed anymore and can be replaced by pcim_disable_device(), which is the exact counterpart to pcim_enable_device(). This permits removing further parts of the old PCI devres implementation. Replace pcim_release() with pcim_disable_device(). Remove the now unused function get_pci_dr(). Remove the struct pci_devres from pci.h. Link: https://lore.kernel.org/r/20240613115032.29098-12-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-11PCI: Add managed pcim_intx()Philipp Stanner
pci_intx() is a "hybrid" function, i.e., it is managed if pcim_enable_device() has been called, but unmanaged otherwise. Add pcim_intx(), which is always managed, and implement pci_intx() using it. Remove the now-unused struct pci_devres.orig_intx and .restore_intx and find_pci_dr(). Link: https://lore.kernel.org/r/20240613115032.29098-11-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> [kwilczynski: squashed in https://lore.kernel.org/r/426645d40776198e0fcc942f4a6cac4433c7a9aa.camel@redhat.com to fix problem reported and tested by Ashish Kalra <Ashish.Kalra@amd.com>: https://lore.kernel.org/r/20240708214656.4721-1-Ashish.Kalra@amd.com https://lore.kernel.org/r/8c4634e9-4f02-4c54-9c89-d75e2f4bf026@amd.com/] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Give pcim_set_mwi() its own devres cleanup callbackPhilipp Stanner
Managing pci_set_mwi() with devres can easily be done with its own callback, without the necessity to store any state about it in a device-related struct. Remove the MWI state from struct pci_devres. Give pcim_set_mwi() a separate devres cleanup callback. Link: https://lore.kernel.org/r/20240613115032.29098-10-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Move struct pci_devres.pinned bit to struct pci_devPhilipp Stanner
The bit describing whether the PCI device is currently pinned is stored in struct pci_devres. To clean up and simplify the PCI devres API, it's better if this information is stored in struct pci_dev. This will later permit simplifying pcim_enable_device(). Move the 'pinned' boolean bit to struct pci_dev. Restructure bits in struct pci_dev so the pm / pme fields are next to each other. Link: https://lore.kernel.org/r/20240613115032.29098-9-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Remove struct pci_devres.enabled status bitPhilipp Stanner
The struct pci_devres has a separate boolean to track whether a device is enabled. That, however, can easily be tracked in an agnostic manner through the function pci_is_enabled(). Using it allows for simplifying the PCI devres implementation. Replace the separate 'enabled' status bit from struct pci_devres with calls to pci_is_enabled() at the appropriate places. Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Document hybrid devres hazardsPhilipp Stanner
These functions: pci_request_region() pci_request_regions() pci_request_regions_exclusive() pci_request_selected_regions() pci_request_selected_regions_exclusive() pci_intx() are "hybrid" functions that are managed if pcim_enable_device() has been called, but unmanaged otherwise. This is confusing and has already caused a bug (in 8558de401b5f ("drm/vboxvideo: use managed pci functions")) because users believe all PCI functions, such as pci_iomap_range(), can become managed that way, which is not the case. Add comments to the relevant functions' docstrings that warn users about this behavior. Link: https://lore.kernel.org/r/20240613115032.29098-7-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add managed pcim_request_region()Philipp Stanner
These existing functions: pci_request_region() pci_request_selected_regions() pci_request_selected_regions_exclusive() are "hybrid" functions built on __pci_request_region() and are managed if pcim_enable_device() has been called, but unmanaged otherwise. Add these new functions: pcim_request_region() pcim_request_region_exclusive() These are *always* managed and use the new pcim_addr_devres tracking infrastructure instead of find_pci_dr() and struct pci_devres.region_mask. Implement the hybrid functions using the new "pure" functions and remove struct pci_devres.region_mask, which is no longer needed. Link: https://lore.kernel.org/r/20240613115032.29098-6-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Deprecate pcim_iomap_table(), pcim_iomap_regions_request_all()Philipp Stanner
Deprecate pcim_iomap_table(). It returns a pointer to a table of ioremapped BARs, or NULL if it fails. This makes uses like this: addr = pcim_iomap_table(pdev)[0]; problematic because it causes a NULL pointer dereference on failure. Callers should use pcim_iomap() instead. Deprecate pcim_iomap_regions_request_all() because it is built on __pci_request_region() and is managed if pcim_enable_device() has been called, but unmanaged otherwise, which is prone to errors. Callers should either use pcim_iomap_regions() to request and map BARs, or use pcim_request_region() followed by pcim_iomap(). Link: https://lore.kernel.org/r/20240613115032.29098-5-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log, sphinx markup] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add managed partial-BAR request and map infrastructurePhilipp Stanner
The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it to build a managed version of pci_iomap_range(), which maps partial BARs. Add struct pcim_addr_devres, which can track request and mapping of both entire BARs and partial BARs. Add the following internal devres functions based on struct pcim_addr_devres: pcim_iomap_region() # request & map entire BAR pcim_iounmap_region() # unmap & release entire BAR pcim_request_region() # request entire BAR pcim_release_region() # release entire BAR pcim_request_all_regions() # request all entire BARs pcim_release_all_regions() # release all entire BARs Rework the following public interfaces using the new infrastructure listed above: pcim_iomap() # map partial BAR pcim_iounmap() # unmap partial BAR pcim_iomap_regions() # request & map specified BARs pcim_iomap_regions_request_all() # request all BARs, map specified BARs pcim_iounmap_regions() # unmap & release specified BARs Link: https://lore.kernel.org/r/20240613115032.29098-4-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add devres helpers for iomap tablePhilipp Stanner
The pcim_iomap_devres.table administrated by pcim_iomap_table() has its entries set and unset at several places throughout devres.c using manual iterations which are effectively code duplications. Add pcim_add_mapping_to_legacy_table() and pcim_remove_mapping_from_legacy_table() helper functions and use them where possible. Link: https://lore.kernel.org/r/20240613115032.29098-3-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-10PCI: Add and use devres helper for bit masksPhilipp Stanner
The current devres implementation uses manual shift operations to check whether a bit in a mask is set. The code can be made more readable by writing a small helper function for that. Implement mask_contains_bar() and use it where applicable. Link: https://lore.kernel.org/r/20240613115032.29098-2-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-07-09PCI: dwc: ep: Enforce DWC specific 64-bit BAR limitationNiklas Cassel
From the DWC EP databook 5.96a, section "3.5.7.1.4 General Rules for BAR Setup (Fixed Mask or Programmable Mask Schemes Only)": "Any pair (for example BARs 0 and 1) can be configured as one 64-bit BAR, two 32-bit BARs, or one 32-bit BAR." "BAR pairs cannot overlap to form a 64-bit BAR. For example, you cannot combine BARs 1 and 2 to form a 64-bit BAR." While this limitation does exist in some other PCI endpoint controllers, e.g. cdns_pcie_ep_set_bar(), the limitation does not appear to be defined in the PCIe specification itself, thus add an explicit check for this in dw_pcie_ep_set_bar() (rather than pci_epc_set_bar()). Link: https://lore.kernel.org/linux-pci/20240528134839.8817-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-07-09PCI: layerscape-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link ↵Manivannan Sadhasivam
Down event Now that dw_pcie_ep_linkdown() is available, use it. This also handles the reinitialization of DWC non-sticky registers in addition to sending the notification to EPF drivers. Closes: https://lore.kernel.org/linux-pci/20240528195539.GA458945@bhelgaas Link: https://lore.kernel.org/linux-pci/20240606-pci-deinit-v1-5-4395534520dc@linaro.org Reported-by: Bjorn Helgaas <helgaas@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Niklas Cassel <cassel@kernel.org>
2024-07-09PCI: qcom-ep: Use the generic dw_pcie_ep_linkdown() API to handle Link Down ↵Manivannan Sadhasivam
event Now that the generic dw_pcie_ep_linkdown() API is available, use it. This also handles the reinitialization of DWC non-sticky registers in addition to sending the notification to EPF drivers. Link: https://lore.kernel.org/linux-pci/20240430-pci-epf-rework-v4-9-22832d0d456f@linaro.org Tested-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org>
2024-07-09PCI: dwc: ep: Remove dw_pcie_ep_init_notify() wrapperManivannan Sadhasivam
Currently dw_pcie_ep_init_notify() wrapper just calls pci_epc_init_notify() directly, so this wrapper provides no benefit to the glue drivers. Remove it and call pci_epc_init_notify() directly from glue drivers. Suggested-by: Bjorn Helgaas <helgaas@kernel.org> Link: https://lore.kernel.org/linux-pci/20240606-pci-deinit-v1-1-4395534520dc@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Niklas Cassel <cassel@kernel.org>
2024-07-09PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown() API to handle Link Down eventManivannan Sadhasivam
Per PCIe r6.0, sec 5.2, a Link Down event can happen under any of the following circumstances: 1. Fundamental/Hot reset 2. Link disable transmission by upstream component 3. Moving from L2/L3 to L0 In those cases, Link Down causes some non-sticky DWC registers to lose the state (like REBAR, etc.), so drivers need to reinitialize them to function properly once the link comes back again. This is not a problem for drivers supporting PERST# IRQ, since they can reinitialize the registers in the PERST# IRQ callback. But for the drivers not supporting PERST#, there is no way they can reinitialize the registers other than relying on Link Down IRQ received when the link goes down. So add a DWC generic API dw_pcie_ep_linkdown() that reinitializes the non-sticky registers and also notifies the EPF drivers about link going down. This API can also be used by the drivers supporting PERST# to handle the scenario (2) mentioned above. NOTE: For the sake of code organization, move the dw_pcie_ep_linkup() definition just above dw_pcie_ep_linkdown(). Link: https://lore.kernel.org/linux-pci/20240430-pci-epf-rework-v4-8-22832d0d456f@linaro.org Tested-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: update spec citation] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org>
2024-07-09PCI: dwc: Add generic MSG TLP support for sending PME_Turn_Off when system ↵Frank Li
suspend Instead of relying on the vendor specific implementations to send the PME_Turn_Off message, introduce a generic way of sending the message using the MSG TLP. This is achieved by reserving a region for MSG TLP of size 'pci->region_align', at the end of the first IORESOURCE_MEM window of the host bridge. And then sending the PME_Turn_Off message during system suspend with the help of iATU. The reason for reserving the MSG TLP region at the end of the IORESOURCE_MEM is to avoid generating holes in between, because when the region is allocated using allocate_resource(), memory will be allocated from the start of the window. Later, if memory gets allocated for an endpoint of size bigger than 'region_align', there will be a hole between MSG TLP region and endpoint memory. This generic implementation is optional for the glue drivers and can be overridden by a custom 'pme_turn_off' callback. Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-5-a54265c39742@nxp.com Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-07-09PCI: Add PCIE_MSG_CODE_PME_TURN_OFF message macroFrank Li
Add PCIE_MSG_CODE_PME_TURN_OFF macros to enable a PCIe host driver to send PME_Turn_Off messages. Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-4-a54265c39742@nxp.com Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-07-09PCI: Add PCIE_MSG_CODE_ASSERT_INTx message macrosYoshihiro Shimoda
Add "Message Routing" and "INTx Mechanism Messages" macros to enable a PCIe driver to send messages for INTx Interrupt Signaling. Values from PCIe r6.1, sec 2.2.8 and 2.2.8.1. Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-1-a54265c39742@nxp.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
2024-07-09PCI: dwc: Add outbound MSG TLPs supportYoshihiro Shimoda
Add "code" and "routing" into struct dw_pcie_ob_atu_cfg for triggering INTx IRQs by iATU in the PCIe endpoint mode in near the future. PCIE_ATU_INHIBIT_PAYLOAD is set to issue TLP type of Msg instead of MsgD. This implementation supports the data-less messages only for now. Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-3-a54265c39742@nxp.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
2024-07-09PCI: dwc: Consolidate args of dw_pcie_prog_outbound_atu() into a structureYoshihiro Shimoda
This is a preparation before adding the Msg-type outbound iATU mapping. The respective update will require two more arguments added to __dw_pcie_prog_outbound_atu(). That will make the already complicated function prototype even more hard to comprehend accepting _eight_ arguments. To prevent that and keep the code more-or-less readable, move all the outbound iATU-related arguments to a new config structure: struct dw_pcie_ob_atu_cfg, and pass a pointer to dw_pcie_prog_outbound_atu(). The structure should be locally defined and populated with the outbound iATU settings implied by the caller context. As a result of this change there is no longer need in having the two distinctive methods for the Host and Endpoint outbound iATU setups since the code can directly call the dw_pcie_prog_outbound_atu() method with the config structure populated, so drop dw_pcie_prog_ep_outbound_atu(). [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240418-pme_msg-v8-2-a54265c39742@nxp.com Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
2024-07-09PCI: dwc: Fix index 0 incorrectly being interpreted as a free ATU slotFrank Li
When PERST# assert and deassert happens on the PERST# supported platforms, both iATU0 and iATU6 will map inbound window to BAR0. DMA will access the area that was previously allocated (iATU0) for BAR0, instead of the new area (iATU6) for BAR0. Right now, this isn't an issue because both iATU0 and iATU6 should translate inbound accesses to BAR0 to the same allocated memory area. However, having two separate inbound mappings for the same BAR is a disaster waiting to happen. The mappings between PCI BAR and iATU inbound window are maintained in the dw_pcie_ep::bar_to_atu[] array. While allocating a new inbound iATU map for a BAR, dw_pcie_ep_inbound_atu() API checks for the availability of the existing mapping in the array and if it is not found (i.e., value in the array indexed by the BAR is found to be 0), it allocates a new map value using find_first_zero_bit(). The issue is the existing logic failed to consider the fact that the map value '0' is a valid value for BAR0, so find_first_zero_bit() will return '0' as the map value for BAR0 (note that it returns the first zero bit position). Due to this, when PERST# assert + deassert happens on the PERST# supported platforms, the inbound window allocation restarts from BAR0 and the existing logic to find the BAR mapping will return '6' for BAR0 instead of '0' due to the fact that it considers '0' as an invalid map value. Fix this issue by always incrementing the map value before assigning to bar_to_atu[] array and then decrementing it while fetching. This will make sure that the map value '0' always represents the invalid mapping." Fixes: 4284c88fff0e ("PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address") Closes: https://lore.kernel.org/linux-pci/ZXsRp+Lzg3x%2Fnhk3@x1-carbon/ Link: https://lore.kernel.org/linux-pci/20240412160841.925927-1-Frank.Li@nxp.com Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com> Tested-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
2024-07-09PCI: dwc: Use msleep() in dw_pcie_wait_for_link()Konrad Dybcio
According to [1], msleep should be used for large sleeps, such as the 100-ish ms one in this function. Comply with the guide and use it. [1] https://docs.kernel.org/timers/timers-howto.html [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240215-topic-pci_sleep-v2-1-79334884546b@linaro.org Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-07-09PCI: kirin: Convert to use agnostic GPIO APIAndy Shevchenko
The of_gpio.h legacy API is going to be removed. In preparation for that, convert the driver to the agnostic API. [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240506142142.4042810-6-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org>
2024-07-09PCI: kirin: Convert kirin_pcie_parse_port() to scoped iteratorJavier Carrasco
Convert loops in kirin_pcie_parse_port() to use the _scoped() version of for_each_available_child_of_node() so the refcounts of children are implicitly decremented when the loop is exited. No functional change intended here, but it will make future error exits from these loops easier. Link: https://lore.kernel.org/linux-pci/20240609-pcie-kirin-memleak-v1-1-62b45b879576@gmail.com Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: move to GPIO series to avoid bisection hole, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-07-04PCI: endpoint: Fix error handling in epf_ntb_epc_cleanup()Dan Carpenter
There are two issues related to epf_ntb_epc_cleanup(): 1) It should call epf_ntb_config_sspad_bar_clear() 2) The epf_ntb_bind() function should call epf_ntb_epc_cleanup() to cleanup. I also changed the ordering a bit. Unwinding should be done in the mirror order from how they are allocated. Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP") Link: https://lore.kernel.org/linux-pci/aaffbe8d-7094-4083-8146-185f4a84e8a1@moroto.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-04PCI: endpoint: Clean up error handling in vpci_scan_bus()Dan Carpenter
Smatch complains about inconsistent NULL checking in vpci_scan_bus(): drivers/pci/endpoint/functions/pci-epf-vntb.c:1024 vpci_scan_bus() error: we previously assumed 'vpci_bus' could be null (see line 1021) Instead of printing an error message and then crashing we should return an error code and clean up. Also the NULL check is reversed so it prints an error for success instead of failure. Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP") Link: https://lore.kernel.org/linux-pci/68e0f6a4-fd57-45d0-945b-0876f2c8cb86@moroto.mountain Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2024-07-04PCI: endpoint: Make pci_epc_class struct constantGreg Kroah-Hartman
Now that the driver core allows for struct class to be in read-only memory, we should make all 'class' structures declared at build time placing them into read-only memory, instead of having to be dynamically allocated at runtime. Link: https://lore.kernel.org/linux-pci/2024061011-citable-herbicide-1095@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2024-07-04PCI: endpoint: Introduce 'epc_deinit' event and notify the EPF driversManivannan Sadhasivam
As like the 'epc_init' event, that is used to signal the EPF drivers about the EPC initialization, let's introduce 'epc_deinit' event that is used to signal EPC deinitialization. The EPC deinitialization applies only when any sort of fundamental reset is supported by the endpoint controller as per the PCIe spec. Reference: PCIe r6.0, sec 4.2.5.9.1 and 6.6.1. Currently, some EPC drivers like pcie-qcom-ep and pcie-tegra194 support PERST# as the fundamental reset. So the 'deinit' event will be notified to the EPF drivers when PERST# assert happens in the above mentioned EPC drivers. The EPF drivers, on receiving the event through the epc_deinit() callback should reset the EPF state machine and also cleanup any configuration that got affected by the fundamental reset like BAR, DMA etc... This change also warrants skipping the cleanups in unbind() if already done in epc_deinit(). Link: https://lore.kernel.org/r/20240606-pci-deinit-v1-2-4395534520dc@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2024-07-01PCI/DPC: Fix use-after-free on concurrent DPC and hot-removalLukas Wunner
Keith reports a use-after-free when a DPC event occurs concurrently to hot-removal of the same portion of the hierarchy: The dpc_handler() awaits readiness of the secondary bus below the Downstream Port where the DPC event occurred. To do so, it polls the config space of the first child device on the secondary bus. If that child device is concurrently removed, accesses to its struct pci_dev cause the kernel to oops. That's because pci_bridge_wait_for_secondary_bus() neglects to hold a reference on the child device. Before v6.3, the function was only called on resume from system sleep or on runtime resume. Holding a reference wasn't necessary back then because the pciehp IRQ thread could never run concurrently. (On resume from system sleep, IRQs are not enabled until after the resume_noirq phase. And runtime resume is always awaited before a PCI device is removed.) However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset"), which introduced that, failed to appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a reference on the child device because dpc_handler() and pciehp may indeed run concurrently. The commit was backported to v5.10+ stable kernels, so that's the oldest one affected. Add the missing reference acquisition. Abridged stack trace: BUG: unable to handle page fault for address: 00000000091400c0 CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0 RIP: pci_bus_read_config_dword+0x17/0x50 pci_dev_wait() pci_bridge_wait_for_secondary_bus() dpc_reset_link() pcie_do_recovery() dpc_handler() Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset") Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/ Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.de Reported-by: Keith Busch <kbusch@kernel.org> Tested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # v5.10+
2024-06-18PCI/DPC: Disable DPC service on suspendKai-Heng Feng
If the link is powered off during suspend, electrical noise may cause errors that trigger DPC. If the DPC interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable DPC triggering and the DPC interrupt during suspend to prevent this. Clear DPC interrupt status before re-enabling DPC interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-3-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: clear status on resume, add comments, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-18PCI/AER: Disable AER service on suspendKai-Heng Feng
If the link is powered off during suspend, electrical noise may cause errors that are logged via AER. If the AER interrupt is enabled and shares an IRQ with PME, that causes a spurious wakeup during suspend. Disable the AER interrupt during suspend to prevent this. Clear error status before re-enabling IRQ interrupts during resume so we don't get an interrupt for errors that occurred during the suspend/resume process. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Link: https://lore.kernel.org/r/20240416043225.1462548-2-kai.heng.feng@canonical.com Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> [bhelgaas: drop pci_ancestor_pr3_present() etc, commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-18PCI: acpiphp: Add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=arm64, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/pci/hotplug/acpiphp_ampere_altra.o Add the missing MODULE_DESCRIPTION(). Link: https://lore.kernel.org/r/20240612-md-drivers-pci-hotplug-v1-1-2b30d14d783d@quicinc.com Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2024-06-12PCI: Relax bridge window tail sizing rulesIlpo Järvinen
During remove & rescan cycle, PCI subsystem will recalculate and adjust the bridge window sizing that was initially done by "BIOS". The size calculation is based on the required alignment of the largest resource among the downstream resources as per pbus_size_mem() (unimportant or zero parameters marked with "..."): min_align = calculate_mem_align(aligns, max_order); size0 = calculate_memsize(size, ..., min_align); inside calculate_memsize(), for the largest alignment: min_align = align1 >> 1; ... return min_align; and then in calculate_memsize(): return ALIGN(max(size, ...), align); If the original bridge window sizing tried to conserve space, this will lead to massive increase of the required bridge window size when the downstream has a large disparity in BAR sizes. E.g., with 16MiB and 16GiB BARs this results in 24GiB bridge window size even if 16MiB BAR does not require gigabytes of space to fit. When doing remove & rescan for a bus that contains such a PCI device, a larger bridge window is suddenly required on rescan but when there is a bridge window upstream that is already assigned based on the original size, it cannot be enlarged to the new requirement. This causes the allocation of the bridge window to fail (0x600000000 > 0x400ffffff): pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] pci 0000:01:00.0: PCI bridge to [bus 02-04] pci 0000:01:00.0: bridge window [mem 0x40400000-0x406fffff] pci 0000:01:00.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] pci 0000:03:00.0: device released pci 0000:02:01.0: device released pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 0 pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 0 pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref] pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref] pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref] pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1 pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1 pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: can't assign; no space pci 0000:02:01.0: bridge window [mem size 0x600000000 64bit pref]: failed to assign pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space pci 0000:03:00.0: BAR 2 [mem size 0x400000000 64bit pref]: failed to assign pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space pci 0000:03:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] This is a major surprise for users who are suddenly left with a device that was working fine with the original bridge window sizing. Even if the already assigned bridge window could be enlarged by reallocation in some cases (something the current code does not attempt to do), it is not possible in general case and the large amount of wasted space at the tail of the bridge window may lead to other resource exhaustion problems on Root Complex level (think of multiple PCIe cards with VFs and BAR size disparity in a single system). PCI BARs only need natural alignment (PCIe r6.1, sec 7.5.1.2.1) and bridge memory windows need 1MiB (sec 7.5.1.3). The current bridge window tail alignment rule was introduced in the commit 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]") that only states: "pbus_size_mem: core stuff; tested with randomly generated sets of resources". It does not explain the motivation for the extra tail space allocated that is not truly needed by the downstream resources. As such, it is far from clear if it ever has been required by any HW. To prevent devices with BAR size disparity from becoming unusable after remove & rescan cycle, attempt to do a truly minimal allocation for memory resources if needed. First check if the normally calculated bridge window will not fit into an already assigned upstream resource. In such case, try with relaxed bridge window tail sizing rules instead where no extra tail space is requested beyond what the downstream resources require. Only enforce the alignment requirement of the bridge window itself (normally 1MiB). With this patch, the resources are successfully allocated: pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: scanning [bus 03-03] behind bridge, pass 1 pcieport 0000:01:00.0: scanning [bus 02-04] behind bridge, pass 1 pcieport 0000:01:00.0: Assigned bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 02-04] cannot fit 0x600000000 required for 0000:02:01.0 bridging to [bus 03] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] to [bus 03] requires relaxed alignment rules pcieport 0000:01:00.0: Assigned bridge window [mem 0x40400000-0x406fffff] to [bus 02-04] free space at [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref]: assigned pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff]: assigned pci 0000:03:00.0: BAR 2 [mem 0x6000000000-0x63ffffffff 64bit pref]: assigned pci 0000:03:00.0: BAR 0 [mem 0x6400000000-0x6400ffffff 64bit pref]: assigned pci 0000:03:00.0: ROM [mem 0x40400000-0x405fffff pref]: assigned pci 0000:02:01.0: PCI bridge to [bus 03] pci 0000:02:01.0: bridge window [mem 0x40400000-0x405fffff] pci 0000:02:01.0: bridge window [mem 0x6000000000-0x6400ffffff 64bit pref] This patch draws inspiration from the initial investigations and work by Mika Westerberg. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216795 Link: https://lore.kernel.org/linux-pci/20190812144144.2646-1-mika.westerberg@linux.intel.com/ Fixes: 5d0a8965aea9 ("[PATCH] 2.5.14: New PCI allocation code (alpha, arm, parisc) [2/2]") Link: https://lore.kernel.org/r/20240507102523.57320-9-ilpo.jarvinen@linux.intel.com Tested-by: Lidong Wang <lidong.wang@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2024-06-12PCI: Make minimum bridge window alignment reference more obviousIlpo Järvinen
Calculations related to bridge window size contain literal 20 that is the minimum alignment for a bridge window. Make the code more obvious by converting the literal 20 to __ffs(SZ_1M). Link: https://lore.kernel.org/r/20240507102523.57320-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash https://lore.kernel.org/r/20240612093250.17544-1-ilpo.jarvinen@linux.intel.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2024-06-04PCI: Warn on missing cfg_access_lock during secondary bus resetDan Williams
The recent adventure with adding lockdep tracking for cfg_access_lock, while it yielded many false positives [1], did catch a true positive in the pci_reset_bus() path [2]. So, while lockdep is difficult to deploy, open coding a check that cfg_access_lock is held during the reset is feasible. While this does not offer a full backtrace, it should be sufficient to implicate the caller of pci_bridge_secondary_bus_reset() as a path that needs investigation. Link: https://lore.kernel.org/r/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com Link: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html [1] Link: http://lore.kernel.org/r/cfb50601-5d2a-4676-a958-1bd3f1b06654@intel.com [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com>