summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-27Merge branch 'pci/controller/amd-mdb'Bjorn Helgaas
- Add DT binding and driver for AMD MDB (Multimedia DMA Bridge) (Thippeswamy Havalige) * pci/controller/amd-mdb: PCI: amd-mdb: Add AMD MDB Root Port driver dt-bindings: PCI: amd-mdb: Add AMD Versal2 MDB PCIe Root Port Bridge dt-bindings: PCI: dwc: Add AMD Versal2 MDB SLCR support
2025-03-27Merge branch 'pci/controller/altera'Bjorn Helgaas
- Add DT binding for Agilex family (P-Tile, F-Tile, R-Tile) (Matthew Gerlach) - Add driver support for Agilex family (P-Tile, F-Tile, R-Tile) (D M, Sharath Kumar) * pci/controller/altera: PCI: altera: Add Agilex support dt-bindings: PCI: altera: Add binding for Agilex
2025-03-27Merge branch 'pci/scoped-cleanup'Bjorn Helgaas
- Use for_each_available_child_of_node_scoped() to simplify apple, kirin, mediatek, mt7621, tegra drivers (Zhang Zekun) * pci/scoped-cleanup: PCI: tegra: Use helper function for_each_child_of_node_scoped() PCI: apple: Use helper function for_each_child_of_node_scoped() PCI: mt7621: Use helper function for_each_available_child_of_node_scoped() PCI: mediatek: Use helper function for_each_available_child_of_node_scoped() PCI: kirin: Tidy up _probe() related function with dev_err_probe() PCI: kirin: Use helper function for_each_available_child_of_node_scoped()
2025-03-27Merge branch 'pci/epf-mhi'Bjorn Helgaas
- Update SA8775P device ID (Mrinmay Sarkar) * pci/epf-mhi: PCI: epf-mhi: Update device ID for SA8775P
2025-03-27Merge branch 'pci/endpoint-test'Bjorn Helgaas
- Fix endpoint BAR testing so the test can skip disabled BARs instead of reporting them as failures (Niklas Cassel) - Verify that pci_endpoint interrupt tests set the correct IRQ type (Kunihiko Hayashi) - Fix interpretation of pci_endpoint_test_bars_read_bar() error returns (Niklas Cassel) - Fix potential string truncation in pci_endpoint_test_probe() (Niklas Cassel) - Increase endpoint test BAR size variable to accommodate BARs larger than INT_MAX (Niklas Cassel) - Release IRQs to avoid leak in pci_endpoint interrupt tests (Kunihiko Hayashi) - Log the correct IRQ type when pci_endpoint IRQ request test fails (Kunihiko Hayashi) - Remove pci_endpoint_test irq_type and no_msi globals; instead use test->irq_type (Kunihiko Hayashi) - Remove unnecessary use of managed IRQ functions in pci_endpoint_test (Kunihiko Hayashi) - Add and use IRQ_TYPE_* defines in pci_endpoint_test (Niklas Cassel) - Add struct pci_epc_features.intx_capable and note that RK3568 and RK3588 can't raise INTx interrupts (Niklas Cassel) - Expose supported IRQ types in CAPS so pci_endpoint_test can set appropriate type (Niklas Cassel) - Add PCITEST_IRQ_TYPE_AUTO to pci_endpoint_test for cases where the IRQ type doesn't matter (Niklas Cassel) * pci/endpoint-test: misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts PCI: endpoint: Add intx_capable to epc_features struct selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI header misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI header PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI header misc: pci_endpoint_test: Do not use managed IRQ functions misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi' misc: pci_endpoint_test: Fix 'irq_type' to convey the correct type misc: pci_endpoint_test: Fix displaying 'irq_type' after 'request_irq' error misc: pci_endpoint_test: Avoid issue of interrupts remaining after request_irq error misc: pci_endpoint_test: Handle BAR sizes larger than INT_MAX misc: pci_endpoint_test: Give disabled BARs a distinct error code misc: pci_endpoint_test: Fix potential truncation in pci_endpoint_test_probe() misc: pci_endpoint_test: Fix pci_endpoint_test_bars_read_bar() error handling selftests: pci_endpoint: Add GET_IRQTYPE checks to each interrupt test selftests: pci_endpoint: Skip disabled BARs
2025-03-27Merge branch 'pci/endpoint'Bjorn Helgaas
- Convert PCI device data so pci-epf-test works correctly on big-endian endpoint systems (Niklas Cassel) - Add BAR_RESIZABLE type to endpoint framework (Niklas Cassel) - Add pci_epc_bar_size_to_rebar_cap() to convert a size to the Resizable BAR Capability so endpoint drivers can configure what the Capability register advertises (Niklas Cassel) - Add DWC core support for EPF drivers to set BAR_RESIZABLE type and size via dw_pcie_ep_set_bar() (Niklas Cassel) - Describe TI AM65x (keystone) BARs 2 and 5 as Resizable, not Fixed (Niklas Cassel) - Reduce TI AM65x (keystone) BAR alignment requirement from 1MB to 64KB (Niklas Cassel) - Describe Rockchip rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas Cassel) - Drop unused devm_pci_epc_destroy() (Zijun Hu) - Fix pci-epf-test double free that causes an oops if the host reboots and PERST# deassertion restarts endpoint BAR allocation (Christian Bruel) - Drop dw_pcie_ep_find_ext_capability() and use dw_pcie_find_ext_capability() instead (Niklas Cassel) * pci/endpoint: PCI: dwc: ep: Remove superfluous function dw_pcie_ep_find_ext_capability() PCI: endpoint: pci-epf-test: Fix double free that causes kernel to oops PCI: endpoint: Remove unused devm_pci_epc_destroy() PCI: dw-rockchip: Describe Resizable BARs as Resizable BARs PCI: keystone: Specify correct alignment requirement PCI: keystone: Describe Resizable BARs as Resizable BARs PCI: dwc: ep: Allow EPF drivers to configure the size of Resizable BARs PCI: dwc: ep: Move dw_pcie_ep_find_ext_capability() PCI: endpoint: Add pci_epc_bar_size_to_rebar_cap() PCI: endpoint: Allow EPF drivers to configure the size of Resizable BARs PCI: endpoint: pci-epf-test: Handle endianness properly
2025-03-27Merge branch 'pci/dt-bindings'Bjorn Helgaas
- Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan) - Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer) - Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander Stein) - Drop deprecated layerscape 'num-ib-windows' and 'num-ob-windows' from example (Krzysztof Kozlowski) - Drop unnecessary layerscape 'status' from example (Krzysztof Kozlowski) - Add common pci-ep-bus.yaml schema for exporting several peripherals of a single PCI function via devicetree (Andrea della Porta) * pci/dt-bindings: dt-bindings: PCI: Add common schema for devices accessible through PCI BARs dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop unnecessary status from example dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop deprecated windows dt-bindings: PCI: fsl,imx6q-pcie: Add optional DMA interrupt dt-bindings: PCI: Convert fsl,mpc83xx-pcie to YAML dt-bindings: PCI: qcom: Document the IPQ5332 PCIe controller
2025-03-27Merge branch 'pci/devtree-create'Bjorn Helgaas
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they haven't been set already (Herve Codina) - Allow of_pci_set_address() to set the DT address property for root bus nodes, where there is no PCI bridge to supply the PCI bus/device/function part of the property (Herve Codina) - Create DT nodes for PCI host bridges to enable loading device tree overlays to create platform devices for PCI devices that have several features that require multiple drivers (Herve Codina) * pci/devtree-create: PCI: of: Create device tree PCI host bridge node PCI: of_property: Constify parameter in of_pci_get_addr_flags() PCI: of_property: Add support for NULL pdev in of_pci_set_address() PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device driver core: Introduce device_{add,remove}_of_node()
2025-03-27Merge branch 'pci/resource'Bjorn Helgaas
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo Järvinen) - Fix typo that repeatedly distributed resources to a bridge instead of iterating over subordinate bridges, which resulted in too little space to assign some BARs (Kai-Heng Feng) - Relax bridge window tail sizing for optional resources, e.g., IOV BARs, to avoid failures when removing and re-adding devices (Ilpo Järvinen) - Fix a double counting error for I/O resources, as we previously did for memory resources (Ilpo Järvinen) - Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen) - Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen) - Add pci_resource_num() to look up the BAR number from the resource pointer (Ilpo Järvinen) - Add restore_dev_resource() to simplify code that resources saved device resources (Ilpo Järvinen) - Allow drivers to enable devices even if we haven't assigned optional IOV resources to them (Ilpo Järvinen) - Improve debug output during resource reallocation (Ilpo Järvinen) - Rework handling of optional resources (IOV BARs, ROMs) to reduce failures if we can't allocate them (Ilpo Järvinen) - Move declarations of pci_rescan_bus_bridge_resize(), pci_reassign_bridge_resources(), and CardBus-related sizes from include/linux/pci.h to drivers/pci/pci.h since they're not used outside the PCI core (Ilpo Järvinen) - Make pci_setup_bridge() static (Ilpo Järvinen) - Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory) - Fix s390 mmio_read/write syscalls, which didn't cause page faults in some cases, which broke vfio-pci lazy mapping on first access (Niklas Schnelle) - Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was disabled only for s390 (Niklas Schnelle) - Support mmap of PCI resources on s390 except for ISM devices (Niklas Schnelle) * pci/resource: s390/pci: Support mmap() of PCI resources except for ISM devices s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP s390/pci: Fix s390_mmio_read/write syscall page fault handling PCI: Fix NULL dereference in SR-IOV VF creation error path PCI: Move cardbus IO size declarations into pci/pci.h PCI: Make pci_setup_bridge() static PCI: Move resource reassignment func declarations into pci/pci.h PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h PCI: Fix BAR resizing when VF BARs are assigned PCI: Do not claim to release resource falsely PCI: Increase Resizable BAR support from 512 GB to 128 TB PCI: Rework optional resource handling PCI: Perform reset_resource() and build fail list in sync PCI: Use res->parent to check if resource is assigned PCI: Add debug print when releasing resources before retry PCI: Indicate optional resource assignment failures PCI: Always have realloc_head in __assign_resources_sorted() PCI: Extend enable to check for any optional resource PCI: Add restore_dev_resource() PCI: Remove incorrect comment from pci_reassign_resource() PCI: Consolidate assignment loop next round preparation PCI: Rename retval to ret PCI: Use while loop and break instead of gotos PCI: Refactor pdev_sort_resources() & __dev_sort_resources() PCI: Converge return paths in __assign_resources_sorted() PCI: Add dev & res local variables to resource assignment funcs PCI: Add pci_resource_num() helper PCI: Check resource_size() separately PCI: Add pci_resource_is_iov() to identify IOV resources PCI: Use resource_set_{range,size}() helpers PCI: Use SZ_* instead of literals in setup-bus.c PCI: Fix old_size lower bound in calculate_iosize() too PCI: Allow relaxed bridge window tail sizing for optional resources PCI: Simplify size1 assignment logic PCI: Use min_align, not unrelated add_align, for size0 PCI: Remove add_align overwrite unrelated to size0 PCI: Use downstream bridges for distributing resources PCI: Cleanup dev->resource + resno to use pci_resource_n()
2025-03-27Merge branch 'pci/reset'Bjorn Helgaas
- Log debug messages about reset methods being used (Bjorn Helgaas) - Avoid reset when it has been disabled via sysfs (Nishanth Aravamudan) * pci/reset: PCI: Avoid reset when disabled via sysfs PCI: Log debug messages about reset method
2025-03-27Merge branch 'pci/pwrctrl'Bjorn Helgaas
- Create pwrctrl devices in pci_scan_device() to make it more symmetric with pci_pwrctrl_unregister() and make pwrctrl devices for PCI bridges possible (Manivannan Sadhasivam) - Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc. can still access devices after pci_stop_dev() (Manivannan Sadhasivam) - If there's a pwrctrl device for a PCI device, skip scanning it because the pwrctrl core will rescan the bus after the device is powered on (Manivannan Sadhasivam) - Add a pwrctrl driver for PCI slots based on voltage regulators described via devicetree (Manivannan Sadhasivam) * pci/pwrctrl: PCI/pwrctrl: Add pwrctrl driver for PCI slots dt-bindings: vendor-prefixes: Document the 'pciclass' prefix PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev() PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
2025-03-27Merge branch 'pci/pm'Bjorn Helgaas
- Allow PCI bridges to go to D3Hot on all non-x86 systems (Manivannan Sadhasivam) * pci/pm: PCI: Allow PCI bridges to go to D3Hot on all non-x86
2025-03-27Merge branch 'pci/hotplug'Bjorn Helgaas
- Drop shpchp module init/exit logging (Ilpo Järvinen) - Replace shpchp dbg() with ctrl_dbg() and remove unused dbg(), err(), info(), warn() wrappers (Ilpo Järvinen) - Drop 'shpchp_debug' module parameter in favor of standard dynamic debugging (Ilpo Järvinen) - Drop unused .get_power(), .set_power() function pointers (Guilherme Giacomo Simoes) - Drop superfluous pci_hotplug_slot_list (Lukas Wunner) - Drop superfluous try_module_get() calls (Lukas Wunner) - Drop superfluous NULL pointer checks (Lukas Wunner) - Pass struct hotplug_slot pointers directly to avoid backpointer dereferencing in has_*_file() (Lukas Wunner) - Inline pci_hp_{create,remove}_module_link() to reduce exported symbols (Lukas Wunner) - Disable hotplug interrupts in portdrv only when pciehp is not enabled to prevent issuing two hotplug commands too close together (Feng Tang) - Skip pciehp 'device replaced' check if the device has been removed to address a common deadlock when resuming after a device was removed during system sleep (Lukas Wunner) - Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo Järvinen) * pci/hotplug: PCI: pciehp: Don't enable HPIE when resuming in poll mode PCI: pciehp: Avoid unnecessary device replacement check PCI/portdrv: Only disable pciehp interrupts early when needed PCI: hotplug: Inline pci_hp_{create,remove}_module_link() PCI: hotplug: Avoid backpointer dereferencing in has_*_file() PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file() PCI: hotplug: Drop superfluous try_module_get() calls PCI: hotplug: Drop superfluous pci_hotplug_slot_list PCI: cpcihp: Remove unused .get_power() and .set_power() PCI: shpchp: Remove 'shpchp_debug' module parameter PCI: shpchp: Remove unused logging wrappers PCI: shpchp: Change dbg() -> ctrl_dbg() PCI: shpchp: Remove logging from module init/exit functions
2025-03-27Merge branch 'pci/enumeration'Bjorn Helgaas
- Enable Configuration RRS SV early instead of during child bus scanning (Bjorn Helgaas) - Cache offset of Resizable BAR capability to avoid redundant searches for it (Bjorn Helgaas) - Fix reference leaks in pci_register_host_bridge() and pci_alloc_child_bus() (Ma Ke) - Drop put_device() in pci_register_host_bridge() left over from converting device_register() to device_add() (Dan Carpenter) * pci/enumeration: PCI: Remove stray put_device() in pci_register_host_bridge() PCI: Fix reference leak in pci_alloc_child_bus() PCI: Fix reference leak in pci_register_host_bridge() PCI: Cache offset of Resizable BAR capability PCI: Enable Configuration RRS SV early
2025-03-27Merge branch 'pci/doe'Bjorn Helgaas
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair Francis) - Expose supported DOE features via sysfs (Alistair Francis) - Allow DOE support to be enabled even if CXL isn't enabled (Alistair Francis) * pci/doe: PCI/DOE: Allow enabling DOE without CXL PCI/DOE: Expose DOE features via sysfs PCI/DOE: Rename Discovery Response Data Object Contents to type PCI/DOE: Rename DOE protocol to feature
2025-03-27Merge branch 'pci/devres'Bjorn Helgaas
- Enlarge the devres table[] to accommodate bridge windows, ROM, IOV BARs, etc (Philipp Stanner) - Validate BAR index in devres interfaces (Philipp Stanner) * pci/devres: PCI: Check BAR index for validity PCI: Fix wrong length of devres array
2025-03-27Merge branch 'pci/bwctrl'Bjorn Helgaas
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the set_pcie_cooling_state.sh test case (Yi Lai) - Fix the pcie_bwctrl_select_speed() return value in cases where a non-compliant device doesn't advertise valid supported speeds (Ilpo Järvinen) - Avoid a NULL pointer dereference when we run out of bus numbers to assign for a bridge secondary bus (Lukas Wunner) * pci/bwctrl: PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
2025-03-27Merge branch 'pci/aspm'Bjorn Helgaas
- Delay pcie_link_state deallocation to avoid dangling pointers that cause invalid references during hot-unplug (Daniel Stodden) * pci/aspm: PCI/ASPM: Fix link state exit during switch upstream function removal
2025-03-27Merge branch 'pci/aer'Bjorn Helgaas
- Implement local aer_printk() since AER is the only place that prints a message with level depending on the error severity (Ilpo Järvinen) * pci/aer: PCI/ERR: Handle TLP Log in Flit mode PCI: Track Flit Mode Status & print it with link status PCI/AER: Descope pci_printk() to aer_printk()
2025-03-27Merge branch 'pci/acs'Bjorn Helgaas
- Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar Dave) * pci/acs: PCI/ACS: Fix 'pci=config_acs=' parameter
2025-03-26misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTONiklas Cassel
For PCITEST_MSI we really want to set PCITEST_SET_IRQTYPE explicitly to PCITEST_IRQ_TYPE_MSI, since we want to test if MSI works. For PCITEST_MSIX we really want to set PCITEST_SET_IRQTYPE explicitly to PCITEST_IRQ_TYPE_MSIX, since we want to test if MSI works. For PCITEST_LEGACY_IRQ we really want to set PCITEST_SET_IRQTYPE explicitly to PCITEST_IRQ_TYPE_INTX, since we want to test if INTx works. However, for PCITEST_WRITE, PCITEST_READ, PCITEST_COPY, we really don't care which IRQ type that is used, we just want to use a IRQ type that is supported by the EPC. The old behavior was to always use MSI for PCITEST_WRITE, PCITEST_READ, PCITEST_COPY, was to always set IRQ type to MSI before doing the actual test, however, there are EPC drivers that do not support MSI. Add a new PCITEST_IRQ_TYPE_AUTO, that will use the CAPS register to see which IRQ types the endpoint supports, and use one of the supported IRQ types. For backwards compatibility, if the endpoint does not expose any supported IRQ type in the CAPS register, simply fallback to using MSI, as it was unconditionally done before. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-16-cassel@kernel.org
2025-03-26PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS registerNiklas Cassel
Expose the supported IRQ types in the CAPS register. This way, the host side driver (drivers/misc/pci_endpoint_test.c) can know which IRQ types that the endpoint supports. The host side driver will make use of this information in a follow-up commit. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-15-cassel@kernel.org
2025-03-26PCI: dw-rockchip: Endpoint mode cannot raise INTx interruptsNiklas Cassel
Neither RK3568 or RK3588 supports INTx interrupts. Since epc_features is zero initialized, this is strictly not needed. However, setting intx_capable explicitly to false makes it more clear that neither RK3568 or RK3588 supports INTx interrupts. No functional change. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-14-cassel@kernel.org
2025-03-26PCI: endpoint: Add intx_capable to epc_features structNiklas Cassel
In struct pci_epc_features, an EPC driver can already specify if they support MSI (by setting msi_capable) and MSI-X (by setting msix_capable). Thus, for consistency, allow an EPC driver to specify if it supports INTx interrupts as well (by setting intx_capable). Since this struct is zero initialized, EPC drivers that want to claim INTx support will need to set intx_capable to true. Signed-off-by: Niklas Cassel <cassel@kernel.org> [kwilczynski: add missing kernel-doc for "intx_capable"] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-13-cassel@kernel.org
2025-03-24dt-bindings: PCI: Add common schema for devices accessible through PCI BARsAndrea della Porta
Common YAML schema for devices that exports internal peripherals through PCI BARs. The BARs are exposed as simple-buses through which the peripherals can be accessed. This is not intended to be used as a standalone binding, but should be included by device specific bindings. Signed-off-by: Andrea della Porta <andrea.porta@suse.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/096ab7addb39e498e28ac2526c07157cc9327c42.1742418429.git.andrea.porta@suse.com
2025-03-23PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustionLukas Wunner
When BIOS neglects to assign bus numbers to PCI bridges, the kernel attempts to correct that during PCI device enumeration. If it runs out of bus numbers, no pci_bus is allocated and the "subordinate" pointer in the bridge's pci_dev remains NULL. The PCIe bandwidth controller erroneously does not check for a NULL subordinate pointer and dereferences it on probe. Bandwidth control of unusable devices below the bridge is of questionable utility, so simply error out instead. This mirrors what PCIe hotplug does since commit 62e4492c3063 ("PCI: Prevent NULL dereference during pciehp probe"). The PCI core emits a message with KERN_INFO severity if it has run out of bus numbers. PCIe hotplug emits an additional message with KERN_ERR severity to inform the user that hotplug functionality is disabled at the bridge. A similar message for bandwidth control does not seem merited, given that its only purpose so far is to expose an up-to-date link speed in sysfs and throttle the link speed on certain laptops with limited Thermal Design Power. So error out silently. User-visible messages: pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring [...] pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74 pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them [...] pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring [...] BUG: kernel NULL pointer dereference RIP: pcie_update_link_speed pcie_bwnotif_enable pcie_bwnotif_probe pcie_port_probe_service really_probe Fixes: 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl> Cc: stable@vger.kernel.org # v6.13+ Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de
2025-03-23PCI: amd-mdb: Add AMD MDB Root Port driverThippeswamy Havalige
Add support for AMD MDB (Multimedia DMA Bridge) IP core as Root Port. The Versal2 devices include MDB Module. The integrated block for MDB along with the integrated bridge can function as PCIe Root Port controller at Gen5 32-GT/s operation per lane. Bridge supports error and INTx interrupts and are handled using platform specific interrupt line in Versal2. Signed-off-by: Thippeswamy Havalige <thippeswamy.havalige@amd.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250228093351.923615-4-thippeswamy.havalige@amd.com [bhelgaas: only present on ARM64-based SoCs; squash Kconfig dependency on ARM64 from Geert Uytterhoeven <geert+renesas@glider.be>: https://lore.kernel.org/r/eaef1dea7edcf146aa377d5e5c5c85a76ff56bae.1742306383.git.geert+renesas@glider.be] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> [kwilczynski: commit log, code comments and error messages clean-up, drop redundant "depends on PCI" from Kconfig, expose the error code as part of error messages where appropriatie, change "depends on" expression to match existing style from other drivers] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-21PCI/DOE: Allow enabling DOE without CXLAlistair Francis
PCIe devices (not CXL) can support DOE as well, so allow DOE to be enabled even if CXL isn't. Link: https://lore.kernel.org/r/20250306075211.1855177-4-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21PCI/DOE: Expose DOE features via sysfsAlistair Francis
PCIe r6.0 added support for Data Object Exchange (DOE). When DOE is supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec 6.30.1.1. DOE allows a requester to obtain information about the other DOE features supported by the device. The kernel already queries the DOE features supported and caches the values. Expose the values in sysfs to allow user space to determine which DOE features are supported by the PCIe device. By exposing the information to userspace, tools like lspci can relay the information to users. By listing all of the supported features we can allow userspace to parse the list, which might include vendor specific features as well as yet to be supported features. As the DOE Discovery feature must always be supported we treat it as a special named attribute case. This allows the usual PCI attribute_group handling to correctly create the doe_features directory when registering pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group() will seg fault). After this patch is supported you can see something like this when attaching a DOE device: $ ls /sys/devices/pci0000:00/0000:00:02.0//doe* 0001:01 0001:02 doe_discovery Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me Signed-off-by: Alistair Francis <alistair@alistair23.me> [bhelgaas: drop pci_doe_sysfs_init() stub return, make DEVICE_ATTR_RO(doe_discovery) static] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-03-21s390/pci: Support mmap() of PCI resources except for ISM devicesNiklas Schnelle
So far s390 does not allow mmap() of PCI resources to user-space via the usual mechanisms, though it does use it for RDMA. For the PCI sysfs resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars for all devices. This is partly because access to mapped PCI resources from user-space requires special PCI load/store memory-I/O (MIO) instructions, or the special MMIO syscalls when these are not available. Still, such access is possible and useful not just for RDMA, in fact not being able to mmap() PCI resources has previously caused extra work when testing devices. One thing that doesn't work with PCI resources mapped to user-space though is the s390 specific virtual ISM device. Not only because the BAR size of 256 TiB prevents mapping the whole BAR but also because access requires use of the legacy PCI instructions which are not accessible to user-space on systems with the newer MIO PCI instructions. Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping its resources while making this functionality available for all other PCI devices. To this end introduce a minimal implementation of PCI_QUIRKS and use that to set pdev->non_mappable_bars for ISM devices only. Then also set ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic implementation of pci_mmap_resource_range() enabling only the newer sysfs mmap() interface. This follows the recommendation in Documentation/PCI/sysfs-pci.rst. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAPNiklas Schnelle
The ability to map PCI resources to user-space is controlled by global defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and controls mapping of PCI resources using vfio-pci with a fallback option via the pread()/pwrite() interface. For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a generic implementation for mapping PCI resources plus the newer sysfs interface. Then there is HAVE_PCI_MMAP which can be used with custom definitions of pci_mmap_resource_range() and the historical /proc/bus/pci interface. Both mechanisms are all or nothing. For s390 mapping PCI resources is possible and useful for testing and certain applications such as QEMU's vfio-pci based user-space NVMe driver. For certain devices, however access to PCI resources via mappings to user-space is not possible and these must be excluded from the general PCI resource mapping mechanisms. Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can not be accessed via mappings to user-space. In the future this enables per-device restrictions of PCI resource mapping. For now, set this flag for all PCI devices on s390 in line with the existing, general disable of PCI resource mapping. As s390 is the only user of the VFI_PCI_MMAP Kconfig options this can already be replaced with a check of this new flag. Also add similar checks in the other code protected by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for enabling these for supported devices. Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/ Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21s390/pci: Fix s390_mmio_read/write syscall page fault handlingNiklas Schnelle
The s390 MMIO syscalls when using the classic PCI instructions do not cause a page fault when follow_pfnmap_start() fails due to the page not being present. Besides being a general deficiency this breaks vfio-pci's mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on first access. Fix this by following a failed follow_pfnmap_start() with fixup_user_page() and retrying the follow_pfnmap_start(). Also fix a VM_READ vs VM_WRITE mixup in the read syscall. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-1-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
2025-03-21PCI: Fix NULL dereference in SR-IOV VF creation error pathShay Drory
Clean up when virtfn setup fails to prevent NULL pointer dereference during device removal. The kernel oops below occurred due to incorrect error handling flow when pci_setup_device() fails. Add pci_iov_scan_device(), which handles virtfn allocation and setup and cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need to call pci_stop_and_remove_bus_device(). This prevents accessing partially initialized virtfn devices during removal. BUG: kernel NULL pointer dereference, address: 00000000000000d0 RIP: 0010:device_del+0x3d/0x3d0 Call Trace: pci_remove_bus_device+0x7c/0x100 pci_iov_add_virtfn+0xfa/0x200 sriov_enable+0x208/0x420 mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core] sriov_numvfs_store+0xae/0x1a0 Link: https://lore.kernel.org/r/20250310084524.599225-1-shayd@nvidia.com Fixes: e3f30d563a38 ("PCI: Make pci_destroy_dev() concurrent safe") Signed-off-by: Shay Drory <shayd@nvidia.com> [bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Keith Busch <kbusch@kernel.org>
2025-03-21PCI/bwctrl: Fix pcie_bwctrl_select_speed() return typeIlpo Järvinen
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply assign the fallback speed (2.5GT/s) into supported_speeds variable to share the normal return path that calls pcie_supported_speeds2target_speed() to calculate __fls(). This code path is not very likely to execute because pcie_get_supported_speeds() should provide valid ->supported_speeds but a spec violating device could fail to synthesize any speed in pcie_get_supported_speeds(). It could also happen in case the supported_speeds intersection is empty (also a violation of the current PCIe specs). Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com Fixes: de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21PCI: pciehp: Don't enable HPIE when resuming in poll modeIlpo Järvinen
PCIe hotplug can operate in poll mode without interrupt handlers using a polling kthread only. eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend") failed to consider that and enables HPIE (Hot-Plug Interrupt Enable) unconditionally when resuming the Port. Only set HPIE if non-poll mode is in use. This makes pcie_enable_interrupt() match how pcie_enable_notification() already handles HPIE. Link: https://lore.kernel.org/r/20250321162114.3939-1-ilpo.jarvinen@linux.intel.com Fixes: eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
2025-03-20PCI: Move cardbus IO size declarations into pci/pci.hIlpo Järvinen
For some reason, cardbus related io/mem size declarations are in linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h. Move all them into one place in pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Make pci_setup_bridge() staticIlpo Järvinen
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a static function. Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move resource reassignment func declarations into pci/pci.hIlpo Järvinen
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used outside of the PCI subsystem. They seem to be naturally static functions but since resource fitting/assignment is split between setup-bus.c and setup-res.c, they fall into different sides of the divide and need to be declared. Move the declarations of pci_reassign_bridge_resources() and pci_reassign_resource() into pci/pci.h to keep them internal to PCI subsystem. Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.hIlpo Järvinen
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem. The comment also falsely advertises it to be for hotplug drivers, yet the only caller is from sysfs store function. Move the function declaration into pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Fix BAR resizing when VF BARs are assignedIlpo Järvinen
__resource_resize_store() attempts to release all resources of the device before attempting the resize. The loop, however, only covers standard BARs (< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned, pci_reassign_bridge_resources() finds the bridge window still has some assigned child resources and returns -NOENT which makes pci_resize_resource() to detect an error and abort the resize. Change the release loop to cover all resources up to VF BARs which allows the resize operation to release the bridge windows and attempt to assigned them again with the different size. If SR-IOV is enabled, disallow resize as it requires releasing also IOV resources. Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com Fixes: 91fa127794ac ("PCI: Expose PCIe Resizable BAR support via sysfs") Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2025-03-20PCI: Allow PCI bridges to go to D3Hot on all non-x86Manivannan Sadhasivam
Currently, pci_bridge_d3_possible() encodes a variety of decision factors when deciding whether a given bridge can be put into D3. A particular one of note is for "recent enough PCIe ports." Per Rafael [0]: "There were hardware issues related to PM on x86 platforms predating the introduction of Connected Standby in Windows. For instance, programming a port into D3hot by writing to its PMCSR might cause the PCIe link behind it to go down and the only way to revive it was to power cycle the Root Complex. And similar." Thus, this function contains a DMI-based check for post-2015 BIOS. The above factors (Windows, x86) don't really apply to non-x86 systems, and also, many such systems don't have BIOS or DMI. However, we'd like to be able to suspend bridges on non-x86 systems too. Restrict the "recent enough" check to x86. If we find further incompatibilities, it probably makes sense to expand on the deny-list approach (i.e., bridge_d3_blacklist or similar). Link: https://lore.kernel.org/r/20250320110604.v6.1.Id0a0e78ab0421b6bce51c4b0b87e6aebdfc69ec7@changeid Link: https://lore.kernel.org/linux-pci/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ [0] Link: https://lore.kernel.org/linux-pci/20240227225442.GA249898@bhelgaas/ [1] Link: https://lore.kernel.org/linux-pci/20240828210705.GA37859@bhelgaas/ [2] [Brian: rewrite to !X86 based on Rafael's suggestions] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-14selftests: pci_endpoint: Use IRQ_TYPE_* defines from UAPI headerNiklas Cassel
In order to improve readability, use the IRQ_TYPE_* defines from the UAPI header rather than using raw values. No functional change. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-12-cassel@kernel.org
2025-03-14misc: pci_endpoint_test: Use IRQ_TYPE_* defines from UAPI headerNiklas Cassel
Use the IRQ_TYPE_* defines from the UAPI header rather than duplicating these defines in the driver itself. No functional change. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-11-cassel@kernel.org
2025-03-14PCI: endpoint: pcitest: Add IRQ_TYPE_* defines to UAPI headerNiklas Cassel
These IRQ_TYPE_* defines are used by both drivers/misc/pci_endpoint_test.c and tools/testing/selftests/pci_endpoint/pci_endpoint_test.c. Considering that both the misc driver and the selftest already includes the pcitest.h UAPI header, it makes sense for these IRQ_TYPE_* defines to be located in the pcitest.h UAPI header. Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Link: https://lore.kernel.org/r/20250310111016.859445-10-cassel@kernel.org
2025-03-14misc: pci_endpoint_test: Do not use managed IRQ functionsKunihiko Hayashi
The pci_endpoint_test_request_irq() and pci_endpoint_test_release_irq() are called repeatedly by the users through pci_endpoint_test_set_irq(). So using the managed version of IRQ functions within these functions has no effect. Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250225110252.28866-7-hayashi.kunihiko@socionext.com Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2025-03-14misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'Kunihiko Hayashi
The global variable "irq_type" preserves the current value of ioctl(GET_IRQTYPE). However, all tests that use interrupts first call ioctl(SET_IRQTYPE) to set "test->irq_type", then write the value of test->irq_type into the register pointed by test_reg_bar, and request the interrupt to the endpoint. The endpoint function driver, pci-epf-test, refers to the register, and determine which type of interrupt to raise. The global variable "irq_type" is never used in the actual test, so remove the variable and replace it with "test->irq_type". Also, for the same reason, the variable "no_msi" can be removed. Initially, "test->irq_type" has IRQ_TYPE_UNDEFINED, and the ioctl(GET_IRQTYPE) before calling ioctl(SET_IRQTYPE) will return an error. Suggested-by: Niklas Cassel <cassel@kernel.org> Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250225110252.28866-6-hayashi.kunihiko@socionext.com
2025-03-14misc: pci_endpoint_test: Fix 'irq_type' to convey the correct typeKunihiko Hayashi
There are two variables that indicate the interrupt type to be used in the next test execution, "irq_type" as global and "test->irq_type". The global is referenced from pci_endpoint_test_get_irq() to preserve the current type for ioctl(PCITEST_GET_IRQTYPE). The type set in this function isn't reflected in the global "irq_type", so ioctl(PCITEST_GET_IRQTYPE) returns the previous type. As a result, the wrong type is displayed in old version of "pcitest" as follows: - Result of running "pcitest -i 0" SET IRQ TYPE TO LEGACY: OKAY - Result of running "pcitest -I" GET IRQ TYPE: MSI Whereas running the new version of "pcitest" in kselftest results in an error as follows: # RUN pci_ep_basic.LEGACY_IRQ_TEST ... # pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Expected 0 (0) == ret (1) # pci_endpoint_test.c:104:LEGACY_IRQ_TEST:Can't get Legacy IRQ type Fix this issue by propagating the current type to the global "irq_type". Fixes: b2ba9225e031 ("misc: pci_endpoint_test: Avoid using module parameter to determine irqtype") Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250225110252.28866-5-hayashi.kunihiko@socionext.com
2025-03-14PCI: Check BAR index for validityPhilipp Stanner
Many functions in PCI use accessor macros such as pci_resource_len(), which take a BAR index. That index, however, is never checked for validity, potentially resulting in undefined behavior by overflowing the array pci_dev.resource in the macro pci_resource_n(). Since many users of those macros directly assign the accessed value to an unsigned integer, the macros cannot be changed easily anymore to return -EINVAL for invalid indexes. Consequently, the problem has to be mitigated in higher layers. Add pci_bar_index_valid(). Use it where appropriate. Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org Closes: https://lore.kernel.org/all/adb53b1f-29e1-3d14-0e61-351fd2d3ff0d@linux.intel.com/ Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com> Signed-off-by: Philipp Stanner <phasta@kernel.org> [kwilczynski: correct if-statement condition the pci_bar_index_is_valid() helper function uses, tidy up code comments] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> [bhelgaas: fix typo] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-14PCI: pciehp: Avoid unnecessary device replacement checkLukas Wunner
Hot-removal of nested PCI hotplug ports suffers from a long-standing race condition which can lead to a deadlock: A parent hotplug port acquires pci_lock_rescan_remove(), then waits for pciehp to unbind from a child hotplug port. Meanwhile that child hotplug port tries to acquire pci_lock_rescan_remove() as well in order to remove its own children. The deadlock only occurs if the parent acquires pci_lock_rescan_remove() first, not if the child happens to acquire it first. Several workarounds to avoid the issue have been proposed and discarded over the years, e.g.: https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@wunner.de/ A proper fix is being worked on, but needs more time as it is nontrivial and necessarily intrusive. Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") provokes more frequent occurrence of the deadlock when removing more than one Thunderbolt device during system sleep. The commit sought to detect device replacement, but also triggered on device removal. Differentiating reliably between replacement and removal is impossible because pci_get_dsn() returns 0 both if the device was removed, as well as if it was replaced with one lacking a Device Serial Number. Avoid the more frequent occurrence of the deadlock by checking whether the hotplug port itself was hot-removed. If so, there's no sense in checking whether its child device was replaced. This works because the ->resume_noirq() callback is invoked in top-down order for the entire hierarchy: A parent hotplug port detecting device replacement (or removal) marks all children as removed using pci_dev_set_disconnected() and a child hotplug port can then reliably detect being removed. Link: https://lore.kernel.org/r/02f166e24c87d6cde4085865cce9adfdfd969688.1741674172.git.lukas@wunner.de Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") Reported-by: Kenneth Crudup <kenny@panix.com> Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@panix.com/ Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@canonical.com/ Tested-by: Kenneth Crudup <kenny@panix.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: stable@vger.kernel.org # v6.11+
2025-03-14PCI: Fix wrong length of devres arrayPhilipp Stanner
The array for the iomapping cookie addresses has a length of PCI_STD_NUM_BARS. This constant, however, only describes standard BARs; while PCI can allow for additional, special BARs. The total number of PCI resources is described by constant PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars(). Thus, the devres array has so far been too small. Change the length of the devres array to PCI_NUM_RESOURCES. Link: https://lore.kernel.org/r/20250312080634.13731-3-phasta@kernel.org Fixes: bbaff68bf4a4 ("PCI: Add managed partial-BAR request and map infrastructure") Signed-off-by: Philipp Stanner <phasta@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org # v6.11+