summaryrefslogtreecommitdiff
path: root/drivers/pci/hotplug
AgeCommit message (Collapse)Author
2018-09-18PCI: hotplug: Embed hotplug_slotLukas Wunner
When the PCI hotplug core and its first user, cpqphp, were introduced in February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot struct for its internal use plus a hotplug_slot struct to be registered with the hotplug core and linked the two with pointers: https://git.kernel.org/tglx/history/c/a8a2069f432c Nowadays, the predominant pattern in the tree is to embed ("subclass") such structures in one another and cast to the containing struct with container_of(). But it wasn't until July 2002 that container_of() was introduced with historic commit ec4f214232cf: https://git.kernel.org/tglx/history/c/ec4f214232cf pnv_php, introduced in 2016, did the right thing and embedded struct hotplug_slot in its internal struct pnv_php_slot, but all other drivers cargo-culted cpqphp's design and linked separate structs with pointers. Embedding structs is preferrable to linking them with pointers because it requires fewer allocations, thereby reducing overhead and simplifying error paths. Casting an embedded struct to the containing struct becomes a cheap subtraction rather than a dereference. And having fewer pointers reduces the risk of them pointing nowhere either accidentally or due to an attack. Convert all drivers to embed struct hotplug_slot in their internal slot struct. The "private" pointer in struct hotplug_slot thereby becomes unused, so drop it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa* Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390* Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86 Cc: Len Brown <lenb@kernel.org> Cc: Scott Murray <scott@spiteful.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Corentin Chary <corentin.chary@gmail.com> Cc: Darren Hart <dvhart@infradead.org>
2018-09-18PCI: hotplug: Drop hotplug_slot_infoLukas Wunner
Ever since the PCI hotplug core was introduced in 2002, drivers had to allocate and register a struct hotplug_slot_info for every slot: https://git.kernel.org/tglx/history/c/a8a2069f432c Apparently the idea was that drivers furnish the hotplug core with an up-to-date card presence status, power status, latch status and attention indicator status as well as notify the hotplug core of changes thereof. However only 4 out of 12 hotplug drivers bother to notify the hotplug core with pci_hp_change_slot_info() and the hotplug core never made any use of the information: There is just a single macro in pci_hotplug_core.c, GET_STATUS(), which uses the hotplug_slot_info if the driver lacks the corresponding callback in hotplug_slot_ops. The macro is called when the user reads the attribute via sysfs. Now, if the callback isn't defined, the attribute isn't exposed in sysfs in the first place (see e.g. has_power_file()). There are only two situations when the hotplug_slot_info would actually be accessed: * If the driver defines ->enable_slot or ->disable_slot but not ->get_power_status. * If the driver defines ->set_attention_status but not ->get_attention_status. There is no driver doing the former and just a single driver doing the latter, namely pnv_php.c. Amend it with a ->get_attention_status callback. With that, the hotplug_slot_info becomes completely unused by the PCI hotplug core. But a few drivers use it internally as a cache: cpcihp uses it to cache the latch_status and adapter_status. cpqhp uses it to cache the adapter_status. pnv_php and rpaphp use it to cache the attention_status. shpchp uses it to cache all four values. Amend these drivers to cache the information in their private slot struct. shpchp's slot struct already contains members to cache the power_status and adapter_status, so additional members are only needed for the other two values. In the case of cpqphp, the cached value is only accessed in a single place, so instead of caching it, read the current value from the hardware. Caution: acpiphp, cpci, cpqhp, shpchp, asus-wmi and eeepc-laptop populate the hotplug_slot_info with initial values on probe. That code is herewith removed. There is a theoretical chance that the code has side effects without which the driver fails to function, e.g. if the ACPI method to read the adapter status needs to be executed at least once on probe. That seems unlikely to me, still maintainers should review the changes carefully for this possibility. Rafael adds: "I'm not aware of any case in which it will break anything, [...] but if that happens, it may be necessary to add the execution of the control methods in question directly to the initialization part." Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa* Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390* Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86 Cc: Len Brown <lenb@kernel.org> Cc: Scott Murray <scott@spiteful.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Corentin Chary <corentin.chary@gmail.com> Cc: Darren Hart <dvhart@infradead.org>
2018-09-18PCI: hotplug: Constify hotplug_slot_opsLukas Wunner
Hotplug drivers cannot declare their hotplug_slot_ops const, making them attractive targets for attackers, because upon registration of a hotplug slot, __pci_hp_initialize() writes to the "owner" and "mod_name" members in that struct. Fix by moving these members to struct hotplug_slot and constify every driver's hotplug_slot_ops except for pciehp. pciehp constructs its hotplug_slot_ops at runtime based on the PCIe port's capabilities, hence cannot declare them const. It can be converted to __write_rarely once that's mainlined: http://www.openwall.com/lists/kernel-hardening/2016/11/16/3 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa* Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86 Cc: Len Brown <lenb@kernel.org> Cc: Scott Murray <scott@spiteful.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Corentin Chary <corentin.chary@gmail.com> Cc: Darren Hart <dvhart@infradead.org>
2018-09-18PCI: pciehp: Reshuffle controller struct for clarityLukas Wunner
The members in pciehp's controller struct are arranged in a seemingly arbitrary order and have grown to an amount that I no longer consider easily graspable by contributors. Sort the members into 5 rubrics: * Slot Capabilities register and quirks * Slot Control register access * Slot Status register event handling * state machine * hotplug core interface Obviously, this is just my personal bikeshed color and if anyone has a better idea, please come forward. Any ordering will do as long as the information is presented in a manageable manner. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-09-18PCI: pciehp: Rename controller struct members for clarityLukas Wunner
Of the members which were just moved from pciehp's slot struct to the controller struct, rename "lock" to "state_lock" and rename "work" to "button_work" for clarity. Perform the rename separately to the unification of the two structs per Sinan's request. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Sinan Kaya <okaya@kernel.org>
2018-09-18PCI: pciehp: Unify controller and slot structsLukas Wunner
pciehp was originally introduced together with shpchp in a single commit, c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug drivers"): https://git.kernel.org/tglx/history/c/c16b4b14d980 shpchp supports up to 31 slots per controller, hence uses separate slot and controller structs. pciehp has a 1:1 relationship between slot and controller and therefore never required this separation. Nevertheless, because much of the code had been copy-pasted between the two drivers, pciehp likewise uses separate structs to this very day. The artificial separation of data structures adds unnecessary complexity and bloat to pciehp and requires constantly chasing pointers at runtime. Simplify the driver by merging struct slot into struct controller. Merge the slot constructor pcie_init_slot() and the destructor pcie_cleanup_slot() into the controller counterparts. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-09-18PCI: pciehp: Tolerate Presence Detect hardwired to zeroLukas Wunner
The WiGig Bus Extension (WBE) specification allows tunneling PCIe over IEEE 802.11. A product implementing this spec is the wil6210 from Wilocity (now part of Qualcomm Atheros). It integrates a PCIe switch with a wireless network adapter: 00.0-+ [1ae9:0101] Upstream Port +-00.0-+ [1ae9:0200] Downstream Port | +-00.0 [168c:0034] Atheros AR9462 Wireless Network Adapter +-02.0 [1ae9:0201] Downstream Port +-03.0 [1ae9:0201] Downstream Port Wirelessly attached devices presumably appear below the hotplug ports with device ID [1ae9:0201]. Oddly, the Downstream Port [1ae9:0200] leading to the wireless network adapter is likewise Hotplug Capable, but has its Presence Detect State bit hardwired to zero. Even if the Link Active bit is set, Presence Detect is zero, so this cannot be caused by in-band presence detection but only by broken hardware. pciehp assumes an empty slot if Presence Detect State is zero, regardless of Link Active being one. Consequently, up until v4.18 it removes the wireless network adapter in pciehp_resume(). From v4.19 it already does so in pciehp_probe(). Be lenient towards broken hardware and assume the slot is occupied if Link Active is set: Introduce pciehp_card_present_or_link_active() and use it in lieu of pciehp_get_adapter_status() everywhere, except in pciehp_handle_presence_or_link_change() whose log messages depend on which of Presence Detect State or Link Active is set. Remove the Presence Detect State check from __pciehp_enable_slot() because it is only called if either of Presence Detect State or Link Active is set. Caution: There is a possibility that broken hardware exists which has working Presence Detect but hardwires Link Active to one. On such hardware the slot will now incorrectly be considered always occupied. If such hardware is discovered, this commit can be rolled back and a quirk can be added which sets is_hotplug_bridge = 0 for [1ae9:0200]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200839 Reported-and-tested-by: David Yang <mmyangfl@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Ashok Raj <ashok.raj@intel.com>
2018-09-17PCI: pciehp: Drop hotplug_slot_ops wrappersLukas Wunner
pciehp's ->enable_slot, ->disable_slot, ->get_attention_status and ->reset_slot callbacks are currently implemented by wrapper functions that do nothing else but call down to a backend function. The backends are not called from anywhere else, so drop the wrappers and use the backends directly as callbacks, thereby shaving off a few lines of unnecessary code. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-09-17PCI: pciehp: Drop unnecessary includesLukas Wunner
Drop the following includes from pciehp source files which no longer use any of the included symbols: * <linux/sched/signal.h> in pciehp.h <linux/signal.h> in pciehp_hpc.c Added by commit de25968cc87c ("fix more missing includes") to accommodate for a call to signal_pending(). The call was removed by commit 262303fe329a ("pciehp: fix wait command completion"). * <linux/interrupt.h> in pciehp_core.c Added by historic commit f308a2dfbe63 ("PCI: add PCI Express Port Bus Driver subsystem") to accommodate for a call to free_irq(): https://git.kernel.org/tglx/history/c/f308a2dfbe63 The call was removed by commit 407f452b05f9 ("pciehp: remove unnecessary free_irq"). * <linux/time.h> in pciehp_core.c and pciehp_hpc.c Added by commit 34d03419f03b ("PCIEHP: Add Electro Mechanical Interlock (EMI) support to the PCIE hotplug driver."), which was reverted by commit bd3d99c17039 ("PCI: Remove untested Electromechanical Interlock (EMI) support in pciehp."). * <linux/module.h> in pciehp_ctrl.c, pciehp_hpc.c and pciehp_pci.c Added by historic commit c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug drivers"): https://git.kernel.org/tglx/history/c/c16b4b14d980 Module-related symbols were neither used back then in those files, nor are they used today. * <linux/slab.h> in pciehp_ctrl.c Added by commit 5a0e3ad6af86 ("include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h") to accommodate for calls to kmalloc(). The calls were removed by commit 0e94916e6091 ("PCI: pciehp: Handle events synchronously"). * "../pci.h" in pciehp_ctrl.c Added by historic commit 67f4660b72f2 ("PCI: ASPM patch for") to accommodate for usage of the global variable pcie_mch_quirk: https://git.kernel.org/tglx/history/c/67f4660b72f2 The global variable was removed by commit 0ba379ec0fb1 ("PCI: Simplify hotplug mch quirk"). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-09-17PCI: pciehp: Differentiate between surprise and safe removalLukas Wunner
When removing PCI devices below a hotplug bridge, pciehp marks them as disconnected if the card is no longer present in the slot or it quiesces them if the card is still present (by disabling INTx interrupts, bus mastering and SERR# reporting). To detect whether the card is still present, pciehp checks the Presence Detect State bit in the Slot Status register. The problem with this approach is that even if the card is present, the link to it may be down, and it that case it would be better to mark the devices as disconnected instead of trying to quiesce them. Moreover, if the card in the slot was quickly replaced by another one, the Presence Detect State bit would be set, yet trying to quiesce the new card's devices would be wrong and the correct thing to do is to mark the previous card's devices as disconnected. Instead of looking at the Presence Detect State bit, it is better to differentiate whether the card was surprise removed versus safely removed (via sysfs or an Attention Button press). On surprise removal, the devices should be marked as disconnected, whereas on safe removal it is correct to quiesce the devices. The knowledge whether a surprise removal or a safe removal is at hand does exist further up in the call stack: A surprise removal is initiated by pciehp_handle_presence_or_link_change(), a safe removal by pciehp_handle_disable_request(). Pass that information down to pciehp_unconfigure_device() and use it in lieu of the Presence Detect State bit. While there, add kernel-doc to pciehp_unconfigure_device() and pciehp_configure_device(). Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Keith Busch <keith.busch@intel.com>
2018-09-17PCI: Simplify disconnected markingLukas Wunner
Commit 89ee9f768003 ("PCI: Add device disconnected state") iterates over the devices on a parent bus, marks each as disconnected, then marks each device's children as disconnected using pci_walk_bus(). The same can be achieved more succinctly by calling pci_walk_bus() on the parent bus. Moreover, this does not need to wait until acquiring pci_lock_rescan_remove(), so move it out of that critical section. The critical section in err.c contains a pci_dev_get() / pci_dev_put() pair which was apparently copy-pasted from pciehp_pci.c. In the latter it serves the purpose of holding the struct pci_dev in place until the Command register is updated. err.c doesn't do anything like that, hence the pair is unnecessary. Remove it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Oza Pawandeep <poza@codeaurora.org> Cc: Sinan Kaya <okaya@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2018-09-11PCI: pciehp: Fix hot-add vs powerfault detection orderKeith Busch
If both hot-add and power fault were observed in a single interrupt, we handled the hot-add first, then the power fault, in this path: pciehp_ist if (events & (PDC | DLLSC)) pciehp_handle_presence_or_link_change case OFF_STATE: pciehp_enable_slot __pciehp_enable_slot board_added pciehp_power_on_slot ctrl->power_fault_detected = 0 pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) pciehp_green_led_on(p_slot) # power LED on pciehp_set_attention_status(p_slot, 0) # attention LED off if ((events & PFD) && !ctrl->power_fault_detected) ctrl->power_fault_detected = 1 pciehp_set_attention_status(1) # attention LED on pciehp_green_led_off(slot) # power LED off This left the attention indicator on (even though the hot-add succeeded) and the power indicator off (even though the slot power was on). Fix this by checking for power faults before checking for new devices. Prior to 0e94916e6091, this was successful because everything was chained through work queues and the order was: INT_PRESENCE_ON -> INT_POWER_FAULT -> ENABLE_REQ The ENABLE_REQ cleared the power fault at the end, but now everything is handled inline with the interrupt thread, such that the work ENABLE_REQ was doing happens before power fault handling now. Fixes: 0e94916e6091 ("PCI: pciehp: Handle events synchronously") Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
2018-08-15Merge branch 'pci/virtualization'Bjorn Helgaas
- To avoid bus errors, enable PASID only if entire path supports End-End TLP prefixes (Sinan Kaya) - Unify slot and bus reset functions and remove hotplug knowledge from callers (Sinan Kaya) - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to fix guest reboot issues (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller (Bjorn Helgaas) * pci/virtualization: PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 PCI: Delay after FLR of Intel DC P3700 NVMe PCI: Disable Samsung SM961/PM961 NVMe before FLR PCI: Export pcie_has_flr() PCI: Rename pci_try_reset_bus() to pci_reset_bus() PCI: Deprecate pci_reset_bus() and pci_reset_slot() functions PCI: Unify try slot and bus reset API PCI: Hide pci_reset_bridge_secondary_bus() from drivers IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset PCI: Handle error return from pci_reset_bridge_secondary_bus() PCI/IOV: Tidy pci_sriov_set_totalvfs() PCI: Enable PASID only if entire path supports End-End TLP prefixes # Conflicts: # drivers/pci/hotplug/pciehp_hpc.c
2018-08-15Merge branch 'pci/misc'Bjorn Helgaas
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough (Gustavo A. R. Silva) - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig) - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied (Heiner Kallweit) - Unify PCI and DMA direction #defines (Shunyong Yang) - Add PCI_DEVICE_DATA() macro (Andy Shevchenko) - Check for VPD completion before checking for timeout (Bert Kenward) - Limit Netronome NFP5000 config space size to work around erratum (Jakub Kicinski) * pci/misc: PCI: Limit config space size for Netronome NFP5000 PCI/VPD: Check for VPD access completion before checking for timeout PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry PCI: Unify PCI and normal DMA direction definitions PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough # Conflicts: # drivers/pci/hotplug/pciehp_ctrl.c
2018-08-15Merge branch 'pci/hotplug'Bjorn Helgaas
- Simplify SHPC existence/permission checks (Bjorn Helgaas) - Remove hotplug sample skeleton driver (Lukas Wunner) - Convert pciehp to threaded IRQ handling (Lukas Wunner) - Improve pciehp tolerance of missed events and initially unstable links (Lukas Wunner) - Clear spurious pciehp events on resume (Lukas Wunner) - Add pciehp runtime PM support, including for Thunderbolt controllers (Lukas Wunner) - Support interrupts from pciehp bridges in D3hot (Lukas Wunner) * pci/hotplug: PCI: pciehp: Deduplicate presence check on probe & resume PCI: pciehp: Avoid implicit fallthroughs in switch statements PCI: Whitelist Thunderbolt ports for runtime D3 PCI: Whitelist native hotplug ports for runtime D3 PCI: sysfs: Resume to D0 on function reset PCI: pciehp: Resume parent to D0 on config space access PCI: pciehp: Resume to D0 on enable/disable PCI: pciehp: Support interrupts sent from D3hot PCI: pciehp: Obey compulsory command delay after resume PCI: pciehp: Clear spurious events earlier on resume PCI: portdrv: Deduplicate PM callback iterator PCI: pciehp: Avoid slot access during reset PCI: pciehp: Always enable occupied slot on probe PCI: pciehp: Become resilient to missed events PCI: pciehp: Tolerate initially unstable link PCI: pciehp: Declare pciehp_enable/disable_slot() static PCI: pciehp: Drop enable/disable lock PCI: pciehp: Enable/disable exclusively from IRQ thread PCI: pciehp: Track enable/disable status PCI: pciehp: Publish to user space last on probe PCI: hotplug: Demidlayer registration with the core PCI: pciehp: Drop slot workqueue PCI: pciehp: Handle events synchronously PCI: pciehp: Stop blinking on slot enable failure PCI: pciehp: Convert to threaded polling PCI: pciehp: Convert to threaded IRQ PCI: pciehp: Document struct slot and struct controller PCI: pciehp: Declare pciehp_unconfigure_device() void PCI: pciehp: Drop unnecessary NULL pointer check PCI: pciehp: Fix unprotected list iteration in IRQ handler PCI: pciehp: Fix use-after-free on unplug PCI: hotplug: Don't leak pci_slot on registration failure PCI: hotplug: Delete skeleton driver PCI: shpchp: Separate existence of SHPC and permission to use it
2018-07-31PCI: pciehp: Deduplicate presence check on probe & resumeLukas Wunner
On driver probe and on resume from system sleep, pciehp checks the Presence Detect State bit in the Slot Status register to bring up an occupied slot or bring down an unoccupied slot. Both code paths are identical, so deduplicate them per Mika's request. On probe, an additional check is performed to disable power of an unoccupied slot. This can e.g. happen if power was enabled by BIOS. It cannot happen once pciehp has taken control, hence is not necessary on resume: The Slot Control register is set to the same value that it had on suspend by pci_restore_state(), so if the slot was occupied, power is enabled and if it wasn't, power is disabled. Should occupancy have changed during the system sleep transition, power is adjusted by bringing up or down the slot per the paragraph above. To allow for deduplication of the presence check, move the power check to pcie_init(). This seems safer anyway, because right now it is performed while interrupts are already enabled, and although I can't think of a scenario where pciehp_power_off_slot() and the IRQ thread collide, it does feel brittle. However this means that pcie_init() may now write to the Slot Control register before the IRQ is requested. If both the CCIE and HPIE bits happen to be set, pcie_wait_cmd() will wait for an interrupt (instead of polling the Command Completed bit) and eventually emit a timeout message. Additionally, if a level-triggered INTx interrupt is used, the user may see a spurious interrupt splat. Avoid by disabling interrupts before disabling power. (Normally the HPIE and CCIE bits should be clear on probe, but conceivably they may already have been set e.g. by BIOS.) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-31PCI: pciehp: Avoid implicit fallthroughs in switch statementsLukas Wunner
Per Mika's request, add an explicit break to the last case of switch statements everywhere in pciehp to be more defensive towards future amendments. Per Gustavo's request, mark all non-empty implicit fallthroughs with a comment to silence warnings triggered by -Wimplicit-fallthrough=2. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2018-07-31PCI: Fix is_added/is_busmaster race conditionHari Vyas
When a PCI device is detected, pdev->is_added is set to 1 and proc and sysfs entries are created. When the device is removed, pdev->is_added is checked for one and then device is detached with clearing of proc and sys entries and at end, pdev->is_added is set to 0. is_added and is_busmaster are bit fields in pci_dev structure sharing same memory location. A strange issue was observed with multiple removal and rescan of a PCIe NVMe device using sysfs commands where is_added flag was observed as zero instead of one while removing device and proc,sys entries are not cleared. This causes issue in later device addition with warning message "proc_dir_entry" already registered. Debugging revealed a race condition between the PCI core setting the is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue setting the is_busmaster bit in pci_set_master(). As these fields are not handled atomically, that clears the is_added bit. Move the is_added bit to a separate private flag variable and use atomic functions to set and retrieve the device addition state. This avoids the race because is_added no longer shares a memory location with is_busmaster. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283 Signed-off-by: Hari Vyas <hari.vyas@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31PCI: pciehp: Resume parent to D0 on config space accessLukas Wunner
Ensure accessibility of a hotplug port's config space when accessed via sysfs by resuming its parent to D0. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31PCI: pciehp: Resume to D0 on enable/disableLukas Wunner
pciehp's IRQ thread ensures accessibility of the port by runtime resuming its parent to D0. However when the slot is enabled/disabled, the port itself needs to be in D0 because its secondary bus is accessed in: pciehp_check_link_status(), pciehp_configure_device() (both called from board_added()) and pciehp_unconfigure_device() (called from remove_board()). Thus, acquire a runtime PM ref on enable/disablement of the slot. Yinghai Lu additionally discovered that some SkyLake servers feature a Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8) which requires the port to be in D0 when invoking pciehp_power_on_slot() (likewise called from board_added()). If slot power is turned on while in D3hot, link training later fails: https://lkml.kernel.org/r/20170205073454.GA253@wunner.de The spec is silent about such a requirement, but it seems prudent to assume that any hotplug port with a Power Controller may need this. The present commit holds a runtime PM ref whenever slot power is turned on and off, but it doesn't keep the port in D0 as long as slot power is on. If vendors determine that's necessary, they need to amend pciehp to acquire a runtime PM ref in pciehp_power_on_slot() and release one in pciehp_power_off_slot(). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31PCI: pciehp: Support interrupts sent from D3hotLukas Wunner
If a hotplug port is able to send an interrupt, one would naively assume that it is accessible at that moment. After all, if it wouldn't be accessible, i.e. if its parent is in D3hot and the link to the hotplug port is thus down, how should an interrupt come through? It turns out that assumption is wrong at least for Thunderbolt: Even though its parents are in D3hot, a Thunderbolt hotplug port is able to signal interrupts. Because the port's config space is inaccessible and resuming the parents may sleep, the hard IRQ handler has to defer runtime resuming the parents and reading the Slot Status register to the IRQ thread. If the hotplug port uses a level-triggered INTx interrupt, it needs to be masked until the IRQ thread has cleared the signaled events. For simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts. Note that if the interrupt is shared (which can only happen for INTx), other devices are starved from receiving interrupts until the IRQ thread is scheduled, has runtime resumed the hotplug port's parents and has read and cleared the Slot Status register. That delay is dominated by the 10 ms D3hot->D0 transition time of each parent port. The worst case is a Thunderbolt downstream port at the end of a daisy chain: There may be up to six Thunderbolt controllers in-between it and the root port, each comprising an upstream and downstream port, plus its own upstream port. That's 13 x 10 = 130 ms. Possible mitigations are polling the interrupt while it's disabled or reducing the d3_delay of Thunderbolt ports if possible. Open code masking of the interrupt instead of requesting it with the IRQF_ONESHOT flag to minimize the period during which it is masked. (IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.) PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the associated form factor specification, a hotplug capable Downstream Port must support generation of a wakeup event (using the PME mechanism) on hotplug events that occur when the system is in a sleep state or the Port is in device state D1, D2, or D3Hot." This would seem to imply that PME needs to be enabled on the hotplug port when it is runtime suspended. pci_enable_wake() currently doesn't enable PME on bridges, it may be necessary to add an exemption for hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the PME_Status bit is not set when an interrupt occurs while the hotplug port is in D3hot, even if PME is enabled. (I've tested this on a Mac and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in negotiate_os_control(), modifying it to 1 didn't change the behavior.) (Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event interrupts (when both are implemented) always share the same MSI or MSI-X vector". That would only seem to apply to Root Ports, however the section never mentions Root Ports, only Downstream Ports. This is explained in the definition of "Downstream Port" in the "Terms and Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that are not the Upstream Port are Downstream Ports. All Ports on a Root Complex are Downstream Ports.") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-31PCI: pciehp: Obey compulsory command delay after resumeLukas Wunner
Upon resume from system sleep, the Slot Control register is written via: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() PCIe r4.0, sec 6.7.3.2 says that after "issuing a write transaction that targets any portion of the Port's Slot Control register, [...] software must wait for [the] command to complete before issuing the next command". pciehp currently fails to enforce that rule after the above-mentioned write. Fix it. (Moving restoration of the Slot Control register to pciehp doesn't seem to make sense because the other PCIe hotplug drivers may need it as well.) Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-31PCI: pciehp: Clear spurious events earlier on resumeLukas Wunner
Thunderbolt hotplug ports that were occupied before system sleep resume with their downstream link in "off" state. Only after the Thunderbolt controller has reestablished the PCIe tunnels does the link go up. As a result, a spurious Presence Detect Changed and/or Data Link Layer State Changed event occurs. The events are not immediately acted upon because tunnel reestablishment happens in the ->resume_noirq phase, when interrupts are still disabled. Also, notification of events may initially be disabled in the Slot Control register when coming out of system sleep and is reenabled in the ->resume_noirq phase through: pci_pm_resume_noirq() pci_pm_default_resume_early() pci_restore_state() pci_restore_pcie_state() It is not guaranteed that the events are acted upon at all: PCIe r4.0, sec 6.7.3.4 says that "a port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." Note the "optionally". If an MSI is sent, pciehp will gratuitously turn the slot off and back on once the ->resume_early phase has commenced. If an MSI is not sent, the extant, unacknowledged events in the Slot Status register will prevent future notification of presence or link changes. Commit 13c65840feab ("PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume") fixed the latter by clearing the events in the ->resume phase. Move this to the ->resume_noirq phase to also fix the gratuitous disable/enablement of the slot. The commit further restored the Slot Control register in the ->resume phase, but that's dispensable because as shown above it's already been done in the ->resume_noirq phase. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-31PCI: pciehp: Avoid slot access during resetLukas Wunner
The ->reset_slot callback introduced by commits: 2e35afaefe64 ("PCI: pciehp: Add reset_slot() method") and 06a8d89af551 ("PCI: pciehp: Disable link notification across slot reset") disables notification of Presence Detect Changed and Data Link Layer State Changed events for the duration of a secondary bus reset. However a bus reset not only triggers these events, but may also clear the Presence Detect State bit in the Slot Status register and the Data Link Layer Link Active bit in the Link Status register momentarily. According to Sinan Kaya: "I know for a fact that bus reset clears the Data Link Layer Active bit as soon as link goes down. It gets set again following link up. Presence detect depends on the HW implementation. QDT root ports don't change presence detect for instance since nobody actually removed the card. If an implementation supports in-band presence detect, the answer is yes. As soon as the link goes down, presence detect bit will get cleared until recovery." https://lkml.kernel.org/r/42e72f83-3b24-f7ef-e5bc-290fae99259a@codeaurora.org In-band presence detect is also covered in Table 4-15 in PCIe r4.0, sec 4.2.6. pciehp should therefore ensure that any parts of the driver that access those bits do not run concurrently to a bus reset. The only precaution the commits took to that effect was to halt interrupt polling. They made no effort to drain the slot workqueue, cancel an outstanding Attention Button work, or block slot enable/disable requests via sysfs and in the ->probe hook. Now that pciehp is converted to enable/disable the slot exclusively from the IRQ thread, the only places accessing the two above-mentioned bits are the IRQ thread and the ->probe hook. Add locking to serialize them with a bus reset. This obviates the need to halt interrupt polling. Do not add locking to the ->get_adapter_status sysfs callback to afford users unfettered access to that bit. Use an rw_semaphore in lieu of a regular mutex to allow parallel execution of the non-reset code paths accessing the critical bits, i.e. the IRQ thread and the ->probe hook. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rajat Jain <rajatja@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Sinan Kaya <okaya@kernel.org>
2018-07-23PCI: pciehp: Always enable occupied slot on probeLukas Wunner
Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when there are hot-plug events that occur while interrupt generation is disabled, and interrupt generation is subsequently enabled." On probe, we currently clear all event bits in the Slot Status register with the notable exception of the Presence Detect Changed bit. Thereby we seek to receive an interrupt for an already occupied slot once event notification is enabled. But because the interrupt is optional, users may have to specify the pciehp_force parameter on the command line, which is inconvenient. Moreover, now that pciehp's event handling has become resilient to missed events, a Presence Detect Changed interrupt for a slot which is powered on is interpreted as removal of the card. If the slot has already been brought up by the BIOS, receiving such an interrupt on probe causes the slot to be powered off and immediately back on, which is likewise undesirable. Avoid both issues by making the behavior of pciehp_force the default and clearing the Presence Detect Changed bit on probe. Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC ("Force pciehp, even if OSHP is missing") seems nonsensical because the OSHP control method is only relevant for SHCP slots according to the PCI Firmware specification r3.0, sec 4.8. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-23PCI: pciehp: Become resilient to missed eventsLukas Wunner
A hotplug port's Slot Status register does not count how often each type of event occurred, it only records the fact *that* an event has occurred. Previously pciehp queued a work item for each event. But if it missed an event, e.g. removal of a card in-between two back-to-back insertions, it queued up the wrong work item or no work item at all. Commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") sought to improve the situation by shrinking the window during which events may be missed. But Stefan Roese reports unbalanced Card present and Link Up events, suggesting that we're still missing events if they occur very rapidly. Bjorn Helgaas responds that he considers pciehp's event handling "baroque" and calls for its simplification and rationalization: https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com It gets worse once a hotplug port is runtime suspended: The port can signal an interrupt while it and its parents are in D3hot, i.e. while it is inaccessible. By the time we've runtime resumed all parents to D0 and read the port's Slot Status register, we may have missed an arbitrary number of events. Event handling therefore needs to be reworked to become resilient to missed events. Assume that a Presence Detect Changed event has occurred. Consider the following truth table: - Slot is in OFF_STATE and is currently empty. => Do nothing. (The event is trailing a Link Down or we've missed an insertion and subsequent removal.) - Slot is in OFF_STATE and is currently occupied. => Turn the slot on. - Slot is in ON_STATE and is currently empty. => Turn the slot off. - Slot is in ON_STATE and is currently occupied. => Turn the slot off, (Be cautious and assume the card in then back on. the slot isn't the same as before.) This leads to the following simple algorithm: 1 If the slot is in ON_STATE, turn it off unconditionally. 2 If the slot is currently occupied, turn it on. Because those actions are now carried out synchronously, rather than by scheduled work items, pciehp reacts to the *current* situation and missed events no longer matter. Data Link Layer State Changed events can be handled identically to Presence Detect Changed events. Note that in the above truth table, a Link Up trailing a Card present event didn't have to be accounted for: It is filtered out by pciehp_check_link_status(). As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says: "Once the Power Indicator begins blinking, a 5-second abort interval exists during which a second depression of the Attention Button cancels the operation." In other words, the user can only expect the system to react to a button press after it starts blinking. Missed button presses that occur in-between are irrelevant. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefan Roese <sr@denx.de> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-23PCI: pciehp: Tolerate initially unstable linkLukas Wunner
When a device is hotplugged, Presence Detect and Link Up events often do not occur simultaneously, but with a lag of a few milliseconds. Only the first event received is relevant, the other one can be disregarded. Moreover, Stefan Roese reports that on certain platforms, Link State and Presence Detect may flap for up to 100 ms before stabilizing, suggesting that such events should be disregarded for at least this long: https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de On slot enablement, pciehp_check_link_status() waits for 100 ms per PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor register for up to 1 second. If this succeeds, the link is definitely up, so ignore any Presence Detect or Link State events that occurred up to this point. pciehp_check_link_status() then checks the Link Training bit in the Link Status register. This is the final opportunity to detect inaccessibility of the device and abort slot enablement. Any link or presence change that occurs afterwards will cause the slot to be disabled again immediately after attempting to enable it. The astute reviewer may appreciate that achieving this behavior would be more complicated had pciehp not just been converted to enable/disable the slot exclusively from the IRQ thread: When the slot is enabled via sysfs, each link or presence flap would otherwise cause the IRQ thread to run and it would have to sense that those events are belonging to a concurrent slot enablement operation and disregard them. It would be much more difficult than this mere 3 line change. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefan Roese <sr@denx.de>
2018-07-23PCI: pciehp: Declare pciehp_enable/disable_slot() staticLukas Wunner
No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c remain, so declare the functions static. For now this requires forward declarations. Those can be eliminated by reshuffling functions once the ongoing effort to refactor the driver has settled. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Drop enable/disable lockLukas Wunner
Previously slot enablement and disablement could happen concurrently. But now it's under the exclusive control of the IRQ thread, rendering the locking obsolete. Drop it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Enable/disable exclusively from IRQ threadLukas Wunner
Besides the IRQ thread, there are several other places in the driver which enable or disable the slot: - pciehp_probe() enables the slot if it's occupied and the pciehp_force module parameter is used. - pciehp_resume() enables or disables the slot after system sleep. - pciehp_queue_pushbutton_work() enables or disables the slot after the 5 second delay following an Attention Button press. - pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or disable the slot on sysfs write. This requires locking and complicates pciehp's state machine. A simplification can be achieved by enabling and disabling the slot exclusively from the IRQ thread. Amend the functions listed above to request slot enable/disablement from the IRQ thread by either synthesizing a Presence Detect Changed event or, in the case of a disable user request (via sysfs or an Attention Button press), submitting a newly introduced force disable request. The latter is needed because the slot shall be forced off despite being occupied. For this force disable request, avoid colliding with Slot Status register bits by using a bit number greater than 16. For synchronous execution of requests (on sysfs write), wait for the request to finish and retrieve the result. There can only ever be one sysfs write in flight due to the locking in kernfs_fop_write(), hence there is no risk of returning the result of a different sysfs request to user space. The POWERON_STATE and POWEROFF_STATE is now no longer entered by the above-listed functions, but solely by the IRQ thread when it begins a power transition. Afterwards, it moves to STATIC_STATE. The same applies to canceling the Attention Button work, it likewise becomes an IRQ thread only operation. An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is never observed by the IRQ thread itself, only by functions called in a different context, such as pciehp_sysfs_enable_slot(). So remove handling of these states from pciehp_handle_button_press() and pciehp_handle_link_change() which are exclusively called from the IRQ thread. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Track enable/disable statusLukas Wunner
handle_button_press_event() currently determines whether the slot has been turned on or off by looking at the Power Controller Control bit in the Slot Control register. This assumes that an attention button implies presence of a power controller even though that's not mandated by the spec. Moreover the Power Controller Control bit is unreliable when a power fault occurs (PCIe r4.0, sec 6.7.1.8). This issue has existed since the driver was introduced in 2004. Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking whether the slot has been turned on or off. This is also a required ingredient to make pciehp resilient to missed events, which is the object of an upcoming commit. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Publish to user space last on probeLukas Wunner
The PCI hotplug core has just been refactored to separate slot initialization for in-kernel use from publication to user space. Take advantage of it in pciehp by publishing to user space last on probe. This will allow enable/disablement of the slot exclusively from the IRQ thread because the IRQ is requested after initialization for in-kernel use (thereby getting its unique name needed by the IRQ thread) but before user space is able to submit enable/disable requests. On teardown, the order is the same in reverse: The user space interface is removed prior to freeing the IRQ and destroying the slot. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: hotplug: Demidlayer registration with the coreLukas Wunner
When a hotplug driver calls pci_hp_register(), all steps necessary for registration are carried out in one go, including creation of a kobject and addition to sysfs. That's a problem for pciehp once it's converted to enable/disable the slot exclusively from the IRQ thread: The thread needs to be spawned after creation of the kobject (because it uses the kobject's name), but before addition to sysfs (because it will handle enable/disable requests submitted via sysfs). pci_hp_deregister() does offer a ->release callback that's invoked after deletion from sysfs and before destruction of the kobject. But because pci_hp_register() doesn't offer a counterpart, hotplug drivers' ->probe and ->remove code becomes asymmetric, which is error prone as recently discovered use-after-free bugs in pciehp's ->remove hook have shown. In a sense, this appears to be a case of the midlayer antipattern: "The core thesis of the "midlayer mistake" is that midlayers are bad and should not exist. That common functionality which it is so tempting to put in a midlayer should instead be provided as library routines which can [be] used, augmented, or ignored by each bottom level driver independently. Thus every subsystem that supports multiple implementations (or drivers) should provide a very thin top layer which calls directly into the bottom layer drivers, and a rich library of support code that eases the implementation of those drivers. This library is available to, but not forced upon, those drivers." -- Neil Brown (2009), https://lwn.net/Articles/336262/ The presence of midlayer traits in the PCI hotplug core might be ascribed to its age: When it was introduced in February 2002, the blessings of a library approach might not have been well known: https://git.kernel.org/tglx/history/c/a8a2069f432c For comparison, the driver core does offer split functions for creating a kobject (device_initialize()) and addition to sysfs (device_add()) as an alternative to carrying out everything at once (device_register()). This was introduced in October 2002: https://git.kernel.org/tglx/history/c/8b290eb19962 The odd ->release callback in the PCI hotplug core was added in 2003: https://git.kernel.org/tglx/history/c/69f8d663b595 Clearly, a library approach would not force every hotplug driver to implement a ->release callback, but rather allow the driver to remove the sysfs files, release its data structures and finally destroy the kobject. Alternatively, a driver may choose to remove everything with pci_hp_deregister(), then release its data structures. To this end, offer drivers pci_hp_initialize() and pci_hp_add() as a split-up version of pci_hp_register(). Likewise, offer pci_hp_del() and pci_hp_destroy() as a split-up version of pci_hp_deregister(). Eliminate the ->release callback and move its code into each driver's teardown routine. Declare pci_hp_deregister() void, in keeping with the usual kernel pattern that enablement can fail, but disablement cannot. It only returned an error if the caller passed in a NULL pointer or a slot which has never or is no longer registered or is sharing its name with another slot. Those would be bugs, so WARN about them. Few hotplug drivers actually checked the return value and those that did only printed a useless error message to dmesg. Remove that. For most drivers the conversion was straightforward since it doesn't matter whether the code in the ->release callback is executed before or after destruction of the kobject. But in the case of ibmphp, it was unclear to me whether setting slot_cur->ctrl and slot_cur->bus_on to NULL needs to happen before the kobject is destroyed, so I erred on the side of caution and ensured that the order stays the same. Another nontrivial case is pnv_php, I've found the list and kref logic difficult to understand, however my impression was that it is safe to delete the list element and drop the references until after the kobject is destroyed. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86 Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Scott Murray <scott@spiteful.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Corentin Chary <corentin.chary@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Shevchenko <andy@infradead.org>
2018-07-23PCI: pciehp: Drop slot workqueueLukas Wunner
Previously the slot workqueue was used to handle events and enable or disable the slot. That's no longer the case as those tasks are done synchronously in the IRQ thread. The slot workqueue is thus merely used to handle a button press after the 5 second delay and only one such work item may be in flight at any given time. A separate workqueue isn't necessary for this simple task, so use the system workqueue instead. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Handle events synchronouslyLukas Wunner
Up until now, pciehp's IRQ handler schedules a work item for each event, which in turn schedules a work item to enable or disable the slot. This double indirection was necessary because sleeping wasn't allowed in the IRQ handler. However it is now that pciehp has been converted to threaded IRQ handling and polling, so handle events synchronously in pciehp_ist() and remove the work item infrastructure (with the exception of work items to handle a button press after the 5 second delay). For link or presence change events, move the register read to determine the current link or presence state behind acquisition of the slot lock to prevent it from becoming stale while the lock is contended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Stop blinking on slot enable failureLukas Wunner
If the attention button is pressed to power on the slot AND the user powers on the slot via sysfs before 5 seconds have elapsed AND powering on the slot fails because either the slot is unoccupied OR the latch is open, we neglect turning off the green LED so it keeps on blinking. That's because the error path of pciehp_sysfs_enable_slot() doesn't call pciehp_green_led_off(), unlike pciehp_power_thread() which does. The bug has been present since 2004 when the driver was introduced. Fix by deduplicating common code in pciehp_sysfs_enable_slot() and pciehp_power_thread() into a wrapper function pciehp_enable_slot() and renaming the existing function to __pciehp_enable_slot(). Same for pciehp_disable_slot(). This will also simplify the upcoming rework of pciehp's event handling. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Convert to threaded pollingLukas Wunner
We've just converted pciehp to threaded IRQ handling, but still cannot sleep in pciehp_ist() because the function is also called in poll mode, which runs in softirq context (from a timer). Convert poll mode to a kthread so that pciehp_ist() always runs in task context. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de>
2018-07-23PCI: pciehp: Convert to threaded IRQLukas Wunner
pciehp's IRQ handler queues up a work item for each event signaled by the hardware. A more modern alternative is to let a long running kthread service the events. The IRQ handler's sole job is then to check whether the IRQ originated from the device in question, acknowledge its receipt to the hardware to quiesce the interrupt and wake up the kthread. One benefit is reduced latency to handle the IRQ, which is a necessity for realtime environments. Another benefit is that we can make pciehp simpler and more robust by handling events synchronously in process context, rather than asynchronously by queueing up work items. pciehp's usage of work items is a historic artifact, it predates the introduction of threaded IRQ handlers by two years. (The former was introduced in 2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt handler support").) Convert pciehp to threaded IRQ handling by retrieving the pending events in pciehp_isr(), saving them for later consumption by the thread handler pciehp_ist() and clearing them in the Slot Status register. By clearing the Slot Status (and thereby acknowledging the events) in pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which would have the unpleasant side effect of starving devices sharing the IRQ until pciehp_ist() has finished. pciehp_isr() does not count how many times each event occurred, but merely records the fact *that* an event occurred. If the same event occurs a second time before pciehp_ist() is woken, that second event will not be recorded separately, which is problematic according to commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") because we may miss removal of a card in-between two back-to-back insertions. We're about to make pciehp_ist() resilient to missed events. The present commit regresses the driver's behavior temporarily in order to separate the changes into reviewable chunks. This doesn't affect regular slow-motion hotplug, only plug-unplug-plug operations that happen in a timespan shorter than wakeup of the IRQ thread. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-23PCI: pciehp: Document struct slot and struct controllerLukas Wunner
Document the driver's data structures to lower the barrier to entry for contributors. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Declare pciehp_unconfigure_device() voidLukas Wunner
Since commit 0f4bd8014db5 ("PCI: hotplug: Drop checking of PCI_BRIDGE_ CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no longer fail, so declare it and its sole caller remove_board() void, in keeping with the usual kernel pattern that enablement can fail, but disablement cannot. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-23PCI: pciehp: Drop unnecessary NULL pointer checkLukas Wunner
pciehp_disable_slot() checks if the ctrl attribute of the slot is NULL and bails out if so. However the function is not called prior to the attribute being set in pcie_init_slot(), and pcie_init_slot() is not called if ctrl is NULL. So the check is unnecessary. Drop it. It has been present ever since the driver was introduced in 2004, but it was already unnecessary back then: https://git.kernel.org/tglx/history/c/c16b4b14d980 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-23PCI: pciehp: Fix unprotected list iteration in IRQ handlerLukas Wunner
Commit b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") iterates over the devices on a hotplug port's subordinate bus in pciehp's IRQ handler without acquiring pci_bus_sem. It is thus possible for a user to cause a crash by concurrently manipulating the device list, e.g. by disabling slot power via sysfs on a different CPU or by initiating a remove/rescan via sysfs. This can't be fixed by acquiring pci_bus_sem because it may sleep. The simplest fix is to avoid the list iteration altogether and just check the ignore_hotplug flag on the port itself. This works because pci_ignore_hotplug() sets the flag both on the device as well as on its parent bridge. We do lose the ability to print the name of the device blocking hotplug in the debug message, but that's probably bearable. Fixes: b440bde74f04 ("PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org
2018-07-23PCI: pciehp: Fix use-after-free on unplugLukas Wunner
When pciehp is unbound (e.g. on unplug of a Thunderbolt device), the hotplug_slot struct is deregistered and thus freed before freeing the IRQ. The IRQ handler and the work items it schedules print the slot name referenced from the freed structure in various informational and debug log messages, each time resulting in a quadruple dereference of freed pointers (hotplug_slot -> pci_slot -> kobject -> name). At best the slot name is logged as "(null)", at worst kernel memory is exposed in logs or the driver crashes: pciehp 0000:10:00.0:pcie204: Slot((null)): Card not present An attacker may provoke the bug by unplugging multiple devices on a Thunderbolt daisy chain at once. Unplugging can also be simulated by powering down slots via sysfs. The bug is particularly easy to trigger in poll mode. It has been present since the driver's introduction in 2004: https://git.kernel.org/tglx/history/c/c16b4b14d980 Fix by rearranging teardown such that the IRQ is freed first. Run the work items queued by the IRQ handler to completion before freeing the hotplug_slot struct by draining the work queue from the ->release_slot callback which is invoked by pci_hp_deregister(). Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v2.6.4
2018-07-23PCI: hotplug: Don't leak pci_slot on registration failureLukas Wunner
If addition of sysfs files fails on registration of a hotplug slot, the struct pci_slot as well as the entry in the slot_list is leaked. The issue has been present since the hotplug core was introduced in 2002: https://git.kernel.org/tglx/history/c/a8a2069f432c Perhaps the idea was that even though sysfs addition fails, the slot should still be usable. But that's not how drivers use the interface, they abort probe if a non-zero value is returned. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v2.4.15+ Cc: Greg Kroah-Hartman <greg@kroah.com>
2018-07-23PCI: hotplug: Delete skeleton driverLukas Wunner
Ten years ago, commit 58319b802a61 ("PCI: Hotplug core: remove 'name'") dropped the name element from struct hotplug_slot but neglected to update the skeleton driver. That same year, commit f46753c5e354 ("PCI: introduce pci_slot") raised the number of arguments to pci_hp_register() from one to four. Fourteen years ago, historic commit 7ab60fc1b8e7 ("PCI Hotplug skeleton: final cleanups") removed all usages of the retval variable from pcihp_skel_init() but not the variable itself, provoking a compiler warning: https://git.kernel.org/tglx/history/c/7ab60fc1b8e7 It seems fair to assume the driver hasn't been used as a template for a new driver in a while. Per Bjorn's and Christoph's preference, delete it. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Christoph Hellwig <hch@lst.de>
2018-07-19PCI: Hide pci_reset_bridge_secondary_bus() from driversSinan Kaya
Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset() and move the declaration from linux/pci.h to drivers/pci.h to be used internally in PCI directory only. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-19PCI: Handle error return from pci_reset_bridge_secondary_bus()Sinan Kaya
Commit 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()") added a return value to the function to return if a device is accessible following a reset. Callers are not checking the value. Pass error code up high in the stack if device is not accessible. Fixes: 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()") Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-12PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthroughGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Warning level 2 was used: -Wimplicit-fallthrough=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-06-26PCI: shpchp: Separate existence of SHPC and permission to use itBjorn Helgaas
The shpchp driver registers for all PCI bridge devices. Its probe method should fail if either (1) the bridge doesn't have an SHPC or (2) the OS isn't allowed to use it (the platform firmware may be operating the SHPC itself). Separate these two tests into: - A new shpc_capable() that looks for the SHPC hardware and is applicable on all systems (ACPI and non-ACPI), and - A simplified acpi_get_hp_hw_control_from_firmware() that we call only when we already know an SHPC exists and there may be ACPI methods to either request permission to use it (_OSC) or transfer control to the OS (OSHP). acpi_get_hp_hw_control_from_firmware() is implemented when CONFIG_ACPI=y, but does nothing if the current platform doesn't support ACPI. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-06-26PCI: shpchp: Manage SHPC unconditionally on non-ACPI systemsBjorn Helgaas
An SHPC can be operated either by platform firmware or by the OS. The OS uses a host bridge ACPI _OSC method to negotiate for control of SHPC. If firmware wants to prevent an OS from operating an SHPC, it must supply an _OSC method that declines to grant SHPC ownership to the OS. If acpi_pci_find_root() returns NULL, it means there's no ACPI host bridge device (PNP0A03 or PNP0A08) and hence no _OSC method, so the OS is always allowed to manage the SHPC. Fix a NULL pointer dereference when CONFIG_ACPI=y but the current hardware/firmware platform doesn't support ACPI. In that case, acpi_get_hp_hw_control_from_firmware() is implemented but acpi_pci_find_root() returns NULL. Fixes: 90cc0c3cc709 ("PCI: shpchp: Add shpchp_is_native()") Link: https://lkml.kernel.org/r/20180621164715.28160-1-marc.zyngier@arm.com Reported-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>