summaryrefslogtreecommitdiff
path: root/drivers/cxl/core/core.h
AgeCommit message (Collapse)Author
2025-05-23Merge branch 'for-6.16/cxl-features-ras' into cxl-for-nextDave Jiang
Add CXL RAS Features support. Features include "patrol scrub control", "error check scrub", "perform maintenance", and "memory sparing". This support connects the RAS Featurs to EDAC.
2025-05-23cxl: Update prototype of function get_support_feature_info()Shiju Jose
Add following changes to function get_support_feature_info() 1. Make generic to share between cxl-fwctl and cxl-edac paths. 2. Rename get_support_feature_info() to cxl_feature_info() 3. Change parameter const struct fwctl_rpc_cxl *rpc_in to const uuid_t *uuid. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-3-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-05-09cxl/hdm: Clean up a debug printkDan Carpenter
Smatch complains that %pa is for phys_addr_t types and "size" is a u64. drivers/cxl/core/hdm.c:521 cxl_dpa_alloc() error: '%pa' expects argument of type 'phys_addr_t*', argument 4 has type 'ullong* Looking at this, to me it seems more useful to print the sizes as decimal instead of hex. Let's do that. [dj: Adjusted based on latest code changes. ] Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/3d3d969d-651d-4e9d-a892-900876a60ab5@moroto.mountain Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-04-09cxl/core: Fix caching dport GPF DVSEC issueLi Ming
Per Table 8-2 in CXL r3.2 section 8.1.1 and CXL r3.2 section 8.1.6, only CXL Downstream switch ports and CXL root ports have GPF DVSEC for CXL Port(DVSEC ID 04h). CXL subsystem has a gpf_dvsec in struct cxl_port which is used to cache the offset of a GPF DVSEC in PCIe configuration space. It will be updated during the first EP attaching to the cxl_port, so the gpf_dvsec can only cache the GPF DVSEC offset of the dport which the first EP is under. Will not have chance to update it during other EPs attaching. That means CXL subsystem will use the same GPF DVSEC offset for all dports under the port, it will be a problem if the GPF DVSEC offset cached in cxl_port is not the right offset for a dport. Moving gpf_dvsec from struct cxl_port to struct cxl_dport, make every cxl dport has their own GPF DVSEC offset caching, and each cxl dport uses its own GPF DVSEC offset for GPF DVSEC accessing. Fixes: a52b6a2c1c99 ("cxl/pci: Support Global Persistent Flush (GPF)") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://patch.msgid.link/20250323093110.233040-2-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-17Merge branch 'for-6.15/features' into cxl-for-nextDave Jiang
Add CXL Features support. Setup code for enabling in kernel usage of CXL Features. Expecting EDAC/RAS to utilize CXL Features in kernel for things such as memory sparing. Also prepartion for enabling of CXL FWCTL support to issue allowed Features from user space.
2025-03-14Merge branch 'for-6.15/extended-linear-cache' into cxl-for-next2Dave Jiang
Add support for Extended Linear Cache for CXL. Add enumeration support of the cache. Add MCE notification of the aliased memory address.
2025-03-14Merge branch 'for-6.15/dirty-shutdown' into cxl-for-next2Dave Jiang
Add support for Global Persistent Flush (GPF) and dirty shutdown accounting.
2025-03-14cxl/pci: Support Global Persistent Flush (GPF)Davidlohr Bueso
Add support for GPF flows. It is found that the CXL specification around this to be a bit too involved from the driver side. And while this should really all handled by the hardware, this patch takes things with a grain of salt. Upon respective port enumeration, both phase timeouts are set to a max of 20 seconds, which is the NMI watchdog default for lockup detection. The premise is that the kernel does not have enough information to set anything better than a max across the board and hope devices finish their GPF flows within the platform energy budget. Timeout detection is based on dirty Shutdown semantics. The driver will mark it as dirty, expecting that the device clear it upon a successful GPF event. The admin may consult the device Health and check the dirty shutdown counter to see if there was a problem with data integrity. [ davej: Explicitly set return to 0 in update_gpf_port_dvsec() ] [ davej: Add spec reference for 'struct cxl_mbox_set_shutdown_state_in ] [ davej: Fix 0-day reported issue ] Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250124233533.910535-1-dave@stgolabs.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-03-14Merge branch 'for-6.15/fw-first-error-logging' into cxl-for-next2Dave Jiang
Add logging support for CXL CPER endpoint and port protocol errors. Including the 2 patches that was completed later. Link: https://lore.kernel.org/linux-cxl/20250123084421.127697-1-Smita.KoralahalliChannabasappa@amd.com/ Link: https://lore.kernel.org/linux-cxl/20250310223839.31342-1-Smita.KoralahalliChannabasappa@amd.com/
2025-03-14acpi/ghes, cxl/pci: Process CXL CPER Protocol ErrorsSmita Koralahalli
When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors. The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors. Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing. Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations. Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors. [DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()] Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26acpi/hmat / cxl: Add extended linear cache support for CXLDave Jiang
The current cxl region size only indicates the size of the CXL memory region without accounting for the extended linear cache size. Retrieve the cache size from HMAT and append that to the cxl region size for the cxl region range that matches the SRAT range that has extended linear cache enabled. The SRAT defines the whole memory range that includes the extended linear cache and the CXL memory region. The new HMAT ECN/ECR to the Memory Side Cache Information Structure defines the size of the extended linear cache size and matches to the SRAT Memory Affinity Structure by the memory proxmity domain. Add a helper to match the cxl range to the SRAT memory range in order to retrieve the cache size. There are several places that checks the cxl region range against the decoder range. Use new helper to check between the two ranges and address the new cache size. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250226162224.3633792-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26cxl/mbox: Add SET_FEATURE mailbox commandShiju Jose
Add support for SET_FEATURE mailbox command. CXL spec r3.2 section 8.2.9.6 describes optional device specific features. CXL devices supports features with changeable attributes. The settings of a feature can be optionally modified using Set Feature command. CXL spec r3.2 section 8.2.9.6.3 describes Set Feature command. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-6-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26cxl/mbox: Add GET_FEATURE mailbox commandShiju Jose
Add support for GET_FEATURE mailbox command. CXL spec r3.2 section 8.2.9.6 describes optional device specific features. The settings of a feature can be retrieved using Get Feature command. CXL spec r3.2 section 8.2.9.6.2 describes Get Feature command. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-5-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-24cxl: Refactor user ioctl command path from mds to mailboxDave Jiang
With 'struct cxl_mailbox' context introduced, the helper functions cxl_query_cmd() and cxl_send_cmd() can take a cxl_mailbox directly rather than a cxl_memdev parameter. Refactor to use cxl_mailbox directly. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250204220430.4146187-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-04cxl: Kill enum cxl_decoder_modeDan Williams
Now that the operational mode of DPA capacity (ram vs pmem... etc) is tracked in the partition, and no code paths have dependencies on the mode implying the partition index, the ambiguous 'enum cxl_decoder_mode' can be cleaned up, specifically this ambiguity on whether the operation mode implied anything about the partition order. Endpoint decoders simply reference their assigned partition where the operational mode can be retrieved as partition mode. With this in place PMEM can now be partition0 which happens today when the RAM capacity size is zero. Dynamic RAM can appear above PMEM when DCD arrives, etc. Code sequences that hard coded the "PMEM after RAM" assumption can now just iterate partitions and consult the partition mode after the fact. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Alejandro Lucero <alucerop@amd.com> Link: https://patch.msgid.link/173864306972.668823.3327008645125276726.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-10-28cxl/core/regs: Add rcd_pcie_cap initializationKobayashi,Daisuke
Add rcd_pcie_cap and its initialization to cache the offset of cxl1.1 device link status information. By caching it, avoid the walking memory map area to find the offset when output the register value. Given that this solution involves port lookups via cxl_pci_find_port() and multiple exit paths where that reference needs to be dropped, introduce a new put_cxl_root() scope-based-free handler. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Kobayashi,Daisuke <kobayashi.da-06@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20241002011549.408412-2-kobayashi.da-06@fujitsu.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-22cxl: Calculate region bandwidth of targets with shared upstream linkDave Jiang
The current bandwidth calculation aggregates all the targets. This simple method does not take into account where multiple targets sharing under a switch or a root port where the aggregated bandwidth can be greater than the upstream link of the switch. To accurately account for the shared upstream uplink cases, a new update function is introduced by walking from the leaves to the root of the hierarchy and clamp the bandwidth in the process as needed. This process is done when all the targets for a region are present but before the final values are send to the HMAT handling code cached access_coordinate targets. The original perf calculation path was kept to calculate the latency performance data that does not require the shared link consideration. The shared upstream link calculation is done as a second pass when all the endpoints have arrived. Testing is done via qemu with CXL hierarchy. run_qemu[1] is modified to support several CXL hierarchy layouts. The following layouts are tested: HB: Host Bridge RP: Root Port SW: Switch EP: End Point 2 HB 2 RP 2 EP: resulting bandwidth: 624 1 HB 2 RP 2 EP: resulting bandwidth: 624 2 HB 2 RP 2 SW 4 EP: resulting bandwidth: 624 Current testing, perf number from SRAT/HMAT is hacked into the kernel code. However with new QEMU support of Generic Target Port that's incoming, the perf data injection is no longer needed. [1]: https://github.com/pmem/run_qemu Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/linux-cxl/20240501152503.00002e60@Huawei.com/ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20240904001316.1688225-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()Alison Schofield
Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA addresses the work it performs is a DPA to HPA translation not a trace. Tidy up this naming by moving the minimal work done in cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa() for trace event callbacks. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://patch.msgid.link/452a9b0c525b774c72d9d5851515ffa928750132.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_trace_hpa() work to the region driverAlison Schofield
This work belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_dpa_to_region() work to the region driverAlison Schofield
This helper belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-12cxl/region: Deal with numa nodes not enumerated by SRATDave Jiang
For the numa nodes that are not created by SRAT, no memory_target is allocated and is not managed by the HMAT_REPORTING code. Therefore hmat_callback() memory hotplug notifier will exit early on those NUMA nodes. The CXL memory hotplug notifier will need to call node_set_perf_attrs() directly in order to setup the access sysfs attributes. In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is stored. Add a helper function acpi_node_backed_by_real_pxm() in order to check if a NUMA node id is defined by SRAT or created by CFMWS. node_set_perf_attrs() symbol is exported to allow update of perf attribs for a node. The sysfs path of /sys/devices/system/node/nodeX/access0/initiators/* is created by node_set_perf_attrs() for the various attributes where nodeX is matched to the NUMA node of the CXL region. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Add memory hotplug notifier for cxl regionDave Jiang
When the CXL region is formed, the driver computes the performance data for the region. However this data is not available at the node data collection that has been populated by the HMAT during kernel initialization. Add a memory hotplug notifier to update the access coordinates to the 'struct memory_target' context kept by the HMAT_REPORTING code. Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the priority number to be called before HMAT_CALLBACK_PRI. The CXL update must happen before hmat_callback(). A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in order to allow CXL to update the memory_target access coordinates. A new ext_updated member is added to the memory_target to indicate that the access coordinates within the memory_target has been updated by an external agent such as CXL. This prevents data being overwritten by the hmat_update_target_attrs() triggered by hmat_callback(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Calculate and store PCI link latency for the downstream portsDave Jiang
The latency is calculated by dividing the flit size over the bandwidth. Add support to retrieve the flit size for the CXL switch device and calculate the latency of the PCIe link. Cache the latency number with cxl_dport. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31Merge branch 'for-6.7/cxl-rch-eh' into cxl/nextDan Williams
Restricted CXL Host (RCH) Error Handling undoes the topology munging of CXL 1.1 to enabled some AER recovery, and lands some base infrastructure for handling Root-Complex-Event-Collectors (RCECs) with CXL. Include this long running series finally for v6.7.
2023-10-27cxl/pci: Add RCH downstream port AER register discoveryRobert Richter
Restricted CXL host (RCH) downstream port AER information is not currently logged while in the error state. One problem preventing the error logging is the AER and RAS registers are not accessible. The CXL driver requires changes to find RCH downstream port AER and RAS registers for purpose of error logging. RCH downstream ports are not enumerated during a PCI bus scan and are instead discovered using system firmware, ACPI in this case.[1] The downstream port is implemented as a Root Complex Register Block (RCRB). The RCRB is a 4k memory block containing PCIe registers based on the PCIe root port.[2] The RCRB includes AER extended capability registers used for reporting errors. Note, the RCH's AER Capability is located in the RCRB memory space instead of PCI configuration space, thus its register access is different. Existing kernel PCIe AER functions can not be used to manage the downstream port AER capabilities and RAS registers because the port was not enumerated during PCI scan and the registers are not PCI config accessible. Discover RCH downstream port AER extended capability registers. Use MMIO accesses to search for extended AER capability in RCRB register space. [1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy [2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-06cxl/memdev: Fix sanitize vs decoder setup lockingDan Williams
The sanitize operation is destructive and the expectation is that the device is unmapped while in progress. The current implementation does a lockless check for decoders being active, but then does nothing to prevent decoders from racing to be committed. Introduce state tracking to resolve this race. This incidentally cleans up unpriveleged userspace from triggering mmio read cycles by spinning on reading the 'security/state' attribute. Which at a minimum is a waste since the kernel state machine can cache the completion result. Lastly cxl_mem_sanitize() was mistakenly marked EXPORT_SYMBOL() in the original implementation, but an export was never required. Fixes: 0c36b6ad436a ("cxl/mbox: Add sanitization handling machinery") Cc: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxlDan Williams
Pick up the first half of the RCH error handling series. The back half needs some fixups for test regressions. Small conflicts with the PMU work around register enumeration and setup helpers.
2023-06-25cxl/rch: Prepare for caching the MMIO mapped PCIe AER capabilityDan Williams
Prepare cxl_probe_rcrb() for retrieving more than just the component register block. The RCH AER handling code wants to get back to the AER capability that happens to be MMIO mapped rather then configuration cycles. Move RCRB specific downstream port data, like the RCRB base and the AER capability offset, into its own data structure ('struct cxl_rcrb_info') for cxl_probe_rcrb() to fill. Extend 'struct cxl_dport' to include a 'struct cxl_rcrb_info' attribute. This centralizes all RCRB scanning in one routine. Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-4-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/acpi: Probe RCRB later during RCH downstream port creationRobert Richter
The RCRB is extracted already during ACPI CEDT table parsing while the data of this is needed not earlier than dport creation. This implementation comes with drawbacks: During ACPI table scan there is already MMIO access including mapping and unmapping, but only ACPI data should be collected here. The collected data must be transferred through a couple of interfaces until it is finally consumed when creating the dport. This causes complex data structures and function interfaces. Additionally, RCRB parsing will be extended to also extract AER data, it would be much easier do this at a later point during port and dport creation when the data structures are available to hold that data. To simplify all that, probe the RCRB at a later point during RCH downstream port creation. Change ACPI table parser to only extract the base address of either the component registers or the RCRB. Parse and extract the RCRB in devm_cxl_add_rch_dport(). This is in preparation to centralize all RCRB scanning. Signed-off-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-2-terry.bowman@amd.com Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20230622205523.85375-3-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-05-30cxl/pci: Find and register CXL PMU devicesJonathan Cameron
CXL PMU devices can be found from entries in the Register Locator DVSEC. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230526095824.16336-4-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/memdev: Trace inject and clear poison as cxl_poison eventsAlison Schofield
The cxl_poison trace event allows users to view the history of poison list reads. With the addition of inject and clear poison capabilities, users will expect similar tracing. Add trace types 'Inject' and 'Clear' to the cxl_poison trace_event and trace successful operations only. If the driver finds that the DPA being injected or cleared of poison is mapped in a region, that region info is included in the cxl_poison trace event. Region reconfigurations can make this extra info useless if the debug operations are not carefully managed. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/e20eb7c3029137b480ece671998c183da0477e2e.1681874357.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/region: Provide region info to the cxl_poison trace eventAlison Schofield
User space may need to know which region, if any, maps the poison address(es) logged in a cxl_poison trace event. Since the mapping of DPAs (device physical addresses) to a region can change, the kernel must provide this information at the time the poison list is read. The event informs user space that at event <timestamp> this <region> mapped to this <DPA>, which is poisoned. The cxl_poison trace event is already wired up to log the region name and uuid if it receives param 'struct cxl_region'. In order to provide that cxl_region, add another method for gathering poison - by committed endpoint decoder mappings. This method is only available with CONFIG_CXL_REGION and is only used if a region actually maps the memdev where poison is being read. After the region driver reads the poison list for all the mapped resources, poison is read for any remaining unmapped resources. The default method remains: read the poison by memdev resource. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/438b01ccaa70592539e8eda4eb2b1d617ba03160.1681838292.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/trace: Add TRACE support for CXL media-error recordsAlison Schofield
CXL devices may support the retrieval of a device poison list. Add a new trace event that the CXL subsystem may use to log the media-error records returned in the poison list. Log each media-error record as a cxl_poison trace event of type 'List'. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/de6196f5269483d886ab1834744f82d27189a666.1681838291.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/dax: Create dax devices for CXL RAM regionsDan Williams
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the remainder and hot-added devices. Platform firmware is also responsible for identifying the platform general purpose memory pool, typically DDR attached DRAM, and arranging for the remainder to be 'Soft Reserved'. That reservation allows the CXL subsystem to route the memory to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated access (device-dax). The new 'struct cxl_dax_region' object allows for a CXL memory resource (region) to be published, but also allow for udev and module policy to act on that event. It also prevents cxl_core.ko from having a module loading dependency on any drivers/dax/ modules. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10tools/testing/cxl: Define a fixed volatile configuration to parseDan Williams
Take two endpoints attached to the first switch on the first host-bridge in the cxl_test topology and define a pre-initialized region. This is a x2 interleave underneath a x1 CXL Window. $ modprobe cxl_test $ # cxl list -Ru { "region":"region3", "resource":"0xf010000000", "size":"512.00 MiB (536.87 MB)", "interleave_ways":2, "interleave_granularity":4096, "decode_state":"commit" } Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602000547.1924368.11613151863880268868.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Add volatile region creation supportDan Williams
Expand the region creation infrastructure to enable 'ram' (volatile-memory) regions. The internals of create_pmem_region_store() and create_pmem_region_show() are factored out into helpers __create_region() and __create_region_show() for the 'ram' case to reuse. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601995775.1924368.352616146815830591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_memDan Williams
tl;dr: Clean up an unnecessary export and enable cxl_test. An RCD (Restricted CXL Device), in contrast to a typical CXL device in a VH topology, obtains its component registers from the bottom half of the associated CXL host bridge RCRB (Root Complex Register Block). In turn this means that cxl_rcrb_to_component() needs to be called from devm_cxl_add_endpoint(). Presently devm_cxl_add_endpoint() is part of the CXL core, but the only user is the CXL mem module. Move it from cxl_core to cxl_mem to not only get rid of an unnecessary export, but to also enable its call out to cxl_rcrb_to_component(), in a subsequent patch, to be mocked by cxl_test. Recall that cxl_test can only mock exported symbols, and since cxl_rcrb_to_component() is itself inside the core, all callers must be outside of cxl_core to allow cxl_test to mock it. Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/166993045072.1882361.13944923741276843683.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/region: Introduce cxl_pmem_region objectsDan Williams
The LIBNVDIMM subsystem is a platform agnostic representation of system NVDIMM / persistent memory resources. To date, the CXL subsystem's interaction with LIBNVDIMM has been to register an nvdimm-bridge device and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM subsystem mechanics. With regions the approach is the same. Create a new cxl_pmem_region object to proxy CXL region details into a LIBNVDIMM definition. With this enabling LIBNVDIMM can partition CXL persistent memory regions with legacy namespace labels. A follow-on patch will add CXL region label and CXL namespace label support to persist region configurations across driver reload / system-reset events. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-26cxl/region: Add region driver boiler plateDan Williams
The CXL region driver is responsible for routing fully formed CXL regions to one of libnvdimm, for persistent memory regions, device-dax for volatile memory regions, or just act as an enumeration placeholder if the region was setup and configuration locked by platform firmware. In the platform-firmware-setup case the expectation is that region is already accounted in the system memory map, i.e. already enabled as "System RAM". For now, just attach to CXL regions in the CXL_CONFIG_COMMIT state, and take no further action. Given this driver is just a small / simple router, include it in the core rather than its own module. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220624041950.559155-18-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Program target listsDan Williams
Once the region's interleave geometry (ways, granularity, size) is established and all the endpoint decoder targets are assigned, the next phase is to program all the intermediate decoders. Specifically, each CXL switch in the path between the endpoint and its CXL host-bridge (including the logical switch internal to the host-bridge) needs to have its decoders programmed and the target list order assigned. The difficulty in this implementation lies in determining which endpoint decoder ordering combinations are valid. Consider the cxl_test case of 2 host bridges, each of those host-bridges attached to 2 switches, and each of those switches attached to 2 endpoints for a potential 8-way interleave. The x2 interleave at the host-bridge level requires that all even numbered endpoint decoder positions be located on the "left" hand side of the topology tree, and the odd numbered positions on the other. The endpoints that are peers on the same switch need to have a position that can be routed with a dedicated address bit per-endpoint. See check_last_peer() for the details. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337827.1758207.132121746122685208.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Attach endpoint decodersDan Williams
CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-25cxl/region: Enable the assignment of endpoint decoders to regionsDan Williams
The region provisioning process involves allocating DPA to a set of endpoint decoders, and HPA plus the region geometry to a region device. Then the decoder is assigned to the region. At this point several validation steps can be performed to validate that the decoder is suitable to participate in the region. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/165784336184.1758207.16403282029203949622.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/region: Add region creation supportBen Widawsky
CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-21cxl/hdm: Add support for allocating DPA to an endpoint decoderDan Williams
The region provisioning flow will roughly follow a sequence of: 1/ Allocate DPA to a set of decoders 2/ Allocate HPA to a region 3/ Associate decoders with a region and validate that the DPA allocations and topologies match the parameters of the region. For now, this change (step 1) arranges for DPA capacity to be allocated and deleted from non-committed decoders based on the decoder's mode / partition selection. Capacity is allocated from the lowest DPA in the partition and any 'pmem' allocation blocks out all remaining ram capacity in its 'skip' setting. DPA allocations are enforced in decoder instance order. I.e. decoder N + 1 always starts at a higher DPA than instance N, and deleting allocations must proceed from the highest-instance allocated decoder to the lowest. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784329399.1758207.16732038126938632700.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-10cxl/debug: Move debugfs init to cxl_core_init()Dan Williams
In preparation for a new cxl debugfs file, move 'cxl' directory establishment and teardown to the core and let subsequent init routines reference that setup. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165603884654.551046.4962104601691723080.stgit@dwillia2-xfh Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-08cxl/core/hdm: Add CXL standard decoder enumeration to the coreDan Williams
Unlike the decoder enumeration for "root decoders" described by platform firmware, standard decoders can be enumerated from the component registers space once the base address has been identified (via PCI, ACPI, or another mechanism). Add common infrastructure for HDM (Host-managed-Device-Memory) Decoder enumeration and share it between host-bridge, upstream switch port, and cxl_test defined decoders. The locking model for switch level decoders is to hold the port lock over the enumeration. This facilitates moving the dport and decoder enumeration to a 'port' driver. For now, the only enumerator of decoder resources is the cxl_acpi root driver. Co-developed-by: Ben Widawsky <ben.widawsky@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/164374688404.395335.9239248252443123526.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21cxl/core: Split decoder setup into alloc + addDan Williams
The kbuild robot reports: drivers/cxl/core/bus.c:516:1: warning: stack frame size (1032) exceeds limit (1024) in function 'devm_cxl_add_decoder' It is also the case the devm_cxl_add_decoder() is unwieldy to use for all the different decoder types. Fix the stack usage by splitting the creation into alloc and add steps. This also allows for context specific construction before adding. With the split the caller is responsible for registering a devm callback to trigger device_unregister() for the decoder rather than it being implicit in the decoder registration. I.e. the routine that calls alloc is responsible for calling put_device() if the "add" operation fails. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/163225205828.3038145.6831131648369404859.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-21cxl/mbox: Move mailbox and other non-PCI specific infrastructure to the coreDan Williams
Now that the internals of mailbox operations are abstracted from the PCI specifics a bulk of infrastructure can move to the core. The CXL_PMEM driver intends to proxy LIBNVDIMM UAPI and driver requests to the equivalent functionality provided by the CXL hardware mailbox interface. In support of that intent move the mailbox implementation to a shared location for the CXL_PCI driver native IOCTL path and CXL_PMEM nvdimm command proxy path to share. A unit test framework seeks to implement a unit test backend transport for mailbox commands to communicate mocked up payloads. It can reuse all of the mailbox infrastructure minus the PCI specifics, so that also gets moved to the core. Finally with the mailbox infrastructure and ioctl handling being transport generic there is no longer any need to pass file file_operations to devm_cxl_add_memdev(). That allows all the ioctl boilerplate to move into the core for unit test reuse. No functional change intended, just code movement. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/163116435233.2460985.16197340449713287180.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-06cxl/core: Move memdev management to coreBen Widawsky
The motivation for moving cxl_memdev allocation to the core (beyond better file organization of sysfs attributes in core/ and drivers in cxl/), is that device lifetime is longer than module lifetime. The cxl_pci module should be free to come and go without needing to coordinate with devices that need the text associated with cxl_memdev_release() to stay resident. The move fixes a use after free bug when looping driver load / unload with CONFIG_DEBUG_KOBJECT_RELEASE=y. Another motivation for disconnecting cxl_memdev creation from cxl_pci is to enable other drivers, like a unit test driver, to registers memdevs. Fixes: b39cb1052a5c ("cxl/mem: Register CXL memX devices") Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162792540495.368511.9748638751088219595.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-08-06cxl/core: Move pmem functionalityDan Williams
Refactor the pmem / nvdimm-bridge functionality from core/bus.c to core/pmem.c. Introduce drivers/core/core.h to communicate data structures and helpers between the core bus and other functionality that registers devices on the bus. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162792538899.368511.3881663908293411300.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>