summaryrefslogtreecommitdiff
path: root/drivers/cxl/cxl.h
AgeCommit message (Collapse)Author
2025-01-29Merge tag 'cxl-for-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) updates from Dave Jiang: "A tweak to the HMAT output that was acked by Rafael, a prep patch for CXL type2 devices support that's coming soon, refactoring of the CXL regblock enumeration code, and a series of patches to update the event records to CXL spec r3.1: - Move HMAT printouts to pr_debug() - Add CXL type2 support to cxl_dvsec_rr_decode() in preparation for type2 support - A series that updates CXL event records to spec r3.1 and related changes - Refactoring of cxl_find_regblock_instance() to count regblocks" * tag 'cxl-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/core/regs: Refactor out functions to count regblocks of given type cxl/test: Update test code for event records to CXL spec rev 3.1 cxl/events: Update Memory Module Event Record to CXL spec rev 3.1 cxl/events: Update DRAM Event Record to CXL spec rev 3.1 cxl/events: Update General Media Event Record to CXL spec rev 3.1 cxl/events: Add Component Identifier formatting for CXL spec rev 3.1 cxl/events: Update Common Event Record to CXL spec rev 3.1 cxl/pci: Add CXL Type 1/2 support to cxl_dvsec_rr_decode() ACPI/HMAT: Move HMAT messages to pr_debug()
2025-01-22cxl/core/regs: Refactor out functions to count regblocks of given typeHuaisheng Ye
cxl_find_regblock_instance() counts the number of instances of a register block as a side effect of searching through all available register blocks. cxl_count_regblock() throws away that work and recounts all the register blocks by asking cxl_find_regblock_instance() to redo work it has already done until it finally returns an error, that is needlessly wasteful. Let cxl_count_regblock() leverage the counting that cxl_find_regblock_instance() already does by passing in a sentinel value (CXL_INSTANCES_COUNT) that triggers the count to be returned. [ davej: Updated to more concise commit log supplied by djbw ] [ davej: Fix up checkpatch formatting warnings ] Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250115152600.26482-2-huaisheng.ye@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-03cxl/pmem: Remove is_cxl_nvdimm_bridge()Zijun Hu
Remove is_cxl_nvdimm_bridge() which has no caller now. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-11-6623037414d4@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-02cxl/pci: Add CXL Type 1/2 support to cxl_dvsec_rr_decode()Alejandro Lucero
In cxl_dvsec_rr_decode() the pci driver expects to retrieve a cxlds, struct cxl_dev_state, from the driver_data field of struct device. While that works for Type 3, drivers for Type 1/2 devices may not put a cxlds in the driver_data field. In preparation for supporting Type 1/2 devices, replace parameter 'struct device' with 'struct cxl_dev_state' in cxl_dvsec_rr_decode(). Remove the unused parameter 'cxl_port' in cxl_dvsec_rr_decode(). Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20241203162112.5088-1-alucerop@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-11-22Merge tag 'cxl-for-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl updates from Dave Jiang: - Constify range_contains() input parameters to prevent changes - Add support for displaying RCD capabilities in sysfs to support lspci for CXL device - Downgrade warning message to debug in cxl_probe_component_regs() - Add support for adding a printf specifier '%pra' to emit 'struct range' content: - Add sanity tests for 'struct resource' - Add documentation for special case - Add %pra for 'struct range' - Add %pra usage in CXL code - Add preparation code for DCD support: - Add range_overlaps() - Add CDAT DSMAS table shared and read only flag in ACPICA - Add documentation to 'struct dev_dax_range' - Delay event buffer allocation in CXL PCI code until needed - Use guard() in cxl_dpa_set_mode() - Refactor create region code to consolidate common code * tag 'cxl-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Refactor common create region code cxl/hdm: Use guard() in cxl_dpa_set_mode() cxl/pci: Delay event buffer allocation dax: Document struct dev_dax_range ACPI/CDAT: Add CDAT/DSMAS shared and read only flag values range: Add range_overlaps() cxl/cdat: Use %pra for dpa range outputs printf: Add print format (%pra) for struct range Documentation/printf: struct resource add start == end special case test printf: Add very basic struct resource tests cxl: downgrade a warning message to debug level in cxl_probe_component_regs() cxl/pci: Add sysfs attribute for CXL 1.1 device link status cxl/core/regs: Add rcd_pcie_cap initialization kernel/range: Const-ify range_contains parameters
2024-10-28cxl/core/regs: Add rcd_pcie_cap initializationKobayashi,Daisuke
Add rcd_pcie_cap and its initialization to cache the offset of cxl1.1 device link status information. By caching it, avoid the walking memory map area to find the offset when output the register value. Given that this solution involves port lookups via cxl_pci_find_port() and multiple exit paths where that reference needs to be dropped, introduce a new put_cxl_root() scope-based-free handler. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Kobayashi,Daisuke <kobayashi.da-06@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20241002011549.408412-2-kobayashi.da-06@fujitsu.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-10-25cxl/port: Fix use-after-free, permit out-of-order decoder shutdownDan Williams
In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-09-22cxl: Calculate region bandwidth of targets with shared upstream linkDave Jiang
The current bandwidth calculation aggregates all the targets. This simple method does not take into account where multiple targets sharing under a switch or a root port where the aggregated bandwidth can be greater than the upstream link of the switch. To accurately account for the shared upstream uplink cases, a new update function is introduced by walking from the leaves to the root of the hierarchy and clamp the bandwidth in the process as needed. This process is done when all the targets for a region are present but before the final values are send to the HMAT handling code cached access_coordinate targets. The original perf calculation path was kept to calculate the latency performance data that does not require the shared link consideration. The shared upstream link calculation is done as a second pass when all the endpoints have arrived. Testing is done via qemu with CXL hierarchy. run_qemu[1] is modified to support several CXL hierarchy layouts. The following layouts are tested: HB: Host Bridge RP: Root Port SW: Switch EP: End Point 2 HB 2 RP 2 EP: resulting bandwidth: 624 1 HB 2 RP 2 EP: resulting bandwidth: 624 2 HB 2 RP 2 SW 4 EP: resulting bandwidth: 624 Current testing, perf number from SRAT/HMAT is hacked into the kernel code. However with new QEMU support of Generic Target Port that's incoming, the perf data injection is no longer needed. [1]: https://github.com/pmem/run_qemu Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/linux-cxl/20240501152503.00002e60@Huawei.com/ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20240904001316.1688225-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-09cxl/pci: Remove duplicated implementation of waiting for memory_info_validYanfei Xu
commit ce17ad0d5498 ("cxl: Wait Memory_Info_Valid before access memory related info") added another implementation, which is cxl_dvsec_mem_range_valid(), of waiting for memory_info_valid without realizing it duplicated wait_for_valid(). Remove wait_for_valid() and retain cxl_dvsec_mem_range_valid() as the former is hardcoded to check only the Memory_Info_Valid bit of DVSEC range 1, while the latter allows for selection between DVSEC range 1 or 2 via parameter. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Yanfei Xu <yanfei.xu@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240828084231.1378789-3-yanfei.xu@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-03cxl/pci: Rename cxl_setup_parent_dport() and cxl_dport_map_regs()Li Ming
The name of cxl_setup_parent_dport() function is not clear, the function is used to initialize AER and RAS capabilities on a dport, therefore, rename the function to cxl_dport_init_ras_reporting(), it is easier for user to understand what the function does. Besides, adjust the order of the function parameters, the subject of cxl_dport_init_ras_reporting() is a cxl dport, so a struct cxl_dport as the first parameter of the function should be better. cxl_dport_map_regs() is used to map CXL RAS capability on a cxl dport, using cxl_dport_map_ras() as the function name. Signed-off-by: Li Ming <ming4.li@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240830061308.2327065-1-ming4.li@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-03cxl/port: Use __free() to drop put_device() for cxl_portLi Ming
Using scope-based resource management __free() marco with a new helper called put_cxl_port() to drop open coded the put_device() used to dereference the 'struct device' in cxl_port. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Ming <ming4.li@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240830013138.2256244-1-ming4.li@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-28Merge tag 'cxl-for-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL updates from Dave Jiang: "Core: - A CXL maturity map has been added to the documentation to detail the current state of CXL enabling. It provides the status of the current state of various CXL features to inform current and future contributors of where things are and which areas need contribution. - A notifier handler has been added in order for a newly created CXL memory region to trigger the abstract distance metrics calculation. This should bring parity for CXL memory to the same level vs hotplugged DRAM for NUMA abstract distance calculation. The abstract distance reflects relative performance used for memory tiering handling. - An addition for XOR math has been added to address the CXL DPA to SPA translation. CXL address translation did not support address interleave math with XOR prior to this change. Fixes: - Fix to address race condition in the CXL memory hotplug notifier - Add missing MODULE_DESCRIPTION() for CXL modules - Fix incorrect vendor debug UUID define Misc: - A warning has been added to inform users of an unsupported configuration when mixing CXL VH and RCH/RCD hierarchies - The ENXIO error code has been replaced with EBUSY for inject poison limit reached via debugfs and cxl-test support - Moving the PCI config read in cxl_dvsec_rr_decode() to avoid unnecessary PCI config reads - A refactor to a common struct for DRAM and general media CXL events" * tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/core/pci: Move reading of control register to immediately before usage cxl: Remove defunct code calculating host bridge target positions cxl/region: Verify target positions using the ordered target list cxl: Restore XOR'd position bits during address translation cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa() cxl/test: Replace ENXIO with EBUSY for inject poison limit reached cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy cxl/core: Fix incorrect vendor debug UUID define Documentation: CXL Maturity Map cxl/region: Simplify cxl_region_nid() cxl/region: Support to calculate memory tier abstract distance cxl/region: Fix a race condition in memory hotplug notifier cxl: add missing MODULE_DESCRIPTION() macros cxl/events: Use a common struct for DRAM and General Media events
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-11Merge branch 'for-6.11/xor_fixes' into cxl-for-nextDave Jiang
Series to fix XOR math for DPA to SPA translation - Refactor and fold cxl_trace_hpa() into cxl_dpa_to_hpa() - Complete DPA->HPA->SPA translation and correct XOR translation issue - Add new method to verify a CXL target position - Remove old method of CXL target position verifiation
2024-07-11cxl: Remove defunct code calculating host bridge target positionsAlison Schofield
The CXL Spec 3.1 Table 9-22 requires that the BIOS populate the CFMWS target list in interleave target order. This means the calculations the CXL driver added to determine positions when XOR math is in use, along with the entire XOR vs Modulo call back setup is not needed. A prior patch added a common method to verify positions. Remove the now unused code related to the cxl_calc_hb_fn. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/2e2c32a2d0f1007e920b58712d15edad2e48d857.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-11cxl: Restore XOR'd position bits during address translationAlison Schofield
When a device reports a DPA in events like poison, general_media, and dram, the driver translates that DPA back to an HPA. Presently, the CXL driver translation only considers the Modulo position and will report the wrong HPA for XOR configured root decoders. Add a helper function that restores the XOR'd bits during DPA->HPA address translation. Plumb a root decoder callback to the new helper when XOR interleave arithmetic is in use. For Modulo arithmetic, just let the callback be NULL - as in no extra work required. Upon completion of a DPA->HPA translation a couple of checks are performed on the result. One simply confirms that the calculated HPA is within the address range of the region. That test is useful for both Modulo and XOR interleave arithmetic decodes. A second check confirms that the HPA is within an expected chunk based on the endpoints position in the region and the region granularity. An XOR decode disrupts the Modulo pattern making the chunk check useless. To align the checks with the proper decode, pull the region range check inline and use the helper to do the chunk check for Modulo decodes only. A cxl-test unit test is posted for upstream review here: https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/ Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-03driver core: have match() callback in struct bus_type take a const *Greg Kroah-Hartman
In the match() callback, the struct device_driver * should not be changed, so change the function callback to be a const *. This is one step of many towards making the driver core safe to have struct device_driver in read-only memory. Because the match() callback is in all busses, all busses are modified to handle this properly. This does entail switching some container_of() calls to container_of_const() to properly handle the constant *. For some busses, like PCI and USB and HV, the const * is cast away in the match callback as those busses do want to modify those structures at this point in time (they have a local lock in the driver structure.) That will have to be changed in the future if they wish to have their struct device * in read-only-memory. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Alex Elder <elder@kernel.org> Acked-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-02cxl/region: Support to calculate memory tier abstract distanceHuang Ying
An abstract distance value must be assigned by the driver that makes the memory available to the system. It reflects relative performance and is used to place memory nodes backed by CXL regions in the appropriate memory tiers allowing promotion/demotion within the existing memory tiering mechanism. The abstract distance is calculated based on the memory access latency and bandwidth of CXL regions. Signed-off-by: Huang, Ying <ying.huang@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Bharata B Rao <bharata@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240618084639.1419629-3-ying.huang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-25cxl/region: check interleave capabilityYao Xingtao
Since interleave capability is not verified, if the interleave capability of a target does not match the region need, committing decoder should have failed at the device end. In order to checkout this error as quickly as possible, driver needs to check the interleave capability of target during attaching it to region. Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register), bits 11 and 12 indicate the capability to establish interleaving in 3, 6, 12 and 16 ways. If these bits are not set, the target cannot be attached to a region utilizing such interleave ways. Additionally, bits 8 and 9 represent the capability of the bits used for interleaving in the address, Linux tracks this in the cxl_port interleave_mask. Per CXL specification r3.1(8.2.4.20.13 Decoder Protection): eIW means encoded Interleave Ways. eIG means encoded Interleave Granularity. in HPA: if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used, the interleave bits are none, the following check is ignored. if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits start at bit position eIG + 8 and end at eIG + eIW + 8 - 1. if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits start at bit position eIG + 8 and end at eIG + eIW - 1. if the interleave mask is insufficient to cover the required interleave bits, the target cannot be attached to the region. Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-18cxl/mem: Fix no cxl_nvd during pmem region auto-assemblingLi Ming
When CXL subsystem is auto-assembling a pmem region during cxl endpoint port probing, always hit below calltrace. BUG: kernel NULL pointer dereference, address: 0000000000000078 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x160 ? do_user_addr_fault+0x65/0x6b0 ? exc_page_fault+0x7d/0x170 ? asm_exc_page_fault+0x26/0x30 ? cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] ? cxl_pmem_region_probe+0x1ac/0x360 [cxl_pmem] cxl_bus_probe+0x1b/0x60 [cxl_core] really_probe+0x173/0x410 ? __pfx___device_attach_driver+0x10/0x10 __driver_probe_device+0x80/0x170 driver_probe_device+0x1e/0x90 __device_attach_driver+0x90/0x120 bus_for_each_drv+0x84/0xe0 __device_attach+0xbc/0x1f0 bus_probe_device+0x90/0xa0 device_add+0x51c/0x710 devm_cxl_add_pmem_region+0x1b5/0x380 [cxl_core] cxl_bus_probe+0x1b/0x60 [cxl_core] The cxl_nvd of the memdev needs to be available during the pmem region probe. Currently the cxl_nvd is registered after the endpoint port probe. The endpoint probe, in the case of autoassembly of regions, can cause a pmem region probe requiring the not yet available cxl_nvd. Adjust the sequence so this dependency is met. This requires adding a port parameter to cxl_find_nvdimm_bridge() that can be used to query the ancestor root port. The endpoint port is not yet available, but will share a common ancestor with its parent, so start the query from there instead. Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Ming <ming4.li@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240612064423.2567625-1-ming4.li@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-21Merge tag 'pci-v6.10-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Skip E820 checks for MCFG ECAM regions for new (2016+) machines, since there's no requirement to describe them in E820 and some platforms require ECAM to work (Bjorn Helgaas) - Rename PCI_IRQ_LEGACY to PCI_IRQ_INTX to be more specific (Damien Le Moal) - Remove last user and pci_enable_device_io() (Heiner Kallweit) - Wait for Link Training==0 to avoid possible race (Ilpo Järvinen) - Skip waiting for devices that have been disconnected while suspended (Ilpo Järvinen) - Clear Secondary Status errors after enumeration since Master Aborts and Unsupported Request errors are an expected part of enumeration (Vidya Sagar) MSI: - Remove unused IMS (Interrupt Message Store) support (Bjorn Helgaas) Error handling: - Mask Genesys GL975x SD host controller Replay Timer Timeout correctable errors caused by a hardware defect; the errors cause interrupts that prevent system suspend (Kai-Heng Feng) - Fix EDR-related _DSM support, which previously evaluated revision 5 but assumed revision 6 behavior (Kuppuswamy Sathyanarayanan) ASPM: - Simplify link state definitions and mask calculation (Ilpo Järvinen) Power management: - Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports, where BIOS apparently doesn't know how to put them back in D0 (Mario Limonciello) CXL: - Support resetting CXL devices; special handling required because CXL Ports mask Secondary Bus Reset by default (Dave Jiang) DOE: - Support DOE Discovery Version 2 (Alexey Kardashevskiy) Endpoint framework: - Set endpoint BAR to be 64-bit if the driver says that's all the device supports, in addition to doing so if the size is >2GB (Niklas Cassel) - Simplify endpoint BAR allocation and setting interfaces (Niklas Cassel) Cadence PCIe controller driver: - Drop DT binding redundant msi-parent and pci-bus.yaml (Krzysztof Kozlowski) Cadence PCIe endpoint driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) Freescale Layerscape PCIe controller driver: - Convert DT binding to YAML (Frank Li) MediaTek MT7621 PCIe controller driver: - Add DT binding missing 'reg' property for child Root Ports (Krzysztof Kozlowski) - Fix theoretical string truncation in PHY name (Sergio Paracuellos) NVIDIA Tegra194 PCIe controller driver: - Return success for endpoint probe instead of falling through to the failure path (Vidya Sagar) Renesas R-Car PCIe controller driver: - Add DT binding missing IOMMU properties (Geert Uytterhoeven) - Add DT binding R-Car V4H compatible for host and endpoint mode (Yoshihiro Shimoda) Rockchip PCIe controller driver: - Configure endpoint BARs to be 64-bit based on the BAR type, not the BAR value (Niklas Cassel) - Add DT binding missing maxItems to ep-gpios (Krzysztof Kozlowski) - Set the Subsystem Vendor ID, which was previously zero because it was masked incorrectly (Rick Wertenbroek) Synopsys DesignWare PCIe controller driver: - Restructure DBI register access to accommodate devices where this requires Refclk to be active (Manivannan Sadhasivam) - Remove the deinit() callback, which was only need by the pcie-rcar-gen4, and do it directly in that driver (Manivannan Sadhasivam) - Add dw_pcie_ep_cleanup() so drivers that support PERST# can clean up things like eDMA (Manivannan Sadhasivam) - Rename dw_pcie_ep_exit() to dw_pcie_ep_deinit() to make it parallel to dw_pcie_ep_init() (Manivannan Sadhasivam) - Rename dw_pcie_ep_init_complete() to dw_pcie_ep_init_registers() to reflect the actual functionality (Manivannan Sadhasivam) - Call dw_pcie_ep_init_registers() directly from all the glue drivers, not just those that require active Refclk from the host (Manivannan Sadhasivam) - Remove the "core_init_notifier" flag, which was an obscure way for glue drivers to indicate that they depend on Refclk from the host (Manivannan Sadhasivam) TI J721E PCIe driver: - Add DT binding J784S4 SoC Device ID (Siddharth Vadapalli) - Add DT binding J722S SoC support (Siddharth Vadapalli) TI Keystone PCIe controller driver: - Add DT binding missing num-viewport, phys and phy-name properties (Jan Kiszka) Miscellaneous: - Constify and annotate with __ro_after_init (Heiner Kallweit) - Convert DT bindings to YAML (Krzysztof Kozlowski) - Check for kcalloc() failure in of_pci_prop_intr_map() (Duoming Zhou)" * tag 'pci-v6.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits) PCI: Do not wait for disconnected devices when resuming x86/pci: Skip early E820 check for ECAM region PCI: Remove unused pci_enable_device_io() ata: pata_cs5520: Remove unnecessary call to pci_enable_device_io() PCI: Update pci_find_capability() stub return types PCI: Remove PCI_IRQ_LEGACY scsi: vmw_pvscsi: Do not use PCI_IRQ_LEGACY instead of PCI_IRQ_LEGACY scsi: pmcraid: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: mpt3sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: megaraid_sas: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: ipr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: hpsa: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY scsi: arcmsr: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY wifi: rtw89: Use PCI_IRQ_INTX instead of PCI_IRQ_LEGACY dt-bindings: PCI: rockchip,rk3399-pcie: Add missing maxItems to ep-gpios Revert "genirq/msi: Provide constants for PCI/IMS support" Revert "x86/apic/msi: Enable PCI/IMS" Revert "iommu/vt-d: Enable PCI/IMS" Revert "iommu/amd: Enable PCI/IMS" Revert "PCI/MSI: Provide IMS (Interrupt Message Store) support" ...
2024-05-08cxl: Add post-reset warning if reset results in loss of previously committed ↵Dave Jiang
HDM decoders Secondary Bus Reset (SBR) is equivalent to a device being hot removed and inserted again. Doing a SBR on a CXL type 3 device is problematic if the exported device memory is part of system memory that cannot be offlined. The event is equivalent to violently ripping out that range of memory from the kernel. While the hardware requires the "Unmask SBR" bit set in the Port Control Extensions register and the kernel currently does not unmask it, user can unmask this bit via setpci or similar tool. The driver does not have a way to detect whether a reset coming from the PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to detect is to note if a decoder is marked as enabled in software but the decoder control register indicates it's not committed. Add a helper function to find discrepancy between the decoder software state versus the hardware register state. Suggested-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240502165851.1948523-6-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
2024-05-01cxl/acpi: Cleanup __cxl_parse_cfmws()Dan Williams
As a follow on to the recent rework of __cxl_parse_cfmws() to always return errors [1], use cleanup.h helpers to remove goto and other cleanups now that logging is moved to the cxl_parse_cfmws() wrapper. This ends up adding more code than it deletes, but __cxl_parse_cfmws() itself does get smaller. The takeaway from the cond_no_free_ptr() discussion [2] was to not add new macros to handle the cases where no_free_ptr() is awkward, instead rework the code to have helpers and clearer delineation of responsibility. Now one might say that __free(del_cxl_resource) is excessive given it is immediately registered with add_or_reset_cxl_resource(). The rationale for keeping it is that it forces use of "no_free_ptr()" on the argument passed to add_or_reset_cxl_resource(). That in turn makes it clear that @res is NULL for the rest of the function which is part of the point of the cleanup helpers, to turn subtle use after free errors [3] into loud NULL pointer de-references. Link: http://lore.kernel.org/r/170820177238.631006.1012639681618409284.stgit@dwillia2-xfh.jf.intel.com [1] Link: http://lore.kernel.org/r/CAHk-=whBVhnh=KSeBBRet=E7qJAwnPR_aj5em187Q3FiD+LXnA@mail.gmail.com [2] Link: http://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org [3] Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20240219124041.00002bda@Huawei.com Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/171235474028.2718248.14109646123143505522.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl: Fix compile warning for cxl_security_ops externDave Jiang
Jonathan reported he has observed compiler warning when using running with W=1 C=1 for cxl_security_ops that is declared as an extern in cxl/pmem.c. Move to cxl.h to make it visible to all cxl sources. Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/linux-cxl/167771196186.3285982.18283746206612049722.stgit@djiang5-mobl3.local/ Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coordDave Jiang
The driver stores access_coordinate for host bridge in ->hb_coord and switch CDAT access_coordinate in ->sw_coord. Since neither of these access_coordinate clobber each other, the variable name can be consolidated into ->coord to simplify the code. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-5-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-08cxl: Fix incorrect region perf data calculationDave Jiang
Current math in cxl_region_perf_data_calculate divides the latency by 1000 every time the function gets called. This causes the region latency to be divided by 1000 per memory device and the math is incorrect. This is user visible as the latency access_coordinate exposed via sysfs will show incorrect latency data. Normalize values from CDAT to nanoseconds. Adjust sub-nanoseconds latency to at least 1. Remove adjustment of perf numbers from the generic target since hmat handling code has already normalized those numbers. Now all computation and stored numbers should be in nanoseconds. cxl_hb_get_perf_coordinates() is removed and HB coords are calculated in the port access_coordinate calculation path since it no longer need to be treated special. Fixes: 3d9f4a197230 ("cxl/region: Calculate performance data for a region") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20240403154844.3403859-4-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-12cxl/region: Add memory hotplug notifier for cxl regionDave Jiang
When the CXL region is formed, the driver computes the performance data for the region. However this data is not available at the node data collection that has been populated by the HMAT during kernel initialization. Add a memory hotplug notifier to update the access coordinates to the 'struct memory_target' context kept by the HMAT_REPORTING code. Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the priority number to be called before HMAT_CALLBACK_PRI. The CXL update must happen before hmat_callback(). A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in order to allow CXL to update the memory_target access coordinates. A new ext_updated member is added to the memory_target to indicate that the access coordinates within the memory_target has been updated by an external agent such as CXL. This prevents data being overwritten by the hmat_update_target_attrs() triggered by hmat_callback(). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl/region: Calculate performance data for a regionDave Jiang
Calculate and store the performance data for a CXL region. Find the worst read and write latency for all the included ranges from each of the devices that attributes to the region and designate that as the latency data. Sum all the read and write bandwidth data for each of the device region and that is the total bandwidth for the region. The perf list is expected to be constructed before the endpoint decoders are registered and thus there should be no early reading of the entries from the region assemble action. The calling of the region qos calculate function is under the protection of cxl_dpa_rwsem and will ensure that all DPA associated work has completed. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Split out host bridge access coordinatesDave Jiang
The difference between access class 0 and access class 1 for 'struct access_coordinate', if any, is that class 0 is for the distance from the target to the closest initiator and that class 1 is for the distance from the target to the closest CPU. For CXL memory, the nearest initiator may not necessarily be a CPU node. The performance path from the CXL endpoint to the host bridge should remain the same. However, the numbers extracted and stored from HMAT is the difference for the two access classes. Split out the performance numbers for the host bridge (generic target) from the calculation of the entire path in order to allow calculation of both access classes for a CXL region. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-7-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12cxl: Split out combine_coordinates() for common shared usageDave Jiang
Refactor the common code of combining coordinates in order to reduce code. Create a new function cxl_cooordinates_combine() it combine two 'struct access_coordinate'. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-6-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-03-12ACPI: HMAT / cxl: Add retrieval of generic port coordinates for both access ↵Dave Jiang
classes Update acpi_get_genport_coordinates() to allow retrieval of both access classes of the 'struct access_coordinate' for a generic target. The update will allow CXL code to compute access coordinates for both access class. Cc: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240308220055.2172956-5-dave.jiang@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-02-16cxl: Fix sysfs export of qos_class for memdevDave Jiang
Current implementation exports only to /sys/bus/cxl/devices/.../memN/qos_class. With both ram and pmem exposed, the second registered sysfs attribute is rejected as duplicate. It's not possible to create qos_class under the dev_groups via the driver due to the ram and pmem sysfs sub-directories already created by the device sysfs groups. Move the ram and pmem qos_class to the device sysfs groups and add a call to sysfs_update() after the perf data are validated so the qos_class can be visible. The end results should be /sys/bus/cxl/devices/.../memN/ram/qos_class and /sys/bus/cxl/devices/.../memN/pmem/qos_class. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20240206190431.1810289-4-dave.jiang@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05Merge branch 'for-6.7/cxl' into for-6.8/cxlDan Williams
Pick up a late locking change + fixup that is better as merge window material than rc material.
2024-01-05cxl: Convert find_cxl_root() to return a 'struct cxl_root *'Dave Jiang
Commit 790815902ec6 ("cxl: Add support for _DSM Function for retrieving QTG ID") introduced 'struct cxl_root', however all usages have been worked indirectly through cxl_port. Refactor code such as find_cxl_root() function to use 'struct cxl_root' directly. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170449246044.3779673.13035770941393418591.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-05cxl: Introduce put_cxl_root() helperDave Jiang
Add a helper function put_cxl_root() to maintain symmetry for find_cxl_root() function instead of relying on open coding of the put_device() in order to dereference the 'struct device' that happens via get_device() in find_cxl_root(). Suggested-by: Robert Richter <rrichter@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/170449245417.3779673.4566146351673989387.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-04cxl/port: Fix missing target list lockDan Williams
cxl_port_setup_targets() modifies the ->targets[] array of a switch decoder. target_list_show() expects to be able to emit a coherent snapshot of that array by "holding" ->target_lock for read. The target_lock is held for write during initialization of the ->targets[] array, but it is not held for write during cxl_port_setup_targets(). The ->target_lock() predates the introduction of @cxl_region_rwsem. That semaphore protects changes to host-physical-address (HPA) decode which is precisely what writes to a switch decoder's target list affects. Replace ->target_lock with @cxl_region_rwsem. Now the side-effect of snapshotting a unstable view of a decoder's target list is likely benign so the Fixes: tag is presumptive. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add helper function that calculate performance data for downstream portsDave Jiang
The CDAT information from the switch, Switch Scoped Latency and Bandwidth Information Structure (SSLBIS), is parsed and stored under a cxl_dport based on the correlated downstream port id from the SSLBIS entry. Walk the entire CXL port paths and collect all the performance data. Also pick up the link latency number that's stored under the dports. The entire path PCIe bandwidth can be retrieved using the pcie_bandwidth_available() call. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319623824.2212653.10302079766473698427.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Store the access coordinates for the generic portsDave Jiang
Each CXL host bridge is represented by an ACPI0016 device. A generic port device handle that is an ACPI device is represented by a string of ACPI0016 device HID and UID. Create a device handle from the ACPI device and retrieve the access coordinates from the stored memory targets. The access coordinates are stored under the cxl_dport that is associated with the CXL host bridge. The access coordinates struct is dynamically allocated under cxl_dport in order for code later on to detect whether the data exists or not. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319623196.2212653.17916695743464172534.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Calculate and store PCI link latency for the downstream portsDave Jiang
The latency is calculated by dividing the flit size over the bandwidth. Add support to retrieve the flit size for the CXL switch device and calculate the latency of the PCIe link. Cache the latency number with cxl_dport. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add support for _DSM Function for retrieving QTG IDDave Jiang
CXL spec v3.0 9.17.3 CXL Root Device Specific Methods (_DSM) Add support to retrieve QTG ID via ACPI _DSM call. The _DSM call requires an input of an ACPI package with 4 dwords (read latency, write latency, read bandwidth, write bandwidth). The call returns a package with 1 WORD that provides the max supported QTG ID and a package that may contain 0 or more WORDs as the recommended QTG IDs in the recommended order. Create a cxl_root container for the root cxl_port and provide a callback ->get_qos_class() in order to retrieve the QoS class. For the ACPI case, the _DSM helper is used to retrieve the QTG ID and returned. A devm_cxl_add_root() function is added for root port setup and registration of the cxl_root callback operation(s). Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/170319621294.2212653.1649682083061569256.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add callback to parse the SSLBIS subtable from CDATDave Jiang
Provide a callback to parse the Switched Scoped Latency and Bandwidth Information Structure (SSLBIS) in the CDAT structures. The SSLBIS contains the bandwidth and latency information that's tied to the CXL switch that the data table has been read from. The extracted values are stored to the cxl_dport correlated by the port_id depending on the SSLBIS entry. Coherent Device Attribute Table 1.03 2.1 Switched Scoped Latency and Bandwidth Information Structure (DSLBIS) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319620635.2212653.5194389158785365150.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-12-22cxl: Add callback to parse the DSMAS subtables from CDATDave Jiang
Provide a callback function to the CDAT parser in order to parse the Device Scoped Memory Affinity Structure (DSMAS). Each DSMAS structure contains the DPA range and its associated attributes in each entry. See the CDAT specification for details. The device handle and the DPA range is saved and to be associated with the DSLBIS locality data when the DSLBIS entries are parsed. The xarray is a local variable. When the total path performance data is calculated and storred this xarray can be discarded. Coherent Device Attribute Table 1.03 2.1 Device Scoped memory Affinity Structure (DSMAS) Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/170319619355.2212653.2675953129671561293.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-31Merge branch 'for-6.7/cxl-commited' into cxl/nextDan Williams
Add the committed decoder sysfs attribute for v6.7.
2023-10-31Merge branch 'for-6.7/cxl-qtg' into cxl/nextDan Williams
Merge some prep-work for CXL QOS class support. This cycle saw large collisions with mm on this topic, so the bulk of this topic needs to wait.
2023-10-27cxl: Export QTG ids from CFMWS to sysfs as qos_class attributeDave Jiang
Export the QoS Throttling Group ID from the CXL Fixed Memory Window Structure (CFMWS) under the root decoder sysfs attributes as qos_class. CXL rev3.0 9.17.1.3 CXL Fixed Memory Window Structure (CFMWS) cxl cli will use this id to match with the _DSM retrieved id for a hot-plugged CXL memory device DPA memory range to make sure that the DPA range is under the right CFMWS window. Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/169713681699.2205276.14475306324720093079.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl: Add cxl_decoders_committed() helperDave Jiang
Add a helper to retrieve the number of decoders committed for the port. Replace all the open coding of the calculation with the helper. Link: https://lore.kernel.org/linux-cxl/651c98472dfed_ae7e729495@dwillia2-xfh.jf.intel.com.notmuch/ Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jim Harris <jim.harris@samsung.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/169747906849.272156.1729290904857372335.stgit@djiang5-mobl3 Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/core/regs: Rework cxl_map_pmu_regs() to use map->dev for devmRobert Richter
struct cxl_register_map carries a @dev parameter for devm operations. Simplify the function interface to use that instead of a separate @dev argument. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-21-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Map RCH downstream AER registers for logging protocol errorsTerry Bowman
The restricted CXL host (RCH) error handler will log protocol errors using AER and RAS status registers. The AER and RAS registers need to be virtually memory mapped before enabling interrupts. Create the initializer function devm_cxl_setup_parent_dport() for this when the endpoint is connected with the dport. The initialization sets up the RCH RAS and AER mappings. Add 'struct cxl_regs' to 'struct cxl_dport' for saving a pointer to the RCH downstream port's AER and RAS registers. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20231018171713.1883517-15-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/pci: Add RCH downstream port AER register discoveryRobert Richter
Restricted CXL host (RCH) downstream port AER information is not currently logged while in the error state. One problem preventing the error logging is the AER and RAS registers are not accessible. The CXL driver requires changes to find RCH downstream port AER and RAS registers for purpose of error logging. RCH downstream ports are not enumerated during a PCI bus scan and are instead discovered using system firmware, ACPI in this case.[1] The downstream port is implemented as a Root Complex Register Block (RCRB). The RCRB is a 4k memory block containing PCIe registers based on the PCIe root port.[2] The RCRB includes AER extended capability registers used for reporting errors. Note, the RCH's AER Capability is located in the RCRB memory space instead of PCI configuration space, thus its register access is different. Existing kernel PCIe AER functions can not be used to manage the downstream port AER capabilities and RAS registers because the port was not enumerated during PCI scan and the registers are not PCI config accessible. Discover RCH downstream port AER extended capability registers. Use MMIO accesses to search for extended AER capability in RCRB register space. [1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy [2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27cxl/port: Remove Component Register base address from struct cxl_portRobert Richter
The Component Register base address @component_reg_phys is no longer used after the rework of the Component Register setup which now uses struct member @reg_map instead. Remove the base address. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-10-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>