summaryrefslogtreecommitdiff
path: root/drivers/cxl/core/region.c
AgeCommit message (Collapse)Author
2023-06-25Merge branch 'for-6.5/cxl-rch-eh' into for-6.5/cxlDan Williams
Pick up the first half of the RCH error handling series. The back half needs some fixups for test regressions. Small conflicts with the PMU work around register enumeration and setup helpers.
2023-06-25Merge branch 'for-6.5/cxl-region-fixes' into for-6.5/cxlDan Williams
Pick up the recent fixes to how CPU caches are managed relative to region setup / teardown, and make sure that all decoders transition successfully before updating the region state from COMMIT => ACTIVE.
2023-06-25cxl/region: Manage decoder target_type at decoder-attach timeDan Williams
Switch-level (mid-level) decoders between the platform root and an endpoint can dynamically switch modes between HDM-H and HDM-D[B] depending on which region they target. Use the region type to fixup each decoder that gets allocated to map the given region. Note that endpoint decoders are meant to determine the region type, so warn if those ever need to be fixed up, but since it is possible to continue do so. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679262543.3436160.13053831955768440312.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/port: Rename CXL_DECODER_{EXPANDER, ACCELERATOR} => {HOSTONLYMEM, DEVMEM}Dan Williams
In preparation for support for HDM-D and HDM-DB configuration (device-memory, and device-memory with back-invalidate). Rename the current type designators to use HOSTONLYMEM and DEVMEM as a suffix. HDM-DB can be supported by devices that are not accelerators, so DEVMEM is a more generic term for that case. Fixup one location where this type value was open coded. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168679261369.3436160.7042443847605280593.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Fix state transitions after reset failureDan Williams
Jonathan reports that failed attempts to reset a region (teardown its HDM decoder configuration) mistakenly advance the state of the region to "not committed". Revert to the previous state of the region on reset failure so that the reset can be re-attempted. Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507968.3590522.14484000711718573626.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Flag partially torn down regions as unusableDan Williams
cxl_region_decode_reset() walks all the decoders associated with a given region and disables them. Due to decoder ordering rules it is possible that a switch in the topology notices that a given decoder can not be shutdown before another region with a higher HPA is shutdown first. That can leave the region in a partially committed state. Capture that state in a new CXL_REGION_F_NEEDS_RESET flag and require that a successful cxl_region_decode_reset() attempt must be completed before cxl_region_probe() accepts the region. This is a corollary for the bug that Jonathan identified in "CXL/region : commit reset of out of order region appears to succeed." [1]. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: http://lore.kernel.org/r/20230316171441.0000205b@Huawei.com [1] Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696507423.3590522.16254212607926684429.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl/region: Move cache invalidation before region teardown, and before setupDan Williams
Vikram raised a concern with the theoretical case of a CPU sending MemClnEvict to a device that is not prepared to receive. MemClnEvict is a message that is sent after a CPU has taken ownership of a cacheline from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder reconfiguration it is possible that the CPU is holding old contents for a new device that has taken over the physical address range being cached by the CPU. To avoid this scenario, invalidate caches prior to tearing down an HDM decoder configuration. Now, this poses another problem that it is possible for something to speculate into that space while the decode configuration is still up, so to close that gap also invalidate prior to establish new contents behind a given physical address range. With this change the cache invalidation is now explicit and need not be checked in cxl_region_probe(), and that obviates the need for CXL_REGION_F_INCOHERENT. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Fixes: d18bc74aced6 ("cxl/region: Manage CPU caches relative to DPA invalidation events") Reported-by: Vikram Sethi <vsethi@nvidia.com> Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl: Rename 'uport' to 'uport_dev'Dan Williams
For symmetry with the recent rename of ->dport_dev for a 'struct cxl_dport', add the "_dev" suffix to the ->uport property of a 'struct cxl_port'. These devices represent the downstream-port-device and upstream-port-device respectively in the CXL/PCIe topology. Signed-off-by: Terry Bowman <terry.bowman@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-6-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-25cxl: Rename member @dport of struct cxl_dport to @dport_devRobert Richter
Reading code like dport->dport does not immediately suggest that this points to the corresponding device structure of the dport. Rename struct member @dport to @dport_dev. While at it, also rename @new argument of add_dport() to @dport. This better describes the variable as a dport (e.g. new->dport becomes to dport->dport_dev). Co-developed-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20230622205523.85375-5-terry.bowman@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/region: Provide region info to the cxl_poison trace eventAlison Schofield
User space may need to know which region, if any, maps the poison address(es) logged in a cxl_poison trace event. Since the mapping of DPAs (device physical addresses) to a region can change, the kernel must provide this information at the time the poison list is read. The event informs user space that at event <timestamp> this <region> mapped to this <DPA>, which is poisoned. The cxl_poison trace event is already wired up to log the region name and uuid if it receives param 'struct cxl_region'. In order to provide that cxl_region, add another method for gathering poison - by committed endpoint decoder mappings. This method is only available with CONFIG_CXL_REGION and is only used if a region actually maps the memdev where poison is being read. After the region driver reads the poison list for all the mapped resources, poison is read for any remaining unmapped resources. The default method remains: read the poison by memdev resource. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/438b01ccaa70592539e8eda4eb2b1d617ba03160.1681838292.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-04cxl/region: Move coherence tracking into cxl_region_attach()Dan Williams
Each time the contents of a given HPA are potentially changed in a cache incoherent manner the CXL core sets CXL_REGION_F_INCOHERENT to invalidate CPU caches before the region is used. Successful invocation of attach_target() indicates that DPA has been newly assigned to a given HPA in the dynamic region creation flow. However, attach_target() is also reused in the autodiscovery flow where the region was activated by platform firmware. In that case there is no need to invalidate caches because that region is already in active use and nothing about the autodiscovery flow modifies the HPA-to-DPA relationship. In the autodiscovery case cxl_region_attach() exits early after determining the endpoint decoder is already correctly attached to the region. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168002858817.50647.1217607907088920888.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-04cxl/region: Fix region setup/teardown for RCDsDan Williams
RCDs (CXL memory devices that link train without VH capability and show up as root complex integrated endpoints), hide the presence of the link between the endpoint and the host-bridge. The CXL region setup/teardown paths assume that a link hop is present and go looking for at least one 'struct cxl_port' instance between the CXL root port-object and an endpoint port-object leading to crashes of the form: BUG: kernel NULL pointer dereference, address: 0000000000000008 [..] RIP: 0010:cxl_region_setup_targets+0x3e9/0xae0 [cxl_core] [..] Call Trace: <TASK> cxl_region_attach+0x46c/0x7a0 [cxl_core] cxl_create_region+0x20b/0x270 [cxl_core] cxl_mock_mem_probe+0x641/0x800 [cxl_mock_mem] platform_probe+0x5b/0xb0 Detect RCDs explicitly and skip walking the non-existent port hierarchy between root and endpoint in that case. While this has been a problem since: commit 0a19bfc8de93 ("cxl/port: Add RCD endpoint port enumeration") ...it becomes a more reliable crash scenario with the new autodiscovery implementation. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168002858268.50647.728091521032131326.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-04cxl/port: Fix find_cxl_root() for RCDs and simplify itDan Williams
The find_cxl_root() helper is used to lookup root decoders and other CXL platform topology information for a given endpoint. It turns out that for RCDs it has never worked. The result of find_cxl_root(&cxlmd->dev) is always NULL for the RCH topology case because it expects to find a cxl_port at the host-bridge. RCH topologies only have the root cxl_port object with the host-bridge as a dport. While there are no reports of this being a problem to date, by inspection region enumeration should crash as a result of this problem, and it does in a local unit test for this scenario. However, an observation that ever since: commit f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") ...all callers of find_cxl_root() occur after the memdev connection to the port topology has been established. That means that find_cxl_root() can be simplified to a walk of the endpoint port topology to the root. Switch to that arrangement which also fixes the RCD bug. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168002857715.50647.344876437247313909.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-14Merge branch 'for-6.3/cxl-ram-region' into cxl/nextDan Williams
Pick up some fixes from exposure of for-6.3/cxl-ram-region in linux-next.
2023-02-14cxl: avoid returning uninitialized error codeArnd Bergmann
The new cxl_add_to_region() function returns an uninitialized value on success: drivers/cxl/core/region.c:2628:6: error: variable 'rc' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (IS_ERR(cxlr)) { ^~~~~~~~~~~~ drivers/cxl/core/region.c:2654:9: note: uninitialized use occurs here return rc; Simplify the logic to have the rc variable always initialized in the same place. Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230213101220.3821689-1-arnd@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10Merge branch 'for-6.3/cxl-ram-region' into cxl/nextDan Williams
Include the support for enumerating and provisioning ram regions for v6.3. This also include a default policy change for ram / volatile device-dax instances to assign them to the dax_kmem driver by default.
2023-02-10cxl/dax: Create dax devices for CXL RAM regionsDan Williams
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the remainder and hot-added devices. Platform firmware is also responsible for identifying the platform general purpose memory pool, typically DDR attached DRAM, and arranging for the remainder to be 'Soft Reserved'. That reservation allows the CXL subsystem to route the memory to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated access (device-dax). The new 'struct cxl_dax_region' object allows for a CXL memory resource (region) to be published, but also allow for udev and module policy to act on that event. It also prevents cxl_core.ko from having a module loading dependency on any drivers/dax/ modules. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Add region autodiscoveryDan Williams
Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Move region-position validation to a helperDan Williams
In preparation for region autodiscovery, that needs all devices discovered before their relative position in the region can be determined, consolidate all position dependent validation in a helper. Recall that in the on-demand region creation flow the end-user picks the position of a given endpoint decoder in a region. In the autodiscovery case the position of an endpoint decoder can only be determined after all other endpoint decoders that claim to decode the region's address range have been enumerated and attached. So, in the autodiscovery case endpoint decoders may be attached before their relative position is known. Once all decoders arrive, then positions can be determined and validated with cxl_region_validate_position() the same as user initiated on-demand creation. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601997584.1924368.4615769326126138969.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Cleanup target list on attach errorDan Williams
Jonathan noticed that the target list setup is not unwound completely upon error. Undo all the setup in the 'err_decrement:' exit path. Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Link: http://lore.kernel.org/r/20230208123031.00006990@Huawei.com Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601996980.1924368.390423634911157277.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Refactor attach_target() for autodiscoveryDan Williams
Region autodiscovery is the process of kernel creating 'struct cxl_region' object to represent active CXL memory ranges it finds already active in hardware when the driver loads. Typically this happens when platform firmware establishes CXL memory regions and then publishes them in the memory map. However, this can also happen in the case of kexec-reboot after the kernel has created regions. In the autodiscovery case the region creation process starts with a known endpoint decoder. Refactor attach_target() into a helper that is suitable to be called from either sysfs, for runtime region creation, or from cxl_port_probe() after it has enumerated all endpoint decoders. The cxl_port_probe() context is an async device-core probing context, so it is not appropriate to allow SIGTERM to interrupt the assembly process. Refactor attach_target() to take @cxled and @state as arguments where @state indicates whether waiting from the region rwsem is interruptible or not. No behavior change is intended. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601996393.1924368.2202255054618600069.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Add volatile region creation supportDan Williams
Expand the region creation infrastructure to enable 'ram' (volatile-memory) regions. The internals of create_pmem_region_store() and create_pmem_region_show() are factored out into helpers __create_region() and __create_region_show() for the 'ram' case to reuse. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601995775.1924368.352616146815830591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Validate region mode vs decoder modeDan Williams
In preparation for a new region mode, do not, for example, allow 'ram' decoders to be assigned to 'pmem' regions and vice versa. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601995111.1924368.7459128614177994602.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Support empty uuids for non-pmem regionsDan Williams
Shipping versions of the cxl-cli utility expect all regions to have a 'uuid' attribute. In preparation for 'ram' regions, update the 'uuid' attribute to return an empty string which satisfies the current expectations of 'cxl list -R'. Otherwise, 'cxl list -R' fails in the presence of regions with the 'uuid' attribute missing. Force the attribute to be read-only as there is no facility or expectation for a 'ram' region to recall its uuid from one boot to the next. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601994558.1924368.12612811533724694444.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-10cxl/region: Add a mode attribute for regionsDan Williams
In preparation for a new region type, "ram" regions, add a mode attribute to clarify the mode of the decoders that can be added to a region. Share the internals of mode_show() (for decoders) with the region case. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/167601993930.1924368.4305018565539515665.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-07Merge branch 'for-6.3/cxl' into cxl/nextDan Williams
Merge the general CXL updates with fixes targeting v6.2-rc for v6.3. Resolve a conflict with the fix and move of cxl_report_and_clear() from pci.c to core/pci.c.
2023-02-07cxl/region: Fix passthrough-decoder detectionDan Williams
A passthrough decoder is a decoder that maps only 1 target. It is a special case because it does not impose any constraints on the interleave-math as compared to a decoder with multiple targets. Extend the passthrough case to multi-target-capable decoders that only have one target selected. I.e. the current code was only considering passthrough *ports* which are only a subset of the potential passthrough decoder scenarios. Fixes: e4f6dfa9ef75 ("cxl/region: Fix 'distance' calculation with passthrough ports") Cc: <stable@vger.kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167564540422.847146.13816934143225777888.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-06cxl/region: Fix null pointer dereference for resetting decoderFan Ni
Not all decoders have a reset callback. The CXL specification allows a host bridge with a single root port to have no explicit HDM decoders. Currently the region driver assumes there are none. As such the CXL core creates a special pass through decoder instance without a commit/reset callback. Prior to this patch, the ->reset() callback was called unconditionally when calling cxl_region_decode_reset. Thus a configuration with 1 Host Bridge, 1 Root Port, and one directly attached CXL type 3 device or multiple CXL type 3 devices attached to downstream ports of a switch can cause a null pointer dereference. Before the fix, a kernel crash was observed when we destroy the region, and a pass through decoder is reset. The issue can be reproduced as below, 1) create a region with a CXL setup which includes a HB with a single root port under which a memdev is attached directly. 2) destroy the region with cxl destroy-region regionX -f. Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Cc: <stable@vger.kernel.org> Signed-off-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Gregory Price <gregory.price@memverge.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Link: https://lore.kernel.org/r/20221215170909.2650271-1-fan.ni@samsung.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-26cxl: fix spelling mistakesRandy Dunlap
Correct spelling mistakes (reported by codespell). Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-cxl@vger.kernel.org Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/20230125032221.21277-1-rdunlap@infradead.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-25cxl/region: Clarify when a cxld->commit() callback is mandatoryDan Williams
Both cxl_switch_decoders() and cxl_endpoint_decoders() are considered by cxl_region_decode_commit(). Flag cases where cxl_switch_decoders with multiple targets, or cxl_endpoint_decoders do not have a commit callback set. The switch case is unlikely to happen since switches are only enumerated by the CXL core, but the endpoint case may support decoders defined by drivers outside of drivers/cxl, like accerator drivers. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167124081824.1626103.1555704405392757219.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-05cxl/region: Only warn about cpu_cache_invalidate_memregion() onceDavidlohr Bueso
No need for more than once per module load. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221215183836.24136-1-dave@stgolabs.net Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-08cxl/region: Fix memdev reuse checkFan Ni
Due to a typo, the check of whether or not a memdev has already been used as a target for the region (above code piece) will always be skipped. Given a memdev with more than one HDM decoder, an interleaved region can be created that maps multiple HPAs to the same DPA. According to CXL spec 3.0 8.1.3.8.4, "Aliasing (mapping more than one Host Physical Address (HPA) to a single Device Physical Address) is forbidden." Fix this by using existing iterator for memdev reuse check. Cc: <stable@vger.kernel.org> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20221107212153.745993-1-fan.ni@samsung.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl: update names for interleave ways conversion macrosDave Jiang
Change names for interleave ways macros to clearly indicate which variable is encoded and which is the actual ways value. ways == interleave ways eiw == encoded interleave ways Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167027516228.3124679.11265039496968588580.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl: update names for interleave granularity conversion macrosDave Jiang
Change names for granularity macros to clearly indicate which variable is encoded and which is the actual granularity. granularity == interleave granularity eig == encoded interleave granularity Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167027493237.3124429.8948852388671827664.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl/region: Fix spelling mistake "memergion" -> "memregion"Colin Ian King
There is a spelling mistake in a dev_warn message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20221205091819.1943564-1-colin.i.king@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05Merge branch 'for-6.2/cxl-security' into for-6.2/cxlDan Williams
Pick CXL PMEM security commands for v6.2. Resolve conflicts with the removal of the cxl_pmem_wq.
2022-12-03cxl/region: Manage CPU caches relative to DPA invalidation eventsDan Williams
A "DPA invalidation event" is any scenario where the contents of a DPA (Device Physical Address) is modified in a way that is incoherent with CPU caches, or if the HPA (Host Physical Address) to DPA association changes due to a remapping event. PMEM security events like Unlock and Passphrase Secure Erase already manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events that need cache management by the CXL core. Those only happen when the boot time CXL configuration has changed. That event occurs when userspace attaches an endpoint decoder to a region configuration, and that region is subsequently activated. The implications of not invalidating caches between remap events is that reads from the region at different points in time may return different results due to stale cached data from the previous HPA to DPA mapping. Without a guarantee that the region contents after cxl_region_probe() are written before being read (a layering-violation assumption that cxl_region_probe() can not make) the CXL subsystem needs to ensure that reads that precede writes see consistent results. A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug and unit testing of the CXL implementation in QEMU or other environments where cpu_cache_has_invalidate_memregion() returns false. This may prove too restrictive for QEMU where the HDM decoders are emulated, but in that case the CXL subsystem needs some new mechanism / indication that the HDM decoder is emulated and not a passthrough of real hardware. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166993222098.1995348.16604163596374520890.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-02cxl/region: Fix missing probe failureDan Williams
cxl_region_probe() allows for regions not in the 'commit' state to be enabled. Fail probe when the region is not committed otherwise the kernel may indicate that an address range is active when none of the decoders are active. Fixes: 8d48817df6ac ("cxl/region: Add region driver boiler plate") Cc: <stable@vger.kernel.org> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/166993220462.1995348.1698008475198427361.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-02cxl/pmem: Refactor nvdimm device registration, delete the workqueueDan Williams
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and 'struct cxl_pmem_region' manage CXL persistent memory resources. The bridge represents base platform resources, the nvdimm represents one or more endpoints, and the region is a collection of nvdimms that contribute to an assembled address range. Their relationship is such that a region is torn down if any component endpoints are removed. All regions and endpoints are torn down if the foundational bridge device goes down. A workqueue was deployed to manage these interdependencies, but it is difficult to reason about, and fragile. A recent attempt to take the CXL root device lock in the cxl_mem driver was reported by lockdep as colliding with the flush_work() in the cxl_pmem flows. Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn down immediately and hierarchically. A similar change is made to both the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both changes are made in the same patch which unfortunately makes the patch bigger than desired. Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and cxl_pmem_region as a devres release action of the bridge device. Additionally, include a devres release action of the cxl_memdev or cxl_region device that triggers the bridge's release action if an endpoint exits before the bridge. I.e. this allows either unplugging the bridge, or unplugging and endpoint to result in the same cleanup actions. To keep the patch smaller the cleanup of the now defunct workqueue infrastructure is saved for a follow-on patch. Tested-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Recycle region idsDan Williams
At region creation time the next region-id is atomically cached so that there is predictability of region device names. If that region is destroyed and then a new one is created the region id increments. That ends up looking like a memory leak, or is otherwise surprising that identifiers roll forward even after destroying all previously created regions. Try to reuse rather than free old region ids at region release time. While this fixes a cosmetic issue, the needlessly advancing memory region-id gives the appearance of a memory leak, hence the "Fixes" tag, but no "Cc: stable" tag. Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Fixes: 779dd20cfb56 ("cxl/region: Add region creation support") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752186062.947915.13200195701224993317.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Fix 'distance' calculation with passthrough portsDan Williams
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Fix cxl_region leak, cleanup targets at region deleteDan Williams
When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time. Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down. Cc: <stable@vger.kernel.org> Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752183055.947915.17681995648556534844.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04cxl/region: Fix region HPA ordering validationDan Williams
Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result: devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace: <TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90 store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453 Cc: <stable@vger.kernel.org> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166752182461.947915.497032805239915067.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-01cxl/region: Fix decoder allocation crashVishal Verma
When an intermediate port's decoders have been exhausted by existing regions, and creating a new region with the port in question in it's hierarchical path is attempted, cxl_port_attach_region() fails to find a port decoder (as would be expected), and drops into the failure / cleanup path. However, during cleanup of the region reference, a sanity check attempts to dereference the decoder, which in the above case didn't exist. This causes a NULL pointer dereference BUG. To fix this, refactor the decoder allocation and de-allocation into helper routines, and in this 'free' routine, check that the decoder, @cxld, is valid before attempting any operations on it. Cc: <stable@vger.kernel.org> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Link: https://lore.kernel.org/r/20221101074100.1732003-1-vishal.l.verma@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-10-20cxl/region: Fix null pointer dereference due to pass through decoder commitJonathan Cameron
Not all decoders have a commit callback. The CXL specification allows a host bridge with a single root port to have no explicit HDM decoders. Currently the region driver assumes there are none. As such the CXL core creates a special pass through decoder instance without a commit callback. Prior to this patch, the ->commit() callback was called unconditionally. Thus a configuration with 1 Host Bridge, 1 Root Port, 1 switch with multiple downstream ports below which there are multiple CXL type 3 devices results in a situation where committing the region causes a null pointer dereference. Reported-by: Bobo WL <lmw.bobo@gmail.com> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20220818164210.2084-1-Jonathan.Cameron@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Disallow region granularity != window granularityDan Williams
The endpoint decode granularity must be <= the window granularity otherwise capacity in the endpoints is lost in the decode. Consider an attempt to have a region granularity of 512 with 4 devices within a window that maps 2 host bridges at a granularity of 256 bytes: HPA DPA Offset HB Port EP 0x0 0x0 0 0 0 0x100 0x0 1 0 2 0x200 0x100 0 0 0 0x300 0x100 1 0 2 0x400 0x200 0 1 1 0x500 0x200 1 1 3 0x600 0x300 0 1 1 0x700 0x300 1 1 3 0x800 0x400 0 0 0 0x900 0x400 1 0 2 0xA00 0x500 0 0 0 0xB00 0x500 1 0 2 Notice how endpoint0 maps HPA 0x0 and 0x200 correctly, but then at HPA 0x800 it results in DPA 0x200-0x400 on being skipped. Fix this by restricing the region granularity to be equal to the window granularity resulting in the following for a x4 region under a x2 window at a granularity of 256. HPA DPA Offset HB Port EP 0x0 0x0 0 0 0 0x100 0x0 1 0 2 0x200 0x0 0 1 1 0x300 0x0 1 1 3 0x400 0x100 0 0 0 0x500 0x100 1 0 2 0x600 0x100 0 1 1 0x700 0x100 1 1 3 Not that it ever made practical sense to support region granularity > window granularity. The window rotates host bridges causing endpoints to never see a consecutive stream of requests at the desired granularity without breaks to issue cycles to the other host bridge. Fixes: 80d10a6cee05 ("cxl/region: Add interleave geometry attributes") Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973127171.1526540.9923273539049172976.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix x1 interleave to greater than x1 interleave routingDan Williams
In cases where the decode fans out as it traverses downstream, the interleave granularity needs to increment to identify the port selector bits out of the remaining address bits. For example, recall that with an x2 parent port intereleave (IW == 1), the downstream decode for children of those ports will either see address bit IG+8 always set, or address bit IG+8 always clear. So if the child port needs to select a downstream port it can only use address bits starting at IG+9 (where IG and IW are the CXL encoded values for interleave granularity (ilog2(ig) - 8) and ways (ilog2(iw))). When the parent port interleave is x1 no such masking occurs and the child port can maintain the granularity that was routed to the parent port. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973126583.1526540.657948655360009242.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Move HPA setup to cxl_region_attach()Dan Williams
A recent bug fix added the setup of the endpoint decoder interleave geometry settings to cxl_region_attach(). Move the HPA setup there as well to keep all endpoint decoder parameter setting in a central location. For symmetry, move endpoint HPA teardown to cxl_region_detach(), and for switches move HPA setup / teardown to cxl_port_{setup,reset}_targets(). Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/165973126020.1526540.14701949254436069807.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/region: Fix decoder interleave programmingDan Williams
Jonathan notes: "Curiously interleave ways = 1 for the EPs which is obviously wrong" ...while testing the latest CXL development branch on QEMU. It turns out the region creation process failed to program the endpoint decoders. This was missed because the default settings of x1 at 4K intereleave still results in the region appearing to function. Jonathan caught the bug by reverse mapping the translations that need to happen for the QEMU support. Link: https://lore.kernel.org/r/62e95fdf9f6e2_30440294e4@dwillia2-xfh.jf.intel.com.notmuch Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165951146336.967013.11160153960900111443.stgit@dwillia2-xfh.jf.intel.com Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-08-05cxl/regions: add padding for cxl_rr_ep_add nested listsBagas Sanjaya
Sphinx reported indentation warnings: Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:732: WARNING: Unexpected indentation. Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:733: WARNING: Block quote ends without a blank line; unexpected unindent. Documentation/driver-api/cxl/memory-devices:457: ./drivers/cxl/core/region.c:735: WARNING: Unexpected indentation. These warnings above are due to missing blank line padding in the nested list in kernel-doc comment for cxl_rr_ep_add(). Add the paddings to fix the warnings. Fixes: 384e624bb211b4 ("cxl/region: Attach endpoint decoders") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220804075448.98241-2-bagasdotme@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>