Age | Commit message (Collapse) | Author |
|
We need the debugfs / driver-core fixes in here as well for testing and
to build on top of.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
For API device_for_each_child_reverse_from(..., const void *data,
int (*fn)(struct device *dev, const void *data))
- Type of @data is const pointer, and means caller's data @*data is not
allowed to be modified, but that usually is not proper for such non
finding device iterating API.
- Types for both @data and @fn are not consistent with all other
for_each device iterating APIs device_for_each_child(_reverse)(),
bus_for_each_dev() and (driver|class)_for_each_device().
Correct its prototype by removing const from parameter types, then adapt
for various existing usages.
An dedicated typedef device_iter_t will be introduced as @fn() type for
various for_each device interating APIs later.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20250105-class_fix-v6-6-3a2f1768d4d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Constify the following API:
struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
To :
struct device *device_find_child(struct device *dev, const void *data,
device_match_t match);
typedef int (*device_match_t)(struct device *dev, const void *data);
with the following reasons:
- Protect caller's match data @*data which is for comparison and lookup
and the API does not actually need to modify @*data.
- Make the API's parameters (@match)() and @data have the same type as
all of other device finding APIs (bus|class|driver)_find_device().
- All kinds of existing device match functions can be directly taken
as the API's argument, they were exported by driver core.
Constify the API and adapt for various existing usages.
BTW, various subsystem changes are squashed into this commit to meet
'git bisect' requirement, and this commit has the minimal and simplest
changes to complement squashing shortcoming, and that may bring extra
code improvement.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for drivers/pwm
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20241224-const_dfc_done-v5-4-6623037414d4@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The cxl_port_setup_targets() algorithm fails to identify valid target list
ordering in the presence of 4-way and above switches resulting in
'cxl create-region' failures of the form:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
cxl region: create_region: region0: failed to set target7 to mem0
cxl region: cmd_create_region: created 0 regions
[kernel debug message]
check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2
bus_remove_device:574: bus: 'cxl': remove device region0
QEMU can create this failing topology:
ACPI0017:00 [root0]
|
HB_0 [port1]
/ \
RP_0 RP_1
| |
USP [port2] USP [port3]
/ / \ \ / / \ \
DSP DSP DSP DSP DSP DSP DSP DSP
| | | | | | | |
mem4 mem6 mem2 mem7 mem1 mem3 mem5 mem0
Pos: 0 2 4 6 1 3 5 7
HB: Host Bridge
RP: Root Port
USP: Upstream Port
DSP: Downstream Port
...with the following command steps:
$ qemu-system-x86_64 -machine q35,cxl=on,accel=tcg \
-smp cpus=8 \
-m 8G \
-hda /home/work/vm-images/centos-stream8-02.qcow2 \
-object memory-backend-ram,size=4G,id=m0 \
-object memory-backend-ram,size=4G,id=m1 \
-object memory-backend-ram,size=2G,id=cxl-mem0 \
-object memory-backend-ram,size=2G,id=cxl-mem1 \
-object memory-backend-ram,size=2G,id=cxl-mem2 \
-object memory-backend-ram,size=2G,id=cxl-mem3 \
-object memory-backend-ram,size=2G,id=cxl-mem4 \
-object memory-backend-ram,size=2G,id=cxl-mem5 \
-object memory-backend-ram,size=2G,id=cxl-mem6 \
-object memory-backend-ram,size=2G,id=cxl-mem7 \
-numa node,memdev=m0,cpus=0-3,nodeid=0 \
-numa node,memdev=m1,cpus=4-7,nodeid=1 \
-netdev user,id=net0,hostfwd=tcp::2222-:22 \
-device virtio-net-pci,netdev=net0 \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,volatile-memdev=cxl-mem0,id=cxl-vmem0 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,volatile-memdev=cxl-mem1,id=cxl-vmem1 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,volatile-memdev=cxl-mem2,id=cxl-vmem2 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,volatile-memdev=cxl-mem3,id=cxl-vmem3 \
-device cxl-upstream,bus=root_port1,id=us1 \
-device cxl-downstream,port=4,bus=us1,id=swport4,chassis=0,slot=8 \
-device cxl-type3,bus=swport4,volatile-memdev=cxl-mem4,id=cxl-vmem4 \
-device cxl-downstream,port=5,bus=us1,id=swport5,chassis=0,slot=9 \
-device cxl-type3,bus=swport5,volatile-memdev=cxl-mem5,id=cxl-vmem5 \
-device cxl-downstream,port=6,bus=us1,id=swport6,chassis=0,slot=10 \
-device cxl-type3,bus=swport6,volatile-memdev=cxl-mem6,id=cxl-vmem6 \
-device cxl-downstream,port=7,bus=us1,id=swport7,chassis=0,slot=11 \
-device cxl-type3,bus=swport7,volatile-memdev=cxl-mem7,id=cxl-vmem7 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=32G &
In Guest OS:
$ cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
Fix the method to calculate @distance by iterativeley multiplying the
number of targets per switch port. This also follows the algorithm
recommended here [1].
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Link: http://lore.kernel.org/6538824b52349_7258329466@dwillia2-xfh.jf.intel.com.notmuch [1]
Signed-off-by: Huaisheng Ye <huaisheng.ye@intel.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
[djbw: add a comment explaining 'distance']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/173378716722.1270362.9546805175813426729.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Clean up the existing export namespace code along the same lines of
commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.
Scripted using
git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
do
awk -i inplace '
/^#define EXPORT_SYMBOL_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/^#define MODULE_IMPORT_NS/ {
gsub(/__stringify\(ns\)/, "ns");
print;
next;
}
/MODULE_IMPORT_NS/ {
$0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
}
/EXPORT_SYMBOL_NS/ {
if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
$0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
$0 !~ /^my/) {
getline line;
gsub(/[[:space:]]*\\$/, "");
gsub(/[[:space:]]/, "", line);
$0 = $0 " " line;
}
$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
"\\1(\\2, \"\\3\")", "g");
}
}
{ print }' $file;
done
Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dave Jiang:
- Constify range_contains() input parameters to prevent changes
- Add support for displaying RCD capabilities in sysfs to support lspci
for CXL device
- Downgrade warning message to debug in cxl_probe_component_regs()
- Add support for adding a printf specifier '%pra' to emit 'struct
range' content:
- Add sanity tests for 'struct resource'
- Add documentation for special case
- Add %pra for 'struct range'
- Add %pra usage in CXL code
- Add preparation code for DCD support:
- Add range_overlaps()
- Add CDAT DSMAS table shared and read only flag in ACPICA
- Add documentation to 'struct dev_dax_range'
- Delay event buffer allocation in CXL PCI code until needed
- Use guard() in cxl_dpa_set_mode()
- Refactor create region code to consolidate common code
* tag 'cxl-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Refactor common create region code
cxl/hdm: Use guard() in cxl_dpa_set_mode()
cxl/pci: Delay event buffer allocation
dax: Document struct dev_dax_range
ACPI/CDAT: Add CDAT/DSMAS shared and read only flag values
range: Add range_overlaps()
cxl/cdat: Use %pra for dpa range outputs
printf: Add print format (%pra) for struct range
Documentation/printf: struct resource add start == end special case
test printf: Add very basic struct resource tests
cxl: downgrade a warning message to debug level in cxl_probe_component_regs()
cxl/pci: Add sysfs attribute for CXL 1.1 device link status
cxl/core/regs: Add rcd_pcie_cap initialization
kernel/range: Const-ify range_contains parameters
|
|
create_pmem_region_store() and create_ram_region_store() are identical
with the exception of the region mode. With the addition of DC region
mode this would end up being 3 copies of the same code.
Refactor create_pmem_region_store() and create_ram_region_store() to use
a single common function to be used in subsequent DC code.
Suggested-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming4.li@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20241107-dcd-type2-upstream-v7-6-56a84e66bc36@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
With the recent change to allow out-of-order decoder de-commit it
highlights a need to strengthen the in-order decoder commit guarantees.
As it stands match_free_decoder() ensures that if 2 regions are racing
decoder allocations the one that wins the race will get the lower id
decoder, but that still leaves the race to *commit* the decoder.
Rather than have this complicated case of "reserved in-order, but may
still commit out-of-order", just arrange for the reservation order to
match the commit-order. In other words, prevent subsequent allocations
until the last reservation is committed.
This precludes overlapping region creation events and requires the
previous regionN to either move forward to the decoder commit stage or
drop its reservation before regionN+1 can move forward. That is,
provided that regionN and regionN+1 decode through the same switch port.
As a side effect this allows match_free_decoder() to drop its dependency
on needing write access to the device_find_child() @data parameter [1].
Reported-by: Zijun Hu <quic_zijuhu@quicinc.com>
Closes: http://lore.kernel.org/20240905-const_dfc_prepare-v4-0-4180e1d5a244@quicinc.com
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964783668.81806.14962699553881333486.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
[..]
cxld_unregister: cxl decoder14.0:
cxl_region_decode_reset: cxl_region region3:
mock_decoder_reset: cxl_port port3: decoder3.0 reset
2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
cxl_endpoint_decoder_release: cxl decoder14.0:
[..]
cxld_unregister: cxl decoder7.0:
3) cxl_region_decode_reset: cxl_region region3:
Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
[..]
RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_region_decode_reset+0x69/0x190 [cxl_core]
cxl_region_detach+0xe8/0x210 [cxl_core]
cxl_decoder_kill_region+0x27/0x40 [cxl_core]
cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.
The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
Cc: stable@vger.kernel.org
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
The current bandwidth calculation aggregates all the targets. This simple
method does not take into account where multiple targets sharing under
a switch or a root port where the aggregated bandwidth can be greater than
the upstream link of the switch.
To accurately account for the shared upstream uplink cases, a new update
function is introduced by walking from the leaves to the root of the
hierarchy and clamp the bandwidth in the process as needed. This process
is done when all the targets for a region are present but before the
final values are send to the HMAT handling code cached access_coordinate
targets.
The original perf calculation path was kept to calculate the latency
performance data that does not require the shared link consideration.
The shared upstream link calculation is done as a second pass when all
the endpoints have arrived.
Testing is done via qemu with CXL hierarchy. run_qemu[1] is modified to
support several CXL hierarchy layouts. The following layouts are tested:
HB: Host Bridge
RP: Root Port
SW: Switch
EP: End Point
2 HB 2 RP 2 EP: resulting bandwidth: 624
1 HB 2 RP 2 EP: resulting bandwidth: 624
2 HB 2 RP 2 SW 4 EP: resulting bandwidth: 624
Current testing, perf number from SRAT/HMAT is hacked into the kernel
code. However with new QEMU support of Generic Target Port that's
incoming, the perf data injection is no longer needed.
[1]: https://github.com/pmem/run_qemu
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://lore.kernel.org/linux-cxl/20240501152503.00002e60@Huawei.com/
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20240904001316.1688225-3-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
In testing Dynamic Capacity Device (DCD) support, a lockdep splat
revealed an ABBA issue between the memory notifiers and the DCD extent
processing code.[0] Changing the lock ordering within DCD proved
difficult because regions must be stable while searching for the proper
region and then the device lock must be held to properly notify the DAX
region driver of memory changes.
Dan points out in the thread that notifiers should be able to trust that
it is safe to access static data. Region data is static once the device
is realized and until it's destruction. Thus it is better to manage the
notifiers within the region driver.
Remove the need for a lock by ensuring the notifiers are active only
during the region's lifetime.
Furthermore, remove cxl_region_nid() because resource can't be NULL
while the region is stable.
Link: https://lore.kernel.org/all/66b4cf539a79b_a36e829416@iweiny-mobl.notmuch/ [0]
Cc: Ying Huang <ying.huang@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://patch.msgid.link/20240904-fix-notifiers-v3-1-576b4e950266@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
A device_lock() and device_unlock() pair can be replaced by a cleanup
helper scoped_guard() or guard(), that can enhance code readability. In
CXL subsystem, still use device_lock() and device_unlock() pairs for cxl
port resource protection, most of them can be replaced by a
scoped_guard() or a guard() simply.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Ming <ming4.li@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240830013138.2256244-2-ming4.li@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
"Core:
- A CXL maturity map has been added to the documentation to detail
the current state of CXL enabling.
It provides the status of the current state of various CXL features
to inform current and future contributors of where things are and
which areas need contribution.
- A notifier handler has been added in order for a newly created CXL
memory region to trigger the abstract distance metrics calculation.
This should bring parity for CXL memory to the same level vs
hotplugged DRAM for NUMA abstract distance calculation. The
abstract distance reflects relative performance used for memory
tiering handling.
- An addition for XOR math has been added to address the CXL DPA to
SPA translation.
CXL address translation did not support address interleave math
with XOR prior to this change.
Fixes:
- Fix to address race condition in the CXL memory hotplug notifier
- Add missing MODULE_DESCRIPTION() for CXL modules
- Fix incorrect vendor debug UUID define
Misc:
- A warning has been added to inform users of an unsupported
configuration when mixing CXL VH and RCH/RCD hierarchies
- The ENXIO error code has been replaced with EBUSY for inject poison
limit reached via debugfs and cxl-test support
- Moving the PCI config read in cxl_dvsec_rr_decode() to avoid
unnecessary PCI config reads
- A refactor to a common struct for DRAM and general media CXL
events"
* tag 'cxl-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/core/pci: Move reading of control register to immediately before usage
cxl: Remove defunct code calculating host bridge target positions
cxl/region: Verify target positions using the ordered target list
cxl: Restore XOR'd position bits during address translation
cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()
cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
cxl/acpi: Warn on mixed CXL VH and RCH/RCD Hierarchy
cxl/core: Fix incorrect vendor debug UUID define
Documentation: CXL Maturity Map
cxl/region: Simplify cxl_region_nid()
cxl/region: Support to calculate memory tier abstract distance
cxl/region: Fix a race condition in memory hotplug notifier
cxl: add missing MODULE_DESCRIPTION() macros
cxl/events: Use a common struct for DRAM and General Media events
|
|
Series to fix XOR math for DPA to SPA translation
- Refactor and fold cxl_trace_hpa() into cxl_dpa_to_hpa()
- Complete DPA->HPA->SPA translation and correct XOR translation issue
- Add new method to verify a CXL target position
- Remove old method of CXL target position verifiation
|
|
When a root decoder is configured the interleave target list is read
from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
9-22 the target list is in interleave order. The CXL driver populates
its decoder target list in the same order and stores it in 'struct
cxl_switch_decoder' field "@target: active ordered target list in
current decoder configuration"
Given the promise of an ordered list, the driver can stop duplicating
the work of BIOS and simply check target positions against the ordered
list during region configuration.
The simplified check against the ordered list is presented here.
A follow-on patch will remove the unused code.
For Modulo arithmetic this is not a fix, only a simplification.
For XOR arithmetic this is a fix for HB IW of 3,6,12.
Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/35d08d3aba08fee0f9b86ab1cef0c25116ca8a55.1719980933.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
When a device reports a DPA in events like poison, general_media,
and dram, the driver translates that DPA back to an HPA. Presently,
the CXL driver translation only considers the Modulo position and
will report the wrong HPA for XOR configured root decoders.
Add a helper function that restores the XOR'd bits during DPA->HPA
address translation. Plumb a root decoder callback to the new helper
when XOR interleave arithmetic is in use. For Modulo arithmetic, just
let the callback be NULL - as in no extra work required.
Upon completion of a DPA->HPA translation a couple of checks are
performed on the result. One simply confirms that the calculated
HPA is within the address range of the region. That test is useful
for both Modulo and XOR interleave arithmetic decodes.
A second check confirms that the HPA is within an expected chunk
based on the endpoints position in the region and the region
granularity. An XOR decode disrupts the Modulo pattern making the
chunk check useless.
To align the checks with the proper decode, pull the region range
check inline and use the helper to do the chunk check for Modulo
decodes only.
A cxl-test unit test is posted for upstream review here:
https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/
Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA
addresses the work it performs is a DPA to HPA translation not a
trace. Tidy up this naming by moving the minimal work done in
cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa()
for trace event callbacks.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Link: https://patch.msgid.link/452a9b0c525b774c72d9d5851515ffa928750132.1719980933.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The node ID of the region can be gotten via resource start address
directly. This simplifies the implementation of cxl_region_nid().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Suggested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240618084639.1419629-4-ying.huang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
An abstract distance value must be assigned by the driver that makes
the memory available to the system. It reflects relative performance
and is used to place memory nodes backed by CXL regions in the appropriate
memory tiers allowing promotion/demotion within the existing memory tiering
mechanism.
The abstract distance is calculated based on the memory access latency
and bandwidth of CXL regions.
Signed-off-by: Huang, Ying <ying.huang@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240618084639.1419629-3-ying.huang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
In the memory hotplug notifier function of the CXL region,
cxl_region_perf_attrs_callback(), the node ID is obtained by checking
the host address range of the region. However, the address range
information is not available when the region is registered in
devm_cxl_add_region(). Additionally, this information may be removed
or added under the protection of cxl_region_rwsem during runtime. If
the memory notifier is called for nodes other than that backed by the
region, a race condition may occur, potentially leading to a NULL
dereference or an invalid address range.
The race condition is addressed by checking the availability of the
address range information under the protection of cxl_region_rwsem. To
enhance code readability and use guard(), the relevant code has been
moved into a newly added function: cxl_region_nid().
Fixes: 067353a46d8c ("cxl/region: Add memory hotplug notifier for cxl region")
Signed-off-by: Huang, Ying <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Bharata B Rao <bharata@amd.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240618084639.1419629-2-ying.huang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Since interleave capability is not verified, if the interleave
capability of a target does not match the region need, committing decoder
should have failed at the device end.
In order to checkout this error as quickly as possible, driver needs
to check the interleave capability of target during attaching it to
region.
Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register),
bits 11 and 12 indicate the capability to establish interleaving in 3, 6,
12 and 16 ways. If these bits are not set, the target cannot be attached to
a region utilizing such interleave ways.
Additionally, bits 8 and 9 represent the capability of the bits used for
interleaving in the address, Linux tracks this in the cxl_port
interleave_mask.
Per CXL specification r3.1(8.2.4.20.13 Decoder Protection):
eIW means encoded Interleave Ways.
eIG means encoded Interleave Granularity.
in HPA:
if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used,
the interleave bits are none, the following check is ignored.
if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW + 8 - 1.
if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits
start at bit position eIG + 8 and end at eIG + eIW - 1.
if the interleave mask is insufficient to cover the required interleave
bits, the target cannot be attached to the region.
Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders")
Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
cxl_dpa_to_region() looks up a region based on a memdev and DPA.
It wrongly assumes an endpoint found mapping the DPA is also of
a fully assembled region. When not true it leads to a null pointer
dereference looking up the region name.
This appears during testing of region lookup after a failure to
assemble a BIOS defined region or if the lookup raced with the
assembly of the BIOS defined region.
Failure to clean up BIOS defined regions that fail assembly is an
issue in itself and a fix to that problem will alleviate some of
the impact. It will not alleviate the race condition so let's harden
this path.
The behavior change is that the kernel oops due to a null pointer
dereference is replaced with a dev_dbg() message noting that an
endpoint was mapped.
Additional comments are added so that future users of this function
can more clearly understand what it provides.
Fixes: 0a105ab28a4d ("cxl/memdev: Warn of poison inject or clear to a mapped region")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/20240604003609.202682-1-alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
When CXL subsystem is auto-assembling a pmem region during cxl
endpoint port probing, always hit below calltrace.
BUG: kernel NULL pointer dereference, address: 0000000000000078
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
RIP: 0010:cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem]
Call Trace:
<TASK>
? __die+0x24/0x70
? page_fault_oops+0x82/0x160
? do_user_addr_fault+0x65/0x6b0
? exc_page_fault+0x7d/0x170
? asm_exc_page_fault+0x26/0x30
? cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem]
? cxl_pmem_region_probe+0x1ac/0x360 [cxl_pmem]
cxl_bus_probe+0x1b/0x60 [cxl_core]
really_probe+0x173/0x410
? __pfx___device_attach_driver+0x10/0x10
__driver_probe_device+0x80/0x170
driver_probe_device+0x1e/0x90
__device_attach_driver+0x90/0x120
bus_for_each_drv+0x84/0xe0
__device_attach+0xbc/0x1f0
bus_probe_device+0x90/0xa0
device_add+0x51c/0x710
devm_cxl_add_pmem_region+0x1b5/0x380 [cxl_core]
cxl_bus_probe+0x1b/0x60 [cxl_core]
The cxl_nvd of the memdev needs to be available during the pmem region
probe. Currently the cxl_nvd is registered after the endpoint port probe.
The endpoint probe, in the case of autoassembly of regions, can cause a
pmem region probe requiring the not yet available cxl_nvd. Adjust the
sequence so this dependency is met.
This requires adding a port parameter to cxl_find_nvdimm_bridge() that
can be used to query the ancestor root port. The endpoint port is not
yet available, but will share a common ancestor with its parent, so
start the query from there instead.
Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
Co-developed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Ming <ming4.li@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://patch.msgid.link/20240612064423.2567625-1-ming4.li@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Move the mode verification to __create_region() before allocating the
memregion to avoid the memregion leaks.
Fixes: 6e099264185d ("cxl/region: Add volatile region creation support")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
A recent bugfix to cxl_pmem_region_alloc() to fix an
error-unwind-memleak [1], highlighted a use case for scope-based resource
management.
Delete the goto for releasing @cxl_region_rwsem, and return error codes
directly from error condition paths.
The caller, devm_cxl_add_pmem_region(), is no longer given @cxlr_pmem
directly it must retrieve it from @cxlr->cxlr_pmem. This retrieval from
@cxlr was already in place for @cxlr->cxl_nvb, and converting
cxl_pmem_region_alloc() to return an int makes it less awkward to handle
no_free_ptr().
Cc: Li Zhijian <lizhijian@fujitsu.com>
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Closes: http://lore.kernel.org/r/20240430174540.000039ce@Huawei.com
Link: http://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/171451430965.1147997.15782562063090960666.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Before this error path, cxlr_pmem pointed to a kzalloc() memory, free
it to avoid this memory leaking.
Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240428030748.318985-1-lizhijian@fujitsu.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
This work belongs in the region driver as it is only useful with
CONFIG_CXL_REGION. Add a stub in core.h for when the region driver
is not built.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
This helper belongs in the region driver as it is only useful
with CONFIG_CXL_REGION. Add a stub in core.h for when the region
driver is not built.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/05e30f788d62b3dd398aff2d2ea50a6aaa7c3313.1714496730.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
For the numa nodes that are not created by SRAT, no memory_target is
allocated and is not managed by the HMAT_REPORTING code. Therefore
hmat_callback() memory hotplug notifier will exit early on those NUMA
nodes. The CXL memory hotplug notifier will need to call
node_set_perf_attrs() directly in order to setup the access sysfs
attributes.
In acpi_numa_init(), the last proximity domain (pxm) id created by SRAT is
stored. Add a helper function acpi_node_backed_by_real_pxm() in order to
check if a NUMA node id is defined by SRAT or created by CFMWS.
node_set_perf_attrs() symbol is exported to allow update of perf attribs
for a node. The sysfs path of
/sys/devices/system/node/nodeX/access0/initiators/* is created by
node_set_perf_attrs() for the various attributes where nodeX is matched
to the NUMA node of the CXL region.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-13-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When the CXL region is formed, the driver computes the performance data
for the region. However this data is not available at the node data
collection that has been populated by the HMAT during kernel
initialization. Add a memory hotplug notifier to update the access
coordinates to the 'struct memory_target' context kept by the
HMAT_REPORTING code.
Add CXL_CALLBACK_PRI for a memory hotplug callback priority. Set the
priority number to be called before HMAT_CALLBACK_PRI. The CXL update must
happen before hmat_callback().
A new HMAT_REPORTING helper hmat_update_target_coordinates() is added in
order to allow CXL to update the memory_target access coordinates.
A new ext_updated member is added to the memory_target to indicate that
the access coordinates within the memory_target has been updated by an
external agent such as CXL. This prevents data being overwritten by the
hmat_update_target_attrs() triggered by hmat_callback().
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Huang, Ying <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-12-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Add read/write latencies and bandwidth sysfs attributes for the enabled CXL
region. The bandwidth is the aggregated bandwidth of all devices that
contribute to the CXL region. The latency is the worst latency of the
device amongst all the devices that contribute to the CXL region.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-11-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Calculate and store the performance data for a CXL region. Find the worst
read and write latency for all the included ranges from each of the devices
that attributes to the region and designate that as the latency data. Sum
all the read and write bandwidth data for each of the device region and
that is the total bandwidth for the region.
The perf list is expected to be constructed before the endpoint decoders
are registered and thus there should be no early reading of the entries
from the region assemble action. The calling of the region qos calculate
function is under the protection of cxl_dpa_rwsem and will ensure that
all DPA associated work has completed.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20240308220055.2172956-10-dave.jiang@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Autodiscovered regions can fail to assemble if they are not discovered
in HPA decode order. The user will see failure messages like:
[] cxl region0: endpoint5: HPA order violation region1
[] cxl region0: endpoint5: failed to allocate region reference
The check that is causing the failure helps the CXL driver enforce
a CXL spec mandate that decoders be committed in HPA order. The
check is needless for autodiscovered regions since their decoders
are already programmed. Trying to enforce order in the assembly of
these regions is useless because they are assembled once all their
member endpoints arrive, and there is no guarantee on the order in
which endpoints are discovered during probe.
Keep the existing check, but for autodiscovered regions, allow the
out of order assembly after a sanity check that the lesser numbered
decoder has the lesser HPA starting address.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/3dec69ee97524ab229a20c6739272c3000b18408.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
In preparation for adding a new caller of cxl_region_find_decoders()
teach it to find a decoder from a cxl_endpoint_decoder structure.
Combining switch and endpoint decoder lookup in one function prevents
code duplication in call sites.
Update the existing caller.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Wonjae Lee <wj28.lee@samsung.com>
Link: https://lore.kernel.org/r/79ae6d72978ef9f3ceec9722e1cb793820553c8e.1706736863.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Creating a region with 16 memory devices caused a problem. The div_u64_rem
function, used for dividing an unsigned 64-bit number by a 32-bit one,
faced an issue when SZ_256M * p->interleave_ways. The result surpassed
the maximum limit of the 32-bit divisor (4G), leading to an overflow
and a remainder of 0.
note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G
To fix this issue, I replaced the div_u64_rem function with div64_u64_rem
and adjusted the type of the remainder.
Signed-off-by: Quanquan Cao <caoqq@fujitsu.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Pick up some miscellaneous fixups for v6.8.
|
|
CXL supports x3, x6 and x12 - not x9.
Fixes: 80d10a6cee050 ("cxl/region: Add interleave geometry attributes")
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/169904271254.204936.8580772404462743630.stgit@ubuntu
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Pick up the CDAT parsing and QOS class infrastructure for v6.8.
|
|
Use "%pap" to print a resource_size_t (phys_addr_t derived type)
to prevent build warnings on 32-bit arches (seen on i386 and
riscv-32).
../drivers/cxl/core/region.c: In function 'alloc_hpa':
../drivers/cxl/core/region.c:556:25: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 5 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
556 | "HPA allocation error (%ld) for size:%#llx in %s %pr\n",
Fixes: 7984d22f1315 ("cxl/region: Add dev_dbg() detail on failure to allocate HPA space")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Fan Ni <fan.ni@samsung.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>
Link: https://lore.kernel.org/r/20240102173917.19718-1-rdunlap@infradead.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When the region driver fails while allocating HPA space for a
new region it can be because the parent resource, the CXL Window,
has no more available space.
In that case, the debug user sees this message:
cxl_core:alloc_hpa:555: cxl region2: failed to allocate HPA: -34
Expand the message like this:
cxl_core:alloc_hpa:555: cxl region8: HPA allocation error (-34) for size:0x20000000 in CXL Window 0 [mem 0xf010000000-0xf04fffffff flags 0x200]
Now the debug user can examine /proc/iomem and consider actions
like removing other allocations in that space or reducing the
size of their region request.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Link: https://lore.kernel.org/r/20231223004740.1401858-1-alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
In devm_cxl_add_region(), devm_add_action_or_reset() is called by
passing in unregister_region() with data ptr of 'cxlr'. However, in
unregister_region(), the passed in parameter is incorrectly assumed to
be a 'struct device' rather than the 'cxlr' pointer. The code has been
working because 'struct device' is the first member of 'struct
cxl_region'. Issue found by inspection. Fix the assignment so that cxlr
is pointing directly to the passed in parameter.
Not flagged for -stable since there is no functional impact of this fix.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/170258123810.952211.3907381447996426480.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A read of a device poison list is triggered via a sysfs attribute
and the results are logged as kernel trace events of type cxl_poison.
The work is managed by either: a) the region driver when one of more
regions map the device, or by b) the memdev driver when no regions
map the device.
In the case of a) the region driver holds the region_rwsem while
reading the poison by committed endpoint decoder mappings and for
any unmapped resources. This makes sure that the cxl_poison trace
event trace reports valid region info. (Region name, HPA, and UUID).
In the case of b) the memdev driver holds the dpa_rwsem preventing
new DPA resources from being attached to a region. However, it leaves
a gap between region attach and decoder commit actions. If a DPA in
the gap is in the poison list, the cxl_poison trace event will omit
the region info.
Close the gap by holding the region_rwsem and the dpa_rwsem when
reading poison per memdev. Since both methods now hold both locks,
down_read both from the caller. Doing so also addresses the lockdep
assert that found this issue:
Commit 458ba8189cb4 ("cxl: Add cxl_decoders_committed() helper")
Fixes: f0832a586396 ("cxl/region: Provide region info to the cxl_poison trace event")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/08e8e7ec9a3413b91d51de39e385653494b1eed0.1701041440.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Dan reports that cxl_decoder_commit() potentially leaks a hold of
cxl_dpa_rwsem. The potential error case is a "should not" happen
scenario, turn it into a "can not" happen scenario by adding the error
check to cxl_port_setup_targets() where other setting validation occurs.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: http://lore.kernel.org/r/63295673-5d63-4919-b851-3b06d48734c0@moroto.mountain
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Pickup some misc. CXL updates for v6.7.
|
|
Root decoder granularity must match value from CFWMS, which may not
be the region's granularity for non-interleaved root decoders.
So when calculating granularities for host bridge decoders, use the
region's granularity instead of the root decoder's granularity to ensure
the correct granularities are set for the host bridge decoders and any
downstream switch decoders.
Test configuration is 1 host bridge * 2 switches * 2 endpoints per switch.
Region created with 2048 granularity using following command line:
cxl create-region -m -d decoder0.0 -w 4 mem0 mem2 mem1 mem3 \
-g 2048 -s 2048M
Use "cxl list -PDE | grep granularity" to get a view of the granularity
set at each level of the topology.
Before this patch:
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":512,
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":512,
"interleave_granularity":256,
After:
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":4096,
"interleave_granularity":2048,
"interleave_granularity":2048,
"interleave_granularity":4096,
"interleave_granularity":2048,
Fixes: 27b3f8d13830 ("cxl/region: Program target lists")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/169824893473.1403938.16110924262989774582.stgit@bgt-140510-bm03.eng.stellus.in
[djbw: fixup the prebuilt cxl_test region]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Fix a missed "goto out" to unlock on error to cleanup this splat:
WARNING: lock held when returning to user space!
6.6.0-rc3-lizhijian+ #213 Not tainted
------------------------------------------------
cxl/673 is leaving the kernel with locks still held!
1 lock held by cxl/673:
#0: ffffffffa013b9d0 (cxl_region_rwsem){++++}-{3:3}, at: commit_store+0x7d/0x3e0 [cxl_core]
In terms of user visible impact of this bug for backports:
cxl_region_invalidate_memregion() on x86 invokes wbinvd which is a
problematic instruction for virtualized environments. So, on virtualized
x86, cxl_region_invalidate_memregion() returns an error. This failure
case got missed because CXL memory-expander device passthrough is not a
production use case, and emulation of CXL devices is typically limited
to kernel development builds with CONFIG_CXL_REGION_INVALIDATION_TEST=y,
that makes cxl_region_invalidate_memregion() succeed.
In other words, the expected exposure of this bug is limited to CXL
subsystem development environments using QEMU that neglected
CONFIG_CXL_REGION_INVALIDATION_TEST=y.
Fixes: d1257d098a5a ("cxl/region: Move cache invalidation before region teardown, and before setup")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231025085450.2514906-1-lizhijian@fujitsu.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
For auto-discovered regions the driver must assign each target to
a valid position in the region interleave set based on the decoder
topology.
The current implementation fails to parse valid decode topologies,
as it does not consider the child offset into a parent port. The sort
put all targets of one port ahead of another port when an interleave
was expected, causing the region assembly to fail.
Replace the existing relative sort with cxl_calc_interleave_pos() that
finds the exact position in a region interleave for an endpoint based
on a walk up the ancestral tree from endpoint to root decoder.
cxl_calc_interleave_pos() was introduced in a prior patch, so the work
here is to use it in cxl_region_sort_targets().
Remove the obsoleted helper functions from the prior sort.
Testing passes on pre-production hardware with BIOS defined regions
that natively trigger this autodiscovery path of the region driver.
Testing passes a CXL unit test using the dev_dbg() calculation test
(see cxl_region_attach()) across an expanded set of region configs:
1, 1, 1+1, 1+1+1, 2, 2+2, 2+2+2, 2+2+2+2, 4, 4+4, where each number
represents the count of endpoints per host bridge.
Fixes: a32320b71f08 ("cxl/region: Add region autodiscovery")
Reported-by: Dmytro Adamenko <dmytro.adamenko@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/3946cc55ddc19678733eddc9de2c317749f43f3b.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Introduce a calculation to find a target's position in a region
interleave. Perform a self-test of the calculation on user-defined
regions.
The region driver uses the kernel sort() function to put region
targets in relative order. Positions are assigned based on each
target's index in that sorted list. That relative sort doesn't
consider the offset of a port into its parent port which causes
some auto-discovered regions to fail creation. In one failure case,
a 2 + 2 config (2 host bridges each with 2 endpoints), the sort
puts all the targets of one port ahead of another port when they
were expected to be interleaved.
In preparation for repairing the autodiscovery region assembly,
introduce a new method for discovering a target position in the
region interleave.
cxl_calc_interleave_pos() adds a method to find the target position by
ascending from an endpoint to a root decoder. The calculation starts
with the endpoint's local position and position in the parent port. It
traverses towards the root decoder and examines both position and ways
in order to allow the position to be refined all the way to the root
decoder.
This calculation: position = position * parent_ways + parent_pos;
applied iteratively yields the correct position.
Include a self-test that exercises this new position calculation against
every successfully configured user-defined region.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/0ac32c75cf81dd8b86bf07d70ff139d33c2300bc.1698263080.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
match_decoder_by_range() and decoder_match_range() both determine
if an HPA range matches a decoder. The first does it for root
decoders and the second one operates on switch decoders.
Tidy these up with clear naming and make the switch helper more
like the root decoder helper in style and functionality. Make it
take the actual range, rather than an endpoint decoder from which
it extracts the range. Require an exact match on switch decoders,
because unlike a root decoder that maps an entire region, Linux
only supports 1:1 mapping of switch to endpoint decoders. Note that
root-decoders are a super-set of switch-decoders and the range they
cover is a super-set of a region, hence the use of range_contains() for
that case.
Aside from aesthetics and maintainability, this is in preparation
for reuse.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Link: https://lore.kernel.org/r/011b1f498e1758bb8df17c5951be00bd8d489e3b.1698263080.git.alison.schofield@intel.com
[djbw: fixup root decoder vs switch decoder range checks]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Commit 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in
cxl_region_attach()") tried to avoid 'eiw' initialization errors when
->nr_targets exceeded 16, by just decrementing ->nr_targets when
cxl_region_setup_targets() failed.
Commit 86987c766276 ("cxl/region: Cleanup target list on attach error")
extended that cleanup to also clear cxled->pos and p->targets[pos]. The
initialization error was incidentally fixed separately by:
Commit 8d4285425714 ("cxl/region: Fix port setup uninitialized variable
warnings") which was merged a few days after 5e42bcbc3fef.
But now the original cleanup when cxl_region_setup_targets() fails
prevents endpoint and switch decoder resources from being reused:
1) the cleanup does not set the decoder's region to NULL, which results
in future dpa_size_store() calls returning -EBUSY
2) the decoder is not properly freed, which results in future commit
errors associated with the upstream switch
Now that the initialization errors were fixed separately, the proper
cleanup for this case is to just return immediately. Then the resources
associated with this target get cleanup up as normal when the failed
region is deleted.
The ->nr_targets decrement in the error case also helped prevent
a p->targets[] array overflow, so add a new check to prevent against
that overflow.
Tested by trying to create an invalid region for a 2 switch * 2 endpoint
topology, and then following up with creating a valid region.
Fixes: 5e42bcbc3fef ("cxl/region: decrement ->nr_targets on error in cxl_region_attach()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/169703589120.1202031.14696100866518083806.stgit@bgt-140510-bm03.eng.stellus.in
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|