summaryrefslogtreecommitdiff
path: root/drivers/cxl/core/trace.h
AgeCommit message (Collapse)Author
2025-03-14Merge branch 'for-6.15/extended-linear-cache' into cxl-for-next2Dave Jiang
Add support for Extended Linear Cache for CXL. Add enumeration support of the cache. Add MCE notification of the aliased memory address.
2025-03-14cxl/pci: Add trace logging for CXL PCIe Port RAS errorsSmita Koralahalli
The CXL drivers use kernel trace functions for logging endpoint and Restricted CXL host (RCH) Downstream Port RAS errors. Similar functionality is required for CXL Root Ports, CXL Downstream Switch Ports, and CXL Upstream Switch Ports. Introduce trace logging functions for both RAS correctable and uncorrectable errors specific to CXL PCIe Ports. Use them to trace FW-First Protocol errors. Co-developed-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-3-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-02-26cxl: Add extended linear cache address alias emission for cxl eventsDave Jiang
Add the aliased address of extended linear cache when emitting event trace for poison, DRAM and general media of CXL events. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20250226162224.3633792-4-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-13cxl/events: Update Memory Module Event Record to CXL spec rev 3.1Shiju Jose
CXL spec 3.1 section 8.2.9.2.1.3 Table 8-47, Memory Module Event Record has updated with following new fields and new info for Device Event Type and Device Health Information fields. 1. Validity Flags 2. Component Identifier 3. Device Event Sub-Type Update the Memory Module event record and Memory Module trace event for the above spec changes. The new fields are inserted in logical places. Example trace print of cxl_memory_module trace event, cxl_memory_module: memdev=mem3 host=0000:0f:00.0 serial=3 log=Fatal : \ time=371709344709 uuid=fe927475-dd59-4339-a586-79bab113b774 len=128 \ flags='0x1' handle=2 related_handle=0 maint_op_class=0 \ maint_op_sub_class=0 : event_type='Temperature Change' \ event_sub_type='Unsupported Config Data' \ health_status='MAINTENANCE_NEEDED|REPLACEMENT_NEEDED' \ media_status='All Data Loss in Event of Power Loss' as_life_used=0x3 \ as_dev_temp=Normal as_cor_vol_err_cnt=Normal as_cor_per_err_cnt=Normal \ life_used=8 device_temp=3 dirty_shutdown_cnt=33 cor_vol_err_cnt=25 \ cor_per_err_cnt=45 validity_flags='COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='Resource ID' \ pldm_entity_id=0x00 pldm_resource_id=fc d2 7e 2f Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-6-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-13cxl/events: Update DRAM Event Record to CXL spec rev 3.1Shiju Jose
CXL spec 3.1 section 8.2.9.2.1.2 Table 8-46, DRAM Event Record has updated with following new fields and new types for Memory Event Type, Transaction Type and Validity Flags fields. 1. Component Identifier 2. Sub-channel 3. Advanced Programmable Corrected Memory Error Threshold Event Flags 4. Corrected Memory Error Count at Event 5. Memory Event Sub-Type Update DRAM events record and DRAM trace event for the above spec changes. The new fields are inserted in logical places. Includes trivial consistency of white space improvements. Example trace print of cxl_dram trace event, cxl_dram: memdev=mem0 host=0000:0f:00.0 serial=3 log=Informational : \ time=54799339519 uuid=601dcbb3-9c06-4eab-b8af-4e9bfb5c9624 len=128 \ flags='0x1' handle=1 related_handle=0 maint_op_class=1 \ maint_op_sub_class=3 : dpa=18680 dpa_flags='' \ descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT' type='Data Path Error' \ sub_type='Media Link CRC Error' transaction_type='Internal Media Scrub' \ channel=3 rank=17 nibble_mask=3b00b2 bank_group=7 bank=11 row=2 \ column=77 cor_mask=21 00 00 00 00 00 00 00 2c 00 00 00 00 00 00 00 37 00 \ 00 00 00 00 00 00 42 00 00 00 00 00 00 00 validity_flags='CHANNEL|RANK|NIBBLE|\ BANK GROUP|BANK|ROW|COLUMN|CORRECTION MASK|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=01 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID' pldm_entity_id=74 c5 08 9a 1a 0b \ pldm_resource_id=0x00 hpa=ffffffffffffffff region= \ region_uuid=00000000-0000-0000-0000-000000000000 sub_channel=5 \ cme_threshold_ev_flags='Corrected Memory Errors in Multiple Media Components|\ Exceeded Programmable Threshold' cvme_count=148 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250111091756.1682-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-13cxl/events: Update General Media Event Record to CXL spec rev 3.1Shiju Jose
CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has updated with following new fields and new types for Memory Event Type and Transaction Type fields. 1. Advanced Programmable Corrected Memory Error Threshold Event Flags 2. Corrected Memory Error Count at Event 3. Memory Event Sub-Type The format of component identifier has changed (CXL spec 3.1 section 8.2.9.2.1 Table 8-44). Update the general media event record and general media trace event for the above spec changes. The new fields are inserted in logical places. Example trace log of cxl_general_media trace event, cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \ time=156831237413 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \ flags='0x1' handle=1 related_handle=0 maint_op_class=2 \ maint_op_sub_class=4 : dpa=30d40 dpa_flags='' \ descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' \ type='TE State Violation' sub_type='Media Link Command Training Error' \ transaction_type='Host Inject Poison' channel=3 rank=33 device=5 \ validity_flags='CHANNEL|RANK|DEVICE|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID | Resource ID' \ pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \ hpa=ffffffffffffffff region= \ region_uuid=00000000-0000-0000-0000-000000000000 \ cme_threshold_ev_flags='Corrected Memory Errors in Multiple Media \ Components|Exceeded Programmable Threshold' cme_count=120 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-4-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-13cxl/events: Add Component Identifier formatting for CXL spec rev 3.1Shiju Jose
Add Component Identifier formatting for CXL spec rev 3.1, Section 8.2.9.2.1, Table 8-44. Examples for Component Identifier format in trace log, validity_flags='CHANNEL|RANK|DEVICE|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID | Resource ID' \ pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \ validity_flags='COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=02 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='Resource ID' \ pldm_entity_id=0x00 pldm_resource_id=fc d2 7e 2f If the validity flags for component ID/component ID format or PLDM ID or resource ID are not set, then pldm_entity_id=0x00 or pldm_resource_id=0x00 would be printed. Component identifier formatting is used in the subsequent patches. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-3-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-01-13cxl/events: Update Common Event Record to CXL spec rev 3.1Shiju Jose
CXL spec 3.1 section 8.2.9.2.1 Table 8-42, Common Event Record format has updated with Maintenance Operation Subclass information. Add updates for the above spec change in the CXL events record and CXL common trace event implementations. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://patch.msgid.link/20250111091756.1682-2-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-10-25cxl/events: Fix Trace DRAM Event RecordShiju Jose
CXL spec rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record. Fix decode memory event type field of DRAM Event Record. For e.g. if value is 0x1 it will be reported as an Invalid Address (General Media Event Record - Memory Event Type) instead of Scrub Media ECC Error (DRAM Event Record - Memory Event Type) and so on. Fixes: 2d6c1e6d60ba ("cxl/mem: Trace DRAM Event Record") Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20241014143003.1170-1-shiju.jose@huawei.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-07-11Merge branch 'for-6.11/xor_fixes' into cxl-for-nextDave Jiang
Series to fix XOR math for DPA to SPA translation - Refactor and fold cxl_trace_hpa() into cxl_dpa_to_hpa() - Complete DPA->HPA->SPA translation and correct XOR translation issue - Add new method to verify a CXL target position - Remove old method of CXL target position verifiation
2024-07-11cxl/core: Fold cxl_trace_hpa() into cxl_dpa_to_hpa()Alison Schofield
Although cxl_trace_hpa() is used to populate TRACE EVENTs with HPA addresses the work it performs is a DPA to HPA translation not a trace. Tidy up this naming by moving the minimal work done in cxl_trace_hpa() into cxl_dpa_to_hpa() and use cxl_dpa_to_hpa() for trace event callbacks. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Link: https://patch.msgid.link/452a9b0c525b774c72d9d5851515ffa928750132.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-02cxl/events: Use a common struct for DRAM and General Media eventsFabio M. De Francesco
cxl_event_common was an unfortunate naming choice and caused confusion with the existing Common Event Record. Furthermore, its fields didn't map all the common information between DRAM and General Media Events. Remove cxl_event_common and introduce cxl_event_media_hdr to record common information between DRAM and General Media events. cxl_event_media_hdr, which is embedded in both cxl_event_gen_media and cxl_event_dram, leverages the commonalities between the two events to simplify their respective handling. Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240607144423.48681-1-fabio.m.de.francesco@linux.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-05-22tracing/treewide: Remove second parameter of __assign_str()Steven Rostedt (Google)
With the rework of how the __string() handles dynamic strings where it saves off the source string in field in the helper structure[1], the assignment of that value to the trace event field is stored in the helper value and does not need to be passed in again. This means that with: __string(field, mystring) Which use to be assigned with __assign_str(field, mystring), no longer needs the second parameter and it is unused. With this, __assign_str() will now only get a single parameter. There's over 700 users of __assign_str() and because coccinelle does not handle the TRACE_EVENT() macro I ended up using the following sed script: git grep -l __assign_str | while read a ; do sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file; mv /tmp/test-file $a; done I then searched for __assign_str() that did not end with ';' as those were multi line assignments that the sed script above would fail to catch. Note, the same updates will need to be done for: __assign_str_len() __assign_rel_str() __assign_rel_str_len() I tested this with both an allmodconfig and an allyesconfig (build only for both). [1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/ Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts. Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs Tested-by: Guenter Roeck <linux@roeck-us.net>
2024-04-30cxl/core: Add region info to cxl_general_media and cxl_dram eventsAlison Schofield
User space may need to know which region, if any, maps the DPAs (device physical addresses) reported in a cxl_general_media or cxl_dram event. Since the mapping can change, the kernel provides this information at the time the event occurs. This informs user space that at event <timestamp> this <region> mapped this <DPA> to this <HPA>. Add the same region info that is included in the cxl_poison trace event: the DPA->HPA translation, region name, and region uuid. The new fields are inserted in the trace event and no existing fields are modified. If the DPA is not mapped, user will see: hpa=ULLONG_MAX, region="", and uuid=0 This work must be protected by dpa_rwsem & region_rwsem since it is looking up region mappings. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/dd8d708b7a7ebfb64a27020a5eb338091336b34d.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/region: Move cxl_trace_hpa() work to the region driverAlison Schofield
This work belongs in the region driver as it is only useful with CONFIG_CXL_REGION. Add a stub in core.h for when the region driver is not built. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/183222631f11a43c5e6debc42ec22fe1bd4b818a.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-30cxl/trace: Correct DPA field masks for general_media & dram eventsAlison Schofield
The length of Physical Address in General Media and DRAM event records is 64-bit, so the field mask for extracting the DPA should be 64-bit also, otherwise the trace event reports DPA's with the upper 32 bits of a DPA address masked off. If users do DPA-to-HPA translations this could lead to incorrect page retirement decisions. Use GENMASK_ULL() for CXL_DPA_MASK to get all the DPA address bits. Tidy up CXL_DPA_FLAGS_MASK by using GENMASK() to only mask the exact flag bits. These bits are defined as part of the event record physical address descriptions of General Media and DRAM events in CXL Spec 3.1 Section 8.2.9.2 Events. Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") Co-developed-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/2867fc43c57720a4a15a3179431829b8dbd2dc16.1714496730.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-03-18cxl/trace: Properly initialize cxl_poison region nameAlison Schofield
The TP_STRUCT__entry that gets assigned the region name, or an empty string if no region is present, is erroneously initialized to the cxl_region pointer. It needs to be properly initialized otherwise it's length is wrong and garbage chars can appear in the kernel trace output: /sys/kernel/tracing/trace The bad initialization was due in part to a naming conflict with the parameter: struct cxl_region *region. The field 'region' is already exposed externally as the region name, so changing that to something logical, like 'region_name' is not an option. Instead rename the internal only struct cxl_region to the commonly used 'cxlr'. Impact is that tooling depending on that trace data can miss picking up a valid event when searching by region name. The TP_printk() output, if enabled, does emit the correct region names in the dmesg log. This was found during testing of the cxl-list option to report media-errors for a region. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: stable@vger.kernel.org Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-03cxl/trace: Remove unnecessary memcpy'sIra Weiny
CPER events don't have UUIDs. Therefore UUIDs were removed from the records passed to trace events and replaced with hard coded values. As pointed out by Jonathan, the new defines for the UUIDs present a more efficient way to assign UUID in trace records.[1] Replace memcpy's with the use of static data. [1] https://lore.kernel.org/all/20240108132325.00000e9c@Huawei.com/ Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-01-09cxl/events: Create a CXL event unionIra Weiny
The CXL CPER and event log records share everything but a UUID/GUID in their structures. Define a cxl_event union without the UUID/GUID to be shared between the CPER and event log record formats. Adjust the code to use this union. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-6-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-09cxl/events: Remove passing a UUID to known event tracesIra Weiny
The UUID data is redundant in the known event trace types. The addition of static defines allows the trace macros to create the UUID data inside the trace thus removing unnecessary code. Have well known trace events use static data to set the uuid field based on the event type. Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-4-1bb8a4ca2c7a@intel.com Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-03cxl/trace: Pass UUID explicitly to event tracesIra Weiny
CXL CPER events are identified by the CPER Section Type GUID. The GUID correlates with the CXL UUID for the event record. It turns out that a CXL CPER record is a strict subset of the CXL event record, only the UUID header field is chopped. In order to unify handling between native and CPER flavors of CXL events, prepare the code for the UUID to be passed in rather than inferred from the record itself. Later patches update the passed in record to only refer to the common data between the formats. Pass the UUID explicitly to each trace event to be able to remove the UUID from the event structures. Originally it was desirable to remove the UUID from the well known event because the UUID value was redundant. However, the trace API was already in place.[1] Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/all/36f2d12934d64a278f2c0313cbd01abc@huawei.com [1] Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-1-1bb8a4ca2c7a@intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/memdev: Trace inject and clear poison as cxl_poison eventsAlison Schofield
The cxl_poison trace event allows users to view the history of poison list reads. With the addition of inject and clear poison capabilities, users will expect similar tracing. Add trace types 'Inject' and 'Clear' to the cxl_poison trace_event and trace successful operations only. If the driver finds that the DPA being injected or cleared of poison is mapped in a region, that region info is included in the cxl_poison trace event. Region reconfigurations can make this extra info useless if the debug operations are not carefully managed. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/e20eb7c3029137b480ece671998c183da0477e2e.1681874357.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/trace: Add an HPA to cxl_poison trace eventsAlison Schofield
When a cxl_poison trace event is reported for a region, the poisoned Device Physical Address (DPA) can be translated to a Host Physical Address (HPA) for consumption by user space. Translate and add the resulting HPA to the cxl_poison trace event. Follow the device decode logic as defined in the CXL Spec 3.0 Section 8.2.4.19.13. If no region currently maps the poison, assign ULLONG_MAX to the cxl_poison event hpa field. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/6d3cd726f9042a59902785b0a2cb3ddfb70e0219.1681838292.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-04-23cxl/trace: Add TRACE support for CXL media-error recordsAlison Schofield
CXL devices may support the retrieval of a device poison list. Add a new trace event that the CXL subsystem may use to log the media-error records returned in the poison list. Log each media-error record as a cxl_poison trace event of type 'List'. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/de6196f5269483d886ab1834744f82d27189a666.1681838291.git.alison.schofield@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-16cxl/trace: Add serial number to trace pointsIra Weiny
Device serial numbers are useful information for the user. Add device serial numbers to all the trace points. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20230208-cxl-event-names-v2-3-fca130c2c68b@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-16cxl/trace: Add host output to trace pointsIra Weiny
The host parameter of where the memdev is connected is useful information. Report host consistently in all trace points. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20230208-cxl-event-names-v2-2-fca130c2c68b@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-16cxl/trace: Standardize device information outputIra Weiny
The trace points were written to take a struct device input for the trace. In CXL multiple device objects are associated with each CXL hardware device. Using different device objects in the trace point can lead to confusion for users. The PCIe device is nice to have, but the user space tooling relies on the memory device naming. It is better to have those device names reported. Change all trace points to take struct cxl_memdev as a standard and report that name. Furthermore, standardize on the name 'memdev' in both /sys/kernel/tracing/trace and cxl-cli monitor output. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20230208-cxl-event-names-v2-1-fca130c2c68b@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-26cxl/mem: Trace Memory Module Event RecordIra Weiny
CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record. Determine if the event read is memory module record and if so trace the record. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-5-2316a5c8f7d8@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-26cxl/mem: Trace DRAM Event RecordIra Weiny
CXL rev 3.0 section 8.2.9.2.1.2 defines the DRAM Event Record. Determine if the event read is a DRAM event record and if so trace the record. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-4-2316a5c8f7d8@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-26cxl/mem: Trace General Media Event RecordIra Weiny
CXL rev 3.0 section 8.2.9.2.1.1 defines the General Media Event Record. Determine if the event read is a general media record and if so trace the record as a General Media Event Record. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-3-2316a5c8f7d8@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-26cxl/mem: Read, trace, and clear events on driver loadIra Weiny
CXL devices have multiple event logs which can be queried for CXL event records. Devices are required to support the storage of at least one event record in each event log type. Devices track event log overflow by incrementing a counter and tracking the time of the first and last overflow event seen. Software queries events via the Get Event Record mailbox command; CXL rev 3.0 section 8.2.9.2.2 and clears events via CXL rev 3.0 section 8.2.9.2.3 Clear Event Records mailbox command. If the result of negotiating CXL Error Reporting Control is OS control, read and clear all event logs on driver load. Ensure a clean slate of events by reading and clearing the events on driver load. The status register is not used because a device may continue to trigger events and the only requirement is to empty the log at least once. This allows for the required transition from empty to non-empty for interrupt generation. Handling of interrupts is in a follow on patch. The device can return up to 1MB worth of event records per query. Allocate a shared large buffer to handle the max number of records based on the mailbox payload size. This patch traces a raw event record and leaves specific event record type tracing to subsequent patches. Macros are created to aid in tracing the common CXL Event header fields. Each record is cleared explicitly. A clear all bit is specified but is only valid when the log overflows. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-1-2316a5c8f7d8@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-04cxl/pci: Move tracepoint definitions to drivers/cxl/core/Dan Williams
CXL is using tracepoints for reporting RAS capability register payloads for AER events, and has plans to use tracepoints for the output payload of Get Poison List and Get Event Records commands. For organization purposes it would be nice to keep those all under a single + local CXL trace system. This also organization also potentially helps in the future when CXL drivers expand beyond generic memory expanders, however that would also entail a move away from the expander-specific cxl_dev_state context, save that for later. Note that the powerpc-specific drivers/misc/cxl/ also defines a 'cxl' trace system, however, it is unlikely that a single platform will ever load both drivers simultaneously. Cc: Steven Rostedt <rostedt@goodmis.org> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167051869176.436579.9728373544811641087.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>