Age | Commit message (Collapse) | Author |
|
Intel Wildcat Lake is a mobile derivative of Panther Lake with one
memory controller. Wildcat Lake SoCs share the same IBECC registers
with Meteor Lake-P SoCs.
Add a compute die ID and a new configuration structure for Wildcat
Lake SoCs with In-Band ECC capability for EDAC support.
Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250704151609.7833-3-qiuxu.zhuo@intel.com
|
|
The current KERN_WARNING level message for detecting absent memory
controllers is overly dramatic. The BIOS likely had valid reasons to
disable the memory controller (e.g. it isn't connected to any DIMM
slots on the motherboard for this system). So there's nothing actually
wrong that needs to be fixed.
Reduce the log level to KERN_DEBUG to eliminate the false warning.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20250618162307.1523736-2-qiuxu.zhuo@intel.com
|
|
A kernel panic was reported with the following kernel log:
EDAC igen6: Expected 2 mcs, but only 1 detected.
BUG: unable to handle page fault for address: 000000000000d570
...
Hardware name: Notebook V54x_6x_TU/V54x_6x_TU, BIOS Dasharo (coreboot+UEFI) v0.9.0 07/17/2024
RIP: e030:ecclog_handler+0x7e/0xf0 [igen6_edac]
...
igen6_probe+0x2a0/0x343 [igen6_edac]
...
igen6_init+0xc5/0xff0 [igen6_edac]
...
This issue occurred because one memory controller was disabled by
the BIOS but the igen6_edac driver still checked all the memory
controllers, including this absent one, to identify the source of
the error. Accessing the null MMIO for the absent memory controller
resulted in the oops above.
Fix this issue by reverting the configuration structure to non-const
and updating the field 'res_cfg->num_imc' to reflect the number of
detected memory controllers.
Fixes: 20e190b1c1fd ("EDAC/igen6: Skip absent memory controllers")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Closes: https://lore.kernel.org/all/aFFN7RlXkaK_loQb@mail-itl/
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Link: https://lore.kernel.org/r/20250618162307.1523736-1-qiuxu.zhuo@intel.com
|
|
Intel Amston Lake is a series of SoCs tailored for edge computing needs.
The Amston Lake SoCs, equipped with IBECC(In-Band ECC) capability, share
the same IBECC registers with Alder Lake-N SoCs. Add the Intel Amston Lake
SoC compute die ID for EDAC support.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-4-qiuxu.zhuo@intel.com
|
|
The Intel Arizona Beach SoC series is oriented toward network computing.
Some types of these SoCs are equipped with IBECC(In-Band ECC) and share
the same IBECC registers with Alder Lake-N SoCs. Add a die ID for Arizona
Beach SoC for EDAC support.
[Tony: s/Arizona Lake/Arizona Beach/ in commit message]
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-3-qiuxu.zhuo@intel.com
|
|
Some BIOS versions may fuse off certain memory controllers and set the
registers of these absent memory controllers to ~0. The current igen6_edac
mistakenly enumerates these absent memory controllers and registers them
with the EDAC core.
Skip the absent memory controllers to avoid mistakenly enumerating them.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250408132455.489046-2-qiuxu.zhuo@intel.com
|
|
'ras/edac-misc' into edac-updates
* ras/edac-cxl:
EDAC/device: Fix dev_set_name() format string
EDAC: Update memory repair control interface for memory sparing feature
EDAC: Add a memory repair control feature
EDAC: Add a Error Check Scrub control feature
EDAC: Add scrub control feature
EDAC: Add support for EDAC device features control
* ras/edac-drivers:
EDAC/ie31200: Switch Raptor Lake-S to interrupt mode
EDAC/ie31200: Add Intel Raptor Lake-S SoCs support
EDAC/ie31200: Break up ie31200_probe1()
EDAC/ie31200: Fold the two channel loops into one loop
EDAC/ie31200: Make struct dimm_data contain decoded information
EDAC/ie31200: Make the memory controller resources configurable
EDAC/ie31200: Simplify the pci_device_id table
EDAC/ie31200: Fix the 3rd parameter name of *populate_dimm_info()
EDAC/ie31200: Fix the error path order of ie31200_init()
EDAC/ie31200: Fix the DIMM size mask for several SoCs
EDAC/ie31200: Fix the size of EDAC_MC_LAYER_CHIP_SELECT layer
EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald Rapids
EDAC/igen6: Fix the flood of invalid error reports
EDAC/ie31200: work around false positive build warning
* ras/edac-misc:
MAINTAINERS: Add a secondary maintainer for bluefield_edac
EDAC/pnd2: Make read-only const array intlv static
EDAC/igen6: Constify struct res_config
EDAC/amd64: Simplify return statement in dct_ecc_enabled()
EDAC: Use string choice helper functions
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
The res_config structs are not modified in this driver.
Constifying these structures moves some data to a read-only section, so
increase overall security, especially when the structure holds some function
pointers.
On a x86_64, with allmodconfig, as an example:
Before:
======
text data bss dec hex filename
36777 2479 4304 43560 aa28 drivers/edac/igen6_edac.o
After:
=====
text data bss dec hex filename
37297 1959 4304 43560 aa28 drivers/edac/igen6_edac.o
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/a06153870951a64b438e76adf97d440e02c1a1fc.1738355198.git.christophe.jaillet@wanadoo.fr
|
|
The ECC_ERROR_LOG register of certain SoCs may contain the invalid value
~0, which results in a flood of invalid error reports in polling mode.
Fix the flood of invalid error reports by skipping the invalid ECC error
log value ~0.
Fixes: e14232afa944 ("EDAC/igen6: Add polling support")
Reported-by: Ramses <ramses@well-founded.dev>
Closes: https://lore.kernel.org/all/OISL8Rv--F-9@well-founded.dev/
Tested-by: Ramses <ramses@well-founded.dev>
Reported-by: John <therealgraysky@proton.me>
Closes: https://lore.kernel.org/all/p5YcxOE6M3Ncxpn2-Ia_wCt61EM4LwIiN3LroQvT_-G2jMrFDSOW5k2A9D8UUzD2toGpQBN1eI0sL5dSKnkO8iteZegLoQEj-DwQaMhGx4A=@proton.me/
Tested-by: John <therealgraysky@proton.me>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250212083354.31919-1-qiuxu.zhuo@intel.com
|
|
* edac-misc:
MAINTAINERS: Change FSL DDR EDAC maintainership
RAS/AMD/ATL: Add debug prints for DF register reads
EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2
EDAC/bluefield: Fix potential integer overflow
EDAC/igen6: Add Intel Panther Lake-H SoCs support
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Some PCs with Intel N100 (with PCI device 8086:461c, DID_ADL_N_SKU4)
experienced issues with error interrupts not working, even with the
following configuration in the BIOS.
In-Band ECC Support: Enabled
In-Band ECC Operation Mode: 2 (make all requests protected and
ignore range checks)
IBECC Error Injection Control: Inject Correctable Error on insertion
counter
Error Injection Insertion Count: 251658240 (0xf000000)
Add polling mode support for these machines to ensure that memory error
events are handled.
Signed-off-by: Orange Kao <orange@aiven.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/all/20241106114024.941659-3-orange@aiven.io
|
|
Currently, igen6_edac sets edac_op_state to EDAC_OPSTATE_NMI, while the
driver also supports memory errors reported from Machine Check. Initialize
edac_op_state to the correct value according to the configuration data
that the driver probed.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20241106114024.941659-2-orange@aiven.io
|
|
The segmentation fault happens because:
During modprobe:
1. In igen6_probe(), igen6_pvt will be allocated with kzalloc()
2. In igen6_register_mci(), mci->pvt_info will point to
&igen6_pvt->imc[mc]
During rmmod:
1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info)
2. In igen6_remove(), it will kfree(igen6_pvt);
Fix this issue by setting mci->pvt_info to NULL to avoid the double
kfree.
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360
Signed-off-by: Orange Kao <orange@aiven.io>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io
|
|
Panther Lake-H SoCs share the same IBECC registers with Meteor Lake-P
SoCs. Add Panther Lake-H SoC compute die IDs for EDAC support.
Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241012071439.54165-1-qiuxu.zhuo@intel.com
|
|
The conversion of system address to physical memory address (as viewed by
the memory controller) by igen6_edac is incorrect when the system address
is above the TOM (Total amount Of populated physical Memory) for Elkhart
Lake and Ice Lake (Neural Network Processor). Fix this conversion.
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- The AMD memory controllers data fabric version 4.5 supports
non-power-of-2 denormalization in the sense that certain bits of the
system physical address cannot be reconstructed from the normalized
address reported by the RAS hardware. Add support for handling such
addresses
- Switch the EDAC drivers to the new Intel CPU model defines
- The usual fixes and cleanups all over the place
* tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC: Add missing MODULE_DESCRIPTION() macros
EDAC/dmc520: Use devm_platform_ioremap_resource()
EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support
RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA
RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization
RAS/AMD/ATL: Validate address map when information is gathered
RAS/AMD/ATL: Expand helpers for adding and removing base and hole
RAS/AMD/ATL: Read DRAM hole base early
RAS/AMD/ATL: Add amd_atl pr_fmt() prefix
RAS/AMD/ATL: Add a missing module description
EDAC, i10nm: make skx_common.o a separate module
EDAC/skx: Switch to new Intel CPU model defines
EDAC/sb_edac: Switch to new Intel CPU model defines
EDAC, pnd2: Switch to new Intel CPU model defines
EDAC/i10nm: Switch to new Intel CPU model defines
EDAC/ghes: Add missing newline to pr_info() statement
RAS/AMD/ATL: Add missing newline to pr_info() statement
EDAC/thunderx: Remove unused struct error_syndrome
|
|
Arrow Lake-U/H SoCs share same IBECC registers with Meteor Lake-P
SoCs. Add Arrow Lake-U/H SoC compute die IDs for EDAC support.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20240614030354.69180-1-qiuxu.zhuo@intel.com
|
|
errcmd_enable_error_reporting() uses pci_{read,write}_config_word()
that return PCIBIOS_* codes. The return code is then returned all the
way into the probe function igen6_probe() that returns it as is. The
probe functions, however, should return normal errnos.
Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal
errno before returning it from errcmd_enable_error_reporting().
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
|
|
Add a new Intel Alder Lake-N SoC compute die ID for EDAC support.
Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20240129062040.60809-2-qiuxu.zhuo@intel.com
|
|
Add Intel Meteor Lake-P SoC compute die IDs for EDAC support.
These Meteor Lake-P SoCs share similar IBECC registers with
Alder Lake-P SoCs.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Add Intel Meteor Lake-PS SoC compute die IDs for EDAC support.
These SoCs share similar IBECC registers with Alder Lake-P SoCs.
The only difference is that IBECC presence is detected through an
MMIO-mapped register instead of the capability register in the
PCI configuration space.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Add Intel Raptor Lake-P SoC compute die IDs for EDAC support.
These Raptor Lake-P SoCs share similar IBECC registers with Alder Lake-P
SoCs but extend the most significant bit of the error address logged
in IBECC from bit 38 to bit 45.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Add Intel Alder Lake-N SoC compute die IDs for EDAC support.
Alder Lake-N, with one memory controller, is a reduced version of
Alder Lake-P, which has two memory controllers.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Make get_mchbar() helper function to retrieve the BAR address of
the memory controller. No function changes.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Current igen6_edac checks for pending errors before the registration
of the error handler. However, there is a possibility that the error
occurs during the registration process, leading to unhandled pending
errors and no future error events. This issue can be reproduced by
repeatedly injecting errors during the loading of the igen6_edac.
Fix this issue by moving the pending error handler after the registration
of the error handler, ensuring that no pending errors are left unhandled.
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: Ee Wey Lim <ee.wey.lim@intel.com>
Tested-by: Ee Wey Lim <ee.wey.lim@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com
|
|
Return -EBUSY instead of -ENODEV just like the other EDAC drivers do.
[ bp: Rewrite text. ]
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221018082214.569504-8-justin.he@arm.com
|
|
Call ghes_get_devices() to check whether ghes_edac should be used on the
platform where it is preferred over the corresponding chipset-specific
EDAC driver.
Unlike the existing edac_get_owner() check, the ghes_get_devices() check
works independent to the module_init ordering.
[ bp: Massage. ]
Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-6-justin.he@arm.com
|
|
Alder Lake SoC shares the same memory controller and In-Band ECC
(IBECC) IP with Tiger Lake SoC. Like Tiger Lake, it also has two
memory controllers each associated one IBECC instance. The minor
differences include the MMIO offset of each memory controller and
the type of memory error address logged in the IBECC.
So add Alder Lake compute die IDs, adjust the MMIO offset for each
memory controller and handle the type of memory error address logged
in the IBECC for Alder Lake EDAC support.
Tested-by: Vrukesh V Panse <vrukesh.v.panse@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-7-tony.luck@intel.com
|
|
Tiger Lake SoC shares the same memory controller and In-Band ECC
(IBECC) IP with Elkhart Lake SoC. The main differences are that Tiger
Lake has two memory controllers each associated with one IBECC and
uses Machine Check for the memory error notification.
So add Tiger Lake compute die IDs, MCE decoding chain registration,
and memory slice decoding for Tiger Lake EDAC support.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-6-tony.luck@intel.com
|
|
The Ice Lake Neural Network Processor for Deep Learning Inference
(ICL-NNPI) SoC shares the same memory controller and In-Band ECC with
Elkhart Lake SoC. Add the ICL-NNPI compute die IDs for EDAC support.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210611170123.1057025-5-tony.luck@intel.com
|
|
Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20201123031850.GA20416@aef56166e5fc
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Add debugfs support to fake memory correctable errors to test the
error reporting path and the error address decoding logic in the
igen6_edac driver.
Please note that the fake errors are also reported to EDAC core and
then the CE counter in EDAC sysfs is also increased.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
This driver supports Intel client SoC with integrated memory controller
using In-Band ECC(IBECC). The memory correctable and uncorrectable errors
are reported via NMIs. The driver handles the NMIs and decodes the memory
error address to platform specific address. The first IBECC-supported SoC
is Elkhart Lake.
[Tony: s/#include <linux/nmi.h>/#include <asm/nmi.h>/ to fix randconfig build]
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|