summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2025-05-09EDAC/altera: Set DDR and SDMMC interrupt mask before registrationNiravkumar L Rabara
commit 6dbe3c5418c4368e824bff6ae4889257dd544892 upstream. Mask DDR and SDMMC in probe function to avoid spurious interrupts before registration. Removed invalid register write to system manager. Fixes: 1166fde93d5b ("EDAC, altera: Add Arria10 ECC memory init functions") Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@altera.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/20250425142640.33125-3-matthew.gerlach@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-09EDAC/altera: Test the correct error reg offsetNiravkumar L Rabara
commit 4fb7b8fceb0beebbe00712c3daf49ade0386076a upstream. Test correct structure member, ecc_cecnt_offset, before using it. [ bp: Massage commit message. ] Fixes: 73bcc942f427 ("EDAC, altera: Add Arria10 EDAC support") Signed-off-by: Niravkumar L Rabara <niravkumar.l.rabara@altera.com> Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/20250425142640.33125-2-matthew.gerlach@altera.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-10EDAC/ie31200: Fix the error path order of ie31200_init()Qiuxu Zhuo
[ Upstream commit 231e341036d9988447e3b3345cf741a98139199e ] The error path order of ie31200_init() is incorrect, fix it. Fixes: 709ed1bcef12 ("EDAC/ie31200: Fallback if host bridge device is already initialized") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Gary Wang <gary.c.wang@intel.com> Link: https://lore.kernel.org/r/20250310011411.31685-4-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10EDAC/ie31200: Fix the DIMM size mask for several SoCsQiuxu Zhuo
[ Upstream commit 3427befbbca6b19fe0e37f91d66ce5221de70bf1 ] The DIMM size mask for {Sky, Kaby, Coffee} Lake is not bits{7:0}, but bits{5:0}. Fix it. Fixes: 953dee9bbd24 ("EDAC, ie31200_edac: Add Skylake support") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Gary Wang <gary.c.wang@intel.com> Link: https://lore.kernel.org/r/20250310011411.31685-3-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10EDAC/ie31200: Fix the size of EDAC_MC_LAYER_CHIP_SELECT layerQiuxu Zhuo
[ Upstream commit d59d844e319d97682c8de29b88d2d60922a683b3 ] The EDAC_MC_LAYER_CHIP_SELECT layer pertains to the rank, not the DIMM. Fix its size to reflect the number of ranks instead of the number of DIMMs. Also delete the unused macros IE31200_{DIMMS,RANKS}. Fixes: 7ee40b897d18 ("ie31200_edac: Introduce the driver") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Gary Wang <gary.c.wang@intel.com> Link: https://lore.kernel.org/r/20250310011411.31685-2-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10EDAC/{skx_common,i10nm}: Fix some missing error reports on Emerald RapidsQiuxu Zhuo
[ Upstream commit d9207cf7760f5f5599e9ff7eb0fedf56821a1d59 ] When doing error injection to some memory DIMMs on certain Intel Emerald Rapids servers, the i10nm_edac missed error reports for some memory DIMMs. Certain BIOS configurations may hide some memory controllers, and the i10nm_edac doesn't enumerate these hidden memory controllers. However, the ADXL decodes memory errors using memory controller physical indices even if there are hidden memory controllers. Therefore, the memory controller physical indices reported by the ADXL may mismatch the logical indices enumerated by the i10nm_edac, resulting in missed error reports for some memory DIMMs. Fix this issue by creating a mapping table from memory controller physical indices (used by the ADXL) to logical indices (used by the i10nm_edac) and using it to convert the physical indices to the logical indices during the error handling process. Fixes: c545f5e41225 ("EDAC/i10nm: Skip the absent memory controllers") Reported-by: Kevin Chang <kevin1.chang@intel.com> Tested-by: Kevin Chang <kevin1.chang@intel.com> Reported-by: Thomas Chen <Thomas.Chen@intel.com> Tested-by: Thomas Chen <Thomas.Chen@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20250214002728.6287-1-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27EDAC/qcom: Correct interrupt enable register configurationKomal Bajaj
commit c158647c107358bf1be579f98e4bb705c1953292 upstream. The previous implementation incorrectly configured the cmn_interrupt_2_enable register for interrupt handling. Using cmn_interrupt_2_enable to configure Tag, Data RAM ECC interrupts would lead to issues like double handling of the interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured for interrupts which needs to be handled by EL3. EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable. Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Signed-off-by: Komal Bajaj <quic_kbajaj@quicinc.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20241119064608.12326-1-quic_kbajaj@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-27EDAC/amd64: Simplify ECC check on unified memory controllersBorislav Petkov (AMD)
commit 747367340ca6b5070728b86ae36ad6747f66b2fb upstream. The intent of the check is to see whether at least one UMC has ECC enabled. So do that instead of tracking which ones are enabled in masks which are too small in size anyway and lead to not loading the driver on Zen4 machines with UMCs enabled over UMC8. Fixes: e2be5955a886 ("EDAC/amd64: Add support for AMD Family 19h Models 10h-1Fh and A0h-AFh") Reported-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Avadhut Naik <avadhut.naik@amd.com> Reviewed-by: Avadhut Naik <avadhut.naik@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20241210212054.3895697-1-avadhut.naik@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-05EDAC/igen6: Avoid segmentation fault on module unloadOrange Kao
[ Upstream commit fefaae90398d38a1100ccd73b46ab55ff4610fba ] The segmentation fault happens because: During modprobe: 1. In igen6_probe(), igen6_pvt will be allocated with kzalloc() 2. In igen6_register_mci(), mci->pvt_info will point to &igen6_pvt->imc[mc] During rmmod: 1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info) 2. In igen6_remove(), it will kfree(igen6_pvt); Fix this issue by setting mci->pvt_info to NULL to avoid the double kfree. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219360 Signed-off-by: Orange Kao <orange@aiven.io> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20241104124237.124109-2-orange@aiven.io Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05EDAC/{skx_common,i10nm}: Fix incorrect far-memory error source indicatorQiuxu Zhuo
[ Upstream commit a36667037a0c0e36c59407f8ae636295390239a5 ] The Granite Rapids CPUs with Flat2LM memory configurations may mistakenly report near-memory errors as far-memory errors, resulting in the invalid decoded ADXL results: EDAC skx: Bad imc -1 Fix this incorrect far-memory error source indicator by prefetching the decoded far-memory controller ID, and adjust the error source indicator to near-memory if the far-memory controller ID is invalid. Fixes: ba987eaaabf9 ("EDAC/i10nm: Add Intel Granite Rapids server support") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Link: https://lore.kernel.org/r/20241015072236.24543-3-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05EDAC/skx_common: Differentiate memory error sourcesQiuxu Zhuo
[ Upstream commit 2397f795735219caa9c2fe61e7bcdd0652e670d3 ] The current skx_common determines whether the memory error source is the near memory of the 2LM system and then retrieves the decoded error results from the ADXL components (near-memory vs. far-memory) accordingly. However, some memory controllers may have limitations in correctly reporting the memory error source, leading to the retrieval of incorrect decoded parts from the ADXL. To address these limitations, instead of simply determining whether the memory error is from the near memory of the 2LM system, it is necessary to distinguish the memory error source details as follows: Memory error from the near memory of the 2LM system. Memory error from the far memory of the 2LM system. Memory error from the 1LM system. Not a memory error. This will enable the i10nm_edac driver to take appropriate actions for those memory controllers that have limitations in reporting the memory error source. Fixes: ba987eaaabf9 ("EDAC/i10nm: Add Intel Granite Rapids server support") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Link: https://lore.kernel.org/r/20241015072236.24543-2-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05EDAC/fsl_ddr: Fix bad bit shift operationsPriyanka Singh
[ Upstream commit 9ec22ac4fe766c6abba845290d5139a3fbe0153b ] Fix undefined behavior caused by left-shifting a negative value in the expression: cap_high ^ (1 << (bad_data_bit - 32)) The variable bad_data_bit ranges from 0 to 63. When it is less than 32, bad_data_bit - 32 becomes negative, and left-shifting by a negative value in C is undefined behavior. Fix this by combining cap_high and cap_low into a 64-bit variable. [ bp: Massage commit message, simplify error bits handling. ] Fixes: ea2eb9a8b620 ("EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx") Signed-off-by: Priyanka Singh <priyanka.singh@nxp.com> Signed-off-by: Li Yang <leoyang.li@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241016-imx95_edac-v3-3-86ae6fc2756a@nxp.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05EDAC/bluefield: Fix potential integer overflowDavid Thompson
[ Upstream commit 1fe774a93b46bb029b8f6fa9d1f25affa53f06c6 ] The 64-bit argument for the "get DIMM info" SMC call consists of mem_ctrl_idx left-shifted 16 bits and OR-ed with DIMM index. With mem_ctrl_idx defined as 32-bits wide the left-shift operation truncates the upper 16 bits of information during the calculation of the SMC argument. The mem_ctrl_idx stack variable must be defined as 64-bits wide to prevent any potential integer overflow, i.e. loss of data from upper 16 bits. Fixes: 82413e562ea6 ("EDAC, mellanox: Add ECC support for BlueField DDR4") Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com> Link: https://lore.kernel.org/r/20240930151056.10158-1-davthompson@nvidia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-05EDAC/qcom: Make irq configuration optionalRajendra Nayak
On most modern qualcomm SoCs, the configuration necessary to enable the Tag/Data RAM related irqs being propagated to the SoC irq controller is already done in firmware (in DSF or 'DDR System Firmware') On some like the x1e80100, these registers aren't even accesible to the kernel causing a crash when edac device is probed. Hence, make the irq configuration optional in the driver and mark x1e80100 as the SoC on which this should be avoided. Fixes: af16b00578a7 ("arm64: dts: qcom: Add base X1E80100 dtsi and the QCP dts") Reported-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Rajendra Nayak <quic_rjendra@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20240903101510.3452734-1-quic_rjendra@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-09-16Merge tag 'edac_updates_for_v6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Drop a now obsolete ppc4xx_edac driver - Fix conversion to physical memory addresses on Intel's Elkhart Lake and Ice Lake hardware when the system address is above the (Top-Of-Memory) TOM address - Pay attention to the memory hole on Zynq UltraScale+ MPSoC DDR controllers when injecting errors for testing purposes - Add support for translating normalized error addresses reported by an AMD memory controller into system physical addresses using an UEFI mechanism called platform runtime mechanism (PRM). - The usual cleanups and fixes * tag 'edac_updates_for_v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Drop obsolete PPC4xx driver EDAC/sb_edac: Fix the compile warning of large frame size EDAC/{skx_common,i10nm}: Remove the AMAP register for determing DDR5 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common EDAC/igen6: Fix conversion of system address to physical memory address EDAC/synopsys: Fix error injection on Zynq UltraScale+ RAS/AMD/ATL: Translate normalized to system physical addresses using PRM ACPI: PRM: Add PRM handler direct call support
2024-09-09Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-misc' and ↵Borislav Petkov (AMD)
'ras/edac-drivers' into edac-updates * ras/edac-amd-atl: RAS/AMD/ATL: Translate normalized to system physical addresses using PRM ACPI: PRM: Add PRM handler direct call support * ras/edac-misc: EDAC/synopsys: Fix error injection on Zynq UltraScale+ * ras/edac-drivers: EDAC: Drop obsolete PPC4xx driver EDAC/sb_edac: Fix the compile warning of large frame size EDAC/{skx_common,i10nm}: Remove the AMAP register for determing DDR5 EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_common EDAC/igen6: Fix conversion of system address to physical memory address Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-09-05EDAC: Drop obsolete PPC4xx driverRob Herring (Arm)
Since 47d13a269bbd ("powerpc/40x: Remove 40x platforms.") support for PPC40x platforms has been removed. While the EDAC driver also mentions PPC440 and PPC460 processors, the driver refuses to probe on anything other than PPC405. It's unlikely support will ever be added at this point for these other old platforms, so the driver can be removed. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20240904192224.3060307-2-robh@kernel.org
2024-09-03EDAC/sb_edac: Fix the compile warning of large frame sizeQiuxu Zhuo
Compiling sb_edac driver with GCC 11.4.0 and the W=1 option reported the following warning: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/sb_edac.c:3249:1: warning: the frame size of 1032 bytes is larger than 1024 bytes [-Wframe-larger-than=] As there is no concurrent invocation of sbridge_mce_output_error(), fix this warning by moving the large-size variables 'msg' and 'msg_full' from the stack to the pre-allocated data segment. [Tony: Fix checkpatch warnings for code alignment & use of strcpy()] Reported-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20240829120903.84152-1-qiuxu.zhuo@intel.com
2024-09-03EDAC/{skx_common,i10nm}: Remove the AMAP register for determing DDR5Qiuxu Zhuo
The configuration flag 'res_config->support_ddr5 = true' sufficiently indicates DDR5 memory support for Sapphire Rapids and Granite Rapids. Additionally, the i10nm_edac driver doesn't need to use the AMAP register for setting the 'fine_grain_bank' of each DIMM. Therefore, remove the AMAP register for determining DDR5. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20240829061309.57738-1-qiuxu.zhuo@intel.com
2024-09-03EDAC/{skx_common,skx,i10nm}: Move the common debug code to skx_commonQiuxu Zhuo
Commit afdb82fd763c ("EDAC, i10nm: make skx_common.o a separate module") made skx_common.o a separate module. With skx_common.o now a separate module, move the common debug code setup_{skx,i10nm}_debug() and teardown_{skx,i10nm}_debug() in {skx,i10nm}_base.c to skx_common.c to reduce code duplication. Additionally, prefix these function names with 'skx' to maintain consistency with other names in the file. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20240829055101.56245-1-qiuxu.zhuo@intel.com
2024-09-03EDAC/igen6: Fix conversion of system address to physical memory addressQiuxu Zhuo
The conversion of system address to physical memory address (as viewed by the memory controller) by igen6_edac is incorrect when the system address is above the TOM (Total amount Of populated physical Memory) for Elkhart Lake and Ice Lake (Neural Network Processor). Fix this conversion. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/stable/20240814061011.43545-1-qiuxu.zhuo%40intel.com
2024-08-01EDAC/synopsys: Fix error injection on Zynq UltraScale+Shubhrajyoti Datta
The Zynq UltraScale+ MPSoC DDR has a disjoint memory from 2GB to 32GB. The DDR host interface has a contiguous memory so while injecting errors, the driver should remove the hole else the injection fails as the address translation is incorrect. Introduce a get_mem_info() function pointer and set it for Zynq UltraScale+ platform to return host address. Fixes: 1a81361f75d8 ("EDAC, synopsys: Add Error Injection support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240711100656.31376-1-shubhrajyoti.datta@amd.com
2024-07-28minmax: make generic MIN() and MAX() macros available everywhereLinus Torvalds
This just standardizes the use of MIN() and MAX() macros, with the very traditional semantics. The goal is to use these for C constant expressions and for top-level / static initializers, and so be able to simplify the min()/max() macros. These macro names were used by various kernel code - they are very traditional, after all - and all such users have been fixed up, with a few different approaches: - trivial duplicated macro definitions have been removed Note that 'trivial' here means that it's obviously kernel code that already included all the major kernel headers, and thus gets the new generic MIN/MAX macros automatically. - non-trivial duplicated macro definitions are guarded with #ifndef This is the "yes, they define their own versions, but no, the include situation is not entirely obvious, and maybe they don't get the generic version automatically" case. - strange use case #1 A couple of drivers decided that the way they want to describe their versioning is with #define MAJ 1 #define MIN 2 #define DRV_VERSION __stringify(MAJ) "." __stringify(MIN) which adds zero value and I just did my Alexander the Great impersonation, and rewrote that pointless Gordian knot as #define DRV_VERSION "1.2" instead. - strange use case #2 A couple of drivers thought that it's a good idea to have a random 'MIN' or 'MAX' define for a value or index into a table, rather than the traditional macro that takes arguments. These values were re-written as C enum's instead. The new function-line macros only expand when followed by an open parenthesis, and thus don't clash with enum use. Happily, there weren't really all that many of these cases, and a lot of users already had the pattern of using '#ifndef' guarding (or in one case just using '#undef MIN') before defining their own private version that does the same thing. I left such cases alone. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28minmax: add a few more MIN_T/MAX_T usersLinus Torvalds
Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-15Merge tag 'x86_misc_for_v6.11_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Borislav Petkov: - Make error checking of AMD SMN accesses more robust in the callers as they're the only ones who can interpret the results properly - The usual cleanups and fixes, left and right * tag 'x86_misc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kmsan: Fix hook for unaligned accesses x86/platform/iosf_mbi: Convert PCIBIOS_* return codes to errnos x86/pci/xen: Fix PCIBIOS_* return code handling x86/pci/intel_mid_pci: Fix PCIBIOS_* return code handling x86/of: Return consistent error type from x86_of_pci_irq_enable() hwmon: (k10temp) Rename _data variable hwmon: (k10temp) Remove unused HAVE_TDIE() macro hwmon: (k10temp) Reduce k10temp_get_ccd_support() parameters hwmon: (k10temp) Define a helper function to read CCD temperature x86/amd_nb: Enhance SMN access error checking hwmon: (k10temp) Check return value of amd_smn_read() EDAC/amd64: Check return value of amd_smn_read() EDAC/amd64: Remove unused register accesses tools/x86/kcpuid: Add missing dir via Makefile x86, arm: Add missing license tag to syscall tables files
2024-07-15Merge tag 'edac_updates_for_v6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - The AMD memory controllers data fabric version 4.5 supports non-power-of-2 denormalization in the sense that certain bits of the system physical address cannot be reconstructed from the normalized address reported by the RAS hardware. Add support for handling such addresses - Switch the EDAC drivers to the new Intel CPU model defines - The usual fixes and cleanups all over the place * tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Add missing MODULE_DESCRIPTION() macros EDAC/dmc520: Use devm_platform_ioremap_resource() EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization RAS/AMD/ATL: Validate address map when information is gathered RAS/AMD/ATL: Expand helpers for adding and removing base and hole RAS/AMD/ATL: Read DRAM hole base early RAS/AMD/ATL: Add amd_atl pr_fmt() prefix RAS/AMD/ATL: Add a missing module description EDAC, i10nm: make skx_common.o a separate module EDAC/skx: Switch to new Intel CPU model defines EDAC/sb_edac: Switch to new Intel CPU model defines EDAC, pnd2: Switch to new Intel CPU model defines EDAC/i10nm: Switch to new Intel CPU model defines EDAC/ghes: Add missing newline to pr_info() statement RAS/AMD/ATL: Add missing newline to pr_info() statement EDAC/thunderx: Remove unused struct error_syndrome
2024-06-29EDAC: Add missing MODULE_DESCRIPTION() macrosJeff Johnson
With ARCH=arm64 make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/edac/layerscape_edac_mod.o Add the missing invocation of the MODULE_DESCRIPTION() macro to all files which have a MODULE_LICENSE(). This includes mpc85xx_edac.c and four octeon_edac-*.c files which, although they did not produce a warning with the arm64 allmodconfig configuration, may cause this warning with other configurations. [ bp: s/module/driver/ for layerscape_edac ] Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240617-md-arm64-drivers-edac-v2-1-6d6c5dd1e5da@quicinc.com
2024-06-23EDAC/dmc520: Use devm_platform_ioremap_resource()Jai Arora
platform_get_resource() and devm_ioremap_resource() are wrapped up in the devm_platform_ioremap_resource() helper. Use the helper and get rid of the local variable for struct resource *. Signed-off-by: Jai Arora <jai.arora@samsung.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240618110226.97395-1-jai.arora@samsung.com
2024-06-14EDAC/igen6: Add Intel Arrow Lake-U/H SoCs supportQiuxu Zhuo
Arrow Lake-U/H SoCs share same IBECC registers with Meteor Lake-P SoCs. Add Arrow Lake-U/H SoC compute die IDs for EDAC support. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240614030354.69180-1-qiuxu.zhuo@intel.com
2024-06-12EDAC/amd64: Check return value of amd_smn_read()Yazen Ghannam
Check the return value of amd_smn_read() before saving a value. This ensures invalid values aren't saved. The struct umc instance is initialized to 0 during memory allocation. Therefore, a bad read will keep the value as 0 providing the expected Read-as-Zero behavior. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20240606-fix-smn-bad-read-v4-2-ffde21931c3f@amd.com
2024-06-12EDAC/amd64: Remove unused register accessesYazen Ghannam
A number of UMC registers are read only for the purpose of debug printing. They are not used in any calculations. Nor do they have any specific debug value. Remove them. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20240606-fix-smn-bad-read-v4-1-ffde21931c3f@amd.com
2024-06-04EDAC/igen6: Convert PCIBIOS_* return codes to errnosIlpo Järvinen
errcmd_enable_error_reporting() uses pci_{read,write}_config_word() that return PCIBIOS_* codes. The return code is then returned all the way into the probe function igen6_probe() that returns it as is. The probe functions, however, should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning it from errcmd_enable_error_reporting(). Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com
2024-06-04EDAC/amd64: Convert PCIBIOS_* return codes to errnosIlpo Järvinen
gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_* codes. The return code is then returned all the way into the module init function amd64_edac_init() that returns it as is. The module init functions, however, should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning it from gpu_get_node_map(). For consistency, convert also the other similar cases which return PCIBIOS_* codes even if they do not have any bugs at the moment. Fixes: 4251566ebc1c ("EDAC/amd64: Cache and use GPU node map") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527132236.13875-1-ilpo.jarvinen@linux.intel.com
2024-05-29EDAC, i10nm: make skx_common.o a separate moduleArnd Bergmann
Commit 598afa050403 ("kbuild: warn objects shared among multiple modules") was added to track down cases where the same object is linked into multiple modules. This can cause serious problems if some modules are builtin while others are not. That test triggers this warning: scripts/Makefile.build:236: drivers/edac/Makefile: skx_common.o is added to multiple modules: i10nm_edac skx_edac Make this a separate module instead. [Tony: Added more background details to commit message] Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20240529095132.1929397-1-arnd@kernel.org/
2024-05-28EDAC/skx: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240520224620.9480-39-tony.luck@intel.com
2024-05-28EDAC/sb_edac: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240520224620.9480-38-tony.luck@intel.com
2024-05-28EDAC, pnd2: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240520224620.9480-37-tony.luck@intel.com
2024-05-28EDAC/i10nm: Switch to new Intel CPU model definesTony Luck
New CPU #defines encode vendor and family as well as model. Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240520224620.9480-36-tony.luck@intel.com
2024-05-28EDAC/ghes: Add missing newline to pr_info() statementVasyl Gomonovych
Add a missing newline character even if printk() adds newlines to non-\n-terminated strings because in the unlikely case a KERN_CONT print statement is added after the unterminated statement, the two will get glued together which is not the expected behavior. [ bp: Rewrite commit message. ] Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240517204951.2019031-1-gomonovych@gmail.com
2024-05-27EDAC/thunderx: Remove unused struct error_syndromeDr. David Alan Gilbert
struct error_syndrome appears never to have been used. Remove it, together with the MAX_SYNDROME_REGS it used. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240516133404.251397-1-linux@treblig.org
2024-05-14Merge tag 'edac_updates_for_v6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Have skx_edac decode error addresses belonging to SGX properly - Remove a bunch of unused struct members - Other cleanups * tag 'edac_updates_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/skx_common: Allow decoding of SGX addresses EDAC/mc_sysfs: Convert sprintf()/snprintf() to sysfs_emit() EDAC: Remove unused struct members EDAC: Remove dynamic attributes from edac_device_alloc_ctl_info() EDAC/device: Remove edac_dev_sysfs_block_attribute::store() EDAC/device: Remove edac_dev_sysfs_block_attribute::{block,value} EDAC/amd64: Remove unused struct member amd64_pvt::ext_nbcfg
2024-05-06EDAC/synopsys: Fix ECC status and IRQ control race conditionSerge Semin
The race condition around the ECCCLR register access happens in the IRQ disable method called in the device remove() procedure and in the ECC IRQ handler: 1. Enable IRQ: a. ECCCLR = EN_CE | EN_UE 2. Disable IRQ: a. ECCCLR = 0 3. IRQ handler: a. ECCCLR = CLR_CE | CLR_CE_CNT | CLR_CE | CLR_CE_CNT b. ECCCLR = 0 c. ECCCLR = EN_CE | EN_UE So if the IRQ disabling procedure is called concurrently with the IRQ handler method the IRQ might be actually left enabled due to the statement 3c. The root cause of the problem is that ECCCLR register (which since v3.10a has been called as ECCCTL) has intermixed ECC status data clear flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and handling (write 1 to clear ECC status data) procedures must be serialised around the ECCCTL register modification to prevent the race. So fix the problem described above by adding the spin-lock around the ECCCLR modifications and preventing the IRQ-handler from modifying the IRQs enable flags (there is no point in disabling the IRQ and then re-enabling it again within a single IRQ handler call, see the statements 3a/3b and 3c above). Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com
2024-04-25EDAC/versal: Do not log total error countsShubhrajyoti Datta
When logging errors, the driver currently logs the total error count. However, it should log the current error only. Fix it. [ bp: Rewrite text. ] Fixes: 6f15b178cd63 ("EDAC/versal: Add a Xilinx Versal memory controller driver") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240425121942.26378-4-shubhrajyoti.datta@amd.com
2024-04-25EDAC/versal: Check user-supplied data before injecting an errorShubhrajyoti Datta
The function inject_data_ue_store() lacks a NULL check for the user passed values. To prevent below kernel crash include a NULL check. Call trace: kstrtoull kstrtou8 inject_data_ue_store full_proxy_write vfs_write ksys_write __arm64_sys_write invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Fixes: 83bf24051a60 ("EDAC/versal: Make the bit position of injected errors configurable") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240425121942.26378-3-shubhrajyoti.datta@amd.com
2024-04-25EDAC/versal: Do not register for NOC errorsShubhrajyoti Datta
The NOC errors are not handled in the driver. Remove the request for registration. Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240425121942.26378-2-shubhrajyoti.datta@amd.com
2024-04-08EDAC/skx_common: Allow decoding of SGX addressesQiuxu Zhuo
There are no "struct page" associations with SGX pages, causing the check pfn_to_online_page() to fail. This results in the inability to decode the SGX addresses and warning messages like: Invalid address 0x34cc9a98840 in IA32_MC17_ADDR Add an additional check to allow the decoding of the error address and to skip the warning message, if the error address is an SGX address. Fixes: 1e92af09fab1 ("EDAC/skx_common: Filter out the invalid address") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240408120419.50234-1-qiuxu.zhuo@intel.com
2024-04-04EDAC/mc_sysfs: Convert sprintf()/snprintf() to sysfs_emit()Li Zhijian
Per Documentation/filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Generated by: make coccicheck M=<path/to/file> MODE=patch \ COCCI=scripts/coccinelle/api/device_attr_show.cocci No functional change intended. [ bp: Massage. ] Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240314084628.1322006-1-lizhijian@fujitsu.com
2024-03-27EDAC: Remove unused struct membersJiri Slaby (SUSE)
Remove unused - edac_pci_ctl_info::edac_subsys - edac_pci_ctl_info::complete - edac_device_ctl_info::removal_complete members. Found by https://github.com/jirislaby/clang-struct. [ bp: Squash three almost identical trivial patches into one. ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-6-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27EDAC: Remove dynamic attributes from edac_device_alloc_ctl_info()Jiri Slaby (SUSE)
Dynamic attributes are not passed from any caller of edac_device_alloc_ctl_info(). Drop this unused/untested functionality completely. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-5-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-03-27EDAC/device: Remove edac_dev_sysfs_block_attribute::store()Jiri Slaby (SUSE)
No one uses this store hook (both BLOCK_ATTR() pass NULL). It actually never was since its addition in fd309a9d8e63 ("drivers/edac: fix leaf sysfs attribute") so drop it. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-4-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>