summaryrefslogtreecommitdiff
path: root/drivers/thermal/intel
AgeCommit message (Collapse)Author
2023-02-02thermal: intel: intel_pch: Rename device operations callbacksRafael J. Wysocki
Because the same device operations callbacks are used for all supported boards, they are in fact generic, so rename them to reflect that. Also rename the operations object itself for consistency. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: intel: intel_pch: Eliminate redundant return pointersRafael J. Wysocki
Both pch_wpt_init() and pch_wpt_get_temp() can return the proper result via their return values, so they do not need to use return pointers. Modify them accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: intel: intel_pch: Make pch_wpt_add_acpi_psv_trip() return intRafael J. Wysocki
Modify pch_wpt_add_acpi_psv_trip() to return an int value instead of using a return pointer for that. While at it, drop an excessive empty code line. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: intel: int340x: Improve int340x_thermal_set_trip_temp()Rafael J. Wysocki
Instead of using snprintf() to populate the ACPI object name in int340x_thermal_set_trip_temp(), use an appropriate initializer and make the function fail if its trip argument is greater than 9, because ACPI object names can only be 4 characters long and it does not make sense to even try to evaluate objects with longer names (that argument is guaranteed to be non-negative, because it comes from the thermal code that will not pass negative trip numbers to zone callbacks). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: intel: int340x: Drop pointless cast to unsigned longRafael J. Wysocki
The explicit casting from int to unsigned long in int340x_thermal_get_zone_temp() is pointless, becuase the multiplication result is cast back to int by the assignment in the same statement, so drop it. No expected functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: intel: int340x: Rename variable in int340x_thermal_zone_add()Rafael J. Wysocki
Rename local variables int34x_thermal_zone in int340x_thermal_zone_add() and int340x_thermal_zone_remove() to int34x_zone which allows a number of code lines to be shorter and easier to read and adjust some white space for consistency. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: intel: int340x: Assorted minor cleanupsRafael J. Wysocki
Improve some inconsistent usage of white space in int340x_thermal_zone.c, fix up one coding style issue in it (missing braces around an else branch of a conditional) and while at it replace a !ACPI_FAILURE() check with an equivalent ACPI_SUCCESS() one. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02thermal: ACPI: Make helpers retrieve temperature onlyRafael J. Wysocki
It is slightly better to make the ACPI thermal helper functions retrieve the trip point temperature only instead of doing the full trip point initialization, because they are also used for updating some already registered trip points, in which case initializing a new trip just in order to update the temperature of an existing one is somewhat wasteful. Modify the ACPI thermal helpers accordingly and update their users. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-01-27thermal: intel: int340x: Use generic trip points tableRafael J. Wysocki
Modify int340x_thermal_zone_add() to register the thermal zone along with a trip points table, which allows the trip-related zone callbacks to be dropped, because they are not needed any more. In order to consolidate the code, use ACPI trip library functions to populate generic trip points in int340x_thermal_read_trips() and to update them in int340x_thermal_update_trips(). Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27thermal: intel: int340x: Use zone lock for synchronizationRafael J. Wysocki
Because the ->get_trip_temp() and ->get_trip_type() thermal zone callbacks are only invoked from __thermal_zone_get_trip() which is always called by the thermal core under the zone lock, it is sufficient for int340x_thermal_update_trips() to acquire the zone lock for mutual exclusion with those callbacks. Accordingly, modify int340x_thermal_update_trips() to use the zone lock instead of the internal trip_mutex and drop the latter which is not necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27thermal: intel: int340x: Rework updating trip pointsRafael J. Wysocki
It is generally invalid to change the trip point indices after they have been exposed via sysfs. Moreover, the thermal objects in the ACPI namespace cannot go away and appear on the fly. In practice, the only thing that can happen when the INT3403_PERF_TRIP_POINT_CHANGED notification is sent by the platform firmware is a change of the return values of those thermal objects. For this reason, add a special function for updating the trip point temperatures after re-evaluating the respective ACPI thermal objects and change int3403_notify() to invoke it instead of int340x_thermal_read_trips() that would change the trip point indices on errors. Also remove the locking from the latter, because it is only called before registering the thermal zone and it cannot race with the zone's callbacks. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27thermal: ACPI: Initialize trips if temperature is out of rangeRafael J. Wysocki
In some cases it is still useful to register a trip point if the temperature returned by the corresponding ACPI thermal object (for example, _HOT) is invalid to start with, because the same ACPI thermal object may start to return a valid temperature after a system configuration change (for example, from an AC power source to battery an vice versa). For this reason, if the ACPI thermal object evaluated by thermal_acpi_trip_init() successfully returns a temperature value that is out of the range of values taken into account, initialize the trip point using THERMAL_TEMP_INVALID as the temperature value instead of returning an error to allow the user of the trip point to decide what to do with it. Also update pch_wpt_add_acpi_psv_trip() to reject trip points with invalid temperature values. Fixes: 7a0e39748861 ("thermal: ACPI: Add ACPI trip point routines") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27Merge back Intel thermal control changes for 6.3.Rafael J. Wysocki
2023-01-26thermal: intel: processor_thermal_device_pci: Use generic trip pointDaniel Lezcano
Make proc_thermal_pci_probe() register the TCPU_PCI thermal zone along with the trip point used by it and drop the zone callbacks related to this trip point that are not needed any more. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-26thermal: intel: int340x: Add production mode attributeSrinivas Pandruvada
It is possible that the system manufacturer locks down thermal tuning beyond what is usually done on the given platform. In that case user space calibration tools should not try to adjust the thermal configuration of the system. To allow user space to check if that is the case, add a new sysfs attribute "production_mode" that will be present when the ACPI DCFG method is present under the INT3400 device object in the ACPI Namespace. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-25thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()Rafael J. Wysocki
In order to prevent int340x_thermal_get_trip_type() from possibly racing with int340x_thermal_read_trips() invoked by int3403_notify() add locking to it in analogy with int340x_thermal_get_trip_temp(). Fixes: 6757a7abe47b ("thermal: intel: int340x: Protect trip temperature from concurrent updates") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-24thermal: intel: int340x: Protect trip temperature from concurrent updatesSrinivas Pandruvada
Trip temperatures are read using ACPI methods and stored in the memory during zone initializtion and when the firmware sends a notification for change. This trip temperature is returned when the thermal core calls via callback get_trip_temp(). But it is possible that while updating the memory copy of the trips when the firmware sends a notification for change, thermal core is reading the trip temperature via the callback get_trip_temp(). This may return invalid trip temperature. To address this add a mutex to protect the invalid temperature reads in the callback get_trip_temp() and int340x_thermal_read_trips(). Fixes: 5fbf7f27fa3d ("Thermal/int340x: Add common thermal zone handler") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.0+ <stable@vger.kernel.org> # 5.0+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-24thermal: intel: intel_pch: Use generic trip pointsDaniel Lezcano
The thermal framework gives the possibility to register the trip points along with the thermal zone. When that is done, no get_trip_* callbacks are needed and they can be removed. Convert the existing callbacks content logic into generic trip points initialization code and register them along with the thermal zone. In order to consolidate the code, use an ACPI trip library function to populate a generic trip point. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Zhang Rui <rui.zhang@intel.com> [ rjw: Subject and changelog edits, rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-01-24thermal: intel: intel_pch: Add support for Wellsburg PCHTim Zimmermann
Add the PCI ID for the Wellsburg C610 series chipset PCH. The driver can read the temperature from the Wellsburg PCH with only the PCI ID added and no other modifications. Signed-off-by: Tim Zimmermann <tim@linux4.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-24Merge back other thermal control material for 6.3.Rafael J. Wysocki
* thermal: (734 commits) thermal: core: call put_device() only after device_register() fails Linux 6.2-rc4 kbuild: Fix CFI hash randomization with KASAN firmware: coreboot: Check size of table entry and use flex-array kallsyms: Fix scheduling with interrupts disabled in self-test ata: pata_cs5535: Don't build on UML lockref: stop doing cpu_relax in the cmpxchg loop x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space efi: tpm: Avoid READ_ONCE() for accessing the event log io_uring: lock overflowing for IOPOLL ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() iommu/iova: Fix alloc iova overflows issue iommu: Fix refcount leak in iommu_device_claim_dma_owner iommu/arm-smmu-v3: Don't unregister on shutdown iommu/arm-smmu: Don't unregister on shutdown iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate() platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode ...
2023-01-20thermal: int340x_thermal: Use sysfs_emit_at() instead of scnprintf()ye xingchen
Follow the advice of the Documentation/filesystems/sysfs.rst that show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> [ rjw: Subject rewrite ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-18thermal: intel: menlow: Update function descriptionsDeming Wang
Update function parameter descriptions for sensor_get_auxtrip() and sensor_set_auxtrip(). [ rjw: New changelog, subject edits ] Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-10thermal: intel: Fix unsigned comparison with less than zeroYang Li
The return value from the call to intel_tcc_get_tjmax() is int, which can be a negative error code. However, the return value is being assigned to an u32 variable 'tj_max', so making 'tj_max' an int. Eliminate the following warning: ./drivers/thermal/intel/intel_soc_dts_iosf.c:394:5-11: WARNING: Unsigned expression compared with zero: tj_max < 0 Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3637 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-06thermal/drivers/intel: Use generic thermal_zone_get_trip() functionDaniel Lezcano
The thermal framework gives the possibility to register the trip points with the thermal zone. When that is done, no get_trip_* ops are needed and they can be removed. Convert ops content logic into generic trip points and register them with the thermal zone. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20221003092602.1323944-30-daniel.lezcano@linaro.org
2023-01-06thermal/intel/int340x: Replace parameter to simplifyDaniel Lezcano
In the process of replacing the get_trip_* ops by the generic trip points, the current code has an 'override' property to add another indirection to a different ops. Rework this approach to prevent this indirection and make the code ready for the generic trip points conversion. Actually the get_temp() is different regarding the platform, so it is pointless to add a new set of ops but just create dynamically the ops at init time. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://lore.kernel.org/r/20221003092602.1323944-29-daniel.lezcano@linaro.org
2022-12-30thermal/x86_pkg_temp_thermal: Add support for handling dynamic tjmaxZhang Rui
Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed at runtime when the Intel SST-PP (Intel Speed Select Technology - Performance Profile) level is changed. Enhance the code to use updated tjmax when programming the thermal interrupt thresholds. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30thermal/x86_pkg_temp_thermal: Use Intel TCC libraryZhang Rui
Cleanup the code by using Intel TCC library for TCC (Thermal Control Circuitry) MSR access. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30thermal/intel/intel_tcc_cooling: Use Intel TCC libraryZhang Rui
Cleanup the code by using Intel TCC library for TCC (Thermal Control Circuitry) MSR access. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30thermal/intel/intel_soc_dts_iosf: Use Intel TCC libraryZhang Rui
Cleanup the code by using Intel TCC library for TCC (Thermal Control Circuitry) MSR access. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30thermal/int340x/processor_thermal: Use Intel TCC libraryZhang Rui
Cleanup the code by using Intel TCC library for TCC (Thermal Control Circuitry) MSR access. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30thermal/intel: Introduce Intel TCC libraryZhang Rui
There are several different drivers that accesses the Intel TCC (thermal control circuitry) MSRs, and each of them has its own implementation for the same functionalities, e.g. getting the current temperature, getting the tj_max, and getting/setting the tj_max offset. Introduce a library to unify the code for Intel CPU TCC MSR access. At the same time, ensure the temperature is got based on the updated tjmax value because tjmax can be changed at runtime for cases like the Intel SST-PP (Intel Speed Select Technology - Performance Profile) level change. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30thermal: int340x: Add missing attribute for data rate baseSrinivas Pandruvada
Commit 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")' added rfi_restriction_data_rate_base string, mmio details and documentation, but missed adding attribute to sysfs. Add missing sysfs attribute. Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver") Cc: 5.11+ <stable@vger.kernel.org> # v5.11+ Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-15Merge tag 'thermal-6.2-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "These are updates of assorted thermal drivers, mostly for ARM platforms, generally isolated and fairly straightforward, and the recent Intel HFI driver fix for systems without HFI support. Specifics: - Avoid clearing the HFI status bit on systems without HFI support which triggers unchecked MSR access errors (Srinivas Pandruvada) - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca Weiss, Neil Armstrong) - Use devm_platform_get_and_ioremap_resource on the ST platform to group two calls into a single one (Minghao Chi) - Use GENMASK instead of bitmaps and validate the temperature after reading it in the imx8mm_thermal driver (Marcus Folkesson) - Convert generic-adc-thermal to DT schema (Rob Herring) - Fix debug print message with inverted logic in the k3_j72xx_bandgap driver (Keerthy) - Fix memory leak on thermal_of_zone_register() failure (Ido Schimmel) - Add support for IPQ8074 in the tsens thermal driver along with the DT bindings (Robert Marko) - Fix and rework the debugfs code in the tsens driver (Christian Marangi) - Add calibration and DT documentation for the imx8mm driver (Marek Vasut) - Add DT bindings and compatible for the Mediatek SoCs mt7981 and mt7983 (Daniel Golle) - Don't show an error message if it happens at probe time while it will be deferred on the QCom SPMI ADC driver (Johan Hovold) - Add HWMon support for the imx8mm board (Alexander Stein) - Remove pointless include from the power allocator governor (Christophe JAILLET) - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450 (Krzysztof Kozlowski) - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss) - Demote error log of thermal zone register to debug in the tsens QCom driver (Manivannan Sadhasivam) - Consolidate the the efuse values and the errata handling in the TI Bandgap driver (Bryan Brattlof) - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT bindings (Lad Prabhakar) - Fix the irq handler return value in the LMh driver (Bjorn Andersson) - Delete empty platform remove callback from imx_sc_thermal (Uwe Kleine-König)" * tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) thermal/drivers/imx_sc_thermal: Drop empty platform remove function thermal/drivers/qcom/lmh: Fix irq handler return value dt-bindings: thermal: qcom-tsens: Add compatible for sm8550 thermal/drivers/st: Use devm_platform_get_and_ioremap_resource() dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range dt-bindings: thermal: k3-j72xx: elaborate on binding description thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function thermal/drivers/qcom: Demote error log of thermal zone register to debug thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2 dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450 thermal/core/power allocator: Remove a useless include thermal/drivers/imx8mm: Add hwmon support thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC thermal: ti-soc-thermal: Drop comma after SoC match table sentinel thermal/drivers/imx: Add support for loading calibration data from OCOTP ...
2022-12-14thermal: intel: Don't set HFI status bit to 1Srinivas Pandruvada
When CPU doesn't support HFI (Hardware Feedback Interface), don't include BIT 26 in the mask to prevent clearing. otherwise this results in: unchecked MSR access error: WRMSR to 0x1b1 (tried to write 0x0000000004000aa8) at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0) Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI status") Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-12Merge tag 'thermal-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These include thermal core fixes to protect thermal device operations against thermal device removal, other thermal core fixes and updates of Intel thermal control drivers. Specifics: - Fix race conditions related to thermal device operations that are not protected against thermal device removal (Guenter Roeck) - Fix error code in __thermal_cooling_device_register() (Dan Carpenter) - Validate new cooling device state (coming from user space) in cur_state_store() and reuse the max_state value from cooling device structure in the sysfs interface (Viresh Kumar) - Fix some possible name leaks in error paths in the thermal control core code (Yang Yingliang) - Detect TCC lock bit set in the intel_tcc_cooling driver and make it refuse to update the TCC offset in that case (Zhang Rui) - Add TCC cooling support for RaptorLake-S (Zhang Rui) - Prevent accidental clearing of HFI status by one of the other drivers using the same status register (Srinivas Pandruvada) - Protect clearing of thermal status bits in Intel thermal control drivers (Srinivas Pandruvada) - Allow the HFI thermal control driver to ACK an HFI event for the previously observed timestamp (Srinivas Pandruvada) - Remove a pointless die_id check from the HFI thermal driver and adjust the definition a data structure used by it (Ricardo Neri)" * tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: hfi: Remove a pointless die_id check thermal: core: fix some possible name leaks in error paths thermal: intel: hfi: ACK HFI for the same timestamp thermal: intel: Protect clearing of thermal status bits thermal: intel: Prevent accidental clearing of HFI status thermal/core: Protect thermal device operations against thermal device removal thermal/core: Remove thermal_zone_set_trips() thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex thermal/core: Introduce locked version of thermal_zone_device_update thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp thermal/core: Delete device under thermal device zone lock thermal/core: Destroy thermal zone device mutex in release function thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S thermal: intel: intel_tcc_cooling: Detect TCC lock bit thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages thermal/core: fix error code in __thermal_cooling_device_register() thermal: sysfs: Reuse cdev->max_state thermal: Validate new state in cur_state_store()
2022-12-02thermal: intel: hfi: Remove a pointless die_id checkRicardo Neri
die_id is an u16 quantity. On single-die systems the default value of die_id is 0. No need to check for negative values. Plus, removing this check makes Coverity happy. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23thermal: intel: hfi: ACK HFI for the same timestampSrinivas Pandruvada
Some processors issue more than one HFI interrupt with the same timestamp. Each interrupt must be acknowledged to let the hardware issue new HFI interrupts. But this can't be done without some additional flow modification in the existing interrupt handling. For background, the HFI interrupt is a package level thermal interrupt delivered via a LVT. This LVT is common for both the CPU and package level interrupts. Hence, all CPUs receive the HFI interrupts. But only one CPU should process interrupt and others simply exit by issuing EOI to LAPIC. The current HFI interrupt processing flow: 1. Receive Thermal interrupt 2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS 3. Try and get spinlock, one CPU will enter spinlock and others will simply return from here to issue EOI. (Let's assume CPU 4 is processing interrupt) 4. Check the stored time-stamp from the HFI memory time-stamp 5. if same 6. ignore interrupt, unlock and return 7. Copy the HFI message to local buffer 8. unlock spinlock 9. ACK HFI interrupt 10. Queue the message for processing in a work-queue It is tempting to simply acknowledge all the interrupts even if they have the same timestamp. This may cause some interrupts to not be processed. Let's say CPU5 is slightly late and reaches step 4 while CPU4 is between steps 8 and 9. Currently we simply ignore interrupts with the same timestamp. No issue here for CPU5. When CPU4 acknowledges the interrupt, the next HFI interrupt can be delivered. If we acknowledge interrupts with the same timestamp (at step 6), there is a race condition. Under the same scenario, CPU 5 will acknowledge the HFI interrupt. This lets hardware generate another HFI interrupt, before CPU 4 start executing step 9. Once CPU 4 complete step 9, it will acknowledge the newly arrived HFI interrupt, without actually processing it. Acknowledge the interrupt when holding the spinlock. This avoids contention of the interrupt acknowledgment. Updated flow: 1. Receive HFI Thermal interrupt 2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS 3. Try and get spin-lock Let's assume CPU 4 is processing interrupt 4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit 4.2 If hfi status is 0 4.3 unlock spinlock 4.4 return 4.5 Check the stored time-stamp from the HFI memory time-stamp 5. if same 6.1 ACK HFI Interrupt, 6.2 unlock spinlock 6.3 return 7. Copy the HFI message to local buffer 8. ACK HFI interrupt 9. unlock spinlock 10. Queue the message for processing in a work-queue To avoid taking the lock unnecessarily, intel_hfi_process_event() checks the status of the HFI interrupt before taking the lock. If CPU5 is late, when it starts processing the interrupt there are two scenarios: a) CPU4 acknowledged the HFI interrupt before CPU5 read MSR_IA32_THERM_STATUS. CPU5 exits. b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the interrupt. CPU5 will take the lock if CPU4 has released it. It then re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt, the HFI status bit is clear and CPU5 exits. If a new HFI interrupt was generated it will find that the status bit is set and it will continue to process the interrupt. In this case even if timestamp is not changed, the ACK can be issued as this is a new interrupt. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Arshad, Adeel<adeel.arshad@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23thermal: intel: Protect clearing of thermal status bitsSrinivas Pandruvada
The clearing of the package thermal status is done by Read-Modify-Write operation. This may result in clearing of some new status bits which are being or about to be processed. For example, while clearing of HFI status, after read of thermal status register, a new thermal status bit is set by the hardware. But during write back, the newly generated status bit will be set to 0 or cleared. So, it is not safe to do read-modify-write. Since thermal status Read-Write bits can be set to only 0 not 1, it is safe to set all other bits to 1 which are not getting cleared. Create a common interface for clearing package thermal status bits. Use this interface to replace existing code to clear thermal package status bits. It is safe to call from different CPUs without protection as there is no read-modify-write. Also wrmsrl results in just single instruction. For example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write 0x4000aa8. The bits which are not part of clear are set to 1. The default mask for bits, which can be written here is 0x4000aaa. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23thermal: intel: Prevent accidental clearing of HFI statusSrinivas Pandruvada
When there is a package thermal interrupt with PROCHOT log, it will be processed and cleared. It is possible that there is an active HFI event status, which is about to get processed or getting processed. While clearing PROCHOT log bit, it will also clear HFI status bit. This means that hardware is free to update HFI memory. When clearing a package thermal interrupt, some processors will generate a "general protection fault" when any of the read only bit is set to 1. The driver maintains a mask of all read-write bits which can be set. This mask doesn't include HFI status bit. This bit will also be cleared, as it will be assumed read-only bit. So, add HFI status bit 26 to the mask. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23ACPI: make remove callback of ACPI driver voidDawei Li
For bus-based driver, device removal is implemented as: 1 device_remove()-> 2 bus->remove()-> 3 driver->remove() Driver core needs no inform from callee(bus driver) about the result of remove callback. In that case, commit fc7a6209d571 ("bus: Make remove callback return void") forces bus_type::remove be void-returned. Now we have the situation that both 1 & 2 of calling chain are void-returned, so it does not make much sense for 3(driver->remove) to return non-void to its caller. So the basic idea behind this change is making remove() callback of any bus-based driver to be void-returned. This change, for itself, is for device drivers based on acpi-bus. Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Lee Jones <lee@kernel.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Dawei Li <set_pte_at@outlook.com> Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com> # for drivers/platform/surface/* Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-09thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-SZhang Rui
Add RaptorLake to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-09thermal: intel: intel_tcc_cooling: Detect TCC lock bitZhang Rui
When MSR_IA32_TEMPERATURE_TARGET is locked, TCC Offset can not be updated even if the PROGRAMMABE Bit is set. Yield the driver on platforms with MSR_IA32_TEMPERATURE_TARGET locked. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28thermal: intel: hfi: Improve the type of hfi_features::nr_table_pagesRicardo Neri
A Coverity static code scan raised a potential overflow_before_widen warning when hfi_features::nr_table_pages is used as an argument to memcpy in intel_hfi_process_event(). Even though the overflow can never happen (the maximum number of pages of the HFI table is 0x10 and 0x10 << PAGE_SHIFT = 0x10000), using size_t as the data type of hfi_features::nr_table_pages makes Coverity happy and matches the data type of the argument 'size' of memcpy(). Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-15thermal: intel_powerclamp: Use first online CPU as control_cpuRafael J. Wysocki
Commit 68b99e94a4a2 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash") fixed an issue related to using smp_processor_id() in preemptible context by replacing it with a pair of get_cpu()/put_cpu(), but what is needed there really is any online CPU and not necessarily the one currently running the code. Arguably, getting the one that's running the code in there is confusing. For this reason, simply give the control CPU role to the first online one which automatically will be CPU0 if it is online, so one check can be dropped from the code for an added benefit. Link: https://lore.kernel.org/linux-pm/20221011113646.GA12080@duo.ucw.cz/ Fixes: 68b99e94a4a2 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Chen Yu <yu.c.chen@intel.com>
2022-09-24thermal: int340x: processor_thermal: Use module_pci_driver() macroShang XiaoJing
Since PCI provides helper macro module_pci_driver(), the module_init/exit code can be replaced with it. Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21thermal: intel_powerclamp: Remove accounting for IRQ wakesSrinivas Pandruvada
There is a static variable "idle_wakeup_counter", which accounts for number of wake ups because of IRQs and take actions to compensate idle injection. This is now read and reset to 0, but never incremented. So all the usage of this counter for idle injection has no use. Also another static variable "reduce_irq", which depends on "idle_wakeup_counter", so remove usage of "reduce_irq" also. Commit feb6cd6a0f9f ("thermal/intel_powerclamp: stop sched tick in forced idle") replaced the local use of "mwait_idle_with_hints" with play_idle(). This removed possibility of updating "idle_wakeup_counter" without change in play_idle(). This change was made in Linux 4.10. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to ↵Srinivas Pandruvada
avoid crash When CPU 0 is offline and intel_powerclamp is used to inject idle, it generates kernel BUG: BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687 caller is debug_smp_processor_id+0x17/0x20 CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57 Call Trace: <TASK> dump_stack_lvl+0x49/0x63 dump_stack+0x10/0x16 check_preemption_disabled+0xdd/0xe0 debug_smp_processor_id+0x17/0x20 powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp] ... ... Here CPU 0 is the control CPU by default and changed to the current CPU, if CPU 0 offlined. This check has to be performed under cpus_read_lock(), hence the above warning. Use get_cpu() instead of smp_processor_id() to avoid this BUG. Suggested-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-03thermal: int340x_thermal: Consolidate priv->data_vault checksRafael J. Wysocki
It is sufficient to check priv->data_vault once in the error code path of int3400_thermal_probe(), so do that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-23thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTRLee, Chun-Yi
In some case, the GDDV returns a package with a buffer which has zero length. It causes that kmemdup() returns ZERO_SIZE_PTR (0x10). Then the data_vault_read() got NULL point dereference problem when accessing the 0x10 value in data_vault. [ 71.024560] BUG: kernel NULL pointer dereference, address: 0000000000000010 This patch uses ZERO_OR_NULL_PTR() for checking ZERO_SIZE_PTR or NULL value in data_vault. Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-03thermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-PSumeet Pawnikar
Add Alder Lake-N and Raptor Lake-P to the list of processor models supported by the Intel TCC cooling driver. Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>