summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_hwmon.c
AgeCommit message (Collapse)Author
2024-09-13drm/i915/hwmon: expose package temperatureRaag Jadav
Add hwmon support for temp1_input attribute, which will expose package temperature in millidegree Celsius. With this in place we can monitor package temperature using lm-sensors tool. $ sensors i915-pci-0300 Adapter: PCI adapter in0: 990.00 mV fan1: 1260 RPM temp1: +45.0°C power1: N/A (max = 35.00 W) energy1: 12.62 kJ v2: Use switch case (Anshuman) v3: Comment adjustment (Riana) Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11276 Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910105242.3357276-1-raag.jadav@intel.com
2024-08-28drm/i915/hwmon: expose fan speedRaag Jadav
Add hwmon support for fan1_input attribute, which will expose fan speed in RPM. With this in place we can monitor fan speed using lm-sensors tool. $ sensors i915-pci-0300 Adapter: PCI adapter in0: 653.00 mV fan1: 3833 RPM power1: N/A (max = 43.00 W) energy1: 32.02 kJ v2: Handle overflow, add mutex protection and ABI documentation Aesthetic adjustments (Riana) v3: Change rotations data type, ABI date and version v4: Fix wakeref leak Drop switch case and simplify hwm_fan_xx() (Andi) v5: Rework time calculation, aesthetic adjustments (Andy) v6: Drop redundant overflow logic (Andy) Split fan_input_read() into dedicated helper (Badal) v7: Fix undefined reference to __udivdi3 for i386 (Andy) Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823034548.2670032-1-raag.jadav@intel.com
2024-04-24Merge tag 'drm-xe-next-2024-04-23' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Remove unused flags (Francois Dugast) - Extend uAPI to query HuC micro-controler firmware version (Francois Dugast) - drm/xe/uapi: Define topology types as indexes rather than masks (Francois Dugast) - drm/xe/uapi: Restore flags VM_BIND_FLAG_READONLY and VM_BIND_FLAG_IMMEDIATE (Francois Dugast) - devcoredump updates. Some touching the output format. (José Roberto de Souza, Matthew Brost) - drm/xe/hwmon: Add infra to support card power and energy attributes - Improve LRC, HWSP and HWCTX error capture. (Maarten Lankhorst) - drm/xe/uapi: Add IP version and stepping to GT list query (Matt roper) - Invalidate userptr VMA on page pin fault (Matthew Brost) - Improve xe_bo_move tracepoint (Priyanka Danamudi) - Align fence output format in ftrace log Cross-driver Changes: - drm/i915/hwmon: Get rid of devm (Ashutosh Dixit) (Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>) - drm/i915/display: convert inner wakeref get towards get_if_in_use (SOB Rodrigo Vivi) - drm/i915: Convert intel_runtime_pm_get_noresume towards raw wakeref (Committer, SOB Jani Nikula) Driver Changes: - Fix for unneeded CCS metadata allocation (Akshata Jahagirdar) - Fix for fix multicast support for Xe_LP platforms (Andrzej Hajda) - A couple of build fixes (Arnd Bergmann) - Fix register definition (Ashutosh Dixit) - Add BMG mocs table (Balasubramani Vivekanandan) - Replace sprintf() across driver (Bommu Krishnaiah) - Add an xe2 workaround (Bommu Krishnaiah) - Makefile fix (Dafna Hirschfeld) - force_wake_get error value check (Daniele Ceraolo Spurio) - Handle GSCCS ER interrupt (Daniele Ceraolo Spurio) - GSC Workaround (Daniele Ceraolo Spurio) - Build error fix (Dawei Li) - drm/xe/gt: Add L3 bank mask to GT topology (Francois Dugast) - Implement xe2- and GuC workarounds (Gustavo Sousa, Haridhar Kalvala, Himal rasad Ghimiray, John Harrison, Matt Roper, Radhakrishna Sripada, Vinay Belgaumkar, Badal Nilawar) - xe2hpg compression (Himal Ghimiray Prasad) - Error code cleanups and fixes (Himal Prasad Ghimiray) - struct xe_device cleanup (Jani Nikula) - Avoid validating bos when only requesting an exec dma-fence (José Roberto de Souza) - Remove debug message from migrate_clear (José Roberto de Souza) - Nuke EXEC_QUEUE_FLAG_PERSISTENT leftover internal flag (José Roberto de Souza) - Mark dpt and related vma as uncached (Juha-Pekka Heikkila) - Hwmon updates (Karthik Poosa) - KConfig fix when ACPI_WMI selcted (Lu Yao) - Update intel_uncore_read*() return types (Luca Coelho) - Mocs updates (Lucas De Marchi, Matt Roper) - Drop dynamic load-balancing workaround (Lucas De Marchi) - Fix a PVC workaround (Lucas De Marchi) - Group live kunit tests into a single module (Lucas De Marchi) - Various code cleanups (Lucas De Marchi) - Fix a ggtt init error patch and move ggtt invalidate out of ggtt lock (Maarten Lankhorst) - Fix a bo leak (Marten Lankhorst) - Add LRC parsing for more GPU instructions (Matt Roper) - Add various definitions for hardware and IP (Matt Roper) - Define all possible engines in media IP descriptors (Matt Roper) - Various cleanups, asserts and code fixes (Matthew Auld) - Various cleanups and code fixes (Matthew Brost) - Increase VM_BIND number of per-ioctl Ops (Matthew Brost, Paulo Zanoni) - Don't support execlists in xe_gt_tlb_invalidation layer (Matthew Brost) - Handle timing out of already signaled jobs gracefully (Matthew Brost) - Pipeline evict / restore of pinned BOs during suspend / resume (Matthew Brost) - Do not grab forcewakes when issuing GGTT TLB invalidation via GuC (Matthew Brost) - Drop ggtt invalidate from display code (Matthew Brost) - drm/xe: Add XE_BO_GGTT_INVALIDATE flag (Matthew Brost) - Add debug messages for MMU notifier and VMA invalidate (Matthew Brost) - Use ordered wq for preempt fence waiting (Matthew Brost) - Initial development for SR-IOV support including some refactoring (Michal Wajdeczko) - Various GuC- and GT- related cleanups and fixes (Michal Wajdeczko) - Move userptr over to start using hmm_range_fault (Oak Zeng) - Add new PCI IDs to DG2 platform (Ravi Kumar Vodapalli) - Pcode - and VRAM initialization check update (Riana Tauro) - Large PM update including i915 display patches, and a fix for one of those. (Rodrigo Vivi) - Introduce performance tuning changes for Xe2_HPG (Shekhar Chauhan) - GSC / HDCP updates (Suraj Kandpal) - Minor code cleanup (Tejas Upadhyay) - Rework / fix rebind TLB flushing and move rebind into the drm_exec locking loop (Thomas Hellström) - Backmerge (Thomas Hellström) - GuC updates and fixes (Vinay Belgaumkar, Zhanjun Dong) Signed-off-by: Dave Airlie <airlied@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQRskUM7w1oG5rx2IZO4FpNVCsYGvwUCZiestQAKCRC4FpNVCsYG # v8dLAQCDFUR7R5rwSdfqzNy+Djg+9ZgmtzVEfHZ+rI2lTReaCwEAhWeK7UooIMV0 # vGsSdsqGsJQm4VLRzE6H1yemCCQOBgM= # =HouD # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Apr 2024 22:42:29 AEST # gpg: using EDDSA key 6C91433BC35A06E6BC762193B81693550AC606BF # gpg: Can't check signature: No public key # Conflicts: # drivers/gpu/drm/xe/xe_device_types.h # drivers/gpu/drm/xe/xe_vm.c # drivers/gpu/drm/xe/xe_vm_types.h From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zievlb1wvqDg1ovi@fedora
2024-04-18drm/i915/hwmon: Get rid of devmAshutosh Dixit
When both hwmon and hwmon drvdata (on which hwmon depends) are device managed resources, the expectation, on device unbind, is that hwmon will be released before drvdata. However, in i915 there are two separate code paths, which both release either drvdata or hwmon and either can be released before the other. These code paths (for device unbind) are as follows (see also the bug referenced below): Call Trace: release_nodes+0x11/0x70 devres_release_group+0xb2/0x110 component_unbind_all+0x8d/0xa0 component_del+0xa5/0x140 intel_pxp_tee_component_fini+0x29/0x40 [i915] intel_pxp_fini+0x33/0x80 [i915] i915_driver_remove+0x4c/0x120 [i915] i915_pci_remove+0x19/0x30 [i915] pci_device_remove+0x32/0xa0 device_release_driver_internal+0x19c/0x200 unbind_store+0x9c/0xb0 and Call Trace: release_nodes+0x11/0x70 devres_release_all+0x8a/0xc0 device_unbind_cleanup+0x9/0x70 device_release_driver_internal+0x1c1/0x200 unbind_store+0x9c/0xb0 This means that in i915, if use devm, we cannot gurantee that hwmon will always be released before drvdata. Which means that we have a uaf if hwmon sysfs is accessed when drvdata has been released but hwmon hasn't. The only way out of this seems to be do get rid of devm_ and release/free everything explicitly during device unbind. v2: Change commit message and other minor code changes v3: Cleanup from i915_hwmon_register on error (Armin Wolf) v4: Eliminate potential static analyzer warning (Rodrigo) Eliminate fetch_and_zero (Jani) v5: Restore previous logic for ddat_gt->hwmon_dev error return (Andi) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10366 Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417145646.793223-1-ashutosh.dixit@intel.com
2024-03-28drm/i915/hwmon: Fix locking inversion in sysfs getterJanusz Krzysztofik
In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an rpm wakeref. That results in lock inversion: <4> [197.079335] ====================================================== <4> [197.085473] WARNING: possible circular locking dependency detected <4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted <4> [197.098096] ------------------------------------------------------ <4> [197.104231] prometheus-node/839 is trying to acquire lock: <4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350 <4> [197.116939] but task is already holding lock: <4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915] <4> [197.131543] which lock already depends on the new lock. ... <4> [197.507922] Chain exists of: fs_reclaim --> &gt->reset.mutex --> &hwmon->hwmon_lock <4> [197.518528] Possible unsafe locking scenario: <4> [197.524411] CPU0 CPU1 <4> [197.528916] ---- ---- <4> [197.533418] lock(&hwmon->hwmon_lock); <4> [197.537237] lock(&gt->reset.mutex); <4> [197.543376] lock(&hwmon->hwmon_lock); <4> [197.549682] lock(fs_reclaim); ... <4> [197.632548] Call Trace: <4> [197.634990] <TASK> <4> [197.637088] dump_stack_lvl+0x64/0xb0 <4> [197.640738] check_noncircular+0x15e/0x180 <4> [197.652968] check_prev_add+0xe9/0xce0 <4> [197.656705] __lock_acquire+0x179f/0x2300 <4> [197.660694] lock_acquire+0xd8/0x2d0 <4> [197.673009] fs_reclaim_acquire+0xa1/0xd0 <4> [197.680478] __kmalloc+0x9a/0x350 <4> [197.689063] acpi_ns_internalize_name.part.0+0x4a/0xb0 <4> [197.694170] acpi_ns_get_node_unlocked+0x60/0xf0 <4> [197.720608] acpi_ns_get_node+0x3b/0x60 <4> [197.724428] acpi_get_handle+0x57/0xb0 <4> [197.728164] acpi_has_method+0x20/0x50 <4> [197.731896] acpi_pci_set_power_state+0x43/0x120 <4> [197.736485] pci_power_up+0x24/0x1c0 <4> [197.740047] pci_pm_default_resume_early+0x9/0x30 <4> [197.744725] pci_pm_runtime_resume+0x2d/0x90 <4> [197.753911] __rpm_callback+0x3c/0x110 <4> [197.762586] rpm_callback+0x58/0x70 <4> [197.766064] rpm_resume+0x51e/0x730 <4> [197.769542] rpm_resume+0x267/0x730 <4> [197.773020] rpm_resume+0x267/0x730 <4> [197.776498] rpm_resume+0x267/0x730 <4> [197.779974] __pm_runtime_resume+0x49/0x90 <4> [197.784055] __intel_runtime_pm_get+0x19/0xa0 [i915] <4> [197.789070] hwm_energy+0x55/0x100 [i915] <4> [197.793183] hwm_read+0x9a/0x310 [i915] <4> [197.797124] hwmon_attr_show+0x36/0x120 <4> [197.800946] dev_attr_show+0x15/0x60 <4> [197.804509] sysfs_kf_seq_show+0xb5/0x100 Acquire the wakeref before the lock and hold it as long as the lock is also held. Follow that pattern across the whole source file where similar lock inversion can happen. v2: Keep hardware read under the lock so the whole operation of updating energy from hardware is still atomic (Guenter), - instead, acquire the rpm wakeref before the lock and hold it as long as the lock is held, - use the same aproach for other similar places across the i915_hwmon.c source file (Rodrigo). Fixes: 1b44019a93e2 ("drm/i915/guc: Disable PL1 power limit when loading GuC firmware") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: <stable@vger.kernel.org> # v6.5+ Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240311203500.518675-2-janusz.krzysztofik@linux.intel.com (cherry picked from commit 71b218771426ea84c0e0148a2b7ac52c1f76e792) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-03-22drm/i915: Drop dead code for xehpsdvLucas De Marchi
PCI IDs for XEHPSDV were never added and platform always marked with force_probe. Drop what's not used and rename some places to either be xehp or dg2, depending on the platform/IP checks. The registers not used anymore are also removed. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Tvrtko Ursulin <tursulin@ursulin.net> Link: https://patchwork.freedesktop.org/patch/msgid/20240320060543.4034215-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-03-13drm/i915/hwmon: Fix locking inversion in sysfs getterJanusz Krzysztofik
In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an rpm wakeref. That results in lock inversion: <4> [197.079335] ====================================================== <4> [197.085473] WARNING: possible circular locking dependency detected <4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted <4> [197.098096] ------------------------------------------------------ <4> [197.104231] prometheus-node/839 is trying to acquire lock: <4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350 <4> [197.116939] but task is already holding lock: <4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915] <4> [197.131543] which lock already depends on the new lock. ... <4> [197.507922] Chain exists of: fs_reclaim --> &gt->reset.mutex --> &hwmon->hwmon_lock <4> [197.518528] Possible unsafe locking scenario: <4> [197.524411] CPU0 CPU1 <4> [197.528916] ---- ---- <4> [197.533418] lock(&hwmon->hwmon_lock); <4> [197.537237] lock(&gt->reset.mutex); <4> [197.543376] lock(&hwmon->hwmon_lock); <4> [197.549682] lock(fs_reclaim); ... <4> [197.632548] Call Trace: <4> [197.634990] <TASK> <4> [197.637088] dump_stack_lvl+0x64/0xb0 <4> [197.640738] check_noncircular+0x15e/0x180 <4> [197.652968] check_prev_add+0xe9/0xce0 <4> [197.656705] __lock_acquire+0x179f/0x2300 <4> [197.660694] lock_acquire+0xd8/0x2d0 <4> [197.673009] fs_reclaim_acquire+0xa1/0xd0 <4> [197.680478] __kmalloc+0x9a/0x350 <4> [197.689063] acpi_ns_internalize_name.part.0+0x4a/0xb0 <4> [197.694170] acpi_ns_get_node_unlocked+0x60/0xf0 <4> [197.720608] acpi_ns_get_node+0x3b/0x60 <4> [197.724428] acpi_get_handle+0x57/0xb0 <4> [197.728164] acpi_has_method+0x20/0x50 <4> [197.731896] acpi_pci_set_power_state+0x43/0x120 <4> [197.736485] pci_power_up+0x24/0x1c0 <4> [197.740047] pci_pm_default_resume_early+0x9/0x30 <4> [197.744725] pci_pm_runtime_resume+0x2d/0x90 <4> [197.753911] __rpm_callback+0x3c/0x110 <4> [197.762586] rpm_callback+0x58/0x70 <4> [197.766064] rpm_resume+0x51e/0x730 <4> [197.769542] rpm_resume+0x267/0x730 <4> [197.773020] rpm_resume+0x267/0x730 <4> [197.776498] rpm_resume+0x267/0x730 <4> [197.779974] __pm_runtime_resume+0x49/0x90 <4> [197.784055] __intel_runtime_pm_get+0x19/0xa0 [i915] <4> [197.789070] hwm_energy+0x55/0x100 [i915] <4> [197.793183] hwm_read+0x9a/0x310 [i915] <4> [197.797124] hwmon_attr_show+0x36/0x120 <4> [197.800946] dev_attr_show+0x15/0x60 <4> [197.804509] sysfs_kf_seq_show+0xb5/0x100 Acquire the wakeref before the lock and hold it as long as the lock is also held. Follow that pattern across the whole source file where similar lock inversion can happen. v2: Keep hardware read under the lock so the whole operation of updating energy from hardware is still atomic (Guenter), - instead, acquire the rpm wakeref before the lock and hold it as long as the lock is held, - use the same aproach for other similar places across the i915_hwmon.c source file (Rodrigo). Fixes: 1b44019a93e2 ("drm/i915/guc: Disable PL1 power limit when loading GuC firmware") Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: <stable@vger.kernel.org> # v6.5+ Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240311203500.518675-2-janusz.krzysztofik@linux.intel.com
2023-12-18drm/i915/hwmon: Fix static analysis tool reported issuesKarthik Poosa
Updated i915 hwmon with fixes for issues reported by static analysis tool. Fixed integer overflow with upcasting. v2: - Added Fixes tag (Badal). - Updated commit message as per review comments (Anshuman). Fixes: 4c2572fe0ae7 ("drm/i915/hwmon: Expose power1_max_interval") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231204144809.1518704-1-karthik.poosa@intel.com (cherry picked from commit ac3420d3d428443a08b923f9118121c170192b62) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-12-13drm/i915/hwmon: Fix static analysis tool reported issuesKarthik Poosa
Updated i915 hwmon with fixes for issues reported by static analysis tool. Fixed integer overflow with upcasting. v2: - Added Fixes tag (Badal). - Updated commit message as per review comments (Anshuman). Fixes: 4c2572fe0ae7 ("drm/i915/hwmon: Expose power1_max_interval") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231204144809.1518704-1-karthik.poosa@intel.com
2023-05-31Merge drm/drm-next into drm-intel-nextJani Nikula
Sync the drm-intel-gt-next changes back to drm-intel-next via drm-next. Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2023-05-22drm/i915: constify pointers to hwmon_channel_infoKrzysztof Kozlowski
Statically allocated array of pointers to hwmon_channel_info can be made const for safety. Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230511175446.282041-1-krzysztof.kozlowski@linaro.org
2023-04-26drm/i915/hwmon: Block waiting for GuC reset to completeAshutosh Dixit
Instead of erroring out when GuC reset is in progress, block waiting for GuC reset to complete which is a more reasonable uapi behavior. v2: Avoid race between wake_up_all and waiting for wakeup (Rodrigo) v3: Remove timeout when blocked (Tvrtko) Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-4-ashutosh.dixit@intel.com
2023-04-26drm/i915/guc: Disable PL1 power limit when loading GuC firmwareAshutosh Dixit
On dGfx, the PL1 power limit being enabled and set to a low value results in a low GPU operating freq. It also negates the freq raise operation which is done before GuC firmware load. As a result GuC firmware load can time out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power limit was enabled and set to a low value). Therefore disable the PL1 power limit when allowed by HW when loading GuC firmware. v2: - Take mutex (to disallow writes to power1_max) across GuC reset/fw load - Add hwm_power_max_restore to error return code path v3 (Jani N): - Add/remove explanatory comments - Function renames - Type corrections - Locking annotation v4: - Don't hold the lock across GuC reset (Rodrigo) - New locking scheme (suggested by Rodrigo) - Eliminate rpm_get in power_max_disable/restore, not needed (Tvrtko) v5: - Fix uninitialized pl1en variable compile warning reported by kernel build robot by creating new err_rps label Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-3-ashutosh.dixit@intel.com
2023-04-26drm/i915/hwmon: Get mutex and rpm ref just once in hwm_power_max_writeAshutosh Dixit
In preparation for follow-on patches, refactor hwm_power_max_write to take hwmon_lock and runtime pm wakeref at start of the function and release them at the end, therefore acquiring these just once each. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-2-ashutosh.dixit@intel.com
2023-04-03drm/i915/hwmon: Use 0 to designate disabled PL1 power limitAshutosh Dixit
On ATSM the PL1 limit is disabled at power up. The previous uapi assumed that the PL1 limit is always enabled and therefore did not have a notion of a disabled PL1 limit. This results in erroneous PL1 limit values when the PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit was previously shown as 0 which means a low PL1 limit whereas the limit being disabled actually implies a high effective PL1 limit value. To get round this problem, the PL1 limit uapi is expanded to include a special value 0 to designate a disabled PL1 limit. A read value of 0 means that the PL1 power limit is disabled, writing 0 disables the limit. The link between this patch and the bugs mentioned below is as follows: * Because on ATSM the PL1 power limit is disabled on power up and there were no means to enable it, we previously implemented the means to enable the limit when the PL1 hwmon entry (power1_max) was written to. * Now there is a IGT igt@i915_hwmon@hwmon_write which (a) reads orig value from all hwmon sysfs (b) does a bunch of random writes and finally (c) restores the orig value read. On ATSM since the orig value is 0, when the IGT restores the 0 value, the PL1 limit is now enabled with a value of 0. * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to 100 MHz. This causes GuC FW load and several IGT's to start timing out and gives rise to these Intel CI bugs. After this patch, writing 0 would disable the PL1 limit instead of enabling it, avoiding the freq drop issue. v2: Add explanation for bugs mentioned below (Rodrigo) v3: Eliminate race during PL1 disable and verify (Tvrtko) Change return to -ENODEV if verify fails (Tvrtko) Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8060 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230401024146.1826092-1-ashutosh.dixit@intel.com
2023-03-01drm/i915/hwmon: Accept writes of value 0 to power1_max_intervalAshutosh Dixit
The value shown by power1_max_interval in millisec is essentially: ((1.x * power(2,y)) * 1000) >> 10 Where x and y are read from a HW register. On ATSM, x and y are 0 on power-up so the value shown is 0. Writes of 0 to power1_max_interval had previously been disallowed to avoid computing ilog2(0) but this resulted in the corner-case bug below. Therefore allow writes of 0 now but special case that write to x = y = 0. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7754 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230228044334.3630391-1-ashutosh.dixit@intel.com
2023-02-16drm/i915/hwmon: Enable PL1 limit when writing limit value to HWAshutosh Dixit
Previous documentation suggested that the PL1 power limit is always enabled in HW. However we now find this not to be the case on some platforms (such as ATSM). Therefore enable the PL1 power limit (by setting the enable bit) when writing the PL1 limit value to HW. Bspec: 51864 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230216164944.2366150-3-ashutosh.dixit@intel.com
2023-02-16drm/i915/hwmon: Replace hwm_field_scale_and_write with hwm_power_max_writeAshutosh Dixit
hwm_field_scale_and_write has a single caller hwm_power_write and is specific to hwm_power_write but makes it appear that it is a general function which can have multiple callers. Replace the function with hwm_power_max_write which is specific to hwm_power_write and use that in future patches where the function needs to be extended. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230216164944.2366150-2-ashutosh.dixit@intel.com
2023-02-08Revert "drm/i915/hwmon: Enable PL1 power limit"Ashutosh Dixit
This reverts commit 0349c41b05968befaffa5fbb7e73d0ee6004f610. 0349c41b0596 ("drm/i915/hwmon: Enable PL1 power limit") is incorrect and caused a major regression on ATSM. The change enabled the PL1 power limit but FW sets the default value of the PL1 limit to 0 which implies HW now works at minimum power and therefore the lowest effective frequency. This means all workloads now run slower resulting in even GuC FW load operations timing out, rendering ATSM unusable. A different solution to the original issue of the PL1 limit being disabled on ATSM is needed but till that is developed, revert 0349c41b0596. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230208190312.1611335-1-ashutosh.dixit@intel.com
2023-02-06drm/i915/hwmon: Enable PL1 power limitAshutosh Dixit
Previous documentation suggested that PL1 power limit is always enabled. However we now find this not to be the case on some platforms (such as ATSM). Therefore enable PL1 power limit during hwmon initialization. Bspec: 51864 v2: Add Bspec reference (Gwan-gyeong) v3: Add Fixes tag Fixes: 99f55efb79114 ("drm/i915/hwmon: Power PL1 limit and TDP setting") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230203155309.1042297-1-ashutosh.dixit@intel.com
2023-01-06drm/i915/hwmon: Display clamped PL1 limitAshutosh Dixit
HW allows arbitrary PL1 limits to be set but silently clamps these values to "typical but not guaranteed" min/max values in pkg_power_sku register. Follow the same pattern for sysfs, allow arbitrary PL1 limits to be set but display clamped values when read, so that users see PL1 limits HW is likely using. Otherwise users think HW is using arbitrarily high/low PL1 limits they might have set. The previous write/read I1 power1_crit limit also follows the same clamping pattern. v2: Explain "why" in commit message and include bug link (Jani Nikula) Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7704 Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221215191727.2468770-1-ashutosh.dixit@intel.com
2022-12-07drm/i915/hwmon: Silence "mailbox access failed" warning in snb_pcode_readAshutosh Dixit
hwm_pcode_read_i1 is called during i915 load. This results in the following warning from snb_pcode_read because POWER_SETUP_SUBCOMMAND_READ_I1 is unsupported on DG1/DG2. [drm:snb_pcode_read [i915]] warning: pcode (read from mbox 47c) \ mailbox access failed for snb_pcode_read_p [i915]: -6 The code handles the unsupported command but the warning in dmesg is a red herring which has resulted in a couple of bugs being filed. Therefore silence the warning by avoiding calling snb_pcode_read_p for DG1/DG2. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221203031454.1280538-1-ashutosh.dixit@intel.com
2022-11-03drm/i915/hwmon: Fix a build error used with clang compilerGwan-gyeong Mun
Use REG_FIELD_PREP() and a constant value for hwm_field_scale_and_write() If the first argument of FIELD_PREP() is not a compile-time constant value or unsigned long long type, this routine of the __BF_FIELD_CHECK() macro used internally by the FIELD_PREP() macro always returns false. BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \ __bf_cast_unsigned(_reg, ~0ull), \ _pfx "type of reg too small for mask"); \ And it returns a build error by the option among the clang compilation options. [-Werror,-Wtautological-constant-out-of-range-compare] Reported build error while using clang compiler: drivers/gpu/drm/i915/i915_hwmon.c:115:16: error: result of comparison of constant 18446744073709551615 with expression of type 'typeof (_Generic((field_msk), char: (unsigned char)0, unsigned char: (unsigned char)0, signed char: (unsigned char)0, unsigned short: (unsigned short)0, short: (unsigned short)0, unsigned int: (unsigned int)0, int: (unsigned int)0, unsigned long: (unsigned long)0, long: (unsigned long)0, unsigned long long: (unsigned long long)0, long long: (unsigned long long)0, default: (field_msk)))' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] bits_to_set = FIELD_PREP(field_msk, nval); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/bitfield.h:114:3: note: expanded from macro 'FIELD_PREP' __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/bitfield.h:71:53: note: expanded from macro '__BF_FIELD_CHECK' BUILD_BUG_ON_MSG(__bf_cast_unsigned(_mask, _mask) > \ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~ ./include/linux/build_bug.h:39:58: note: expanded from macro 'BUILD_BUG_ON_MSG' ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ./include/linux/compiler_types.h:357:22: note: expanded from macro 'compiletime_assert' _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/compiler_types.h:345:23: note: expanded from macro '_compiletime_assert' __compiletime_assert(condition, msg, prefix, suffix) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./include/linux/compiler_types.h:337:9: note: expanded from macro '__compiletime_assert' if (!(condition)) \ v2: Use REG_FIELD_PREP() macro instead of FIELD_PREP() (Jani) Fixes: 99f55efb7911 ("drm/i915/hwmon: Power PL1 limit and TDP setting") Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> [Joonas: Wrapped commit message error line length to be more reasonable] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221029044230.32128-1-gwan-gyeong.mun@intel.com
2022-10-17drm/i915/hwmon: Extend power/energy for XEHPSDVDale B Stimson
Extend hwmon power/energy for XEHPSDV especially per gt level energy usage. v2: Update to latest HWMON spec (Ashutosh) v3: Fix review comments (Ashutosh) v4: Fix review comments (Anshuman) v5: s/hwmon_device_register_with_info/ devm_hwmon_device_register_with_info/ (Ashutosh) v6: Change contact to intel-gfx (Rodrigo) GEN12_RPSTAT1 is available for all Gen12+ (Andi) Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-8-ashutosh.dixit@intel.com
2022-10-17drm/i915/hwmon: Expose power1_max_intervalAshutosh Dixit
Expose power1_max_interval, that is the tau corresponding to PL1, as a custom hwmon attribute. Some bit manipulation is needed because of the format of PKG_PWR_LIM_1_TIME in GT0_PACKAGE_RAPL_LIMIT register (1.x * power(2,y)). v2: Update date and kernel version in Documentation (Badal) v3: Cleaned up hwm_power1_max_interval_store() (Badal) v4: - Fixed review comments (Anshuman) - In hwm_power1_max_interval_store() get PKG_MAX_WIN from pkg_power_sku when it is valid (Ashutosh) - KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko) v5: On some of the DGFX setups it is seen that although pkg_power_sku is valid the field PKG_WIN_MAX is not populated. So it is decided to stick to default value of PKG_WIN_MAX (Ashutosh) v6: Change contact to intel-gfx (Rodrigo) Fixed variable types in hwm_power1_max_interval_store (Andi) Documented PKG_MAX_WIN_DEFAULT (Andi) Removed else in hwm_attributes_visible (Andi) Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-7-ashutosh.dixit@intel.com
2022-10-17drm/i915/hwmon: Expose card reactive critical powerAshutosh Dixit
Expose the card reactive critical (I1) power. I1 is exposed as power1_crit in microwatts (typically for client products) or as curr1_crit in milliamperes (typically for server). v2: Add curr1_crit functionality (Ashutosh) v3: Use HWMON_CHANNEL_INFO to define power1_crit, curr1_crit (Badal) v4: Use hwm_ prefix for static functions (Ashutosh) v5: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko) v6: Change contact to intel-gfx (Rodrigo) Cc: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-6-ashutosh.dixit@intel.com
2022-10-17drm/i915/hwmon: Show device level energy usageDale B Stimson
Use i915 HWMON to display device level energy input. v2: Updated the date and kernel version in feature description v3: - Cleaned up hwm_energy function and removed unused function i915_hwmon_energy_status_get (Ashutosh) v4: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko) v5: Change contact to intel-gfx (Rodrigo) Change return type of hwm_energy to void (Andi) Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-5-ashutosh.dixit@intel.com
2022-10-17drm/i915/hwmon: Power PL1 limit and TDP settingDale B Stimson
Use i915 HWMON to display/modify dGfx power PL1 limit and TDP setting. v2: - Fix review comments (Ashutosh) - Do not restore power1_max upon module unload/load sequence because on production systems modules are always loaded and not unloaded/reloaded (Ashutosh) - Fix review comments (Jani) - Remove endianness conversion (Ashutosh) v3: Add power1_rated_max (Ashutosh) v4: - Use macro HWMON_CHANNEL_INFO to define power channel (Guenter) - Update the date and kernel version in Documentation (Badal) v5: Use hwm_ prefix for static functions (Ashutosh) v6: Fix review comments (Ashutosh) v7: - Define PCU_PACKAGE_POWER_SKU for DG1,DG2 and move PKG_PKG_TDP to intel_mchbar_regs.h (Anshuman) - KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko) v8: Change contact to intel-gfx (Rodrigo) Minor change to val_sku_unit init (Andi) Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-4-ashutosh.dixit@intel.com
2022-10-17drm/i915/hwmon: Add HWMON current voltage supportRiana Tauro
Use i915 HWMON subsystem to display current input voltage. v2: - Updated date and kernel version in feature description - Fixed review comments (Ashutosh) v3: Use macro HWMON_CHANNEL_INFO to define hwmon channel (Guenter) v4: - Fixed review comments (Ashutosh) - Use hwm_ prefix for static functions (Ashutosh) v5: Added unit of voltage as millivolts (Ashutosh) v6: KernelVersion: 6.2, Date: February 2023 in doc (Tvrtko) v7: Change contact to intel-gfx (Rodrigo) GEN12_RPSTAT1 is available for all Gen12+ (Andi) Added Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon to MAINTAINERS Cc: Guenter Roeck <linux@roeck-us.net> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-3-ashutosh.dixit@intel.com
2022-10-17drm/i915/hwmon: Add HWMON infrastructureDale B Stimson
The i915 HWMON module will be used to expose voltage, power and energy values for dGfx. Here we set up i915 hwmon infrastructure including i915 hwmon registration, basic data structures and functions. v2: - Create HWMON infra patch (Ashutosh) - Fixed review comments (Jani) - Remove "select HWMON" from i915/Kconfig (Jani) v3: Use hwm_ prefix for static functions (Ashutosh) v4: s/#ifdef CONFIG_HWMON/#if IS_REACHABLE(CONFIG_HWMON)/ since the former doesn't work if hwmon is compiled as a module (Guenter) v5: Fixed review comments (Jani) v6: s/kzalloc/devm_kzalloc/ (Andi) v7: s/hwmon_device_register_with_info/ devm_hwmon_device_register_with_info/ (Ashutosh) Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Dale B Stimson <dale.b.stimson@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221013154526.2105579-2-ashutosh.dixit@intel.com