summaryrefslogtreecommitdiff
path: root/kernel/power
AgeCommit message (Collapse)Author
2025-06-19PM: sleep: Print PM debug messages during hibernationRafael J. Wysocki
[ Upstream commit 1b17d4525bca3916644c41e01522df8fa0f8b90b ] Commit cdb8c100d8a4 ("include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume") caused PM debug messages to only be printed during system-wide suspend and resume in progress, but it forgot about hibernation. Address this by adding a check for hibernation in progress to pm_debug_messages_should_print(). Fixes: cdb8c100d8a4 ("include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/4998903.GXAFRqVoOG@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks()Zijun Hu
[ Upstream commit f0050a3e214aa941b78ad4caf122a735a24d81a6 ] pm_show_wakelocks() is called to generate a string when showing attributes /sys/power/wake_(lock|unlock), but the string ends with an unwanted space that was added back by mistake by commit c9d967b2ce40 ("PM: wakeup: simplify the output logic of pm_show_wakelocks()"). Remove the unwanted space. Fixes: c9d967b2ce40 ("PM: wakeup: simplify the output logic of pm_show_wakelocks()") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://patch.msgid.link/20250505-fix_power-v1-1-0f7f2c2f338c@quicinc.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19PM: EM: Fix potential division-by-zero error in em_compute_costs()Yaxiong Tian
[ Upstream commit 179c0c7044a378198adb36f2a12410ab68cc730a ] When the device is of a non-CPU type, table[i].performance won't be initialized in the previous em_init_performance(), resulting in division by zero when calculating costs in em_compute_costs(). Since the 'cost' algorithm is only used for EAS energy efficiency calculations and is currently not utilized by other device drivers, we should add the _is_cpu_device(dev) check to prevent this division-by-zero issue. Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division") Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02PM: EM: Address RCU-related sparse warningsRafael J. Wysocki
[ Upstream commit 3ee7be9e10dd5f79448788b899591d4bd2bf0c19 ] The usage of __rcu in the Energy Model code is quite inconsistent which causes the following sparse warnings to trigger: kernel/power/energy_model.c:169:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:169:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:169:15: got struct em_perf_table * kernel/power/energy_model.c:171:9: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:171:9: expected struct callback_head *head kernel/power/energy_model.c:171:9: got struct callback_head [noderef] __rcu * kernel/power/energy_model.c:171:9: warning: cast removes address space '__rcu' of expression kernel/power/energy_model.c:182:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:182:19: expected struct kref *kref kernel/power/energy_model.c:182:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:200:15: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:200:15: expected struct em_perf_table [noderef] __rcu *table kernel/power/energy_model.c:200:15: got void *[assigned] _res kernel/power/energy_model.c:204:20: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:204:20: expected struct kref *kref kernel/power/energy_model.c:204:20: got struct kref [noderef] __rcu * kernel/power/energy_model.c:320:19: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:320:19: expected struct kref *kref kernel/power/energy_model.c:320:19: got struct kref [noderef] __rcu * kernel/power/energy_model.c:325:45: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:325:45: expected struct em_perf_state *table kernel/power/energy_model.c:325:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:425:45: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:425:45: expected struct em_perf_state *table kernel/power/energy_model.c:425:45: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:442:15: warning: incorrect type in argument 1 (different address spaces) kernel/power/energy_model.c:442:15: expected void const *objp kernel/power/energy_model.c:442:15: got struct em_perf_table [noderef] __rcu *[assigned] em_table kernel/power/energy_model.c:626:55: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:626:55: expected struct em_perf_state *table kernel/power/energy_model.c:626:55: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:681:16: warning: incorrect type in assignment (different address spaces) kernel/power/energy_model.c:681:16: expected struct em_perf_state *new_ps kernel/power/energy_model.c:681:16: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:699:37: warning: incorrect type in argument 2 (different address spaces) kernel/power/energy_model.c:699:37: expected struct em_perf_state *table kernel/power/energy_model.c:699:37: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:733:38: warning: incorrect type in argument 3 (different address spaces) kernel/power/energy_model.c:733:38: expected struct em_perf_state *table kernel/power/energy_model.c:733:38: got struct em_perf_state [noderef] __rcu * kernel/power/energy_model.c:855:53: warning: dereference of noderef expression kernel/power/energy_model.c:864:32: warning: dereference of noderef expression This is because the __rcu annotation for sparse is only applicable to pointers that need rcu_dereference() or equivalent for protection, which basically means pointers assigned with rcu_assign_pointer(). Make all of the above sparse warnings go away by cleaning up the usage of __rcu and using rcu_dereference_protected() where applicable. Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/5885405.DvuYhMxLoT@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02PM: EM: use kfree_rcu() to simplify the codeLi RongQing
[ Upstream commit 1618f635bdf56f3ac158171114e9bf18db234cbf ] The callback function of call_rcu() just calls kfree(), so use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20250218082021.2766-1-lirongqing@baidu.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Stable-dep-of: 3ee7be9e10dd ("PM: EM: Address RCU-related sparse warnings") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-20PM: hibernate: Avoid deadlock in hibernate_compressor_param_set()Lizhi Xu
[ Upstream commit 52323ed1444ea5c2a5f1754ea0a2d9c8c216ccdf ] syzbot reported a deadlock in lock_system_sleep() (see below). The write operation to "/sys/module/hibernate/parameters/compressor" conflicts with the registration of ieee80211 device, resulting in a deadlock when attempting to acquire system_transition_mutex under param_lock. To avoid this deadlock, change hibernate_compressor_param_set() to use mutex_trylock() for attempting to acquire system_transition_mutex and return -EBUSY when it fails. Task flags need not be saved or adjusted before calling mutex_trylock(&system_transition_mutex) because the caller is not going to end up waiting for this mutex and if it runs concurrently with system suspend in progress, it will be frozen properly when it returns to user space. syzbot report: syz-executor895/5833 is trying to acquire lock: ffffffff8e0828c8 (system_transition_mutex){+.+.}-{4:4}, at: lock_system_sleep+0x87/0xa0 kernel/power/main.c:56 but task is already holding lock: ffffffff8e07dc68 (param_lock){+.+.}-{4:4}, at: kernel_param_lock kernel/params.c:607 [inline] ffffffff8e07dc68 (param_lock){+.+.}-{4:4}, at: param_attr_store+0xe6/0x300 kernel/params.c:586 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (param_lock){+.+.}-{4:4}: __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730 ieee80211_rate_control_ops_get net/mac80211/rate.c:220 [inline] rate_control_alloc net/mac80211/rate.c:266 [inline] ieee80211_init_rate_ctrl_alg+0x18d/0x6b0 net/mac80211/rate.c:1015 ieee80211_register_hw+0x20cd/0x4060 net/mac80211/main.c:1531 mac80211_hwsim_new_radio+0x304e/0x54e0 drivers/net/wireless/virtual/mac80211_hwsim.c:5558 init_mac80211_hwsim+0x432/0x8c0 drivers/net/wireless/virtual/mac80211_hwsim.c:6910 do_one_initcall+0x128/0x700 init/main.c:1257 do_initcall_level init/main.c:1319 [inline] do_initcalls init/main.c:1335 [inline] do_basic_setup init/main.c:1354 [inline] kernel_init_freeable+0x5c7/0x900 init/main.c:1568 kernel_init+0x1c/0x2b0 init/main.c:1457 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 -> #2 (rtnl_mutex){+.+.}-{4:4}: __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730 wg_pm_notification drivers/net/wireguard/device.c:80 [inline] wg_pm_notification+0x49/0x180 drivers/net/wireguard/device.c:64 notifier_call_chain+0xb7/0x410 kernel/notifier.c:85 notifier_call_chain_robust kernel/notifier.c:120 [inline] blocking_notifier_call_chain_robust kernel/notifier.c:345 [inline] blocking_notifier_call_chain_robust+0xc9/0x170 kernel/notifier.c:333 pm_notifier_call_chain_robust+0x27/0x60 kernel/power/main.c:102 snapshot_open+0x189/0x2b0 kernel/power/user.c:77 misc_open+0x35a/0x420 drivers/char/misc.c:179 chrdev_open+0x237/0x6a0 fs/char_dev.c:414 do_dentry_open+0x735/0x1c40 fs/open.c:956 vfs_open+0x82/0x3f0 fs/open.c:1086 do_open fs/namei.c:3830 [inline] path_openat+0x1e88/0x2d80 fs/namei.c:3989 do_filp_open+0x20c/0x470 fs/namei.c:4016 do_sys_openat2+0x17a/0x1e0 fs/open.c:1428 do_sys_open fs/open.c:1443 [inline] __do_sys_openat fs/open.c:1459 [inline] __se_sys_openat fs/open.c:1454 [inline] __x64_sys_openat+0x175/0x210 fs/open.c:1454 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #1 ((pm_chain_head).rwsem){++++}-{4:4}: down_read+0x9a/0x330 kernel/locking/rwsem.c:1524 blocking_notifier_call_chain_robust kernel/notifier.c:344 [inline] blocking_notifier_call_chain_robust+0xa9/0x170 kernel/notifier.c:333 pm_notifier_call_chain_robust+0x27/0x60 kernel/power/main.c:102 snapshot_open+0x189/0x2b0 kernel/power/user.c:77 misc_open+0x35a/0x420 drivers/char/misc.c:179 chrdev_open+0x237/0x6a0 fs/char_dev.c:414 do_dentry_open+0x735/0x1c40 fs/open.c:956 vfs_open+0x82/0x3f0 fs/open.c:1086 do_open fs/namei.c:3830 [inline] path_openat+0x1e88/0x2d80 fs/namei.c:3989 do_filp_open+0x20c/0x470 fs/namei.c:4016 do_sys_openat2+0x17a/0x1e0 fs/open.c:1428 do_sys_open fs/open.c:1443 [inline] __do_sys_openat fs/open.c:1459 [inline] __se_sys_openat fs/open.c:1454 [inline] __x64_sys_openat+0x175/0x210 fs/open.c:1454 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (system_transition_mutex){+.+.}-{4:4}: check_prev_add kernel/locking/lockdep.c:3163 [inline] check_prevs_add kernel/locking/lockdep.c:3282 [inline] validate_chain kernel/locking/lockdep.c:3906 [inline] __lock_acquire+0x249e/0x3c40 kernel/locking/lockdep.c:5228 lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5851 __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0x19b/0xb10 kernel/locking/mutex.c:730 lock_system_sleep+0x87/0xa0 kernel/power/main.c:56 hibernate_compressor_param_set+0x1c/0x210 kernel/power/hibernate.c:1452 param_attr_store+0x18f/0x300 kernel/params.c:588 module_attr_store+0x55/0x80 kernel/params.c:924 sysfs_kf_write+0x117/0x170 fs/sysfs/file.c:139 kernfs_fop_write_iter+0x33d/0x500 fs/kernfs/file.c:334 new_sync_write fs/read_write.c:586 [inline] vfs_write+0x5ae/0x1150 fs/read_write.c:679 ksys_write+0x12b/0x250 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Chain exists of: system_transition_mutex --> rtnl_mutex --> param_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(param_lock); lock(rtnl_mutex); lock(param_lock); lock(system_transition_mutex); *** DEADLOCK *** Reported-by: syzbot+ace60642828c074eb913@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ace60642828c074eb913 Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Link: https://patch.msgid.link/20250224013139.3994500-1-lizhi.xu@windriver.com [ rjw: New subject matching the code changes, changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08PM: hibernate: Add error handling for syscore_suspend()Wentao Liang
[ Upstream commit e20a70c572539a486dbd91b225fa6a194a5e2122 ] In hibernation_platform_enter(), the code did not check the return value of syscore_suspend(), potentially leading to a situation where syscore_resume() would be called even if syscore_suspend() failed. This could cause unpredictable behavior or system instability. Modify the code sequence in question to properly handle errors returned by syscore_suspend(). If an error occurs in the suspend path, the code now jumps to label 'Enable_irqs' skipping the syscore_resume() call and only enabling interrupts after setting the system state to SYSTEM_RUNNING. Fixes: 40dc166cb5dd ("PM / Core: Introduce struct syscore_ops for core subsystems PM") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20250119143205.2103-1-vulab@iscas.ac.cn [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-27[tree-wide] finally take no_llseek outAl Viro
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-10PM: hibernate: Remove unused stub for saveable_highmem_page()Andy Shevchenko
When saveable_highmem_page() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: kernel/power/snapshot.c:1369:21: error: unused function 'saveable_highmem_page' [-Werror,-Wunused-function] 1369 | static inline void *saveable_highmem_page(struct zone *z, unsigned long p) | ^~~~~~~~~~~~~~~~~~~~~ Fix this by removing unused stub. See also commit 6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build"). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20240905184848.318978-1-andriy.shevchenko@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02PM: sleep: Use sysfs_emit() and sysfs_emit_at() in "show" functionsXueqin Luo
As Documentation/filesystems/sysfs.rst suggested, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. No functional change intended. Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn> Link: https://patch.msgid.link/20240801083156.2513508-3-luoxueqin@kylinos.cn [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-02PM: hibernate: Use sysfs_emit() and sysfs_emit_at() in "show" functionsXueqin Luo
As Documentation/filesystems/sysfs.rst suggested, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. No functional change intended. Signed-off-by: Xueqin Luo <luoxueqin@kylinos.cn> Link: https://patch.msgid.link/20240801083156.2513508-2-luoxueqin@kylinos.cn Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-07-03mm: remove the implementation of swap_free() and always use swap_free_nr()Barry Song
To streamline maintenance efforts, we propose removing the implementation of swap_free(). Instead, we can simply invoke swap_free_nr() with nr set to 1. swap_free_nr() is designed with a bitmap consisting of only one long, resulting in overhead that can be ignored for cases where nr equals 1. A prime candidate for leveraging swap_free_nr() lies within kernel/power/swap.c. Implementing this change facilitates the adoption of batch processing for hibernation. Link: https://lkml.kernel.org/r/20240529082824.150954-3-21cnbao@gmail.com Signed-off-by: Barry Song <v-songbaohua@oppo.com> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chuanhua Han <hanchuanhua@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Gao Xiang <xiang@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-27Merge tag 'vfs-6.10-rc2.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix io_uring based write-through after converting cifs to use the netfs library - Fix aio error handling when doing write-through via netfs library - Fix performance regression in iomap when used with non-large folio mappings - Fix signalfd error code - Remove obsolete comment in signalfd code - Fix async request indication in netfs_perform_write() by raising BDP_ASYNC when IOCB_NOWAIT is set - Yield swap device immediately to prevent spurious EBUSY errors - Don't cross a .backup mountpoint from backup volumes in afs to avoid infinite loops - Fix a race between umount and async request completion in 9p after 9p was converted to use the netfs library * tag 'vfs-6.10-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs, 9p: Fix race between umount and async request completion afs: Don't cross .backup mountpoint from backup volume swap: yield device immediately netfs: Fix setting of BDP_ASYNC from iocb flags signalfd: drop an obsolete comment signalfd: fix error return code iomap: fault in smaller chunks for non-large folio mappings filemap: add helper mapping_max_folio_size() netfs: Fix AIO error handling when doing write-through netfs: Fix io_uring based write-through
2024-05-24swap: yield device immediatelyChristian Brauner
Otherwise we can cause spurious EBUSY issues when trying to mount the rootfs later on. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218845 Reported-by: Petri Kaukasoina <petri.kaukasoina@tuni.fi> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-21Merge tag 'pull-set_blocksize' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs blocksize updates from Al Viro: "This gets rid of bogus set_blocksize() uses, switches it over to be based on a 'struct file *' and verifies that the caller has the device opened exclusively" * tag 'pull-set_blocksize' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make set_blocksize() fail unless block device is opened exclusive set_blocksize(): switch to passing struct file * btrfs_get_bdev_and_sb(): call set_blocksize() only for exclusive opens swsusp: don't bother with setting block size zram: don't bother with reopening - just use O_EXCL for open swapon(2): open swap with O_EXCL swapon(2)/swapoff(2): don't bother with block size pktcdvd: sort set_blocksize() calls out bcache_register(): don't bother with set_blocksize()
2024-05-15Merge tag 'cgroup-for-6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - The locking around cpuset hotplug processing has always been a bit of mess which was worked around by making hotplug processing asynchronous. The asynchronity isn't great and led to other issues. We tried to make the behavior synchronous a while ago but that led to lockdep splats. Waiman took another stab at cleaning up and making it synchronous. The patch has been in -next for well over a month and there haven't been any complaints, so fingers crossed. - Tracepoints added to help understanding rstat lock contentions. - A bunch of minor changes - doc updates, code cleanups and selftests. * tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits) cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints selftests/cgroup: Drop define _GNU_SOURCE docs: cgroup-v1: Update page cache removal functions selftests/cgroup: fix uninitialized variables in test_zswap.c selftests/cgroup: cpu_hogger init: use {} instead of {NULL} selftests/cgroup: fix clang warnings: uninitialized fd variable selftests/cgroup: fix clang build failures for abs() calls cgroup/cpuset: Remove outdated comment in sched_partition_write() cgroup/cpuset: Fix incorrect top_cpuset flags cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice cgroup/cpuset: Statically initialize more members of top_cpuset cgroup: Avoid unnecessary looping in cgroup_no_v1() cgroup, legacy_freezer: update comment for freezer_css_offline() docs, cgroup: add entries for pids to cgroup-v2.rst cgroup: don't call cgroup1_pidlist_destroy_all() for v2 cgroup_freezer: update comment for freezer_css_online() cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints cgroup/pids: Remove superfluous zeroing docs: cgroup-v1: Fix description for css_online ...
2024-05-13Merge branches 'pm-em' and 'pm-docs'Rafael J. Wysocki
Merge Enery Model update and a power management documentation update for 6.10: - Make the Samsung exynos-asv driver update the Energy Model after adjusting voltage on top of some preliminary changes of the OPP and Enery Model generic code (Lukasz Luba). - Remove a reference to a function that has been dropped from the power management documentation (Bjorn Helgaas). * pm-em: soc: samsung: exynos-asv: Update Energy Model after adjusting voltage PM: EM: Add em_dev_update_chip_binning() PM: EM: Refactor em_adjust_new_capacity() OPP: OF: Export dev_opp_pm_calc_power() for usage from EM * pm-docs: Documentation: PM: Update platform_pci_wakeup_init() reference
2024-05-02swsusp: don't bother with setting block sizeAl Viro
same as with the swap... Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-30PM: hibernate: replace deprecated strncpy() with strscpy()Justin Stitt
strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. This kernel config option is simply assigned with the resume_file buffer. It should be NUL-terminated but not necessarily NUL-padded as per its further usage with other string apis: | static int __init find_resume_device(void) | { | if (!strlen(resume_file)) | return -ENOENT; | | pm_pr_dbg("Checking hibernation image partition %s\n", resume_file); Use strscpy() [2] as it guarantees NUL-termination on the destination buffer. Specifically, use the new 2-argument version of strscpy() introduced in Commit e6584c3964f2f ("string: Allow 2-argument strscpy()"). Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-08cgroup/cpuset: Make cpuset hotplug processing synchronousWaiman Long
Since commit 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()"), cpuset hotplug was done asynchronously via a work function. This is to avoid recursive locking of cgroup_mutex. Since then, the cgroup locking scheme has changed quite a bit. A cpuset_mutex was introduced to protect cpuset specific operations. The cpuset_mutex is then replaced by a cpuset_rwsem. With commit d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock order"), cpu_hotplug_lock is acquired before cpuset_rwsem. Later on, cpuset_rwsem is reverted back to cpuset_mutex. All these locking changes allow the hotplug code to call into cpuset core directly. The following commits were also merged due to the asynchronous nature of cpuset hotplug processing. - commit b22afcdf04c9 ("cpu/hotplug: Cure the cpusets trainwreck") - commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") - commit 28b89b9e6f7b ("cpuset: handle race between CPU hotplug and cpuset_hotplug_work") Clean up all these bandages by making cpuset hotplug processing synchronous again with the exception that the call to cgroup_transfer_tasks() to transfer tasks out of an empty cgroup v1 cpuset, if necessary, will still be done via a work function due to the existing cgroup_mutex -> cpu_hotplug_lock dependency. It is possible to reverse that dependency, but that will require updating a number of different cgroup controllers. This special hotplug code path should be rarely taken anyway. As all the cpuset states will be updated by the end of the hotplug operation, we can revert most the above commits except commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs") which is partially reverted. Also removing some cpus_read_lock trylock attempts in the cpuset partition code as they are no longer necessary since the cpu_hotplug_lock is now held for the whole duration of the cpuset hotplug code path. Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-04-08PM: EM: Add em_dev_update_chip_binning()Lukasz Luba
Add a function which allows to modify easily the EM after the new voltage information is available. The device drivers for the chip can adjust the voltage values after setup. The voltage for the same frequency in OPP can be different due to chip binning. The voltage impacts the power usage and the EM power values can be updated to reflect that. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-08PM: EM: Refactor em_adjust_new_capacity()Lukasz Luba
Extract em_table_dup() and em_recalc_and_update() from em_adjust_new_capacity(). Both functions will be later reused by the 'update EM due to chip binning' functionality. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-04-08PM: s2idle: Make sure CPUs will wakeup directly on resumeAnna-Maria Behnsen
s2idle works like a regular suspend with freezing processes and freezing devices. All CPUs except the control CPU go into idle. Once this is completed the control CPU kicks all other CPUs out of idle, so that they reenter the idle loop and then enter s2idle state. The control CPU then issues an swait() on the suspend state and therefore enters the idle loop as well. Due to being kicked out of idle, the other CPUs leave their NOHZ states, which means the tick is active and the corresponding hrtimer is programmed to the next jiffie. On entering s2idle the CPUs shut down their local clockevent device to prevent wakeups. The last CPU which enters s2idle shuts down its local clockevent and freezes timekeeping. On resume, one of the CPUs receives the wakeup interrupt, unfreezes timekeeping and its local clockevent and starts the resume process. At that point all other CPUs are still in s2idle with their clockevents switched off. They only resume when they are kicked by another CPU or after resuming devices and then receiving a device interrupt. That means there is no guarantee that all CPUs will wakeup directly on resume. As a consequence there is no guarantee that timers which are queued on those CPUs and should expire directly after resume, are handled. Also timer list timers which are remotely queued to one of those CPUs after resume will not result in a reprogramming IPI as the tick is active. Queueing a hrtimer will also not result in a reprogramming IPI because the first hrtimer event is already in the past. The recent introduction of the timer pull model (7ee988770326 ("timers: Implement the hierarchical pull model")) amplifies this problem, if the current migrator is one of the non woken up CPUs. When a non pinned timer list timer is queued and the queuing CPU goes idle, it relies on the still suspended migrator CPU to expire the timer which will happen by chance. The problem exists since commit 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path"). There the cpuidle_pause() call which in turn invoked a wakeup for all idle CPUs was moved to a later point in the resume process. This might not be reached or reached very late because it waits on a timer of a still suspended CPU. Address this by kicking all CPUs out of idle after the control CPU returns from swait() so that they resume their timers and restore consistent system state. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218641 Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path") Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Cc: 5.16+ <stable@kernel.org> # 5.16+ Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-21Merge tag 'rtc-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Subsytem: - rtc_class is now const Drivers: - ds1511: cleanup, set date and time range and alarm offset limit - max31335: fix interrupt handler - pcf8523: improve suspend support" * tag 'rtc-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (28 commits) MAINTAINER: Include linux-arm-msm for Qualcomm RTC patches dt-bindings: rtc: zynqmp: Add support for Versal/Versal NET SoCs rtc: class: make rtc_class constant dt-bindings: rtc: abx80x: Improve checks on trickle charger constraints MAINTAINERS: adjust file entry in ARM/Mediatek RTC DRIVER rtc: nct3018y: fix possible NULL dereference rtc: max31335: fix interrupt status reg rtc: mt6397: select IRQ_DOMAIN instead of depending on it dt-bindings: rtc: abx80x: convert to yaml rtc: m41t80: Use the unified property API get the wakeup-source property dt-bindings: at91rm9260-rtt: add sam9x7 compatible dt-bindings: rtc: convert MT7622 RTC to the json-schema dt-bindings: rtc: convert MT2717 RTC to the json-schema rtc: pcf8523: add suspend handlers for alarm IRQ rtc: ds1511: set alarm offset limit rtc: ds1511: set range rtc: ds1511: drop inline/noinline hints rtc: ds1511: rename pdata rtc: ds1511: implement ds1511_rtc_read_alarm properly rtc: ds1511: remove partial alarm support ...
2024-03-13PM: EM: Force device drivers to provide power in uWLukasz Luba
The EM only supports power in uW. Make sure that it is not possible to register some downstream driver which doesn't provide power in uW. The only exception is artificial EM, but that EM is ignored by the rest of kernel frameworks (thermal, powercap, etc). Reported-by: PoShao Chen <poshao.chen@mediatek.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-13Merge tag 'pm-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "From the functional perspective, the most significant change here is the addition of support for Energy Models that can be updated dynamically at run time. There is also the addition of LZ4 compression support for hibernation, the new preferred core support in amd-pstate, new platforms support in the Intel RAPL driver, new model-specific EPP handling in intel_pstate and more. Apart from that, the cpufreq default transition delay is reduced from 10 ms to 2 ms (along with some related adjustments), the system suspend statistics code undergoes a significant rework and there is a usual bunch of fixes and code cleanups all over. Specifics: - Allow the Energy Model to be updated dynamically (Lukasz Luba) - Add support for LZ4 compression algorithm to the hibernation image creation and loading code (Nikhil V) - Fix and clean up system suspend statistics collection (Rafael Wysocki) - Simplify device suspend and resume handling in the power management core code (Rafael Wysocki) - Fix PCI hibernation support description (Yiwei Lin) - Make hibernation take set_memory_ro() return values into account as appropriate (Christophe Leroy) - Set mem_sleep_current during kernel command line setup to avoid an ordering issue with handling it (Maulik Shah) - Fix wake IRQs handling when pm_runtime_force_suspend() is used as a driver's system suspend callback (Qingliang Li) - Simplify pm_runtime_get_if_active() usage and add a replacement for pm_runtime_put_autosuspend() (Sakari Ailus) - Add a tracepoint for runtime_status changes tracking (Vilas Bhat) - Fix section title markdown in the runtime PM documentation (Yiwei Lin) - Enable preferred core support in the amd-pstate cpufreq driver (Meng Li) - Fix min_perf assignment in amd_pstate_adjust_perf() and make the min/max limit perf values in amd-pstate always stay within the (highest perf, lowest perf) range (Tor Vic, Meng Li) - Allow intel_pstate to assign model-specific values to strings used in the EPP sysfs interface and make it do so on Meteor Lake (Srinivas Pandruvada) - Drop long-unused cpudata::prev_cummulative_iowait from the intel_pstate cpufreq driver (Jiri Slaby) - Prevent scaling_cur_freq from exceeding scaling_max_freq when the latter is an inefficient frequency (Shivnandan Kumar) - Change default transition delay in cpufreq to 2ms (Qais Yousef) - Remove references to 10ms minimum sampling rate from comments in the cpufreq code (Pierre Gondois) - Honour transition_latency over transition_delay_us in cpufreq (Qais Yousef) - Stop unregistering cpufreq cooling on CPU hot-remove (Viresh Kumar) - General enhancements / cleanups to ARM cpufreq drivers (tianyu2, Nícolas F. R. A. Prado, Erick Archer, Arnd Bergmann, Anastasia Belova) - Update cpufreq-dt-platdev to block/approve devices (Richard Acayan) - Make the SCMI cpufreq driver get a transition delay value from firmware (Pierre Gondois) - Prevent the haltpoll cpuidle governor from shrinking guest poll_limit_ns below grow_start (Parshuram Sangle) - Avoid potential overflow in integer multiplication when computing cpuidle state parameters (C Cheng) - Adjust MWAIT hint target C-state computation in the ACPI cpuidle driver and in intel_idle to return a correct value for C0 (He Rongguang) - Address multiple issues in the TPMI RAPL driver and add support for new platforms (Lunar Lake-M, Arrow Lake) to Intel RAPL (Zhang Rui) - Fix freq_qos_add_request() return value check in dtpm_cpu (Daniel Lezcano) - Fix kernel-doc for dtpm_create_hierarchy() (Yang Li) - Fix file leak in get_pkg_num() in x86_energy_perf_policy (Samasth Norway Ananda) - Fix cpupower-frequency-info.1 man page typo (Jan Kratochvil) - Fix a couple of warnings in the OPP core code related to W=1 builds (Viresh Kumar) - Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h (Viresh Kumar) - Extend dev_pm_opp_data with turbo support (Sibi Sankar) - dt-bindings: drop maxItems from inner items (David Heidelberg)" * tag 'pm-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (95 commits) dt-bindings: opp: drop maxItems from inner items OPP: debugfs: Fix warning around icc_get_name() OPP: debugfs: Fix warning with W=1 builds cpufreq: Move dev_pm_opp_{init|free}_cpufreq_table() to pm_opp.h OPP: Extend dev_pm_opp_data with turbo support Fix cpupower-frequency-info.1 man page typo cpufreq: scmi: Set transition_delay_us firmware: arm_scmi: Populate fast channel rate_limit firmware: arm_scmi: Populate perf commands rate_limit cpuidle: ACPI/intel: fix MWAIT hint target C-state computation PM: sleep: wakeirq: fix wake irq warning in system suspend powercap: dtpm: Fix kernel-doc for dtpm_create_hierarchy() function cpufreq: Don't unregister cpufreq cooling on CPU hotplug PM: suspend: Set mem_sleep_current during kernel command line setup cpufreq: Honour transition_latency over transition_delay_us cpufreq: Limit resolving a frequency to policy min/max Documentation: PM: Fix runtime_pm.rst markdown syntax cpufreq: amd-pstate: adjust min/max limit perf cpufreq: Remove references to 10ms min sampling rate cpufreq: intel_pstate: Update default EPPs for Meteor Lake ...
2024-03-11Merge branch 'pm-em'Rafael J. Wysocki
Merge Enery Model changes for 6.9-rc1: - Allow the Energy Model to be updated dynamically (Lukasz Luba). * pm-em: (24 commits) PM: EM: Fix nr_states warnings in static checks Documentation: EM: Update with runtime modification design PM: EM: Add em_dev_compute_costs() PM: EM: Remove old table PM: EM: Change debugfs configuration to use runtime EM table data drivers/thermal/devfreq_cooling: Use new Energy Model interface drivers/thermal/cpufreq_cooling: Use new Energy Model interface powercap/dtpm_devfreq: Use new Energy Model interface to get table powercap/dtpm_cpu: Use new Energy Model interface to get table PM: EM: Optimize em_cpu_energy() and remove division PM: EM: Support late CPUs booting and capacity adjustment PM: EM: Add performance field to struct em_perf_state and optimize PM: EM: Add em_perf_state_from_pd() to get performance states table PM: EM: Introduce em_dev_update_perf_domain() for EM updates PM: EM: Add functions for memory allocations for new EM tables PM: EM: Use runtime modified EM for CPUs energy estimation in EAS PM: EM: Introduce runtime modifiable table PM: EM: Split the allocation and initialization of the EM table PM: EM: Check if the get_cost() callback is present in em_compute_costs() PM: EM: Introduce em_compute_costs() ...
2024-03-08rtc: class: make rtc_class constantRicardo B. Marliere
Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the rtc_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Link: https://lore.kernel.org/r/20240305-class_cleanup-abelloni-v1-1-944c026137c8@marliere.net Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-02-29PM: suspend: Set mem_sleep_current during kernel command line setupMaulik Shah
psci_init_system_suspend() invokes suspend_set_ops() very early during bootup even before kernel command line for mem_sleep_default is setup. This leads to kernel command line mem_sleep_default=s2idle not working as mem_sleep_current gets changed to deep via suspend_set_ops() and never changes back to s2idle. Set mem_sleep_current along with mem_sleep_default during kernel command line setup as default suspend mode. Fixes: faf7ec4a92c0 ("drivers: firmware: psci: add system suspend support") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-25power: port block device access to fileChristian Brauner
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-6-adbd023e19cc@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-22PM: EM: Fix nr_states warnings in static checksLukasz Luba
During the static checks nr_states has been mentioned by the kernel test robot. Fix the warning in those 2 places. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-22PM: hibernate: Don't ignore return from set_memory_ro()Christophe Leroy
set_memory_ro() and set_memory_rw() can fail, leaving memory unprotected. Take the returned value into account and abort in case of failure. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-22PM: hibernate: Support to select compression algorithmNikhil V
Currently the default compression algorithm is selected based on compile time options. Introduce a module parameter "hibernate.compressor" to override this behaviour. Different compression algorithms have different characteristics and hibernation may benefit when it uses any of these algorithms, especially when a secondary algorithm(LZ4) offers better decompression speeds over a default algorithm(LZO), which in turn reduces hibernation image restore time. Users can override the default algorithm in two ways: 1) Passing "hibernate.compressor" as kernel command line parameter. Usage: LZO: hibernate.compressor=lzo LZ4: hibernate.compressor=lz4 2) Specifying the algorithm at runtime. Usage: LZO: echo lzo > /sys/module/hibernate/parameters/compressor LZ4: echo lz4 > /sys/module/hibernate/parameters/compressor Currently LZO and LZ4 are the supported algorithms. LZO is the default compression algorithm used with hibernation. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add em_dev_compute_costs()Lukasz Luba
The device drivers can modify EM at runtime by providing a new EM table. The EM is used by the EAS and the em_perf_state::cost stores pre-calculated value to avoid overhead. This patch provides the API for device drivers to calculate the cost values properly (and not duplicate the same code). Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Remove old tableLukasz Luba
Remove the old EM table which wasn't able to modify the data. Clean the unneeded function and refactor the code a bit. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Change debugfs configuration to use runtime EM table dataLukasz Luba
Dump the runtime EM table values which can be modified in time. In order to do that allocate chunk of debug memory which can be later freed automatically thanks to devm_kcalloc(). This design can handle the fact that the EM table memory can change after EM update, so debug code cannot use the pointer from initialization phase. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Optimize em_cpu_energy() and remove divisionLukasz Luba
The Energy Model (EM) can be modified at runtime which brings new possibilities. The em_cpu_energy() is called by the Energy Aware Scheduler (EAS) in its hot path. The energy calculation uses power value for a given performance state (ps) and the CPU busy time as percentage for that given frequency. It is possible to avoid the division by 'scale_cpu' at runtime, because EM is updated whenever new max capacity CPU is set in the system. Use that feature and do the needed division during the calculation of the coefficient 'ps->cost'. That enhanced 'ps->cost' value can be then just multiplied simply by utilization: pd_nrg = ps->cost * \Sum cpu_util to get the needed energy for whole Performance Domain (PD). With this optimization and earlier removal of map_util_freq(), the em_cpu_energy() should run faster on the Big CPU by 1.43x and on the Little CPU by 1.69x (RockPi 4B board). Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Support late CPUs booting and capacity adjustmentLukasz Luba
The patch adds needed infrastructure to handle the late CPUs boot, which might change the previous CPUs capacity values. With this changes the new CPUs which try to register EM will trigger the needed re-calculations for other CPUs EMs. Thanks to that the em_per_state::performance values will be aligned with the CPU capacity information after all CPUs finish the boot and EM registrations. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add performance field to struct em_perf_state and optimizeLukasz Luba
The performance doesn't scale linearly with the frequency. Also, it may be different in different workloads. Some CPUs are designed to be particularly good at some applications e.g. images or video processing and other CPUs in different. When those different types of CPUs are combined in one SoC they should be properly modeled to get max of the HW in Energy Aware Scheduler (EAS). The Energy Model (EM) provides the power vs. performance curves to the EAS, but assumes the CPUs capacity is fixed and scales linearly with the frequency. This patch allows to adjust the curve on the 'performance' axis as well. Code speed optimization: Removing map_util_freq() allows to avoid one division and one multiplication operations from the EAS hot code path. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Introduce em_dev_update_perf_domain() for EM updatesLukasz Luba
Add API function em_dev_update_perf_domain() which allows the EM to be changed safely. Concurrent updaters are serialized with a mutex and the removal of memory that will not be used any more is carried out with the help of RCU. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add functions for memory allocations for new EM tablesLukasz Luba
The runtime modified EM table can be provided from drivers. Create mechanism which allows safely allocate and free the table for device drivers. The same table can be used by the EAS in task scheduler code paths, so make sure the memory is not freed when the device driver module is unloaded. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Introduce runtime modifiable tableLukasz Luba
The new runtime table can be populated with a new power data to better reflect the actual efficiency of the device e.g. CPU. The power can vary over time e.g. due to the SoC temperature change. Higher temperature can increase power values. For longer running scenarios, such as game or camera, when also other devices are used (e.g. GPU, ISP) the CPU power can change. The new EM framework is able to addresses this issue and change the EM data at runtime safely. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Split the allocation and initialization of the EM tableLukasz Luba
Split the process of allocation and data initialization for the EM table. The upcoming changes for modifiable EM will use it. This change is not expected to alter the general functionality. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Check if the get_cost() callback is present in em_compute_costs()Lukasz Luba
Subsequent changes will introduce a case in which 'cb->get_cost' may not be set in em_compute_costs(), so add a check to ensure that it is not NULL before attempting to dereference it. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Introduce em_compute_costs()Lukasz Luba
Move the EM costs computation code into a new dedicated function, em_compute_costs(), that can be reused in other places in the future. This change is not expected to alter the general functionality. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Find first CPU active while updating OPP efficiencyLukasz Luba
The Energy Model might be updated at runtime and the energy efficiency for each OPP may change. Thus, there is a need to update also the cpufreq framework and make it aligned to the new values. In order to do that, use a first active CPU from the Performance Domain. This is needed since the first CPU in the cpumask might be offline when we run this code path. Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Extend em_cpufreq_update_efficiencies() argument listLukasz Luba
In order to prepare the code for the modifiable EM perf_state table, make em_cpufreq_update_efficiencies() take a pointer to the EM table as its second argument and modify it to use that new argument instead of the 'table' member of dev->em_pd. No functional impact. Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-08PM: EM: Add missing newline for the message logLukasz Luba
Fix missing newline for the string long in the error code path. Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-05PM: hibernate: Add support for LZ4 compression for hibernationNikhil V
Extend the support for LZ4 compression to be used with hibernation. The main idea is that different compression algorithms have different characteristics and hibernation may benefit when it uses any of these algorithms: a default algorithm, having higher compression rate but is slower(compression/decompression) and a secondary algorithm, that is faster(compression/decompression) but has lower compression rate. LZ4 algorithm has better decompression speeds over LZO. This reduces the hibernation image restore time. As per test results: LZO LZ4 Size before Compression(bytes) 682696704 682393600 Size after Compression(bytes) 146502402 155993547 Decompression Rate 335.02 MB/s 501.05 MB/s Restore time 4.4s 3.8s LZO is the default compression algorithm used for hibernation. Enable CONFIG_HIBERNATION_COMP_LZ4 to set the default compressor as LZ4. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-02-05PM: hibernate: Move to crypto APIs for LZO compressionNikhil V
Currently for hibernation, LZO is the only compression algorithm available and uses the existing LZO library calls. However, there is no flexibility to switch to other algorithms which provides better results. The main idea is that different compression algorithms have different characteristics and hibernation may benefit when it uses alternate algorithms. By moving to crypto based APIs, it lays a foundation to use other compression algorithms for hibernation. There are no functional changes introduced by this approach. Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>