summaryrefslogtreecommitdiff
path: root/drivers/cpufreq/intel_pstate.c
AgeCommit message (Collapse)Author
2015-06-16intel_pstate: Fix overflow in busy_scaled due to long delayPrarit Bhargava
The kernel may delay interrupts for a long time which can result in timers being delayed. If this occurs the intel_pstate driver will crash with a divide by zero error: divide error: 0000 [#1] SMP Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1 Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014 task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000 RIP: 0010:[<ffffffff814a9279>] [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0 RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206 RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001 R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5 R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071 ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100 Call Trace: <IRQ> [<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840 [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130 [<ffffffff8107df56>] call_timer_fn+0x36/0x110 [<ffffffff814a9100>] ? pid_param_set+0x130/0x130 [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320 [<ffffffff81077b2f>] __do_softirq+0xef/0x280 [<ffffffff816156dc>] call_softirq+0x1c/0x30 [<ffffffff81015d95>] do_softirq+0x65/0xa0 [<ffffffff81077ec5>] irq_exit+0x115/0x120 [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80 <EOI> [<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0 [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0 [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200 [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30 [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290 [<ffffffff8104228a>] start_secondary+0x1ba/0x230 Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29 RIP [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0 RSP <ffff883fff4e3db8> The kernel values for cpudata for CPU 113 were: struct cpudata { cpu = 113, timer = { entry = { next = 0x0, prev = 0xdead000000200200 }, expires = 8357799745, base = 0xffff883fe84ec001, function = 0xffffffff814a9100 <intel_pstate_timer_func>, data = 18446612406765768960, <snip> i_gain = 0, d_gain = 0, deadband = 0, last_err = 22489 }, last_sample_time = { tv64 = 4063132438017305 }, prev_aperf = 287326796397463, prev_mperf = 251427432090198, sample = { core_pct_busy = 23081, aperf = 2937407, mperf = 3257884, freq = 2524484, time = { tv64 = 4063149215234118 } } } which results in the time between samples = last_sample_time - sample.time = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds. The duration between reads of the APERF and MPERF registers overflowed a s32 sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0. While the kernel shouldn't be delaying for a long time, it can and does happen and the intel_pstate driver should not panic in this situation. This patch changes the div_fp() function to use div64_s64() to allow for "long" division. This will avoid the overflow condition on long delays. [v2]: use div64_s64() in div_fp() Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-10intel_pstate: Force setting target pstate when requiredDoug Smythies
During initialization and exit it is possible that the target pstate might not actually be set. Furthermore, the result can be that the driver and the processor are out of synch and, under some conditions, the driver might never send the processor the proper target pstate. This patch adds a bypass or do_checks flag to the call to intel_pstate_set_pstate. If bypass, then specifically bypass clamp checks and the do not send if it is the same as last time check. If do_checks, then, and as before, do the current policy clamp checks, and do not do actual send if the new target is the same as the old. Signed-off-by: Doug Smythies <dsmythies@telus.net> Reported-by: Marien Zwart <marien.zwart@gmail.com> Reported-by: Alex Lochmann <alexander.lochmann@tu-dortmund.de> Reported-by: Piotr Ko?aczkowski <pkolaczk@gmail.com> Reported-by: Clemens Eisserer <linuxhippy@gmail.com> Tested-by: Marien Zwart <marien.zwart@gmail.com> Tested-by: Doug Smythies <dsmythies@telus.net> [ rjw: Dropped pointless symbol definitions, rebased ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-10intel_pstate: change some inconsistent debug informationDoug Smythies
Commit ce717613f3fb (intel_pstate: Turn per cpu printk into pr_debug) turned per cpu printk into pr_debug. However, only half of the change was done, introducing an inconsistency between entry and exit from driver pstate control. This patch changes the exit message to pr_debug also. The various messages are inconsistent with respect to any identifier text that can be used to help isolate the desired information from a huge log. This patch makes a consistent identifier portion of the string. Amends: ce717613f3fb (intel_pstate: Turn per cpu printk into pr_debug) Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-03x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h>Stephen Rothwell
Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so remove it from there and fix up the resulting build problems triggered on x86 {64|32}-bit {def|allmod|allno}configs. The breakages were triggering in places where x86 builds relied on vmalloc() facilities but did not include <linux/vmalloc.h> explicitly and relied on the implicit inclusion via <asm/io.h>. Also add: - <linux/init.h> to <linux/io.h> - <asm/pgtable_types> to <asm/io.h> ... which were two other implicit header file dependencies. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [ Tidied up the changelog. ] Acked-by: David Miller <davem@davemloft.net> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Colin Cross <ccross@android.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: James E.J. Bottomley <JBottomley@odin.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Suma Ramars <sramars@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-12intel_pstate: set BYT MSR with wrmsrl_on_cpu()Joe Konno
Commit 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.) introduced byt_set_pstate() with the assumption that it would always be run by the CPU whose MSR is to be written by it. It turns out, however, that is not always the case in practice, so modify byt_set_pstate() to enforce the MSR write done by it to always happen on the right CPU. Fixes: 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.) Signed-off-by: Joe Konno <joe.konno@intel.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-05-05intel_pstate: Add tsc collection and keep previous target pstateDoug Smythies
The intel_pstate driver is difficult to debug and investigate without tsc. Also, it is likely use of tsc, and some version of C0 percentage, will be re-introdcued in futute. There have also been occasions where it is desirebale to know, and confirm, the previous target pstate. This patch brings back tsc, adds previous target pstate, and adds both to the trace data collection. Signed-off-by: Doug Smythies <dsmythies@telus.net> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-15cpufreq: intel_pstate: Fix an annoying !CONFIG_SMP warningBorislav Petkov
I keep seeing drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_init’: drivers/cpufreq/intel_pstate.c:1187:26: warning: initialization from incompatible pointer type struct cpuinfo_x86 *c = &boot_cpu_data; when doing randconfig builds. This is caused by the fact that when !CONFIG_SMP, asm/processor.h defines cpu_info to boot_cpu_data and the local variable struct cpu_defaults *cpu_info overshadows it leading to this unfortunate assignment in the preprocessed source: struct cpu_defaults *boot_cpu_data; struct cpuinfo_x86 *c = &boot_cpu_data; Rename the local variable and use static_cpu_has_safe() which alleviates the need for defining a local cpuinfo_x86 pointer. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-15intel_pstate: Change the setpoint for Atom paramsKristen Carlson Accardi
Change the setpoint for the Baytrail and Cherrytrail CPUs. This will cause more aggressive pstate selection and improves performance on a variety of workloads with little power penalty. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-11intel_pstate: Knights Landing supportDasaratharaman Chandramouli
1. Add Knights Landing (KNL) CPUID to the list of CPUIDs supported by the intel_pstate driver. 2. Add a new cpu_default structure for KNL since KNL has a slightly different mechanism to get turbo pstates from MSRs. Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> [ rjw: Subject and changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-11intel_pstate: remove MSR testKristen Carlson Accardi
x86_match_cpu will not match our cpuid unless APERF/MPERF flag is set, so there is no need to do the manual check for this MSR. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-06intel_pstate: provide option to only use intel_pstate with HWPKristen Carlson Accardi
Allow users the option to disable the driver for any hardware which does not support HWP. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-30intel_pstate: honor user space min_perf_pct override on resumeKristen Carlson Accardi
If the user has requested an override of the min_perf_pct via sysfs, then it should be restored whenever policy is updated, such as on resume. Take the max of whatever the user requested and whatever the policy is. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-30intel_pstate: respect cpufreq policy requestSrinivas Pandruvada
When thermal or other subsystem requests to change the policy, use that irrepective of whether cpufreq policy is PERFORMANCE or not. Without this change, when thermal subsystem passive policy wants to reduce performance, it still runs at 100%. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-30intel_pstate: Add num_pstates to sysfsKristen Carlson Accardi
Add a sysfs interface to display the total number of supported pstates. This value is independent of whether turbo has been enabled or disabled. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-30intel_pstate: expose turbo range to sysfsKristen Carlson Accardi
This patch adds "turbo_pct" to the intel_pstate sysfs interface. turbo_pct will display the percentage of the total supported pstates that are in the turbo range. This value is independent of whether turbo has been disabled or not. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-30intel_pstate: Add support for SkyLakeKristen Carlson Accardi
Add SKL cpuid. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-11intel_pstate: Add a few commentsKristen Carlson Accardi
Add a few comments in the code which calculates busyness to clarify parts of the algorithm. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-11intel_pstate: add kernel parameter to force loadingEthan Zhao
To force loading on Oracle Sun X86 servers, provide one kernel command line parameter intel_pstate = force For those who are aware of the risk of no power capping capabily working and try to get better performance with this driver. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Reviewed-by: Linda Knippers <linda.knippers@hp.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-03intel_pstate: skip this driver if Sun server has _PPC methodethan zhao
Oracle Sun X86 servers have dynamic power capping capability that works via ACPI _PPC method etc, so skip loading this driver if Sun server has ACPI _PPC enabled. Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com> Tested-by: Linda Knippers <linda.knippers@hp.com> Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12intel_pstate: Add CPUID for BDW-H CPUDirk Brandewie
Add BDW-H to the list of supported processors. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12intel_pstate: Add support for HWPDirk Brandewie
Add support of Hardware Managed Performance States (HWP) described in Volume 3 section 14.4 of the SDM. With HWP enbaled intel_pstate will no longer be responsible for selecting P states for the processor. intel_pstate will continue to register to the cpufreq core as the scaling driver for CPUs implementing HWP. In HWP mode intel_pstate provides three functions reporting frequency to the cpufreq core, support for the set_policy() interface from the core and maintaining the intel_pstate sysfs interface in /sys/devices/system/cpu/intel_pstate. User preferences expressed via the set_policy() interface or the sysfs interface are forwared to the CPU via the HWP MSR interface. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23intel_pstate: Correct BYT VID values.Dirk Brandewie
Using a VID value that is not high enough for the requested P state can cause machine checks. Add a ceiling function to ensure calulated VIDs with fractional values are set to the next highest integer VID value. The algorythm for calculating the non-trubo VID from the BIOS writers guide is: vid_ratio = (vid_max - vid_min) / (max_pstate - min_pstate) vid = ceiling(vid_min + (req_pstate - min_pstate) * vid_ratio) Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23intel_pstate: Fix BYT frequency reportingDirk Brandewie
BYT has a different conversion from P state to frequency than the core processors. This causes the min/max and current frequency to be misreported on some BYT SKUs. Tested on BYT N2820, Ivybridge and Haswell processors. Link: https://bugzilla.yoctoproject.org/show_bug.cgi?id=6663 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23intel_pstate: Don't lose sysfs settings during cpu offlineDirk Brandewie
The user may have custom settings don't destroy them during suspend. Link: https://bugzilla.kernel.org/show_bug.cgi?id=80651 Reported-by: Tobias Jakobi <liquid.acid@gmx.net> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23cpufreq: intel_pstate: Reflect current no_turbo state correctlyGabriele Mazzotta
Some BIOSes modify the state of MSR_IA32_MISC_ENABLE_TURBO_DISABLE based on the current power source for the system battery AC vs battery. Reflect the correct current state and ability to modify the no_turbo sysfs file based on current state of MSR_IA32_MISC_ENABLE_TURBO_DISABLE. Link: https://bugzilla.kernel.org/show_bug.cgi?id=83151 Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23cpufreq: intel_pstate: Fix setting max_perf_pct in performance policyPali Rohár
Code which changes policy to powersave changes also max_policy_pct based on max_freq. Code which change max_perf_pct has upper limit base on value max_policy_pct. When policy is changing from powersave back to performance then max_policy_pct is not changed. Which means that changing max_perf_pct is not possible to high values if max_freq was too low in powersave policy. Test case: $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 800000 $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3300000 $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor performance $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 $ echo powersave > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor $ echo 800000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq $ echo 20 > /sys/devices/system/cpu/intel_pstate/max_perf_pct $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor powersave $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 800000 $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 20 $ echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor $ echo 3300000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq $ echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor performance $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 3300000 $ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 24 And now intel_pstate driver allows to set maximal value for max_perf_pct based on max_policy_pct which is 24 for previous powersave max_freq 800000. This patch will set default value for max_policy_pct when setting policy to performance so it will allow to set also max value for max_perf_pct. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Cc: All applicable <stable@vger.kernel.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-03cpufreq: intel_pstate: Remove unneeded variableGabriele Mazzotta
It should have been removed with commit d1b6848590af ("cpufreq / intel_pstate: Optimize intel_pstate_set_policy") Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-28cpufreq: intel_pstate: Add CPU ID for Braswell processorMika Westerberg
This is pretty much the same as Intel Baytrail, only the CPU ID is different. Add the new ID to the supported CPU list. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-08-28intel_pstate: Turn per cpu printk into pr_debugAndi Kleen
On larger systems intel_pstate currently spams the boot up log with its "Intel pstate controlling ..." message for each CPU. It's the only subsystem that prints a message for each CPU. Turn the message into a pr_debug. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Remove core_pct roundingStratos Karafotis
The specific rounding adds conditionally only 1/256 to fractional part of core_pct. We can safely remove it without any noticeable impact in calculations. Use div64_u64 instead of div_u64 to avoid possible overflow of sample->mperf as divisor Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Simplify P state adjustment logic.Stratos Karafotis
Simplify the code by removing the inline functions pstate_increase and pstate_decrease and use directly the intel_pstate_set_pstate. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Keep values in aperf/mperf in full precisionStratos Karafotis
Currently we shift right aperf and mperf variables by FRAC_BITS to prevent overflow when we convert them to fix point numbers (shift left by FRAC_BITS). But this is not necessary, because we actually use delta aperf and mperf which are much less than APERF and MPERF values. So, use the unmodified APERF and MPERF values in calculation. This also adds 8 bits in precision, although the gain is insignificant. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Disable interrupts during MSRs readingStratos Karafotis
According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. This should increase the accuracy of the calculations. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Align multiple lines to open parenthesisStratos Karafotis
Suppress checkpatch.pl --strict warnings: CHECK: Alignment should match open parenthesis Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Remove unnecessary intermediate variable sample_timeStratos Karafotis
Remove the unnecessary intermediate assignment and use directly the pid_params.sample_rate_ms variable. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Cleanup parenthesesStratos Karafotis
Remove unnecessary parentheses. Also, add parentheses in one case for better readability. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Fit code in a single line where possibleStratos Karafotis
We can fit these lines in a single one, under the 80 characters limit. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Add missing blank lines after declarationsStratos Karafotis
Also, remove unnecessary blank lines. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Remove unnecessary type casting in div_s64() callStratos Karafotis
div_s64() accepts the divisor parameter as s32. Helper div_fp() also accepts divisor as int32_t. So, remove the unnecessary int64_t type casting. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21cpufreq: intel_pstate: Make intel_pstate_kobject and debugfs_parent localsStratos Karafotis
Since we never remove sysfs entry and debugfs files, we can make the intel_pstate_kobject and debugfs_parent locals. Also, annotate with __init intel_pstate_sysfs_expose_params() and intel_pstate_debug_expose_params() in order to be freed after bootstrap. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-07intel_pstate: Set CPU number before accessing MSRsVincent Minet
Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU initialization. Otherwise only cpu0 has its P-state set and all other cores are left with their values unchanged. In most cases, this is not too serious because the P-states will be set correctly when the timer function is run. But when the default governor is set to performance, the per-CPU current_pstate stays the same forever and no attempts are made to write the MSRs again. Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-07intel_pstate: don't touch turbo bit if turbo disabled or unavailable.Dirk Brandewie
If turbo is disabled in the BIOS bit 38 should be set in MSR_IA32_MISC_ENABLE register per section 14.3.2.1 of the SDM Vol 3 document 325384-050US Feb 2014. If this bit is set do *not* attempt to disable trubo via the MSR_IA32_PERF_CTL register. On some systems trying to disable turbo via MSR_IA32_PERF_CTL will cause subsequent writes to MSR_IA32_PERF_CTL not take affect, in fact reading MSR_IA32_PERF_CTL will not show the IDA/Turbo DISENGAGE bit(32) as set. A write of bit 32 to zero returns to normal operation. Also deal with the case where the processor does not support turbo and the BIOS does not report the fact in MSR_IA32_MISC_ENABLE but does report the max and turbo P states as the same value. Link: https://bugzilla.kernel.org/show_bug.cgi?id=64251 Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-07intel_pstate: Fix setting VIDDirk Brandewie
Commit 21855ff5 (intel_pstate: Set turbo VID for BayTrail) introduced setting the turbo VID which is required to prevent a machine check on some Baytrail SKUs under heavy graphics based workloads. The docmumentation update that brought the requirement to light also changed the bit mask used for enumerating P state and VID values from 0x7f to 0x3f. This change returns the mask value to 0x7f. Tested with the Intel NUC DN2820FYK, BIOS version FYBYT10H.86A.0034.2014.0513.1413 with v3.16-rc1 and v3.14.8 kernel versions. Fixes: 21855ff5 (intel_pstate: Set turbo VID for BayTrail) Link: https://bugzilla.kernel.org/show_bug.cgi?id=77951 Reported-and-tested-by: Rune Reterson <rune@megahurts.dk> Reported-and-tested-by: Eric Eickmeyer <erich@ericheickmeyer.com> Cc: 3.13+ <stable@vger.kernel.org> # 3.13+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-17intel_pstate: Correct rounding in busy calculationDoug Smythies
There was a mistake in the actual rounding portion this previous patch: f0fe3cd7e12d (intel_pstate: Correct rounding in busy calculation) such that the rounding was asymetric and incorrect. Severity: Not very serious, but can increase target pstate by one extra value. For real world work flows the issue should self correct (but I have no proof). It is the equivalent of different PID gains for positive and negative numbers. Examples: -3.000000 used to round to -4, rounds to -3 with this patch. -3.503906 used to round to -5, rounds to -4 with this patch. Fixes: f0fe3cd7e12d (intel_pstate: Correct rounding in busy calculation) Signed-off-by: Doug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-10cpufreq: intel_pstate: Remove duplicate CPU ID checkStratos Karafotis
We check the CPU ID during driver init. There is no need to do it again per logical CPU initialization. So, remove the duplicate check. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-03Merge back earlier cpufreq material.Rafael J. Wysocki
Conflicts: arch/mips/loongson/lemote-2f/clock.c drivers/cpufreq/intel_pstate.c
2014-06-02intel_pstate: Improve initial busy calculationDoug Smythies
This change makes the busy calculation using 64 bit math which prevents overflow for large values of aperf/mperf. Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02intel_pstate: add sample time scalingDirk Brandewie
The PID assumes that samples are of equal time, which for a deferable timers this is not true when the system goes idle. This causes the PID to take a long time to converge to the min P state and depending on the pattern of the idle load can make the P state appear stuck. The hold-off value of three sample times before using the scaling is to give a grace period for applications that have high performance requirements and spend a lot of time idle, The poster child for this behavior is the ffmpeg benchmark in the Phoronix test suite. Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02intel_pstate: Correct rounding in busy calculationDirk Brandewie
Changing to fixed point math throughout the busy calculation in commit e66c1768 (Change busy calculation to use fixed point math.) Introduced some inaccuracies by rounding the busy value at two points in the calculation. This change removes roundings and moves the rounding to the output of the PID where the calculations are complete and the value returned as an integer. Fixes: e66c17683746 (intel_pstate: Change busy calculation to use fixed point math.) Reported-by: Doug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02intel_pstate: Remove C0 trackingDirk Brandewie
Commit fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation) introduced a regression referenced below. The issue with "lockup" after suspend that this commit was addressing is now dealt with in the suspend path. Fixes: fcb6a15c2e7e (intel_pstate: Take core C0 time into account for core busy calculation) Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581 Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121 Reported-by: Doug Smythies <dsmythies@telus.net> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>