summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2016-09-14Merge tag 'irqchip-4.9-1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Merge the first drop of irqchip updates for 4.9 from Marc Zyngier: - ACPI IORT core code - IORT support for the GICv3 ITS - A few of GIC cleanups
2016-09-14genirq: Expose interrupt information through sysfsCraig Gallek
Information about interrupts is exposed via /proc/interrupts, but the format of that file has changed over kernel versions and differs across architectures. It also has varying column numbers depending on hardware. That all makes it hard for tools to parse. To solve this, expose the information through sysfs so each irq attribute is in a separate file in a consistent, machine parsable way. This feature is only available when both CONFIG_SPARSE_IRQ and CONFIG_SYSFS are enabled. Examples: /sys/kernel/irq/18/actions: i801_smbus,ehci_hcd:usb1,uhci_hcd:usb7 /sys/kernel/irq/18/chip_name: IR-IO-APIC /sys/kernel/irq/18/hwirq: 18 /sys/kernel/irq/18/name: fasteoi /sys/kernel/irq/18/per_cpu_count: 0,0 /sys/kernel/irq/18/type: level /sys/kernel/irq/25/actions: ahci0 /sys/kernel/irq/25/chip_name: IR-PCI-MSI /sys/kernel/irq/25/hwirq: 512000 /sys/kernel/irq/25/name: edge /sys/kernel/irq/25/per_cpu_count: 29036,0 /sys/kernel/irq/25/type: edge [ tglx: Moved kobject_del() under sparse_irq_lock, massaged code comments and changelog ] Signed-off-by: Craig Gallek <kraig@google.com> Cc: David Decotigny <decot@google.com> Link: http://lkml.kernel.org/r/1473783291-122873-1-git-send-email-kraigatgoog@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13cpufreq: schedutil: Add iowait boostingRafael J. Wysocki
Modify the schedutil cpufreq governor to boost the CPU frequency if the SCHED_CPUFREQ_IOWAIT flag is passed to it via cpufreq_update_util(). If that happens, the frequency is set to the maximum during the first update after receiving the SCHED_CPUFREQ_IOWAIT flag and then the boost is reduced by half during each following update. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Looks-good-to: Steve Muckle <smuckle@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-09-13cpufreq / sched: SCHED_CPUFREQ_IOWAIT flag to indicate iowait conditionRafael J. Wysocki
Testing indicates that it is possible to improve performace significantly without increasing energy consumption too much by teaching cpufreq governors to bump up the CPU performance level if the in_iowait flag is set for the task in enqueue_task_fair(). For this purpose, define a new cpufreq_update_util() flag SCHED_CPUFREQ_IOWAIT and modify enqueue_task_fair() to pass that flag to cpufreq_update_util() in the in_iowait case. That generally requires cpufreq_update_util() to be called directly from there, because update_load_avg() may not be invoked in that case. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Looks-good-to: Steve Muckle <smuckle@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-09-13Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "A try_to_wake_up() memory ordering race fix causing a busy-loop in ttwu()" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix a race between try_to_wake_up() and a woken up task
2016-09-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This contains: - a set of fixes found by directed-random perf fuzzing efforts by Vince Weaver, Alexander Shishkin and Peter Zijlstra - a cqm driver crash fix - an AMD uncore driver use after free fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix PEBSv3 record drain perf/x86/intel/bts: Kill a silly warning perf/x86/intel/bts: Fix BTS PMI detection perf/x86/intel/bts: Fix confused ordering of PMU callbacks perf/core: Fix aux_mmap_count vs aux_refcount order perf/core: Fix a race between mmap_close() and set_output() of AUX events perf/x86/amd/uncore: Prevent use after free perf/x86/intel/cqm: Check cqm/mbm enabled state in event init perf/core: Remove WARN from perf_event_read()
2016-09-13tick/nohz: Prevent stopping the tick on an offline CPUWanpeng Li
can_stop_full_tick() has no check for offline cpus. So it allows to stop the tick on an offline cpu from the interrupt return path, which is wrong and subsequently makes irq_work_needs_cpu() warn about being called for an offline cpu. Commit f7ea0fd639c2c4 ("tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline") added prevention for can_stop_idle_tick(), but forgot to do the same in can_stop_full_tick(). Add it. [ tglx: Massaged changelog ] Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1473245473-4463-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13cpuset: handle race between CPU hotplug and cpuset_hotplug_workJoonwoo Park
A discrepancy between cpu_online_mask and cpuset's effective_cpus mask is inevitable during hotplug since cpuset defers updating of effective_cpus mask using a workqueue, during which time nothing prevents the system from more hotplug operations. For that reason guarantee_online_cpus() walks up the cpuset hierarchy until it finds an intersection under the assumption that top cpuset's effective_cpus mask intersects with cpu_online_mask even with such a race occurring. However a sequence of CPU hotplugs can open a time window, during which none of the effective CPUs in the top cpuset intersect with cpu_online_mask. For example when there are 4 possible CPUs 0-3 and only CPU0 is online: ======================== =========================== cpu_online_mask top_cpuset.effective_cpus ======================== =========================== echo 1 > cpu2/online. CPU hotplug notifier woke up hotplug work but not yet scheduled. [0,2] [0] echo 0 > cpu0/online. The workqueue is still runnable. [2] [0] ======================== =========================== Now there is no intersection between cpu_online_mask and top_cpuset.effective_cpus. Thus invoking sys_sched_setaffinity() at this moment can cause following: Unable to handle kernel NULL pointer dereference at virtual address 000000d0 ------------[ cut here ]------------ Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable] Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 1420 Comm: taskset Tainted: G W 4.4.8+ #98 task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000 PC is at guarantee_online_cpus+0x2c/0x58 LR is at cpuset_cpus_allowed+0x4c/0x6c <snip> Process taskset (pid: 1420, stack limit = 0xffffffc06e124020) Call trace: [<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58 [<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c [<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac [<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28 The top cpuset's effective_cpus are guaranteed to be identical to cpu_online_mask eventually. Hence fall back to cpu_online_mask when there is no intersection between top cpuset's effective_cpus and cpu_online_mask. Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Tejun Heo <tj@kernel.org> Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.17+ Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-13x86/pkeys: Fix pkeys build breakage for some non-x86 archesDave Hansen
Guenter Roeck reported breakage on the h8300 and c6x architectures (among others) caused by the new memory protection keys syscalls. This patch does what Arnd suggested and adds them to kernel/sys_ni.c. Fixes: a60f7b69d92c ("generic syscalls: Wire up memory protection keys syscalls") Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20160912203842.48E7AC50@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-13PM / Hibernate: allow hibernation with PAGE_POISONING_ZEROAnisse Astier
PAGE_POISONING_ZERO disables zeroing new pages on alloc, they are poisoned (zeroed) as they become available. In the hibernate use case, free pages will appear in the system without being cleared, left there by the loading kernel. This patch will make sure free pages are cleared on resume when PAGE_POISONING_ZERO is enabled. We free the pages just after resume because we can't do it later: going through any device resume code might allocate some memory and invalidate the free pages bitmap. Thus we don't need to disable hibernation when PAGE_POISONING_ZERO is enabled. Signed-off-by: Anisse Astier <anisse@astier.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13PM / sleep: enable suspend-to-idle even without registered suspend_opsSudeep Holla
Suspend-to-idle (aka the "freeze" sleep state) is a system sleep state in which all of the processors enter deepest possible idle state and wait for interrupts right after suspending all the devices. There is no hard requirement for a platform to support and register platform specific suspend_ops to enter suspend-to-idle/freeze state. Only deeper system sleep states like PM_SUSPEND_STANDBY and PM_SUSPEND_MEM rely on such low level support/implementation. suspend-to-idle can be entered as along as all the devices can be suspended. This patch enables the support for suspend-to-idle even on systems that don't have any low level support for deeper system sleep states and/or don't register any platform specific suspend_ops. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-13PM / sleep: Increase default DPM watchdog timeout to 120Chen Yu
Recently we have a new report that, the harddisk can not resume on time due to firmware issues, and got a kernel panic because of DPM watchdog timeout. So adjust the default timeout from 60 to 120 to survive on this platform, and make DPM_WATCHDOG depending on EXPERT. Link: https://bugzilla.kernel.org/show_bug.cgi?id=117971 Suggested-by: Pavel Machek <pavel@ucw.cz> Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Reported-by: Higuita <higuita@gmx.net> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12tracing: Have max_latency be defined for HWLAT_TRACER as wellSteven Rostedt (Red Hat)
The hwlat tracer uses tr->max_latency, and if it's the only tracer enabled that uses it, the build will fail. Add max_latency and its file when the hwlat tracer is enabled. Link: http://lkml.kernel.org/r/d6c3b7eb-ba95-1ffa-0453-464e1e24262a@infradead.org Reported-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-09-10Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "nvdimm fixes for v4.8, two of them are tagged for -stable: - Fix devm_memremap_pages() to use track_pfn_insert(). Otherwise, DAX pmd mappings end up with an uncached pgprot, and unusable performance for the device-dax interface. The device-dax interface appeared in 4.7 so this is tagged for -stable. - Fix a couple VM_BUG_ON() checks in the show_smaps() path to understand DAX pmd entries. This fix is tagged for -stable. - Fix a mis-merge of the nfit machine-check handler to flip the polarity of an if() to match the final version of the patch that Vishal sent for 4.8-rc1. Without this the nfit machine check handler never detects / inserts new 'badblocks' entries which applications use to identify lost portions of files. - For test purposes, fix the nvdimm_clear_poison() path to operate on legacy / simulated nvdimm memory ranges. Without this fix a test can set badblocks, but never clear them on these ranges. - Fix the range checking done by dax_dev_pmd_fault(). This is not tagged for -stable since this problem is mitigated by specifying aligned resources at device-dax setup time. These patches have appeared in a next release over the past week. The recent rebase you can see in the timestamps was to drop an invalid fix as identified by the updated device-dax unit tests [1]. The -mm touches have an ack from Andrew" [1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs" https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: allow legacy (e820) pmem region to clear bad blocks nfit, mce: Fix SPA matching logic in MCE handler mm: fix cache mode of dax pmd mappings mm: fix show_smap() for zone_device-pmd ranges dax: fix mapping size check
2016-09-10Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10Revert "sched/fair: Make update_min_vruntime() more readable"Peter Zijlstra
There's a bug in this commit: 97a7142f157a ("sched/fair: Make update_min_vruntime() more readable") ... when !rb_leftmost && curr we fail to advance min_vruntime. So revert it. Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10perf/core: Fix aux_mmap_count vs aux_refcount orderAlexander Shishkin
The order of accesses to ring buffer's aux_mmap_count and aux_refcount has to be preserved across the users, namely perf_mmap_close() and perf_aux_output_begin(), otherwise the inversion can result in the latter holding the last reference to the aux buffer and subsequently free'ing it in atomic context, triggering a warning. > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 257 at kernel/events/ring_buffer.c:541 __rb_free_aux+0x11a/0x130 > CPU: 0 PID: 257 Comm: stopbug Not tainted 4.8.0-rc1+ #2596 > Call Trace: > [<ffffffff810f3e0b>] __warn+0xcb/0xf0 > [<ffffffff810f3f3d>] warn_slowpath_null+0x1d/0x20 > [<ffffffff8121182a>] __rb_free_aux+0x11a/0x130 > [<ffffffff812127a8>] rb_free_aux+0x18/0x20 > [<ffffffff81212913>] perf_aux_output_begin+0x163/0x1e0 > [<ffffffff8100c33a>] bts_event_start+0x3a/0xd0 > [<ffffffff8100c42d>] bts_event_add+0x5d/0x80 > [<ffffffff81203646>] event_sched_in.isra.104+0xf6/0x2f0 > [<ffffffff8120652e>] group_sched_in+0x6e/0x190 > [<ffffffff8120694e>] ctx_sched_in+0x2fe/0x5f0 > [<ffffffff81206ca0>] perf_event_sched_in+0x60/0x80 > [<ffffffff81206d1b>] ctx_resched+0x5b/0x90 > [<ffffffff81207281>] __perf_event_enable+0x1e1/0x240 > [<ffffffff81200639>] event_function+0xa9/0x180 > [<ffffffff81202000>] ? perf_cgroup_attach+0x70/0x70 > [<ffffffff8120203f>] remote_function+0x3f/0x50 > [<ffffffff811971f3>] flush_smp_call_function_queue+0x83/0x150 > [<ffffffff81197bd3>] generic_smp_call_function_single_interrupt+0x13/0x60 > [<ffffffff810a6477>] smp_call_function_single_interrupt+0x27/0x40 > [<ffffffff81a26ea9>] call_function_single_interrupt+0x89/0x90 > [<ffffffff81120056>] finish_task_switch+0xa6/0x210 > [<ffffffff81120017>] ? finish_task_switch+0x67/0x210 > [<ffffffff81a1e83d>] __schedule+0x3dd/0xb50 > [<ffffffff81a1efe5>] schedule+0x35/0x80 > [<ffffffff81128031>] sys_sched_yield+0x61/0x70 > [<ffffffff81a25be5>] entry_SYSCALL_64_fastpath+0x18/0xa8 > ---[ end trace 6235f556f5ea83a9 ]--- This patch puts the checks in perf_aux_output_begin() in the same order as that of perf_mmap_close(). Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-10perf/core: Fix a race between mmap_close() and set_output() of AUX eventsAlexander Shishkin
In the mmap_close() path we need to stop all the AUX events that are writing data to the AUX area that we are unmapping, before we can safely free the pages. To determine if an event needs to be stopped, we're comparing its ->rb against the one that's getting unmapped. However, a SET_OUTPUT ioctl may turn up inside an AUX transaction and swizzle event::rb to some other ring buffer, but the transaction will keep writing data to the old ring buffer until the event gets scheduled out. At this point, mmap_close() will skip over such an event and will proceed to free the AUX area, while it's still being used by this event, which will set off a warning in the mmap_close() path and cause a memory corruption. To avoid this, always stop an AUX event before its ->rb is updated; this will release the (potentially) last reference on the AUX area of the buffer. If the event gets restarted, its new ring buffer will be used. If another SET_OUTPUT comes and switches it back to the old ring buffer that's getting unmapped, it's also fine: this ring buffer's aux_mmap_count will be zero and AUX transactions won't start any more. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-09bpf: add BPF_CALL_x macros for declaring helpersDaniel Borkmann
This work adds BPF_CALL_<n>() macros and converts all the eBPF helper functions to use them, in a similar fashion like we do with SYSCALL_DEFINE<n>() macros that are used today. Motivation for this is to hide all the register handling and all necessary casts from the user, so that it is done automatically in the background when adding a BPF_CALL_<n>() call. This makes current helpers easier to review, eases to write future helpers, avoids getting the casting mess wrong, and allows for extending all helpers at once (f.e. build time checks, etc). It also helps detecting more easily in code reviews that unused registers are not instrumented in the code by accident, breaking compatibility with existing programs. BPF_CALL_<n>() internals are quite similar to SYSCALL_DEFINE<n>() ones with some fundamental differences, for example, for generating the actual helper function that carries all u64 regs, we need to fill unused regs, so that we always end up with 5 u64 regs as an argument. I reviewed several 0-5 generated BPF_CALL_<n>() variants of the .i results and they look all as expected. No sparse issue spotted. We let this also sit for a few days with Fengguang's kbuild test robot, and there were no issues seen. On s390, it barked on the "uses dynamic stack allocation" notice, which is an old one from bpf_perf_event_output{,_tp}() reappearing here due to the conversion to the call wrapper, just telling that the perf raw record/frag sits on stack (gcc with s390's -mwarn-dynamicstack), but that's all. Did various runtime tests and they were fine as well. All eBPF helpers are now converted to use these macros, getting rid of a good chunk of all the raw castings. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09bpf: add BPF_SIZEOF and BPF_FIELD_SIZEOF macrosDaniel Borkmann
Add BPF_SIZEOF() and BPF_FIELD_SIZEOF() macros to improve the code a bit which otherwise often result in overly long bytes_to_bpf_size(sizeof()) and bytes_to_bpf_size(FIELD_SIZEOF()) lines. So place them into a macro helper instead. Moreover, we currently have a BUILD_BUG_ON(BPF_FIELD_SIZEOF()) check in convert_bpf_extensions(), but we should rather make that generic as well and add a BUILD_BUG_ON() test in all BPF_SIZEOF()/BPF_FIELD_SIZEOF() users to detect any rewriter size issues at compile time. Note, there are currently none, but we want to assert that it stays this way. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09bpf: minor cleanups in helpersDaniel Borkmann
Some minor misc cleanups, f.e. use sizeof(__u32) instead of hardcoding and in __bpf_skb_max_len(), I missed that we always have skb->dev valid anyway, so we can drop the unneeded test for dev; also few more other misc bits addressed here. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-09mm: fix cache mode of dax pmd mappingsDan Williams
track_pfn_insert() in vmf_insert_pfn_pmd() is marking dax mappings as uncacheable rendering them impractical for application usage. DAX-pte mappings are cached and the goal of establishing DAX-pmd mappings is to attain more performance, not dramatically less (3 orders of magnitude). track_pfn_insert() relies on a previous call to reserve_memtype() to establish the expected page_cache_mode for the range. While memremap() arranges for reserve_memtype() to be called, devm_memremap_pages() does not. So, teach track_pfn_insert() and untrack_pfn() how to handle tracking without a vma, and arrange for devm_memremap_pages() to establish the write-back-cache reservation in the memtype tree. Cc: <stable@vger.kernel.org> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: Kai Zhang <kai.ka.zhang@oracle.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-08bpf: fix range propagation on direct packet accessDaniel Borkmann
LLVM can generate code that tests for direct packet access via skb->data/data_end in a way that currently gets rejected by the verifier, example: [...] 7: (61) r3 = *(u32 *)(r6 +80) 8: (61) r9 = *(u32 *)(r6 +76) 9: (bf) r2 = r9 10: (07) r2 += 54 11: (3d) if r3 >= r2 goto pc+12 R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 12: (18) r4 = 0xffffff7a 14: (05) goto pc+430 [...] from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 24: (7b) *(u64 *)(r10 -40) = r1 25: (b7) r1 = 0 26: (63) *(u32 *)(r6 +56) = r1 27: (b7) r2 = 40 28: (71) r8 = *(u8 *)(r9 +20) invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0) The reason why this gets rejected despite a proper test is that we currently call find_good_pkt_pointers() only in case where we detect tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and derived, for example, from a register of type pkt(id=Y,off=0,r=0) pointing to skb->data. find_good_pkt_pointers() then fills the range in the current branch to pkt(id=Y,off=0,r=Z) on success. For above case, we need to extend that to recognize pkt_end >= rX pattern and mark the other branch that is taken on success with the appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers(). Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the only two practical options to test for from what LLVM could have generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=) that we would need to take into account as well. After the fix: [...] 7: (61) r3 = *(u32 *)(r6 +80) 8: (61) r9 = *(u32 *)(r6 +76) 9: (bf) r2 = r9 10: (07) r2 += 54 11: (3d) if r3 >= r2 goto pc+12 R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp 12: (18) r4 = 0xffffff7a 14: (05) goto pc+430 [...] from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp 24: (7b) *(u64 *)(r10 -40) = r1 25: (b7) r1 = 0 26: (63) *(u32 *)(r6 +56) = r1 27: (b7) r2 = 40 28: (71) r8 = *(u8 *)(r9 +20) 29: (bf) r1 = r8 30: (25) if r8 > 0x3c goto pc+47 R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56 R9=pkt(id=0,off=0,r=54) R10=fp 31: (b7) r1 = 1 [...] Verifier test cases are also added in this work, one that demonstrates the mentioned example here and one that tries a bad packet access for the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0), pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two with both test variants (>, >=). Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08Merge branch 'linus' into timers/core, to refresh the branchIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-06perf, bpf: fix conditional call to bpf_overflow_handlerArnd Bergmann
The newly added bpf_overflow_handler function is only built of both CONFIG_EVENT_TRACING and CONFIG_BPF_SYSCALL are enabled, but the caller only checks the latter: kernel/events/core.c: In function 'perf_event_alloc': kernel/events/core.c:9106:27: error: 'bpf_overflow_handler' undeclared (first use in this function) This changes the caller so we also skip this call if CONFIG_EVENT_TRACING is disabled entirely. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aa6a5f3cb2b2 ("perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs") Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06kernel/softirq: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160818125731.27256-7-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06slab: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: linux-mm@kvack.org Cc: rt@linutronix.de Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20160823125319.abeapfjapf2kfezp@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06relayfs: Convert to hotplug state machineRichard Weinberger
Install the callbacks via the state machine. They are installed at run time but relay_prepare_cpu() does not need to be invoked by the boot CPU because relay_open() was not yet invoked and there are no pools that need to be created. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20160818125731.27256-3-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06relay: Use per CPU constructs for the relay channel buffer pointersAkash Goel
relay essentially needs to maintain a per CPU array of channel buffer pointers but it manually creates that array. Instead its better to use the per CPU constructs, provided by the kernel, to allocate & access the array of pointer to channel buffers. Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://lkml.kernel.org/r/1470909140-25919-1-git-send-email-akash.goel@intel.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06cpu/hotplug: Remove CPU_STARTING and CPU_DYING notifierThomas Gleixner
All users are converted to state machine, remove CPU_STARTING and the corresponding CPU_DYING. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160818125731.27256-2-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06cpu/hotplug: Make state names consistentThomas Gleixner
We should have all names in the scheme "[subsys/]facility:state]". Fix the core to comply. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06genirq: No need to mask non trigger mode flags before __irq_set_trigger()Alexander Kuleshov
Some callers of __irq_set_trigger() masks all flags except trigger mode flags. This is unnecessary, ase __irq_set_trigger() already does this before usage of flags. [ tglx: Moved the flag mask and adjusted comment. Removed the hunk in enable_percpu_irq() as it is required there ] Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Link: http://lkml.kernel.org/r/20160719095408.13778-1-kuleshovmail@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05futex: Add some more function commentryThomas Gleixner
Add some more comments and reformat existing ones to kernel doc style. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Link: http://lkml.kernel.org/r/1464770609-30168-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05genirq: Update stale comment for __irq_domain_addPunit Agrawal
Commit 1bf4ddc46c5d ("irqdomain: Introduce irq_domain_create_{linear, tree}") introduced the use of fwnode_handle to identify the interrupt controller when calling __irq_domain_add but missed updating the kernel doc parameters for the function. Update this comment. While we are touching this code, also consolidate the declaration and assignment of of_node. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Marc Zygnier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/1464699409-23113-1-git-send-email-punit.agrawal@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-05cpu/hotplug: Replace anon unionThomas Gleixner
Some compilers are unhappy with the anon union in the state array. Replace it with a named union. While at it align the state array initializers proper and add the missing name tags. Fixes: cf392d10b69e "cpu/hotplug: Add multi instance support" Reported-by: Ingo Molnar <mingo@kernel.org> Reported-by: Fenguang Wu <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: rt@linutronix.de
2016-09-05PM / QoS: avoid calling cancel_delayed_work_sync() during early bootTejun Heo
of_clk_init() ends up calling into pm_qos_update_request() very early during boot where irq is expected to stay disabled. pm_qos_update_request() uses cancel_delayed_work_sync() which correctly assumes that irq is enabled on invocation and unconditionally disables and re-enables it. Gate cancel_delayed_work_sync() invocation with kevented_up() to avoid enabling irq unexpectedly during early boot. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Qiao Zhou <qiaozhou@asrmicro.com> Link: http://lkml.kernel.org/r/d2501c4c-8e7b-bea3-1b01-000b36b5dfe9@asrmicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-05smp: Add function to execute a function synchronously on a CPUJuergen Gross
On some hardware models (e.g. Dell Studio 1555 laptop) some hardware related functions (e.g. SMIs) are to be executed on physical CPU 0 only. Instead of open coding such a functionality multiple times in the kernel add a service function for this purpose. This will enable the possibility to take special measures in virtualized environments like Xen, too. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Douglas_Warzecha@dell.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: david.vrabel@citrix.com Cc: hpa@zytor.com Cc: jdelvare@suse.com Cc: jeremy@goop.org Cc: linux@roeck-us.net Cc: pali.rohar@gmail.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1472453327-19050-4-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05virt, sched: Add generic vCPU pinning supportJuergen Gross
Add generic virtualization support for pinning the current vCPU to a specified physical CPU. As this operation isn't performance critical (a very limited set of operations like BIOS calls and SMIs is expected to need this) just add a hypervisor specific indirection. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Douglas_Warzecha@dell.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: david.vrabel@citrix.com Cc: hpa@zytor.com Cc: jdelvare@suse.com Cc: jeremy@goop.org Cc: linux@roeck-us.net Cc: pali.rohar@gmail.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1472453327-19050-3-git-send-email-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/debug: Remove several CONFIG_SCHEDSTATS guardsJosh Poimboeuf
Clean up the sched code by removing several of the CONFIG_SCHEDSTATS guards, using schedstat_*() macros where needed. Code size: !CONFIG_SCHEDSTATS defconfig: text data bss dec hex filename 10209818 4368184 1105920 15683922 ef5152 vmlinux.before.nostats 10209818 4368184 1105920 15683922 ef5152 vmlinux.after.nostats CONFIG_SCHEDSTATS defconfig: text data bss dec hex filename 10214210 4370040 1105920 15690170 ef69ba vmlinux.before.stats 10214210 4370680 1105920 15690810 ef6c3a vmlinux.after.stats Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/e51e0ebe5af95ac295de720dd252e7c0d2142e4a.1466184592.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/debug: Rename 'schedstat_val()' -> 'schedstat_val_or_zero()'Josh Poimboeuf
The schedstat_val() macro's behavior is kind of surprising: when schedstat is runtime disabled, it returns zero. Rename it to schedstat_val_or_zero(). There's also a need for a similar macro which doesn't have the 'if (schedstat_enable())' check, to avoid doing the check twice. Create a new 'schedstat_val()' macro for that. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3bb1d2367d041fee333b0dde17171e709395b675.1466184592.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/debug: Clean up schedstat macrosJosh Poimboeuf
The schedstat_*() macros are inconsistent: most of them take a pointer and a field which the macro combines, whereas schedstat_set() takes the already combined ptr->field. The already combined ptr->field argument is actually more intuitive and easier to use, and there's no reason to require the user to split the variable up, so convert the macros to use the combined argument. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/54953ca25bb579f3a5946432dee409b0e05222c6.1466184592.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/debug: Rename and move enqueue_sleeper()Josh Poimboeuf
enqueue_sleeper() doesn't actually enqueue, it just handles some statistics and tracepoints. Rename it to update_stats_enqueue_sleeper() and call it from update_stats_enqueue(). Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/fb20b7159dc4d028c406c0e8d5f8c439b741615b.1466184592.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/deadline: Fix the intention to re-evalute tick dependency for offline CPUWanpeng Li
The dl task will be replenished after dl task timer fire and start a new period. It will be enqueued and to re-evaluate its dependency on the tick in order to restart it. However, if the CPU is hot-unplugged, irq_work_queue will splash since the target CPU is offline. As a result we get: WARNING: CPU: 2 PID: 0 at kernel/irq_work.c:69 irq_work_queue_on+0xad/0xe0 Call Trace: dump_stack+0x99/0xd0 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 irq_work_queue_on+0xad/0xe0 tick_nohz_full_kick_cpu+0x44/0x50 tick_nohz_dep_set_cpu+0x74/0xb0 enqueue_task_dl+0x226/0x480 activate_task+0x5c/0xa0 dl_task_timer+0x19b/0x2c0 ? push_dl_task.part.31+0x190/0x190 This can be triggered by hot-unplugging the full dynticks CPU which dl task is running on. We enqueue the dl task on the offline CPU, because we need to do replenish for start_dl_timer(). So, as Juri pointed out, we would need to do is calling replenish_dl_entity() directly, instead of enqueue_task_dl(). pi_se shouldn't be a problem as the task shouldn't be boosted if it was throttled. This patch fixes it by avoiding the whole enqueue+dequeue+enqueue story, by first migrating (set_task_cpu()) and then doing 1 enqueue. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luca Abeni <luca.abeni@unitn.it> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1472639264-3932-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05schedcore: Remove duplicated init_task's preempt_notifiers initseokhoon.yoon
init_task's preempt_notifiers is initialized twice: 1) sched_init() -> INIT_HLIST_HEAD(&init_task.preempt_notifiers) 2) sched_init() -> init_idle(current,) <--- current task is init_task at this time -> __sched_fork(,current) -> INIT_HLIST_HEAD(&p->preempt_notifiers) I think the first one is unnecessary, so remove it. Signed-off-by: seokhoon.yoon <iamyooon@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1471339568-5790-1-git-send-email-iamyooon@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/fair: Fix load_above_capacity fixed point arithmetic widthDietmar Eggemann
Since commit: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") we now have two different fixed point units for load. load_above_capacity has to have 10 bits fixed point unit like PELT, whereas NICE_0_LOAD has 20 bit fixed point unit on 64-bit kernels. Fix this by scaling down NICE_0_LOAD when multiplying load_above_capacity with it. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yuyang Du <yuyang.du@intel.com> Link: http://lkml.kernel.org/r/1470824847-5316-1-git-send-email-dietmar.eggemann@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()Tommaso Cucinotta
These 2 exercise independent code paths and need different arguments. After this change, you call: cpudl_clear(cp, cpu); cpudl_set(cp, cpu, dl); instead of: cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */); cpudl_set(cp, cpu, dl, 1 /* is_valid */); Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luca Abeni <luca.abeni@unitn.it> Reviewed-by: Juri Lelli <juri.lelli@arm.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-dl@retis.sssup.it Link: http://lkml.kernel.org/r/1471184828-12644-4-git-send-email-tommaso.cucinotta@sssup.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/deadline: Make CPU heap faster avoiding real swaps on heapifyTommaso Cucinotta
This change goes from heapify() ops done by swapping with parent/child so that the item to fix moves along, to heapify() ops done by just pulling the parent/child chain by 1 pos, then storing the item to fix just at the end. On a non-trivial heapify(), this performs roughly half stores wrt swaps. This has been measured to achieve up to 10% of speed-up for cpudl_set() calls, with a randomly generated workload of 1K,10K,100K random heap insertions and deletions (75% cpudl_set() calls with is_valid=1 and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256 CPUs, as measured on an Intel Core2 Duo. Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luca Abeni <luca.abeni@unitn.it> Reviewed-by: Juri Lelli <juri.lelli@arm.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-dl@retis.sssup.it Link: http://lkml.kernel.org/r/1471184828-12644-3-git-send-email-tommaso.cucinotta@sssup.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/deadline: Refactor CPU heap codeTommaso Cucinotta
1. heapify up factored out in new dedicated function heapify_up() (avoids repetition of same code) 2. call to cpudl_change_key() replaced with heapify_up() when cpudl_set actually inserts a new node in the heap 3. cpudl_change_key() replaced with heapify() that heapifies up or down as needed. Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Luca Abeni <luca.abeni@unitn.it> Reviewed-by: Juri Lelli <juri.lelli@arm.com> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-dl@retis.sssup.it Link: http://lkml.kernel.org/r/1471184828-12644-2-git-send-email-tommaso.cucinotta@sssup.it Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-05sched/fair: Make update_min_vruntime() more readableByungchul Park
The update_min_vruntime() control flow can be simplified. Signed-off-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: minchan.kim@lge.com Link: http://lkml.kernel.org/r/1436088829-25768-1-git-send-email-byungchul.park@lge.com Signed-off-by: Ingo Molnar <mingo@kernel.org>