Age | Commit message (Collapse) | Author |
|
Support the preempt= boot option and patch the static call sites
accordingly.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-9-frederic@kernel.org
|
|
Provide static calls to control preempt_schedule[_notrace]()
(called in CONFIG_PREEMPT) so that we can override their behaviour when
preempt= is overriden.
Since the default behaviour is full preemption, both their calls are
initialized to the arch provided wrapper, if any.
[fweisbec: only define static calls when PREEMPT_DYNAMIC, make it less
dependent on x86 with __preempt_schedule_func]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-7-frederic@kernel.org
|
|
Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT)
and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we
can override their behaviour when preempt= is overriden.
Since the default behaviour is full preemption, both their calls are
ignored when preempt= isn't passed.
[fweisbec: branch might_resched() directly to __cond_resched(), only
define static calls when PREEMPT_DYNAMIC]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.org
|
|
The description of the RT offset and the values for 'normal' tasks needs
update. Moreover there are DL tasks now.
task_prio() has to stay like it is to guarantee compatibility with the
/proc/<pid>/stat priority field:
# cat /proc/<pid>/stat | awk '{ print $18; }'
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-4-dietmar.eggemann@arm.com
|
|
The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.
Commit fe443ef2ac42 ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.
Commit a4ec24b48dde ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.
Commit 3ee237dddcd8 ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.
With:
MAX_USER_PRIO = USER_PRIO(MAX_PRIO)
= MAX_PRIO - MAX_RT_PRIO
MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH
MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO
MAX_USER_PRIO = NICE_WIDTH
MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com
|
|
Commit d46523ea32a7 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO")
was introduced due to a a small time period in which the realtime patch
set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO.
This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO.
Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO
instead.
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-2-dietmar.eggemann@arm.com
|
|
Commit "sched/topology: Make sched_init_numa() use a set for the
deduplicating sort" allocates 'i + nr_levels (level)' instead of
'i + nr_levels + 1' sched_domain_topology_level.
This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):
sched_init_domains
build_sched_domains()
__free_domain_allocs()
__sdt_free() {
...
for_each_sd_topology(tl)
...
sd = *per_cpu_ptr(sdd->sd, j); <--
...
}
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com
|
|
Reduce rbtree boiler plate by using the new helpers.
Make rb_add_cached() / rb_erase_cached() return a pointer to the
leftmost node to aid in updating additional state.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
|
|
Reduce rbtree boiler plate by using the new helper function.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
|
|
Both select_idle_core() and select_idle_cpu() do a loop over the same
cpumask. Observe that by clearing the already visited CPUs, we can
fold the iteration and iterate a core at a time.
All we need to do is remember any non-idle CPU we encountered while
scanning for an idle core. This way we'll only iterate every CPU once.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210127135203.19633-5-mgorman@techsingularity.net
|
|
In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210125085909.4600-4-mgorman@techsingularity.net
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Safely rescheduling while holding a spin lock is essential for keeping
long running kernel operations running smoothly. Add the facility to
cond_resched rwlocks.
CC: Ingo Molnar <mingo@redhat.com>
CC: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210125085909.4600-3-mgorman@techsingularity.net
|
|
SIS_AVG_CPU was introduced as a means of avoiding a search when the
average search cost indicated that the search would likely fail. It was
a blunt instrument and disabled by commit 4c77b18cf8b7 ("sched/fair: Make
select_idle_cpu() more aggressive") and later replaced with a proportional
search depth by commit 1ad3aaf3fcd2 ("sched/core: Implement new approach
to scale select_idle_cpu()").
While there are corner cases where SIS_AVG_CPU is better, it has now been
disabled for almost three years. As the intent of SIS_PROP is to reduce
the time complexity of select_idle_cpu(), lets drop SIS_AVG_CPU and focus
on SIS_PROP as a throttling mechanism.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210125085909.4600-2-mgorman@techsingularity.net
|
|
The deduplicating sort in sched_init_numa() assumes that the first line in
the distance table contains all unique values in the entire table. I've
been trying to pen what this exactly means for the topology, but it's not
straightforward. For instance, topology.c uses this example:
node 0 1 2 3
0: 10 20 20 30
1: 20 10 20 20
2: 20 20 10 20
3: 30 20 20 10
0 ----- 1
| / |
| / |
| / |
2 ----- 3
Which works out just fine. However, if we swap nodes 0 and 1:
1 ----- 0
| / |
| / |
| / |
2 ----- 3
we get this distance table:
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 30
2: 20 20 10 20
3: 20 30 20 10
Which breaks the deduplicating sort (non-representative first line). In
this case this would just be a renumbering exercise, but it so happens that
we can have a deduplicating sort that goes through the whole table in O(n²)
at the extra cost of a temporary memory allocation (i.e. any form of set).
The ACPI spec (SLIT) mentions distances are encoded on 8 bits. Following
this, implement the set as a 256-bits bitmap. Should this not be
satisfactory (i.e. we want to support 32-bit values), then we'll have to go
for some other sparse set implementation.
This has the added benefit of letting us allocate just the right amount of
memory for sched_domains_numa_distance[], rather than an arbitrary
(nr_node_ids + 1).
Note: DT binding equivalent (distance-map) decodes distances as 32-bit
values.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210122123943.1217-2-valentin.schneider@arm.com
|
|
If the task is pinned to a cpu, setting the misfit status means that
we'll unnecessarily continuously attempt to migrate the task but fail.
This continuous failure will cause the balance_interval to increase to
a high value, and eventually cause unnecessary significant delays in
balancing the system when real imbalance happens.
Caught while testing uclamp where rt-app calibration loop was pinned to
cpu 0, shortly after which we spawn another task with high util_clamp
value. The task was failing to migrate after over 40ms of runtime due to
balance_interval unnecessary expanded to a very high value from the
calibration loop.
Not done here, but it could be useful to extend the check for pinning to
verify that the affinity of the task has a cpu that fits. We could end
up in a similar situation otherwise.
Fixes: 3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Acked-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210119120755.2425264-1-qais.yousef@arm.com
|
|
'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD
doc.2021.01.06a: Documentation updates.
fixes.2021.01.04b: Miscellaneous fixes.
kfree_rcu.2021.01.04a: kfree_rcu() updates.
mmdumpobj.2021.01.22a: Dump allocation point for memory blocks.
nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths.
rt.2021.01.04a: Real-time updates.
stall.2021.01.06a: RCU CPU stall warning updates.
torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API.
tortureall.2021.01.06a: Torture-test script updates.
|
|
Now that we have KTHREAD_IS_PER_CPU to denote the critical per-cpu
tasks to retain during CPU offline, we can relax the warning in
set_cpus_allowed_ptr(). Any spurious kthread that wants to get on at
the last minute will get pushed off before it can run.
While during CPU online there is no harm, and actual benefit, to
allowing kthreads back on early, it simplifies hotplug code and fixes
a number of outstanding races.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lai jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.240724591@infradead.org
|
|
Prior to commit 1cf12e08bc4d ("sched/hotplug: Consolidate task
migration on CPU unplug") we'd leave any task on the dying CPU and
break affinity and force them off at the very end.
This scheme had to change in order to enable migrate_disable(). One
cannot wait for migrate_disable() to complete while stuck in
stop_machine(). Furthermore, since we need at the very least: idle,
hotplug and stop threads at any point before stop_machine, we can't
break affinity and/or push those away.
Under the assumption that all per-cpu kthreads are sanely handled by
CPU hotplug, the new code no long breaks affinity or migrates any of
them (which then includes the critical ones above).
However, there's an important difference between per-cpu kthreads and
kthreads that happen to have a single CPU affinity which is lost. The
latter class very much relies on the forced affinity breaking and
migration semantics previously provided.
Use the new kthread_is_per_cpu() infrastructure to tighten
is_per_cpu_kthread() and fix the hot-unplug problems stemming from the
change.
Fixes: 1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.102416009@infradead.org
|
|
In preparation of using the balance_push state in ttwu() we need it to
provide a reliable and consistent state.
The immediate problem is that rq->balance_callback gets cleared every
schedule() and then re-set in the balance_push_callback() itself. This
is not a reliable signal, so add a variable that stays set during the
entire time.
Also move setting it before the synchronize_rcu() in
sched_cpu_deactivate(), such that we get guaranteed visibility to
ttwu(), which is a preempt-disable region.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.966069627@infradead.org
|
|
We don't need to push away tasks when we come online, mark the push
complete right before the CPU dies.
XXX hotplug state machine has trouble with rollback here.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.415606087@infradead.org
|
|
Since commit
1cf12e08bc4d ("sched/hotplug: Consolidate task migration on CPU unplug")
tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.
Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.
Print the dying rq's tasks to shed some light on the stragglers.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210113183141.11974-1-valentin.schneider@arm.com
|
|
Use the task_current() function where appropriate.
No functional change.
Signed-off-by: Hui Su <sh_def@163.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201030173223.GA52339@rlk
|
|
Active balance is triggered for a number of voluntary cases like misfit
or pinned tasks cases but also after that a number of load balance
attempts failed to migrate a task. There is no need to use active load
balance when the group is overloaded because an overloaded state means
that there is at least one waiting task. Nevertheless, the waiting task
is not selected and detached until the threshold becomes higher than its
load. This threshold increases with the number of failed lb (see the
condition if ((load >> env->sd->nr_balance_failed) > env->imbalance) in
detach_tasks()) and the waiting task will end up to be selected after a
number of attempts.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210107103325.30851-4-vincent.guittot@linaro.org
|
|
Setting LBF_ALL_PINNED during active load balance is only valid when there
is only 1 running task on the rq otherwise this ends up increasing the
balance interval whereas other tasks could migrate after the next interval
once they become cache-cold as an example.
LBF_ALL_PINNED flag is now always set it by default. It is then cleared
when we find one task that can be pulled when calling detach_tasks() or
during active migration.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210107103325.30851-3-vincent.guittot@linaro.org
|
|
Don't waste time checking whether an idle cfs_rq could be the busiest
queue. Furthermore, this can end up selecting a cfs_rq with a high load
but being idle in case of migrate_load.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lkml.kernel.org/r/20210107103325.30851-2-vincent.guittot@linaro.org
|
|
CPU (root cfs_rq) estimated utilization (util_est) is currently used in
dequeue_task_fair() to drive frequency selection before it is updated.
with:
CPU_util : rq->cfs.avg.util_avg
CPU_util_est : rq->cfs.avg.util_est
CPU_utilization : max(CPU_util, CPU_util_est)
task_util : p->se.avg.util_avg
task_util_est : p->se.avg.util_est
dequeue_task_fair():
/* (1) CPU_util and task_util update + inform schedutil about
CPU_utilization changes */
for_each_sched_entity() /* 2 loops */
(dequeue_entity() ->) update_load_avg() -> cfs_rq_util_change()
-> cpufreq_update_util() ->...-> sugov_update_[shared\|single]
-> sugov_get_util() -> cpu_util_cfs()
/* (2) CPU_util_est and task_util_est update */
util_est_dequeue()
cpu_util_cfs() uses CPU_utilization which could lead to a false (too
high) utilization value for schedutil in task ramp-down or ramp-up
scenarios during task dequeue.
To mitigate the issue split the util_est update (2) into:
(A) CPU_util_est update in util_est_dequeue()
(B) task_util_est update in util_est_update()
Place (A) before (1) and keep (B) where (2) is. The latter is necessary
since (B) relies on task_util update in (1).
Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/1608283672-18240-1-git-send-email-xuewen.yan94@gmail.com
|
|
SCHED_SOFTIRQ is raised to trigger periodic load balancing. When CPU is not
active, CPU should not participate in load balancing.
The scheduler uses nohz.idle_cpus_mask to keep track of the CPUs which can
do idle load balancing. When bringing a CPU up the CPU is added to the mask
when it reaches the active state, but on teardown the CPU stays in the mask
until it goes offline and invokes sched_cpu_dying().
When SCHED_SOFTIRQ is raised on a !active CPU, there might be a pending
softirq when stopping the tick which triggers a warning in NOHZ code. The
SCHED_SOFTIRQ can also be raised by the scheduler tick which has the same
issue.
Therefore remove the CPU from nohz.idle_cpus_mask when it is marked
inactive and also prevent the scheduler_tick() from raising SCHED_SOFTIRQ
after this point.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201215104400.9435-1-anna-maria@linutronix.de
|
|
There is nothing schedutil specific in schedutil_cpu_util(), rename it
to effective_cpu_util(). Also create and expose another wrapper
sched_cpu_util() which can be used by other parts of the kernel, like
thermal core (that will be done in a later commit).
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/db011961fb3bb8bef1c0eda5cd64564637d3ef31.1607400596.git.viresh.kumar@linaro.org
|
|
There is nothing schedutil specific in schedutil_cpu_util(), move it to
core.c and define it only for CONFIG_SMP.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/c921a362c78e1324f8ebc5aaa12f53e309c5a8a2.1607400596.git.viresh.kumar@linaro.org
|
|
The try_invoke_on_locked_down_task() function currently requires
that interrupts be enabled, but it is called with interrupts
disabled from rcu_print_task_stall(), resulting in an "IRQs not
enabled as expected" diagnostic. This commit therefore updates
try_invoke_on_locked_down_task() to use raw_spin_lock_irqsave() instead
of raw_spin_lock_irq(), thus allowing use from either context.
Link: https://lore.kernel.org/lkml/000000000000903d5805ab908fc4@google.com/
Link: https://lore.kernel.org/lkml/20200928075729.GC2611@hirez.programming.kicks-ass.net/
Reported-by: syzbot+cb3b69ae80afd6535b0e@syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
"Fix a context switch performance regression"
* tag 'sched-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Optimize finish_lock_switch()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update the CPPC cpufreq driver and intel_pstate (which involves
updating the cpufreq core and the schedutil governor) and make
janitorial changes in the ACPI code handling processor objects.
Specifics:
- Rework the passive-mode "fast switch" path in the intel_pstate
driver to allow it receive the minimum (required) and target
(desired) performance information from the schedutil governor so as
to avoid running some workloads too fast (Rafael Wysocki).
- Make the intel_pstate driver allow the policy max limit to be
increased after the guaranteed performance value for the given CPU
has increased (Rafael Wysocki).
- Clean up the handling of CPU coordination types in the CPPC cpufreq
driver and make it export frequency domains information to user
space via sysfs (Ionela Voinescu).
- Fix the ACPI code handling processor objects to use a correct
coordination type when it fails to map frequency domains and drop a
redundant CPU map initialization from it (Ionela Voinescu, Punit
Agrawal)"
* tag 'pm-5.11-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Use most recent guaranteed performance values
cpufreq: intel_pstate: Implement the ->adjust_perf() callback
cpufreq: Add special-purpose fast-switching callback for drivers
cpufreq: schedutil: Add util to struct sg_cpu
cppc_cpufreq: replace per-cpu data array with a list
cppc_cpufreq: expose information on frequency domains
cppc_cpufreq: clarify support for coordination types
cppc_cpufreq: use policy->cpu as driver of frequency setting
ACPI: processor: fix NONE coordination for domain mapping failure
|
|
* pm-cpufreq:
cpufreq: intel_pstate: Use most recent guaranteed performance values
cpufreq: intel_pstate: Implement the ->adjust_perf() callback
cpufreq: Add special-purpose fast-switching callback for drivers
cpufreq: schedutil: Add util to struct sg_cpu
cppc_cpufreq: replace per-cpu data array with a list
cppc_cpufreq: expose information on frequency domains
cppc_cpufreq: clarify support for coordination types
cppc_cpufreq: use policy->cpu as driver of frequency setting
ACPI: processor: fix NONE coordination for domain mapping failure
ACPI: processor: Drop duplicate setting of shared_cpu_map
|
|
Pull KVM updates from Paolo Bonzini:
"Much x86 work was pushed out to 5.12, but ARM more than made up for it.
ARM:
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
s390:
- memcg accouting for s390 specific parts of kvm and gmap
- selftest for diag318
- new kvm_stat for when async_pf falls back to sync
x86:
- Tracepoints for the new pagetable code from 5.10
- Catch VFIO and KVM irqfd events before userspace
- Reporting dirty pages to userspace with a ring buffer
- SEV-ES host support
- Nested VMX support for wait-for-SIPI activity state
- New feature flag (AVX512 FP16)
- New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
- Selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: SVM: fix 32-bit compilation
KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
KVM: SVM: Provide support to launch and run an SEV-ES guest
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Create trace events for VMGEXIT processing
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update cpufreq (core and drivers), cpuidle (polling state
implementation and the PSCI driver), the OPP (operating performance
points) framework, devfreq (core and drivers), the power capping RAPL
(Running Average Power Limit) driver, the Energy Model support, the
generic power domains (genpd) framework, the ACPI device power
management, the core system-wide suspend code and power management
utilities.
Specifics:
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from the
frequency table as the current frequency and clean up that driver
(Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
(Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables in
DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it to
take the interconnect and OPP information from DT and clean it up
(Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print driver
flags for devices with debug enabled (Alex Shi, Patrice Chotard,
Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan)"
* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
PM: domains: create debugfs nodes when adding power domains
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
PM / EM: Micro optimization in em_cpu_energy
cpufreq: arm_scmi: Discover the power scale in performance protocol
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Generic interrupt and irqchips subsystem updates. Unusually, there is
not a single completely new irq chip driver, just new DT bindings and
extensions of existing drivers to accomodate new variants!
Core:
- Consolidation and robustness changes for irq time accounting
- Cleanup and consolidation of irq stats
- Remove the fasteoi IPI flow which has been proved useless
- Provide an interface for converting legacy interrupt mechanism into
irqdomains
Drivers:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM
optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups"
* tag 'irq-core-2020-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
irqchip/qcom-pdc: Fix phantom irq when changing between rising/falling
driver core: platform: Add devm_platform_get_irqs_affinity()
ACPI: Drop acpi_dev_irqresource_disabled()
resource: Add irqresource_disabled()
genirq/affinity: Add irq_update_affinity_desc()
irqchip/gic-v3-its: Flag device allocation as proxied if behind a PCI bridge
irqchip/gic-v3-its: Tag ITS device as shared if allocating for a proxy device
platform-msi: Track shared domain allocation
irqchip/ti-sci-intr: Fix freeing of irqs
irqchip/ti-sci-inta: Fix printing of inta id on probe success
drivers/irqchip: Remove EZChip NPS interrupt controller
Revert "genirq: Add fasteoi IPI flow"
irqchip/hip04: Make IPIs use handle_percpu_devid_irq()
irqchip/bcm2836: Make IPIs use handle_percpu_devid_irq()
irqchip/armada-370-xp: Make IPIs use handle_percpu_devid_irq()
irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()
irqchip/ocelot: Add support for Jaguar2 platforms
irqchip/ocelot: Add support for Serval platforms
irqchip/ocelot: Add support for Luton platforms
irqchip/ocelot: prepare to support more SoC
...
|
|
First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.
Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.
Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.
Also update the schedutil governor to use the new callback instead
of ->fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Instead of passing util and max between functions while computing the
utilization and capacity, store the former in struct sg_cpu (along
with the latter and bw_dl).
This will allow the current utilization value to be compared with the
one obtained previously (which is requisite for some code changes to
follow this one), but also it causes the code to look slightly more
consistent and cleaner.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.11
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
|
|
* pm-cpufreq: (31 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
cpufreq: arm_scmi: Discover the power scale in performance protocol
firmware: arm_scmi: Add power_scale_mw_get() interface
cpufreq: tegra194: Rename tegra194_get_speed_common function
cpufreq: tegra194: Remove unnecessary frequency calculation
cpufreq: tegra186: Simplify cluster information lookup
cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning
cpufreq: imx: fix NVMEM_IMX_OCOTP dependency
cpufreq: vexpress-spc: Add missing MODULE_ALIAS
cpufreq: scpi: Add missing MODULE_ALIAS
cpufreq: loongson1: Add missing MODULE_ALIAS
cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE
cpufreq: st: Add missing MODULE_DEVICE_TABLE
cpufreq: qcom: Add missing MODULE_DEVICE_TABLE
cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE
cpufreq: highbank: Add missing MODULE_DEVICE_TABLE
...
|
|
The kernel test robot measured a -1.6% performance regression on
will-it-scale/sched_yield due to commit:
2558aacff858 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
Even though we were careful to replace a single load with another
single load from the same cacheline.
Restore finish_lock_switch() to the exact state before the offending
patch and solve the problem differently.
Fixes: 2558aacff858 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201210161408.GX3021@hirez.programming.kicks-ass.net
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates for 5.11 from Marc Zyngier:
- Preliminary support for managed interrupts on platform devices
- Correctly identify allocation of MSIs proxyied by another device
- Remove the fasteoi IPI flow which has been proved useless
- Generalise the Ocelot support to new SoCs
- Improve GICv4.1 vcpu entry, matching the corresponding KVM optimisation
- Work around spurious interrupts on Qualcomm PDC
- Random fixes and cleanups
Link: https://lore.kernel.org/r/20201212135626.1479884-1-maz@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kmap updates from Thomas Gleixner:
"The new preemtible kmap_local() implementation:
- Consolidate all kmap_atomic() internals into a generic
implementation which builds the base for the kmap_local() API and
make the kmap_atomic() interface wrappers which handle the
disabling/enabling of preemption and pagefaults.
- Switch the storage from per-CPU to per task and provide scheduler
support for clearing mapping when scheduling out and restoring them
when scheduling back in.
- Merge the migrate_disable/enable() code, which is also part of the
scheduler pull request. This was required to make the kmap_local()
interface available which does not disable preemption when a
mapping is established. It has to disable migration instead to
guarantee that the virtual address of the mapped slot is the same
across preemption.
- Provide better debug facilities: guard pages and enforced
utilization of the mapping mechanics on 64bit systems when the
architecture allows it.
- Provide the new kmap_local() API which can now be used to cleanup
the kmap_atomic() usage sites all over the place. Most of the usage
sites do not require the implicit disabling of preemption and
pagefaults so the penalty on 64bit and 32bit non-highmem systems is
removed and quite some of the code can be simplified. A wholesale
conversion is not possible because some usage depends on the
implicit side effects and some need to be cleaned up because they
work around these side effects.
The migrate disable side effect is only effective on highmem
systems and when enforced debugging is enabled. On 64bit and 32bit
non-highmem systems the overhead is completely avoided"
* tag 'core-mm-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
ARM: highmem: Fix cache_is_vivt() reference
x86/crashdump/32: Simplify copy_oldmem_page()
io-mapping: Provide iomap_local variant
mm/highmem: Provide kmap_local*
sched: highmem: Store local kmaps in task struct
x86: Support kmap_local() forced debugging
mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP
mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL
microblaze/mm/highmem: Add dropped #ifdef back
xtensa/mm/highmem: Make generic kmap_atomic() work correctly
mm/highmem: Take kmap_high_get() properly into account
highmem: High implementation details and document API
Documentation/io-mapping: Remove outdated blurb
io-mapping: Cleanup atomic iomap
mm/highmem: Remove the old kmap_atomic cruft
highmem: Get rid of kmap_types.h
xtensa/mm/highmem: Switch to generic kmap atomic
sparc/mm/highmem: Switch to generic kmap atomic
powerpc/mm/highmem: Switch to generic kmap atomic
nds32/mm/highmem: Switch to generic kmap atomic
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/exit updates from Thomas Gleixner:
"A set of updates for entry/exit handling:
- More generalization of entry/exit functionality
- The consolidation work to reclaim TIF flags on x86 and also for
non-x86 specific TIF flags which are solely relevant for syscall
related work and have been moved into their own storage space. The
x86 specific part had to be merged in to avoid a major conflict.
- The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
delivery mode of task work and results in an impressive performance
improvement for io_uring. The non-x86 consolidation of this is
going to come seperate via Jens.
- The selective syscall redirection facility which provides a clean
and efficient way to support the non-Linux syscalls of WINE by
catching them at syscall entry and redirecting them to the user
space emulation. This can be utilized for other purposes as well
and has been designed carefully to avoid overhead for the regular
fastpath. This includes the core changes and the x86 support code.
- Simplification of the context tracking entry/exit handling for the
users of the generic entry code which guarantee the proper ordering
and protection.
- Preparatory changes to make the generic entry code accomodate S390
specific requirements which are mostly related to their syscall
restart mechanism"
* tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
entry: Add syscall_exit_to_user_mode_work()
entry: Add exit_to_user_mode() wrapper
entry_Add_enter_from_user_mode_wrapper
entry: Rename exit_to_user_mode()
entry: Rename enter_from_user_mode()
docs: Document Syscall User Dispatch
selftests: Add benchmark for syscall user dispatch
selftests: Add kselftest for syscall user dispatch
entry: Support Syscall User Dispatch on common syscall entry
kernel: Implement selective syscall userspace redirection
signal: Expose SYS_USER_DISPATCH si_code type
x86: vdso: Expose sigreturn address on vdso to the kernel
MAINTAINERS: Add entry for common entry code
entry: Fix boot for !CONFIG_GENERIC_ENTRY
x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs
sched: Detect call to schedule from critical entry code
context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK
x86: Reclaim unused x86 TI flags
...
|
|
Rearrange a conditional to make it more straightforward.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
idle_balance() has been renamed to newidle_balance(). To differentiate
with nohz_idle_balance, it seems refining the comment will be helpful
for the readers of the code.
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20201202220641.22752-1-song.bao.hua@hisilicon.com
|