Age | Commit message (Collapse) | Author |
|
The atomic_long_cmpxchg_acquire() in rwsem_try_read_lock_unqueued() is
replaced by atomic_long_try_cmpxchg_acquire() to simpify the code and
generate slightly better assembly code.
There is no functional change.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20190404174320.22416-5-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
We don't need to expose rwsem internal functions which are not supposed
to be called directly from other kernel code.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20190404174320.22416-4-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Move all the owner setting code closer to the rwsem-xadd fast paths
directly within rwsem.h file as well as in the slowpaths where owner
setting is done after acquring the lock. This will enable us to add
DEBUG_RWSEMS check in a later patch to make sure that read lock is
really acquired when rwsem_down_read_failed() returns, for instance.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20190404174320.22416-3-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The rwsem_down_read_failed*() functions were relocated from above the
optimistic spinning section to below that section. This enables the
reader functions to use optimisitic spinning in future patches. There
is no code change.
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/20190404174320.22416-2-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While staring at build_sched_domains(), I realized that get_group()
does several duplicate (thus useless) writes.
If you take the Arm Juno r0 (LITTLEs = [0, 3, 4, 5], bigs = [1, 2]), the
sched_group build flow would look like this:
('MC[cpu]->sg' means 'per_cpu_ptr(&tl->data->sg, cpu)' with 'tl == MC')
build_sched_groups(MC[CPU0]->sd, CPU0)
get_group(0) -> MC[CPU0]->sg
get_group(3) -> MC[CPU3]->sg
get_group(4) -> MC[CPU4]->sg
get_group(5) -> MC[CPU5]->sg
build_sched_groups(DIE[CPU0]->sd, CPU0)
get_group(0) -> DIE[CPU0]->sg
get_group(1) -> DIE[CPU1]->sg <=================+
|
build_sched_groups(MC[CPU1]->sd, CPU1) |
get_group(1) -> MC[CPU1]->sg |
get_group(2) -> MC[CPU2]->sg |
|
build_sched_groups(DIE[CPU1]->sd, CPU1) ^
get_group(1) -> DIE[CPU1]->sg } We've set up these two up here!
get_group(3) -> DIE[CPU0]->sg }
From this point on, we will only use sched_groups that have been
previously visited & initialized. The only new operation will
be which group pointer we affect to sd->groups.
On the Juno r0 we get 32 get_group() calls, every single one of them
writing to a sched_group->cpumask. However, all of the data structures
we need are set up after 8 visits (see above).
Return early from get_group() if we've already visited (and thus
initialized) the sched_group we're looking at. Overlapping domains
are not affected as they do not use build_sched_groups().
Tested on a Juno and a 2 * (Xeon E5-2690) system.
( FWIW I initially checked the refs for both sg && sg->sgc, but figured if
they weren't both 0 or > 1 then something must have gone wrong, so I
threw in a WARN_ON(). )
No change in functionality intended.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The comment was introduced (pre 2.6.12) by:
8a7a2318dc07 ("[PATCH] sched: consolidate sched domains")
and referred to sched_group->cpu_power. This was folded into
sched_group->sched_group_power in
commit 9c3f75cbd144 ("sched: Break out cpu_power from the sched_group structure")
The comment was then updated in:
ced549fa5fc1 ("sched: Remove remaining dubious usage of "power"")
but should have replaced "sg->cpu_capacity" with
"sg->sched_group_capacity". Do that now.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: qais.yousef@arm.com
Link: http://lkml.kernel.org/r/20190409173546.4747-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Given we'll be reusing BPF array maps for global data/bss/rodata
sections, we need a way to associate BTF DataSec type as its map
value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR()
macro hack e.g. via 38d5d3b3d5db ("bpf: Introduce BPF_ANNOTATE_KV_PAIR")
to get initial map to type association going. While more use cases
for it are discouraged, this also won't work for global data since
the use of array map is a BPF loader detail and therefore unknown
at compilation time. For array maps with just a single entry we make
an exception in terms of BTF in that key type is declared optional
if value type is of DataSec type. The latter LLVM is guaranteed to
emit and it also aligns with how we regard global data maps as just
a plain buffer area reusing existing map facilities for allowing
things like introspection with existing tools.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This work adds kernel-side verification, logging and seq_show dumping
of BTF Var and DataSec kinds which are emitted with latest LLVM. The
following constraints apply:
BTF Var must have:
- Its kind_flag is 0
- Its vlen is 0
- Must point to a valid type
- Type must not resolve to a forward type
- Size of underlying type must be > 0
- Must have a valid name
- Can only be a source type, not sink or intermediate one
- Name may include dots (e.g. in case of static variables
inside functions)
- Cannot be a member of a struct/union
- Linkage so far can either only be static or global/allocated
BTF DataSec must have:
- Its kind_flag is 0
- Its vlen cannot be 0
- Its size cannot be 0
- Must have a valid name
- Can only be a source type, not sink or intermediate one
- Name may include dots (e.g. to represent .bss, .data, .rodata etc)
- Cannot be a member of a struct/union
- Inner btf_var_secinfo array with {type,offset,size} triple
must be sorted by offset in ascending order
- Type must always point to BTF Var
- BTF resolved size of Var must be <= size provided by triple
- DataSec size must be >= sum of triple sizes (thus holes
are allowed)
btf_var_resolve(), btf_ptr_resolve() and btf_modifier_resolve()
are on a high level quite similar but each come with slight,
subtle differences. They could potentially be a bit refactored
in future which hasn't been done here to ease review.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Trivial addition to allow '.' aside from '_' as "special" characters
in the object name. Used to allow for substrings in maps from loader
side such as ".bss", ".data", ".rodata", but could also be useful for
other purposes.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch adds a new BPF_MAP_FREEZE command which allows to
"freeze" the map globally as read-only / immutable from syscall
side.
Map permission handling has been refactored into map_get_sys_perms()
and drops FMODE_CAN_WRITE in case of locked map. Main use case is
to allow for setting up .rodata sections from the BPF ELF which
are loaded into the kernel, meaning BPF loader first allocates
map, sets up map value by copying .rodata section into it and once
complete, it calls BPF_MAP_FREEZE on the map fd to prevent further
modifications.
Right now BPF_MAP_FREEZE only takes map fd as argument while remaining
bpf_attr members are required to be zero. I didn't add write-only
locking here as counterpart since I don't have a concrete use-case
for it on my side, and I think it makes probably more sense to wait
once there is actually one. In that case bpf_attr can be extended
as usual with a flag field and/or others where flag 0 means that
we lock the map read-only hence this doesn't prevent to add further
extensions to BPF_MAP_FREEZE upon need.
A map creation flag like BPF_F_WRONCE was not considered for couple
of reasons: i) in case of a generic implementation, a map can consist
of more than just one element, thus there could be multiple map
updates needed to set the map into a state where it can then be
made immutable, ii) WRONCE indicates exact one-time write before
it is then set immutable. A generic implementation would set a bit
atomically on map update entry (if unset), indicating that every
subsequent update from then onwards will need to bail out there.
However, map updates can fail, so upon failure that flag would need
to be unset again and the update attempt would need to be repeated
for it to be eventually made immutable. While this can be made
race-free, this approach feels less clean and in combination with
reason i), it's not generic enough. A dedicated BPF_MAP_FREEZE
command directly sets the flag and caller has the guarantee that
map is immutable from syscall side upon successful return for any
future syscall invocations that would alter the map state, which
is also more intuitive from an API point of view. A command name
such as BPF_MAP_LOCK has been avoided as it's too close with BPF
map spin locks (which already has BPF_F_LOCK flag). BPF_MAP_FREEZE
is so far only enabled for privileged users.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This work adds two new map creation flags BPF_F_RDONLY_PROG
and BPF_F_WRONLY_PROG in order to allow for read-only or
write-only BPF maps from a BPF program side.
Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
applies to system call side, meaning the BPF program has full
read/write access to the map as usual while bpf(2) calls with
map fd can either only read or write into the map depending
on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
for the exact opposite such that verifier is going to reject
program loads if write into a read-only map or a read into a
write-only map is detected. For read-only map case also some
helpers are forbidden for programs that would alter the map
state such as map deletion, update, etc. As opposed to the two
BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well
as BPF_F_WRONLY_PROG really do correspond to the map lifetime.
We've enabled this generic map extension to various non-special
maps holding normal user data: array, hash, lru, lpm, local
storage, queue and stack. Further generic map types could be
followed up in future depending on use-case. Main use case
here is to forbid writes into .rodata map values from verifier
side.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Both BPF_F_WRONLY / BPF_F_RDONLY flags are tied to the map file
descriptor, but not to the map object itself! Meaning, at map
creation time BPF_F_RDONLY can be set to make the map read-only
from syscall side, but this holds only for the returned fd, so
any other fd either retrieved via bpf file system or via map id
for the very same underlying map object can have read-write access
instead.
Given that, keeping the two flags around in the map_flags attribute
and exposing them to user space upon map dump is misleading and
may lead to false conclusions. Since these two flags are not
tied to the map object lets also not store them as map property.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This generic extension to BPF maps allows for directly loading
an address residing inside a BPF map value as a single BPF
ldimm64 instruction!
The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts. For the newly added
BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
the first part of the double insns's imm field is again a file
descriptor corresponding to the map, and the second part of the
imm field is an offset into the value. The verifier will then
replace both imm parts with an address that points into the BPF
map value at the given value offset for maps that support this
operation. Currently supported is array map with single entry.
It is possible to support more than just single map element by
reusing both 16bit off fields of the insns as a map index, so
full array map lookup could be expressed that way. It hasn't
been implemented here due to lack of concrete use case, but
could easily be done so in future in a compatible way, since
both off fields right now have to be 0 and would correctly
denote a map index 0.
The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
map pointer versus load of map's value at offset 0, and changing
BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
regular map pointer and map value pointer would add unnecessary
complexity and increases barrier for debugability thus less
suitable. Using the second part of the imm field as an offset
into the value does /not/ come with limitations since maximum
possible value size is in u32 universe anyway.
This optimization allows for efficiently retrieving an address
to a map value memory area without having to issue a helper call
which needs to prepare registers according to calling convention,
etc, without needing the extra NULL test, and without having to
add the offset in an additional instruction to the value base
pointer. The verifier then treats the destination register as
PTR_TO_MAP_VALUE with constant reg->off from the user passed
offset from the second imm field, and guarantees that this is
within bounds of the map value. Any subsequent operations are
normally treated as typical map value handling without anything
extra needed from verification side.
The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
To be able to predict the sleep duration for a CPU entering idle, it
is essential to know the expiration time of the next timer. Both the
teo and the menu cpuidle governors already use this information for
CPU idle state selection.
Moving forward, a similar prediction needs to be made for a group of
idle CPUs rather than for a single one and the following changes
implement a new genpd governor for that purpose.
In order to support that feature, add a new function called
tick_nohz_get_next_hrtimer() that will return the next hrtimer
expiration time of a given CPU to be invoked after deciding
whether or not to stop the scheduler tick on that CPU.
Make the cpuidle core call tick_nohz_get_next_hrtimer() right
before invoking the ->enter() callback provided by the cpuidle
driver for the given state and store its return value in the
per-CPU struct cpuidle_device, so as to make it available to code
outside of cpuidle.
Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
the governor's ->select() callback has already returned and indicated
whether or not the tick should be stopped, so in fact the value
returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
expiration time for the given CPU, possibly including the tick (if
it hasn't been stopped).
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD
consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups.
doc.2019.03.26b: Documentation updates.
fixes.2019.03.26b: Miscellaneous fixes.
srcu.2019.03.26b: SRCU updates.
stall.2019.03.26b: RCU CPU stall warning updates.
torture.2019.03.26b: Torture-test updates.
|
|
%pF and %pf are functionally equivalent to %pS and %ps conversion
specifiers. The former are deprecated, therefore switch the current users
to use the preferred variant.
The changes have been produced by the following command:
git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done
And verifying the result.
Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: xen-devel@lists.xenproject.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: drbd-dev@lists.linbit.com
Cc: linux-block@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-pci@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-btrfs@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: linux-mm@kvack.org
Cc: ceph-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: David Sterba <dsterba@suse.com> (for btrfs)
Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c)
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
|
|
The original conditions that led to the use of list_empty() to optimize
list_for_each_entry_rcu() in auditfilter.c and auditsc.c code have been
removed without removing the list_empty() call, but this code example
has been copied several times. Remove the unnecessary list_empty()
calls.
Please see upstream github issue
https://github.com/linux-audit/audit-kernel/issues/112
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
there is a similar helper in net/netfilter/nf_tables_api.c,
this maybe become a common request someday, so move it to
time.c
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
There is no need for sched_rcu. The undocumented reason why sched_rcu
is used is to avoid a few explicit rcu_read_lock()/unlock() pairs by
the fact that sched_rcu reader side critical sections are also protected
by preempt or irq disabled regions.
Replace rcu_read_lock_sched with rcu_read_lock and acquire the RCU lock
where it is not yet explicit acquired. Replace local_irq_disable() with
rcu_read_lock(). Update asserts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: mangle changelog a little]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Now that we removed support for the NULL device argument in the DMA API,
there is no need to cater for that in the x86 code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Most dma_map_ops implementations already had some issues with a NULL
device, or did simply crash if one was fed to them. Now that we have
cleaned up all the obvious offenders we can stop to pretend we
support this mode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When DMA_REMAP is enabled, code in remap.c needs generic allocator.
It currently worked since few architectures uses it (arm64, csky) and
they both select GENERIC_ALLOCATOR. Select it when using DMA_REMAP
to have correct dependencies.
Signed-off-by: Clement Leger <clement.leger@kalray.eu>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When CONFIG_RCU_TRACE is not set, all these tracepoints are defined as
do-nothing macro.
We'd better make those inline functions that take proper arguments.
As RCU_TRACE() is defined as do-nothing marco as well when
CONFIG_RCU_TRACE is not set, so we can clean it up.
Link: http://lkml.kernel.org/r/1553602391-11926-4-git-send-email-laoar.shao@gmail.com
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Add brief blurb about error_log to the 'Important files' section.
Link: http://lkml.kernel.org/r/c81e60f9aded495081231a32d2d1023c4d043a7a.1554072478.git.tom.zanussi@linux.intel.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
As each instance has their own error_log file, it makes more sense that the
instances show the errors of their own instead of all error_logs having the
same data. Make it that the errors show up in the instance error_log file
that the error happens in. If no instance trace_array is available, then
NULL can be passed in which will create the error in the top level instance
(the one at the top of the tracefs directory).
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Have the trace_array that associates the trace instance of the histogram
passed around to functions so that error handling can display the error
message in the proper instance.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Pass in the trace_array that represents the instance the filter being
changed is in to create_event_filter(). This will allow for error messages
that happen when writing to the filter can be displayed in the proper
instance "error_log" file.
Note, for calls to create_filter() (that was also modified to support
create_event_filter()), that changes filters that do not exist in a instance
(for perf for example), NULL may be passed in, which means that there will
not be any message to log for that filter.
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
gpio tracing was made configurable in 4.4-rc1 (commit ddd70280bf0e
("tracing: gpio: Add Kconfig option for enabling/disabling trace
events")). Since then it is the only event type that can be compiled
conditionally. Given that there is only little overhead I don't
understand the reasoning and I was annoyed more than once that gpio
events were not available without recompiling.
So drop the Kconfig symbol and make gpio events available
unconditionally.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Removing explicit calls to mmiowb() from driver code means that we must
now call into the generic mmiowb_spin_{lock,unlock}() functions from the
core spinlock code. In order to elide barriers following critical
sections without any I/O writes, we also hook into the asm-generic I/O
routines.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
In preparation for removing all explicit mmiowb() calls from driver
code, implement a tracking system in asm-generic based loosely on the
PowerPC implementation. This allows architectures with a non-empty
mmiowb() definition to have the barrier automatically inserted in
spin_unlock() following a critical section containing an I/O write.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
There is not reason for the minimum iowait boost value in the
schedutil cpufreq governor to depend on the available range of CPU
frequencies. In fact, that dependency is generally confusing,
because it causes the iowait boost to behave somewhat differently
on CPUs with the same maximum frequency and different minimum
frequencies, for example.
For this reason, replace the min field in struct sugov_cpu
with a constant and choose its values to be 1/8 of
SCHED_CAPACITY_SCALE (for consistency with the intel_pstate
driver's internal governor).
[Note that policy->cpuinfo.max_freq will not be a constant any more
after a subsequent change, so this change is depended on by it.]
Link: https://lore.kernel.org/lkml/20190305083202.GU32494@hirez.programming.kicks-ass.net/T/#ee20bdc98b7d89f6110c0d00e5c3ee8c2ced93c3d
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Merge misc fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kernel/sysctl.c: fix out-of-bounds access when setting file-max
mm/util.c: fix strndup_user() comment
sh: fix multiple function definition build errors
MAINTAINERS: add maintainer and replacing reviewer ARM/NUVOTON NPCM
MAINTAINERS: fix bad pattern in ARM/NUVOTON NPCM
mm: writeback: use exact memcg dirty counts
psi: clarify the units used in pressure files
mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
hugetlbfs: fix memory leak for resv_map
mm: fix vm_fault_t cast in VM_FAULT_GET_HINDEX()
lib/lzo: fix bugs for very short or empty input
include/linux/bitrev.h: fix constant bitrev
kmemleak: powerpc: skip scanning holes in the .bss section
lib/string.c: implement a basic bcmp
|
|
Commit 32a5ad9c2285 ("sysctl: handle overflow for file-max") hooked up
min/max values for the file-max sysctl parameter via the .extra1 and
.extra2 fields in the corresponding struct ctl_table entry.
Unfortunately, the minimum value points at the global 'zero' variable,
which is an int. This results in a KASAN splat when accessed as a long
by proc_doulongvec_minmax on 64-bit architectures:
| BUG: KASAN: global-out-of-bounds in __do_proc_doulongvec_minmax+0x5d8/0x6a0
| Read of size 8 at addr ffff2000133d1c20 by task systemd/1
|
| CPU: 0 PID: 1 Comm: systemd Not tainted 5.1.0-rc3-00012-g40b114779944 #2
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x228
| show_stack+0x14/0x20
| dump_stack+0xe8/0x124
| print_address_description+0x60/0x258
| kasan_report+0x140/0x1a0
| __asan_report_load8_noabort+0x18/0x20
| __do_proc_doulongvec_minmax+0x5d8/0x6a0
| proc_doulongvec_minmax+0x4c/0x78
| proc_sys_call_handler.isra.19+0x144/0x1d8
| proc_sys_write+0x34/0x58
| __vfs_write+0x54/0xe8
| vfs_write+0x124/0x3c0
| ksys_write+0xbc/0x168
| __arm64_sys_write+0x68/0x98
| el0_svc_common+0x100/0x258
| el0_svc_handler+0x48/0xc0
| el0_svc+0x8/0xc
|
| The buggy address belongs to the variable:
| zero+0x0/0x40
|
| Memory state around the buggy address:
| ffff2000133d1b00: 00 00 00 00 00 00 00 00 fa fa fa fa 04 fa fa fa
| ffff2000133d1b80: fa fa fa fa 04 fa fa fa fa fa fa fa 04 fa fa fa
| >ffff2000133d1c00: fa fa fa fa 04 fa fa fa fa fa fa fa 00 00 00 00
| ^
| ffff2000133d1c80: fa fa fa fa 00 fa fa fa fa fa fa fa 00 00 00 00
| ffff2000133d1d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fix the splat by introducing a unsigned long 'zero_ul' and using that
instead.
Link: http://lkml.kernel.org/r/20190403153409.17307-1-will.deacon@arm.com
Fixes: 32a5ad9c2285 ("sysctl: handle overflow for file-max")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Christian Brauner <christian@brauner.io>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull syscall-get-arguments cleanup and fixes from Steven Rostedt:
"Andy Lutomirski approached me to tell me that the
syscall_get_arguments() implementation in x86 was horrible and gcc
certainly gets it wrong.
He said that since the tracepoints only pass in 0 and 6 for i and n
repectively, it should be optimized for that case. Inspecting the
kernel, I discovered that all users pass in 0 for i and only one file
passing in something other than 6 for the number of arguments. That
code happens to be my own code used for the special syscall tracing.
That can easily be converted to just using 0 and 6 as well, and only
copying what is needed. Which is probably the faster path anyway for
that case.
Along the way, a couple of real fixes came from this as the
syscall_get_arguments() function was incorrect for csky and riscv.
x86 has been optimized to for the new interface that removes the
variable number of arguments, but the other architectures could still
use some loving and take more advantage of the simpler interface"
* tag 'trace-5.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
syscalls: Remove start and number from syscall_set_arguments() args
syscalls: Remove start and number from syscall_get_arguments() args
csky: Fix syscall_get_arguments() and syscall_set_arguments()
riscv: Fix syscall_get_arguments() and syscall_set_arguments()
tracing/syscalls: Pass in hardcoded 6 into syscall_get_arguments()
ptrace: Remove maxargs from task_current_syscall()
|
|
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The previous approach based on the variance was discarding values from
the timings when they were considered as anomalies as stated by the
normal law statistical model.
However in the interrupt life, there can be multiple anomalies due to the
nature of the device generating the interrupts, and most of the time a
repeating pattern can be observed, that is particulary true for network,
console, MMC or SSD devices.
The variance approach missed the patterns and it was only able to deal with
the interrupt coming in regular intervals, thus reducing considerably the
scope of what is predictable.
In order to find out the repeating patterns, the interrupt intervals are
grouped in a ilog2 basis to create a suite of numbers with small
amplitude. Every group contains an exponential moving average of the values
belonging to the group. The array suffix, a data structure used for string
searching, data compression, etc ..., is built from the suite of numbers
and the suffixes are then searched in this suite.
The tests showed the algorithm is able to find all repeating patterns,
as well as regular interval in less than 1us on x86-i7.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rjw@rjwysocki.net
Cc: ulf.hansson@linaro.org
Cc: linux-pm@vger.kernel.org
Link: https://lkml.kernel.org/r/20190328151336.5316-2-daniel.lezcano@linaro.org
|
|
The variance computation did not provide the expected results and will be
replaced with a different approach to compute the next interrupt based on
the array suffixes derived algorithm.
There is no good way to transform the variance code to the new algorithm,
so for ease of review remove the existing code first.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: rjw@rjwysocki.net
Cc: ulf.hansson@linaro.org
Cc: linux-pm@vger.kernel.org
Link: https://lkml.kernel.org/r/20190328151336.5316-1-daniel.lezcano@linaro.org
|
|
If a child irqchip calls irq_chip_set_wake_parent() but its parent irqchip
has the IRQCHIP_SKIP_SET_WAKE flag set an error is returned.
This is inconsistent behaviour vs. set_irq_wake_real() which returns 0 when
the irqchip has the IRQCHIP_SKIP_SET_WAKE flag set. It doesn't attempt to
walk the chain of parents and set irq wake on any chips that don't have the
flag set either. If the intent is to call the .irq_set_wake() callback of
the parent irqchip, then we expect irqchip implementations to omit the
IRQCHIP_SKIP_SET_WAKE flag and implement an .irq_set_wake() function that
calls irq_chip_set_wake_parent().
The problem has been observed on a Qualcomm sdm845 device where set wake
fails on any GPIO interrupts after applying work in progress wakeup irq
patches to the GPIO driver. The chain of chips looks like this:
QCOM GPIO -> QCOM PDC (SKIP) -> ARM GIC (SKIP)
The GPIO controllers parent is the QCOM PDC irqchip which in turn has ARM
GIC as parent. The QCOM PDC irqchip has the IRQCHIP_SKIP_SET_WAKE flag
set, and so does the grandparent ARM GIC.
The GPIO driver doesn't know if the parent needs to set wake or not, so it
unconditionally calls irq_chip_set_wake_parent() causing this function to
return a failure because the parent irqchip (PDC) doesn't have the
.irq_set_wake() callback set. Returning 0 instead makes everything work and
irqs from the GPIO controller can be configured for wakeup.
Make it consistent by returning 0 (success) from irq_chip_set_wake_parent()
when a parent chip has IRQCHIP_SKIP_SET_WAKE set.
[ tglx: Massaged changelog ]
Fixes: 08b55e2a9208e ("genirq: Add irqchip_set_wake_parent")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-gpio@vger.kernel.org
Cc: Lina Iyer <ilina@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190325181026.247796-1-swboyd@chromium.org
|
|
check_stack_access() that prints verbose log is used in
adjust_ptr_min_max_vals() that prints its own verbose log and now they
stick together, e.g.:
variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16
size=1R2 stack pointer arithmetic goes out of range, prohibited for
!root
Add missing newline so that log is more readable:
variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1
R2 stack pointer arithmetic goes out of range, prohibited for !root
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
As discussed in [1] max value of variable offset has to be checked for
overflow on stack access otherwise verifier would accept code like this:
0: (b7) r2 = 6
1: (b7) r3 = 28
2: (7a) *(u64 *)(r10 -16) = 0
3: (7a) *(u64 *)(r10 -8) = 0
4: (79) r4 = *(u64 *)(r1 +168)
5: (c5) if r4 s< 0x0 goto pc+4
R1=ctx(id=0,off=0,imm=0) R2=inv6 R3=inv28
R4=inv(id=0,umax_value=9223372036854775807,var_off=(0x0;
0x7fffffffffffffff)) R10=fp0,call_-1 fp-8=mmmmmmmm fp-16=mmmmmmmm
6: (17) r4 -= 16
7: (0f) r4 += r10
8: (b7) r5 = 8
9: (85) call bpf_getsockopt#57
10: (b7) r0 = 0
11: (95) exit
, where R4 obviosly has unbounded max value.
Fix it by checking that reg->smax_value is inside (-BPF_MAX_VAR_OFF;
BPF_MAX_VAR_OFF) range.
reg->smax_value is used instead of reg->umax_value because stack
pointers are calculated using negative offset from fp. This is opposite
to e.g. map access where offset must be non-negative and where
umax_value is used.
Also dedicated verbose logs are added for both min and max bound check
failures to have diagnostics consistent with variable offset handling in
check_map_access().
[1] https://marc.info/?l=linux-netdev&m=155433357510597&w=2
Fixes: 2011fccfb61b ("bpf: Support variable offset stack access from helpers")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Proper support of indirect stack access with variable offset in
unprivileged mode (!root) requires corresponding support in Spectre
masking for stack ALU in retrieve_ptr_limit().
There are no use-case for variable offset in unprivileged mode though so
make verifier reject such accesses for simplicity.
Pointer arithmetics is one (and only?) way to cause variable offset and
it's already rejected in unpriv mode so that verifier won't even get to
helper function whose argument contains variable offset, e.g.:
0: (7a) *(u64 *)(r10 -16) = 0
1: (7a) *(u64 *)(r10 -8) = 0
2: (61) r2 = *(u32 *)(r1 +0)
3: (57) r2 &= 4
4: (17) r2 -= 16
5: (0f) r2 += r10
variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1R2
stack pointer arithmetic goes out of range, prohibited for !root
Still it looks like a good idea to reject variable offset indirect stack
access for unprivileged mode in check_stack_boundary() explicitly.
Fixes: 2011fccfb61b ("bpf: Support variable offset stack access from helpers")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
It's hard to guarantee that whole memory is marked as initialized on
helper return if uninitialized stack is accessed with variable offset
since specific bounds are unknown to verifier. This may cause
uninitialized stack leaking.
Reject such an access in check_stack_boundary to prevent possible
leaking.
There are no known use-cases for indirect uninitialized stack access
with variable offset so it shouldn't break anything.
Fixes: 2011fccfb61b ("bpf: Support variable offset stack access from helpers")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
At Linux Plumbers, Andy Lutomirski approached me and pointed out that the
function call syscall_get_arguments() implemented in x86 was horribly
written and not optimized for the standard case of passing in 0 and 6 for
the starting index and the number of system calls to get. When looking at
all the users of this function, I discovered that all instances pass in only
0 and 6 for these arguments. Instead of having this function handle
different cases that are never used, simply rewrite it to return the first 6
arguments of a system call.
This should help out the performance of tracing system calls by ptrace,
ftrace and perf.
Link: http://lkml.kernel.org/r/20161107213233.754809394@goodmis.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dave Martin <dave.martin@arm.com>
Cc: "Dmitry V. Levin" <ldv@altlinux.org>
Cc: x86@kernel.org
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Cc: linux-hexagon@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
Acked-by: Max Filippov <jcmvbkbc@gmail.com> # For xtensa changes
Acked-by: Will Deacon <will.deacon@arm.com> # For the arm64 bits
Reviewed-by: Thomas Gleixner <tglx@linutronix.de> # for x86
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When CONFIG_SPARSE_IRQ is disable, the request_mutex in struct irq_desc
is not initialized which causes malfunction.
Fixes: 9114014cf4e6 ("genirq: Add mutex to irq desc to serialize request/free_irq()")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190404074512.145533-1-wangkefeng.wang@huawei.com
|
|
Pull networking fixes from David Miller:
1) Several hash table refcount fixes in batman-adv, from Sven
Eckelmann.
2) Use after free in bpf_evict_inode(), from Daniel Borkmann.
3) Fix mdio bus registration in ixgbe, from Ivan Vecera.
4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.
5) ila rhashtable corruption fix from Herbert Xu.
6) Don't allow upper-devices to be added to vrf devices, from Sabrina
Dubroca.
7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.
8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
from Alexander Lobakin.
9) Missing IDR checks in mlx5 driver, from Aditya Pakki.
10) Fix false connection termination in ktls, from Jakub Kicinski.
11) Work around some ASPM issues with r8169 by disabling rx interrupt
coalescing on certain chips. From Heiner Kallweit.
12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
Abeni.
13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.
14) Various BPF flow dissector fixes from Stanislav Fomichev.
15) Divide by zero in act_sample, from Davide Caratti.
16) Fix bridging multicast regression introduced by rhashtable
conversion, from Nikolay Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
ibmvnic: Fix completion structure initialization
ipv6: sit: reset ip header pointer in ipip6_rcv
net: bridge: always clear mcast matching struct on reports and leaves
libcxgb: fix incorrect ppmax calculation
vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
sch_cake: Make sure we can write the IP header before changing DSCP bits
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
tcp: Ensure DCTCP reacts to losses
net/sched: act_sample: fix divide by zero in the traffic path
net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
net: hns: Fix sparse: some warnings in HNS drivers
net: hns: Fix WARNING when remove HNS driver with SMMU enabled
net: hns: fix ICMP6 neighbor solicitation messages discard problem
net: hns: Fix probabilistic memory overwrite when HNS driver initialized
net: hns: Use NAPI_POLL_WEIGHT for hns driver
net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
flow_dissector: rst'ify documentation
ipv6: Fix dangling pointer when ipv6 fragment
net-gro: Fix GRO flush when receiving a GSO packet.
flow_dissector: document BPF flow dissector environment
...
|
|
What happens there is that we are replacing file->path.mnt of
a file we'd just opened with a clone and we need the write
count contribution to be transferred from original mount to
new one. That's it. We do *NOT* want any kind of freeze
protection for the duration of switchover.
IOW, we should just use __mnt_{want,drop}_write() for that
switchover; no need to bother with mnt_{want,drop}_write()
there.
Tested-by: Amir Goldstein <amir73il@gmail.com>
Reported-by: syzbot+2a73a6ea9507b7112141@syzkaller.appspotmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
The callers of cgroup_migrate_prepare_dst() correctly call
cgroup_migrate_finish() for success and failure cases both. No need to
call it in cgroup_migrate_prepare_dst() in failure case.
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The only users that calls syscall_get_arguments() with a variable and not a
hard coded '6' is ftrace_syscall_enter(). syscall_get_arguments() can be
optimized by removing a variable input, and always grabbing 6 arguments
regardless of what the system call actually uses.
Change ftrace_syscall_enter() to pass the 6 args into a local stack array
and copy the necessary arguments into the trace event as needed.
This is needed to remove two parameters from syscall_get_arguments().
Link: http://lkml.kernel.org/r/20161107213233.627583542@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|