Age | Commit message (Collapse) | Author |
|
Store the owner CPU number in the lock word so it may be yielded to,
as powerpc's paravirtualised simple spinlocks do.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-7-npiggin@gmail.com
|
|
Give the queue head the ability to stop stealers. After a number of
spins without successfully acquiring the lock, the queue head sets
this, which halts stealing and will assure it is the next owner.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-6-npiggin@gmail.com
|
|
Allow new waiters to "steal" the lock before queueing. That is, to
acquire it while other CPUs have queued.
This particularly helps paravirt performance when physical CPUs are
oversubscribed, by keeping the lock from becoming a strict FIFO and
vCPU preemption causing queue train wrecks.
The new __queued_spin_trylock_steal() function is put in qspinlock.h
to save having to move it, because it will be used there by a later
change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-5-npiggin@gmail.com
|
|
This uses more optimal ll/sc style access patterns (rather than
cmpxchg), and also sets the EH=1 lock hint on those operations
which acquire ownership of the lock.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-4-npiggin@gmail.com
|
|
The first 16 bits of the lock are only modified by the owner, and other
modifications always use atomic operations on the entire 32 bits, so
unlocks can use plain stores on the 16 bits. This is the same kind of
optimisation done by core qspinlock code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-3-npiggin@gmail.com
|
|
This forms the basis of the qspinlock slow path.
Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock
owner does not participate in the queue, only waiters. The first waiter
spins on the lock word, then when the lock is released it takes
ownership and unqueues the next waiter. This is how qspinlocks can be
implemented with the spinlock API -- lock owners don't need a node, only
waiters do.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221126095932.1234527-2-npiggin@gmail.com
|
|
Add a powerpc specific implementation of queued spinlocks. This is the
build framework with a very simple (non-queued) spinlock implementation
to begin with. Later changes add queueing, and other features and
optimisations one-at-a-time. It is done this way to more easily see how
the queued spinlocks are built, and to make performance and correctness
bisects more useful.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop paravirt.h & processor.h changes to fix 32-bit build]
[mpe: Fix 32-bit build of qspinlock.o & disallow GENERIC_LOCKBREAK per Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/CONLLQB6DCJU.2ZPOS7T6S5GRR@bobo
|
|
NUMA hinting no longer uses savedwrite, let's rip it out.
... and while at it, drop __pte_write() and __pmd_write() on ppc64.
Link: https://lkml.kernel.org/r/20221108174652.198904-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Adds a local TLB flush operation that works given an mm_struct, VA to
flush, and page size representation. Most implementations mirror the
surrounding code. The book3s/32/tlbflush.h implementation is left as
a BUILD_BUG because it is more complicated and not required for
anything as yet.
This removes the need to create a vm_area_struct, which the temporary
patching mm work does not need.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-8-bgray@linux.ibm.com
|
|
These functions were introduced for "cxl: Enable global TLBIs for cxl
contexts" [1], which ended up using them for Radix only. They were never
implemented on Hash (and creating an implementation appears to be
difficult), so nothing can actually rely on them.
They behave differently to the existing surrounding functions too, in
that they actually need to do something on Hash. The other functions
are primarily for use in generic code that expects their definitions,
but Hash updates the TLB during PTE updates.
After replacing the only usage with the Radix specific version, there
are no more users of these functions, and given they are not implemented
anyway it is safe to delete them.
[1]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20170903181513.29635-1-fbarrat@linux.vnet.ibm.com/
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-7-bgray@linux.ibm.com
|
|
The generic implementation of this function isn't really generic (Hash
is not implemented). Unfortunately, the runtime warnings cannot be
replaced with BUILD_BUG's, so it seems safer not to provide a stub in
the first place.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-6-bgray@linux.ibm.com
|
|
The empty hash__* functions are unnecessary. The empty definitions were
introduced when 64-bit Hash support was added, as the functions were
still used in generic code. These empty definitions were prefixed with
hash__ when Radix support was added, and new wrappers with the original
names were added that selected the Radix or Hash version based on
radix_enabled().
But the hash__ prefixed functions were not part of a public interface,
so there is no need to include them for compatibility with anything.
Generic code will use the non-prefixed wrappers, and Hash specific code
will know that there is no point in calling them (or even worse, call
them and expect them to do something).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-5-bgray@linux.ibm.com
|
|
breakpoint state
For the coming temporary mm used for instruction patching, the
breakpoint registers need to be cleared to prevent them from
accidentally being triggered. As soon as the patching is done, the
breakpoints will be restored.
The breakpoint state is stored in the per-cpu variable current_brk[].
Add a suspend_breakpoints() function which will clear the breakpoint
registers without touching the state in current_brk[]. Add a pair
function restore_breakpoints() which will move the state in
current_brk[] back to the registers.
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221109045112.187069-2-bgray@linux.ibm.com
|
|
ps3_system_bus_type is only used inside of system-bus.c, so remove
the external declaration and the very outdated comment next to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221122072225.423432-1-hch@lst.de
|
|
Merge our fixes branch to bring in some changes that are prerequisites
for work in next.
|
|
Merge our KVM topic branch.
|
|
32-bit does not trace_irqs_off() to match the trace_irqs_on() call in
kvmppc_fix_ee_before_entry(). This can lead to irqs being enabled twice
in the trace, and the irqs-off region between guest exit and the host
enabling local irqs again is not properly traced.
64-bit code does call this, but from asm code where volatiles are live
and so incorrectly get clobbered.
Move the irq reconcile into C to fix both problems.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-2-npiggin@gmail.com
|
|
There's no declaration for machine_check_early_boot(), which leads to a
build failure with W=1. Add one.
Fixes: 2f5182cffa43 ("powerpc/64s: early boot machine check handler")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221125132521.2167039-1-mpe@ellerman.id.au
|
|
PAPR Spec defines H_P1 actually as H_PARAMETER and maps H_ABORTED to
a different numerical value.
Fix the error codes as per PAPR Specification.
Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-3-nayna@linux.ibm.com
|
|
cputime_t was a core kernel type, removed by commits
ed5c8c854f2b..b672592f0221. As explained in commit b672592f0221
("sched/cputime: Remove generic asm headers"), the final cleanup
is for the arch to provide cputime_to_nsec[s](). Commit ade7667a981b
("powerpc: Add cputime_to_nsecs()") did that, but justdidn't remove
the then-unused cputime_to_usecs(), cputime_t type, and associated
remnants.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006105653.115829-1-npiggin@gmail.com
|
|
At the time being, with 16k pages __set_pte_at() writes table entries
in reverse order:
294: 91 49 00 0c stw r10,12(r9)
298: 91 49 00 08 stw r10,8(r9)
29c: 91 49 00 04 stw r10,4(r9)
2a0: 91 49 00 00 stw r10,0(r9)
Allthough there should be no impact at all as it stays in a single
cacheline, reverse the writing in a more natural order.
288: 91 49 00 0c stw r10,0(r9)
28c: 91 49 00 08 stw r10,4(r9)
290: 91 49 00 04 stw r10,8(r9)
294: 91 49 00 00 stw r10,12(r9)
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/67c3b5d44edfec054234ea9b4d05fc4b4f7f8a0e.1664346554.git.christophe.leroy@csgroup.eu
|
|
While looking at code generated for code patching, I saw that
pte_clear generated:
2d8: 38 a0 00 00 li r5,0
2dc: 38 e0 10 00 li r7,4096
2e0: 39 00 20 00 li r8,8192
2e4: 39 40 30 00 li r10,12288
2e8: 90 a9 00 00 stw r5,0(r9)
2ec: 90 e9 00 04 stw r7,4(r9)
2f0: 91 09 00 08 stw r8,8(r9)
2f4: 91 49 00 0c stw r10,12(r9)
With 16k pages, only the first entry is used by the kernel, so no need
to adapt the address of other entries. Only duplicate the first entry
for hardware.
Now it is:
2cc: 39 40 00 00 li r10,0
2d0: 91 49 00 00 stw r10,0(r9)
2d4: 91 49 00 04 stw r10,4(r9)
2d8: 91 49 00 08 stw r10,8(r9)
2dc: 91 49 00 0c stw r10,12(r9)
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/65f76300de07091a59a042a3db2d0ce9b939a05c.1664346532.git.christophe.leroy@csgroup.eu
|
|
Commit 57dc0eed73ca ("KVM: PPC: Book3S HV P9: Implement PMU save/restore
in C") removed the PMU save/restore functions from assembly code and
implemented these functions in C, for power9 and later platforms.
After the code refactoring, Performance Monitoring Unit (PMU) registers
became part of "p9_host_os_sprs" structure and now this structure is
used to save/restore pmu host registers, for power9 and later platfroms.
But we still have old unused registers references. Patch removes unused
host_mmcr references for Monitor Mode Control Register 3 (MMCR3)/
Sampled Instruction Event Register 2 (SIER2)/ SIER3 registers from
"struct kvmppc_host_state".
Fixes: 57dc0eed73ca ("KVM: PPC: Book3S HV P9: Implement PMU save/restore in C")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220916105736.268153-3-disgoel@linux.vnet.ibm.com
|
|
ISA v2.06 (POWER7 and up) as well as e6500 support lbarx and lharx.
Add a compile option that allows code to use it, and add support in
cmpxchg and xchg 8 and 16 bit values without shifting and masking.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220909052312.63916-1-npiggin@gmail.com
|
|
Extend commit b39181f7c6907d ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to
avoid adding weak function") to ppc32 and ppc64 -mprofile-kernel by
defining FTRACE_MCOUNT_MAX_OFFSET.
For ppc64 -mprofile-kernel ABI, we can have two instructions at function
entry for TOC setup followed by 'mflr r0' and 'bl _mcount'. So, the
mcount location is at most the 4th instruction in a function. For ppc32,
mcount location is always the 3rd instruction in a function, preceded by
'mflr r0' and 'stw r0,4(r1)'.
With this patch, and with ppc64le_guest_defconfig and some ftrace/bpf
config items enabled:
# grep __ftrace_invalid_address available_filter_functions | wc -l
79
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220809105425.424045-1-naveen.n.rao@linux.vnet.ibm.com
|
|
Last usage of find_current_mm_pte() was removed by
commit 15759cb054ef ("powerpc/perf/callchain: Use
__get_user_pages_fast in read_user_stack_slow")
Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ec79f462a3bfa8365b7df505e574d5d85246bc68.1646818177.git.christophe.leroy@csgroup.eu
|
|
In subsequent patches we'll arrange for architectures to have an
ftrace_regs which is entirely distinct from pt_regs. In preparation for
this, we need to minimize the use of pt_regs to where strictly necessary
in the core ftrace code.
This patch adds new ftrace_regs_{get,set}_*() helpers which can be used
to manipulate ftrace_regs. When CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=y,
these can always be used on any ftrace_regs, and when
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS=n these can be used when regs are
available. A new ftrace_regs_has_args(fregs) helper is added which code
can use to check when these are usable.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
ftrace_regs_set_instruction_pointer()
In subsequent patches we'll add a sew of ftrace_regs_{get,set}_*()
helpers. In preparation, this patch renames
ftrace_instruction_pointer_set() to
ftrace_regs_set_instruction_pointer().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This patch adds [stub] implementations for required functions, inorder
to enable objtool build on powerpc.
[Christophe Leroy: powerpc: Add missing asm/asm.h for objtool,
Use local variables for type and imm in arch_decode_instruction(),
Adapt len for prefixed instructions.]
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-16-sv@linux.ibm.com
|
|
The RNG always mixes in the Linux version extremely early in boot. It
also always includes a cycle counter, not only during early boot, but
each and every time it is invoked prior to being fully initialized.
Together, this means that the use of additional xors inside of the
various stackprotector.h files is superfluous and over-complicated.
Instead, we can get exactly the same thing, but better, by just calling
`get_random_canary()`.
Acked-by: Guo Ren <guoren@kernel.org> # for csky
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
In a subsequent patch, we would want to annotate powerpc assembly functions
with SYM_FUNC_START_LOCAL macro. This macro depends on __ALIGN macro.
The default expansion of __ALIGN macro is:
#define __ALIGN .align 4,0x90
So, override __ALIGN and __ALIGN_STR macros to use the same alignment as
that of the existing _GLOBAL macro. Also, do not pad with 0x90, because
repeated 0x90s are not a nop or trap on powerpc.
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-3-sv@linux.ibm.com
|
|
Commit 1e688dd2a3d675 ("powerpc/bug: Provide better flexibility to
WARN_ON/__WARN_FLAGS() with asm goto") updated __WARN_FLAGS() to use asm
goto, and added a call to 'unreachable()' after the asm goto for optimal
code generation. With CONFIG_OBJTOOL enabled, 'annotate_unreachable()'
statement in 'unreachable()' tries to note down the location of the
subsequent instruction in a separate elf section to aid code flow
analysis. However, on powerpc, this results in gcc emitting a call to a
symbol of size 0. This results in objtool complaining of "unannotated
intra-function call" since the target symbol is not a valid function
call destination.
Objtool wants this annotation for code flow analysis, which we are not
yet enabling on powerpc. As such, expand the call to 'unreachable()' in
__WARN_FLAGS() without annotate_unreachable():
barrier_before_unreachable();
__builtin_unreachable();
This still results in optimal code generation for __WARN_FLAGS(), while
getting rid of the objtool warning.
We still need barrier_before_unreachable() to work around gcc bugs 82365
and 106751:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106751
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221114175754.1131267-2-sv@linux.ibm.com
|
|
Most architectures (except arm64/x86/sparc) simply return 1 for
kern_addr_valid(), which is only used in read_kcore(), and it calls
copy_from_kernel_nofault() which could check whether the address is a
valid kernel address. So as there is no need for kern_addr_valid(), let's
remove it.
Link: https://lkml.kernel.org/r/20221018074014.185687-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Xuerui Wang <kernel@xen0n.name>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.
Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Fixes: e23750623835 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
|
|
apply_to_page_range on kernel pages does not disable preemption, which
is a requirement for hash's lazy mmu mode, which keeps track of the
TLBs to flush with a per-cpu array.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221013151647.1857994-1-npiggin@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix 32-bit syscall wrappers with 64-bit arguments of unaligned
register-pairs. Notably this broke ftruncate64 & pread/write64, which
can lead to file corruption.
- Fix lost interrupts when returning to soft-masked context on 64-bit.
- Fix build failure when CONFIG_DTL=n.
Thanks to Nicholas Piggin, Jason A. Donenfeld, Guenter Roeck, Arnd
Bergmann, and Sachin Sant.
* tag 'powerpc-6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix CONFIG_DTL=n build
powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
|
|
register-pairs
powerpc 32-bit system call (and function) calling convention for 64-bit
arguments requires the next available odd-pair (two sequential registers
with the first being odd-numbered) from the standard register argument
allocation.
The first argument register is r3, so a 64-bit argument that appears at
an even position in the argument list must skip a register (unless there
were preceding 64-bit arguments, which might throw things off). This
requires non-standard compat definitions to deal with the holes in the
argument register allocation.
With pt_regs syscall wrappers which use a standard mapper to map pt_regs
GPRs to function arguments, 32-bit kernels hit the same basic problem,
the standard definitions don't cope with the unused argument registers.
Fix this by having 32-bit kernels share those syscall definitions with
compat.
Thanks to Jason for spending a lot of time finding and bisecting this
and developing a trivial reproducer. The perfect bug report.
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Fixes: 7e92e01b72452 ("powerpc: Provide syscall wrapper")
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221012035335.866440-1-npiggin@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Remove our now never-true definitions for pgd_huge() and p4d_leaf().
- Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit.
- Add support for syscall wrappers.
- Add support for KFENCE on 64-bit.
- Update 64-bit HV KVM to use the new guest state entry/exit accounting
API.
- Support execute-only memory when using the Radix MMU (P9 or later).
- Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests.
- Updates to our linker script to move more data into read-only
sections.
- Allow the VDSO to be randomised on 32-bit.
- Many other small features and fixes.
Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Christophe Leroy, David Hildenbrand, Disha Goel, Fabiano Rosas,
Gaosheng Cui, Gustavo A. R. Silva, Haren Myneni, Hari Bathini, Jilin
Yuan, Joel Stanley, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent
Dufour, Liang He, Li Huafei, Lukas Bulwahn, Madhavan Srinivasan, Nathan
Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Pali
Rohár, Rohan McLure, Russell Currey, Sachin Sant, Segher Boessenkool,
Shrikanth Hegde, Tyrel Datwyler, Wolfram Sang, ye xingchen, and Zheng
Yongjun.
* tag 'powerpc-6.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
KVM: PPC: Book3S HV: Fix stack frame regs marker
powerpc: Don't add __powerpc_ prefix to syscall entry points
powerpc/64s/interrupt: Fix stack frame regs marker
powerpc/64: Fix msr_check_and_set/clear MSR[EE] race
powerpc/64s/interrupt: Change must-hard-mask interrupt check from BUG to WARN
powerpc/pseries: Add firmware details to the hardware description
powerpc/powernv: Add opal details to the hardware description
powerpc: Add device-tree model to the hardware description
powerpc/64: Add logical PVR to the hardware description
powerpc: Add PVR & CPU name to hardware description
powerpc: Add hardware description string
powerpc/configs: Enable PPC_UV in powernv_defconfig
powerpc/configs: Update config files for removed/renamed symbols
powerpc/mm: Fix UBSAN warning reported on hugetlb
powerpc/mm: Always update max/min_low_pfn in mem_topology_setup()
powerpc/mm/book3s/hash: Rename flush_tlb_pmd_range
powerpc: Drops STABS_DEBUG from linker scripts
powerpc/64s: Remove lost/old comment
powerpc/64s: Remove old STAB comment
powerpc: remove orphan systbl_chk.sh
...
|
|
When using syscall wrappers the __SYSCALL_DEFINEx() and related macros
add a "__powerpc_" prefix to all syscall entry points.
So for example sys_mmap becomes __powerpc_sys_mmap.
This risks breaking workflows and tools that expect the old naming
scheme. At a minimum setting a breakpoint on eg. sys_mmap with gdb no
longer works.
There seems to be no compelling reason to add the "__powerpc_" prefix,
other than that it follows what some other arches do (x86, arm64, s390).
But unlike other arches powerpc doesn't always enable syscall wrappers,
so the syscall entry points can change name depending on CONFIG options.
For those reasons drop the "__powerpc_" prefix, reverting to the
existing naming.
Doing so reveals two prototypes in signal.h that have the incorrect type
when syscall wrappers are enabled. There are already prototypes for both
functions in syscalls.h, so drop the ones from signal.h.
Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221006135940.1223988-1-mpe@ellerman.id.au
|
|
irq soft-masking means that when Linux irqs are disabled, the MSR[EE]
value can change from 1 to 0 asynchronously: if a masked interrupt of
the PACA_IRQ_MUST_HARD_MASK variety fires while irqs are disabled,
the masked handler will return with MSR[EE]=0.
This means a sequence like mtmsr(mfmsr() | MSR_FP) is racy if it can
be called with local irqs disabled, unless a hard_irq_disable has been
done.
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221004051157.308999-2-npiggin@gmail.com
|
|
Create a hardware description string, which we will use to record
various details of the hardware platform we are running on.
Print the accumulated description at boot, and use it to set the generic
description which is printed in oopses.
To begin with add ppc_md.name, aka the "machine description".
Example output at boot with the full series applied:
Linux version 6.0.0-rc2-gcc-11.1.0-00199-g893f9007a5ce-dirty (michael@alpine1-p1) (powerpc64-linux-gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #844 SMP Thu Sep 29 22:29:53 AEST 2022
Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-5b4c5a pSeries
printk: bootconsole [udbg0] enabled
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930082709.55830-1-mpe@ellerman.id.au
|
|
This function does the hash page table update. Hence rename it to
indicate this better to avoid confusion with flush_pmd_tlb_range()
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Drop unnecessary extern]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220907081941.209501-1-aneesh.kumar@linux.ibm.com
|
|
The data storage interrupt (DSI) error will be generated when the
paste operation is issued on the suspended Nest Accelerator (NX)
window due to NX state changes. The hypervisor expects the
partition to ignore this error during page fault handling.
To differentiate DSI caused by an actual HW configuration or by
the NX window, a new “ibm,pi-features” type value is defined.
Byte 0, bit 3 of pi-attribute-specifier-type is now defined to
indicate this DSI error. If this error is not ignored, the user
space can get SIGBUS when the NX request is issued.
This patch adds changes to read ibm,pi-features property and ignore
DSI error during page fault handling if MMU_FTR_NX_DSI is defined.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
[mpe: Mention PAPR version in comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b9cd844b85eb8f70459109ce1b14e44c4cc85fa7.camel@linux.ibm.com
|
|
On little endian the stack frame marker appears reversed when dumping
memory sequentially, as is typical in xmon or gdb, eg:
c000000004733e40 0000000000000000 0000000000000000 |................|
c000000004733e50 0000000000000000 0000000000000000 |................|
c000000004733e60 0000000000000000 0000000000000000 |................|
c000000004733e70 5347455200000000 0000000000000000 |SGER............|
c000000004733e80 a700000000000000 708897f7ff7f0000 |........p.......|
c000000004733e90 0073428fff7f0000 208997f7ff7f0000 |.sB..... .......|
c000000004733ea0 0100000000000000 ffffffffffffffff |................|
c000000004733eb0 0000000000000000 0000000000000000 |................|
To make it easier to recognise, reverse the value on little endian, so
it always appears as "REGS", eg:
c000000004733e70 5245475300000000 0000000000000000 |REGS............|
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220927150419.1503001-2-mpe@ellerman.id.au
|
|
Now that the stack frame regs marker is only 32-bits it is not as
obvious in memory dumps and easier to miss, eg:
c000000004733e40 0000000000000000 0000000000000000 |................|
c000000004733e50 0000000000000000 0000000000000000 |................|
c000000004733e60 0000000000000000 0000000000000000 |................|
c000000004733e70 7367657200000000 0000000000000000 |sger............|
c000000004733e80 a700000000000000 708897f7ff7f0000 |........p.......|
c000000004733e90 0073428fff7f0000 208997f7ff7f0000 |.sB..... .......|
c000000004733ea0 0100000000000000 ffffffffffffffff |................|
c000000004733eb0 0000000000000000 0000000000000000 |................|
So make it upper case to make it stand out a bit more:
c000000004733e70 5347455200000000 0000000000000000 |SGER............|
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220927150419.1503001-1-mpe@ellerman.id.au
|
|
'extern' keyword is pointless and deprecated for function prototypes.
Signed-off-by: Pali Rohár <pali@kernel.org>
Suggested-by: Gabriel Paubert <paubert@iram.es>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220822231751.16973-1-pali@kernel.org
|
|
Use the early boot interrupt fixup in the machine check handler to allow
the machine check handler to run before interrupt endian is set up.
Branch to an early boot handler that just does a basic crash, which
allows it to run before ppc_md is set up. MSR[ME] is enabled on the boot
CPU earlier, and the machine check stack is temporarily set to the
middle of the init task stack.
This allows machine checks (e.g., due to invalid data access in real
mode) to print something useful earlier in boot (as soon as udbg is set
up, if CONFIG_PPC_EARLY_DEBUG=y).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926055620.2676869-3-npiggin@gmail.com
|
|
The interrupt entry code carefully saves a minimal number of registers,
so in some places the TOC is required, it is loaded into a different
register, so provide a macro that can supply an alternate TOC register.
This continues to use got addressing because TOC-relative results in
"got/toc optimization is not supported" messages by the linker. Having
r2 be one of the saved registers and using that for TOC addressing may
be the best way to avoid that and switch this to TOC addressing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-6-npiggin@gmail.com
|
|
A later change stops the kernel using r2 and loads it with a poison
value. Provide a PACATOC loading abstraction which can hide this
detail.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220926034057.2360083-5-npiggin@gmail.com
|