Age | Commit message (Collapse) | Author |
|
The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to
describe this configuration option.
Update both to actually describe what this option does.
Link: https://lkml.kernel.org/r/20230324052233.2654090-10-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rich Felker <dalias@libc.org>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- A fix for NUMA distance handling in the pseries SCM (pmem) driver.
Thanks to Aneesh Kumar K.V.
* tag 'powerpc-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Update the NUMA distance table for the target node
|
|
In preparation for improving objtool's handling of weak noreturn
functions, mark panic_smp_self_stop() __noreturn.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/92d76ab5c8bf660f04fdcd3da1084519212de248.1681342859.git.jpoimboe@kernel.org
|
|
Use the preferred form of branch-and-link for finding the current
address so objtool doesn't think it is an unannotated intra-function
call.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230407040924.231023-1-npiggin@gmail.com
|
|
When building with W=1 after commit 80b6093b55e3 ("kbuild: add -Wundef
to KBUILD_CPPFLAGS for W=1 builds"), the following warning occurs.
In file included from arch/powerpc/kvm/bookehv_interrupts.S:26:
arch/powerpc/kvm/../kernel/head_booke.h:20:6: warning: "THREAD_SHIFT" is not defined, evaluates to 0 [-Wundef]
20 | #if (THREAD_SHIFT < 15)
| ^~~~~~~~~~~~
THREAD_SHIFT is defined in thread_info.h but it is not directly included
in head_booke.h, so it is possible for THREAD_SHIFT to be undefined. Add
the include to ensure that THREAD_SHIFT is always defined.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/202304050954.yskLdczH-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406-wundef-thread_shift_booke-v1-1-8deffa4d84f9@kernel.org
|
|
syscalls do not set the PPR field in their interrupt frame and
return from syscall always sets the default PPR for userspace,
so setting the value in the ret_from_fork frame is not necessary
and mildly inconsistent. Remove it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-9-npiggin@gmail.com
|
|
In the kernel user thread path, don't set _TIF_RESTOREALL because
the thread is required to call kernel_execve() before it returns,
which will set _TIF_RESTOREALL if necessary via start_thread().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-8-npiggin@gmail.com
|
|
Kernel created user threads start similarly to kernel threads in that
they call a kernel function after first returning from _switch, so
they share ret_from_kernel_thread for this. Kernel threads never return
from that function though, whereas user threads often do (although some
don't, e.g., IO threads).
Split these startup functions in two, and catch kernel threads that
improperly return from their function. This is intended to make the
complicated code a little bit easier to understand.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-7-npiggin@gmail.com
|
|
When copy_thread is given a kernel function to run in arg->fn, this
does not necessarily mean it is a kernel thread. User threads can be
created this way (e.g., kernel_init, see also x86's copy_thread()).
These threads run a kernel function which may call kernel_execve()
and return, which returns like a userspace exec(2) syscall.
Kernel threads are to be differentiated with PF_KTHREAD, will always
have arg->fn set, and should never return from that function, instead
calling kthread_exit() to exit.
Create separate paths for the kthread and user kernel thread creation
logic. The kthread path will never exit and does not require a user
interrupt frame, so it gets a minimal stack frame.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-6-npiggin@gmail.com
|
|
If the system call return path always restores NVGPRs then there is no
need for ret_from_fork to do it. The HANDLER_RESTORE_NVGPRS does the
right thing for this.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-5-npiggin@gmail.com
|
|
The kernel thread path in copy_thread creates a user interrupt frame on
stack and stores the function and arg parameters there, and
ret_from_kernel_thread loads them. This is a slightly confusing way to
overload that frame. Non-volatile registers are loaded from the switch
frame, so the parameters can be stored there. The user interrupt frame
is now only used by user threads when they return to user.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-4-npiggin@gmail.com
|
|
The ret_from_fork code for 64e and 32-bit set r3 for
syscall_exit_prepare the same way that 64s does, so there should
be no need to special-case them in copy_thread.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-3-npiggin@gmail.com
|
|
The pkey registers (AMR, IAMR) do not get loaded from the switch frame
so it is pointless to save anything there. Remove the dead code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230325122904.2375060-2-npiggin@gmail.com
|
|
The amdgpu driver builds some of its code with hard-float enabled,
whereas the rest of the kernel is built with soft-float.
When building with 64-bit long double, if soft-float and hard-float
objects are linked together, the build fails due to incompatible ABI
tags.
In the past there have been build errors in the amdgpu driver caused by
this, some of those were due to bad intermingling of soft & hard-float
code, but those issues have now all been fixed since commit 58ddbecb14c7
("drm/amd/display: move remaining FPU code to dml folder").
However it's still possible for soft & hard-float objects to end up
linked together, if the amdgpu driver is built-in to the kernel along
with the test_emulate_step.c code, which uses soft-float. That happens
in an allyesconfig build.
Currently those build errors are avoided because the amdgpu driver is
gated on 128-bit long double being enabled. But that's not a detail the
amdgpu driver should need to be aware of, and if another driver starts
using hard-float the same problem would occur.
All versions of the 64-bit ABI specify that long-double is 128-bits.
However some compilers, notably the kernel.org ones, are built to use
64-bit long double by default.
Apart from this issue of soft vs hard-float, the kernel doesn't care
what size long double is. In particular the kernel using 128-bit long
double doesn't impact userspace's ability to use 64-bit long double, as
musl does.
So always build the 64-bit kernel with 128-bit long double. That should
avoid any build errors due to the incompatible ABI tags. Excluding the
code that uses soft/hard-float, the vmlinux is identical with/without
the flag.
It does mean any code which is incorrectly intermingling soft &
hard-float code will build without error, so those bugs will need to be
caught by testing rather than at build time.
For more background see:
- commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
- commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc")
- https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://msgid.link/20230404102847.3303623-1-mpe@ellerman.id.au
|
|
We need the USB fixes in here for testing, and this resolves two merge
conflicts, one pointed out by linux-next:
drivers/usb/dwc3/dwc3-pci.c
drivers/usb/host/xhci-pci.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
We need it here to apply other char/misc driver changes to.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
As for now all arches have dma_default_coherent reflecting default
DMA coherency for of devices, so there is no need to have a standalone
config option.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Conflicts:
drivers/net/ethernet/google/gve/gve.h
3ce934558097 ("gve: Secure enough bytes in the first TX desc for all TCP pkts")
75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
https://lore.kernel.org/all/20230406104927.45d176f5@canb.auug.org.au/
https://lore.kernel.org/all/c5872985-1a95-0bc8-9dcc-b6f23b439e9d@tessares.net/
Adjacent changes:
net/can/isotp.c
051737439eae ("can: isotp: fix race between isotp_sendsmg() and isotp_release()")
96d1c81e6a04 ("can: isotp: add module parameter for maximum pdu size")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails. Copied from "x86/mm: try
VMA lock-based page fault handling first"
[ldufour@linux.ibm.com: powerpc/mm: fix mmap_lock bad unlock]
Link: https://lkml.kernel.org/r/20230306154244.17560-1-ldufour@linux.ibm.com
Link: https://lore.kernel.org/linux-mm/842502FB-F99C-417C-9648-A37D0ECDC9CE@linux.ibm.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-32-surenb@google.com
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definition is counter-intuitive and lead to number of bugs all over
the kernel.
Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.
[kirill@shutemov.name: fix min() warning]
Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.
The following architectures do not select HAS_IOPORT:
* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa
All other architectures select HAS_IOPORT at least conditionally.
The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.
Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Now that the SRCU Kconfig option is unconditionally selected, there is
no longer any point in selecting it. Therefore, remove the "select SRCU"
Kconfig statements from the various KVM Kconfig files.
Acked-by: Sean Christopherson <seanjc@google.com> (x86)
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <kvm@vger.kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org> (arm64)
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Anup Patel <anup@brainfault.org> (riscv)
Acked-by: Heiko Carstens <hca@linux.ibm.com> (s390)
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"PPC:
- Hide KVM_CAP_IRQFD_RESAMPLE if XIVE is enabled
s390:
- Fix handling of external interrupts in protected guests
x86:
- Resample the pending state of IOAPIC interrupts when unmasking them
- Fix usage of Hyper-V "enlightened TLB" on AMD
- Small fixes to real mode exceptions
- Suppress pending MMIO write exits if emulator detects exception
Documentation:
- Fix rST syntax"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
docs: kvm: x86: Fix broken field list
KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent
KVM: s390: pv: fix external interruption loop not always detected
KVM: nVMX: Do not report error code when synthesizing VM-Exit from Real Mode
KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
KVM: x86: Suppress pending MMIO write exits if emulator detects exception
KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking
KVM: irqfd: Make resampler_list an RCU list
KVM: SVM: Flush Hyper-V TLB when required
|
|
Instead of open-coding it everywhere introduce a tiny helper that can be
used to iterate over each resource of a PCI device, and convert the most
obvious users into it.
While at it drop doubled empty line before pdev_sort_resources().
No functional changes intended.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230330162434.35055-4-andriy.shevchenko@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
|
|
Remove arch_atomic_try_cmpxchg_lock function as it is no longer used
since commit 9f61521c7a28 ("powerpc/qspinlock: powerpc qspinlock
implementation")
Signed-off-by: Nysal Jan K.A <nysal@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230224103940.1328725-1-nysal@linux.ibm.com
|
|
Walks the stack when copy_{to,from}_user address is in the stack to
ensure that the object being copied is entirely a single stack frame and
does not contain stack metadata.
Substantially similar to the x86 implementation. The back chain is used
to traverse the stack and identify stack frame boundaries.
Signed-off-by: Nicholas Miehlbradt <nicholas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230228054355.300628-1-nicholas@linux.ibm.com
|
|
Replace open coded reading of "reg" or of_get_address()/
of_translate_address() calls with a single call to
of_address_to_resource().
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230329220337.141295-1-robh@kernel.org
|
|
Replace of_get_property()+of_translate_address()+ioremap() with a call
to of_iomap() which does all those steps.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230327223109.820381-1-robh@kernel.org
|
|
Replace of_address_to_resource()+ioremap() with a call to of_iomap()
which does both of those steps.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230327223103.820229-1-robh@kernel.org
|
|
icp_native_init_one_node() only needs the number of entries in "reg".
Replace the open coded "reg" parsing with of_address_count() to get the
number of "reg" entries.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230327223056.820086-1-robh@kernel.org
|
|
"ranges" is a standard property with common parsing functions. Users
shouldn't be implementing their own parsing of it. Reimplement the
ISA brige "ranges" parsing using the common ranges iterator functions.
The common routines are flexible enough to work on PCI and non-PCI to
ISA bridges, so refactor pci_process_ISA_OF_ranges() and
isa_bridge_init_non_pci() into a single implementation.
Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Unsplit some strings and use pr_xxx()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230327223045.819852-1-robh@kernel.org
|
|
Merge our KVM topic branch to bring some KVM commits into next for wider
testing.
|
|
Platform device helper routines won't update the NUMA distance table
while creating a platform device, even if the device is present on a
NUMA node that doesn't have memory or CPU. This is especially true for
pmem devices. If the target node of the pmem device is not online, we
find the nearest online node to the device and associate the pmem device
with that online node. To find the nearest online node, we should have
the numa distance table updated correctly. Update the distance
information during the device probe.
For a papr scm device on NUMA node 3 distance_lookup_table value for
distance_ref_points_depth = 2 before and after fix is below:
Before fix:
node 3 distance depth 0 - 0
node 3 distance depth 1 - 0
node 4 distance depth 0 - 4
node 4 distance depth 1 - 2
node 5 distance depth 0 - 5
node 5 distance depth 1 - 1
After fix
node 3 distance depth 0 - 3
node 3 distance depth 1 - 1
node 4 distance depth 0 - 4
node 4 distance depth 1 - 2
node 5 distance depth 0 - 5
node 5 distance depth 1 - 1
Without the fix, the nearest numa node to the pmem device (NUMA node 3)
will be picked as 4. After the fix, we get the correct numa node which
is 5.
Fixes: da1115fdbd6e ("powerpc/nvdimm: Pick nearby online node if the device node is not online")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230404041433.1781804-1-aneesh.kumar@linux.ibm.com
|
|
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now that we can read prefixed instructions from a HV KVM guest and
emulate prefixed load/store instructions to emulated MMIO locations,
we can add HFSCR_PREFIXED into the set of bits that are set in the
HFSCR for a HV KVM guest on POWER10, allowing the guest to use
prefixed instructions.
PR KVM has not yet been extended to handle prefixed instructions in
all situations where we might need to emulate them, so prevent the
guest from enabling prefixed instructions in the FSCR for now.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ZAgs25dCmLrVkBdU@cleo
|
|
In order to handle emulation of prefixed instructions in the guest,
this first makes vcpu->arch.last_inst be an unsigned long, i.e. 64
bits on 64-bit platforms. For prefixed instructions, the upper 32
bits are used for the prefix and the lower 32 bits for the suffix, and
both halves are byte-swapped if the guest endianness differs from the
host.
Next, vcpu->arch.emul_inst is now 64 bits wide, to match the HEIR
register on POWER10. Like HEIR, for a prefixed instruction it is
defined to have the prefix is in the top 32 bits and the suffix in the
bottom 32 bits, with both halves in the correct byte order.
kvmppc_get_last_inst is extended on 64-bit machines to put the prefix
and suffix in the right places in the ppc_inst_t being returned.
kvmppc_load_last_inst now returns the instruction in an unsigned long
in the same format as vcpu->arch.last_inst. It makes the decision
about whether to fetch a suffix based on the SRR1_PREFIXED bit in the
MSR image stored in the vcpu struct, which generally comes from SRR1
or HSRR1 on an interrupt. This bit is defined in Power ISA v3.1B to
be set if the interrupt occurred due to a prefixed instruction and
cleared otherwise for all interrupts except for instruction storage
interrupt, which does not come to the hypervisor. It is set to zero
for asynchronous interrupts such as external interrupts. In previous
ISA versions it was always set to 0 for all interrupts except
instruction storage interrupt.
The code in book3s_hv_rmhandlers.S that loads the faulting instruction
on a HDSI is only used on POWER8 and therefore doesn't ever need to
load a suffix.
[npiggin@gmail.com - check that the is-prefixed bit in SRR1 matches the
type of instruction that was fetched.]
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ZAgsq9h1CCzouQuV@cleo
|
|
This changes kvmppc_get_last_inst() so that the instruction it fetches
is returned in a ppc_inst_t variable rather than a u32. This will
allow us to return a 64-bit prefixed instruction on those 64-bit
machines that implement Power ISA v3.1 or later, such as POWER10.
On 32-bit platforms, ppc_inst_t is 32 bits wide, and is turned back
into a u32 by ppc_inst_val, which is an identity operation on those
platforms.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/ZAgsiPlL9O7KnlZZ@cleo
|
|
Pass the hypervisor (H)SRR1[PREFIX] indication through to synchronous
interrupts injected into the guest.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230330103224.3589928-3-npiggin@gmail.com
|
|
The prefix architecture in ISA v3.1 introduces a prefixed bit in SRR1
for many types of synchronous interrupts which is set when the interrupt
is caused by a prefixed instruction.
This requires KVM to be able to set this bit when injecting interrupts
into a guest. Plumb through the SRR1 "flags" argument to the core_queue
APIs where it's missing for this. For now they are set to 0, which is
no change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup kvmppc_core_queue_alignment() in booke.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230330103224.3589928-2-npiggin@gmail.com
|
|
Fix various W=1 warnings in booke.c:
arch/powerpc/kvm/booke.c:1008:5: error: no previous prototype for ‘kvmppc_handle_exit’ [-Werror=missing-prototypes]
1008 | int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
| ^~~~~~~~~~~~~~~~~~
arch/powerpc/kvm/booke.c:1009: warning: Function parameter or member 'vcpu' not described in 'kvmppc_handle_exit'
arch/powerpc/kvm/booke.c:1009: warning: Function parameter or member 'exit_nr' not described in 'kvmppc_handle_exit'
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/202304020827.3LEZ86WB-lkp@intel.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230403045314.3095410-1-mpe@ellerman.id.au
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a false positive warning in __pte_needs_flush() (with DEBUG_VM=y)
- Fix oops when a PF_IO_WORKER thread tries to core dump
- Don't try to reconfigure VAS when it's disabled
Thanks to Benjamin Gray, Haren Myneni, Jens Axboe, Nathan Lynch, and
Russell Currey.
* tag 'powerpc-6.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries/vas: Ignore VAS update for DLPAR if copy/paste is not enabled
powerpc: Don't try to copy PPR for task with NULL pt_regs
powerpc/64s: Fix __pte_needs_flush() false positive warning
|
|
When introduced, IRQFD resampling worked on POWER8 with XICS. However
KVM on POWER9 has never implemented it - the compatibility mode code
("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native
XIVE mode does not handle INTx in KVM at all.
This moved the capability support advertising to platforms and stops
advertising it on XIVE, i.e. POWER9 and later.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20220504074807.3616813-1-aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Conflicts:
drivers/net/ethernet/mediatek/mtk_ppe.c
3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting")
924531326e2d ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties.
Convert reading boolean properties to of_property_read_bool().
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310144659.1541127-1-robh@kernel.org
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property/of_find_property functions for reading properties. As
part of this, convert of_get_property/of_find_property calls to the
recently added of_property_present() helper when we just want to test
for presence of a property and nothing more.
Signed-off-by: Rob Herring <robh@kernel.org>
[mpe: Drop change in ppc4xx_probe_pci_bridge(), formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230310144657.1541039-1-robh@kernel.org
|
|
Add lockdep annotations for the following properties that must hold:
* Any error log retrieval must be atomically coupled with the prior
RTAS call, without a window for another RTAS call to occur before the
error log can be retrieved.
* All users of the core rtas_args parameter block must hold rtas_lock.
Move the definitions of rtas_lock and rtas_args up in the file so that
__do_enter_rtas_trace() can refer to them.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-6-010e4416f13f@linux.ibm.com
|
|
The 'filter' member is a pointer, not a bool; fix the wording
accordingly.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-4-010e4416f13f@linux.ibm.com
|
|
Add documentation for rtas_call_unlocked(), including details on how
it differs from rtas_call().
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-3-010e4416f13f@linux.ibm.com
|
|
Using memcpy() isn't safe when buf is identical to rtas_err_buf, which
can happen during boot before slab is up. Full context which may not
be obvious from the diff:
if (altbuf) {
buf = altbuf;
} else {
buf = rtas_err_buf;
if (slab_is_available())
buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC);
}
if (buf)
memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);
This was found by inspection and I'm not aware of it causing problems
in practice. It appears to have been introduced by commit
033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel"); the
old ppc64 version of this code did not have this problem.
Use memmove() instead.
Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-2-010e4416f13f@linux.ibm.com
|
|
CHRP and PAPR agree: "In order to make an RTAS call, the operating
system must construct an argument call buffer aligned on an eight byte
boundary in physically contiguous real memory [...]." (7.2.7 Calling
Mechanism and Conventions).
struct rtas_args is the type used for this argument call buffer. The
unarchitected 'rets' member happens to produce 8-byte alignment for
the struct on 64-bit targets in practice. But without an alignment
directive the structure will have only 4-byte alignment on 32-bit
targets:
$ nm b/{before,after}/chrp32/vmlinux | grep rtas_args
c096881c b rtas_args
c0968820 b rtas_args
Add an alignment directive to the struct rtas_args declaration so all
instances have the alignment required by the specs. rtas-types.h no
longer refers to any spinlock types, so drop the spinlock_types.h
inclusion while we're here.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230220-rtas-queue-for-6-4-v1-1-010e4416f13f@linux.ibm.com
|