Age | Commit message (Collapse) | Author |
|
Make pte_swp_exclusive return bool instead of int. This will better
reflect how pte_swp_exclusive is actually used in the code.
This fixes swap/swapoff problems on Alpha due pte_swp_exclusive not
returning correct values when _PAGE_SWP_EXCLUSIVE bit resides in upper
32-bits of PTE (like on alpha).
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Cc: Sam James <sam@gentoo.org>
Link: https://lore.kernel.org/lkml/20250218175735.19882-2-linmag7@gmail.com/
Link: https://lore.kernel.org/lkml/20250602041118.GA2675383@ZenIV/
[ Applied as the 'sed' script Al suggested - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for the FWFT SBI extension, which is part of SBI 3.0 and a
dependency for many new SBI and ISA extensions
- Support for getrandom() in the VDSO
- Support for mseal
- Optimized routines for raid6 syndrome and recovery calculations
- kexec_file() supports loading Image-formatted kernel binaries
- Improvements to the instruction patching framework to allow for
atomic instruction patching, along with rules as to how systems need
to behave in order to function correctly
- Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha,
some SiFive vendor extensions
- Various fixes and cleanups, including: misaligned access handling,
perf symbol mangling, module loading, PUD THPs, and improved uaccess
routines
* tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (69 commits)
riscv: uaccess: Only restore the CSR_STATUS SUM bit
RISC-V: vDSO: Wire up getrandom() vDSO implementation
riscv: enable mseal sysmap for RV64
raid6: Add RISC-V SIMD syndrome and recovery calculations
riscv: mm: Add support for Svinval extension
RISC-V: Documentation: Add enough title underlines to CMODX
riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
MAINTAINERS: Update Atish's email address
riscv: uaccess: do not do misaligned accesses in get/put_user()
riscv: process: use unsigned int instead of unsigned long for put_user()
riscv: make unsafe user copy routines use existing assembly routines
riscv: hwprobe: export Zabha extension
riscv: Make regs_irqs_disabled() more clear
perf symbols: Ignore mapping symbols on riscv
RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
riscv: module: Optimize PLT/GOT entry counting
riscv: Add support for PUD THP
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
...
|
|
Clément Léger <cleger@rivosinc.com> says:
The SBI Firmware Feature extension allows the S-mode to request some
specific features (either hardware or software) to be enabled. This
series uses this extension to request misaligned access exception
delegation to S-mode in order to let the kernel handle it. It also adds
support for the KVM FWFT SBI extension based on the misaligned access
handling infrastructure.
FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be
tested using the qemu provided at [2] which contains the series from
[3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct
delegation of misaligned exceptions. Upstream OpenSBI can be used.
The tests can be run using the kselftest from series [4].
$ qemu-system-riscv64 \
-cpu rv64,trap-misaligned-access=true,v=true \
-M virt \
-m 1024M \
-bios fw_dynamic.bin \
-kernel Image
...
# ./misaligned
TAP version 13
1..23
# Starting 23 tests from 1 test cases.
# RUN global.gp_load_lh ...
# OK global.gp_load_lh
ok 1 global.gp_load_lh
# RUN global.gp_load_lhu ...
# OK global.gp_load_lhu
ok 2 global.gp_load_lhu
# RUN global.gp_load_lw ...
# OK global.gp_load_lw
ok 3 global.gp_load_lw
# RUN global.gp_load_lwu ...
# OK global.gp_load_lwu
ok 4 global.gp_load_lwu
# RUN global.gp_load_ld ...
# OK global.gp_load_ld
ok 5 global.gp_load_ld
# RUN global.gp_load_c_lw ...
# OK global.gp_load_c_lw
ok 6 global.gp_load_c_lw
# RUN global.gp_load_c_ld ...
# OK global.gp_load_c_ld
ok 7 global.gp_load_c_ld
# RUN global.gp_load_c_ldsp ...
# OK global.gp_load_c_ldsp
ok 8 global.gp_load_c_ldsp
# RUN global.gp_load_sh ...
# OK global.gp_load_sh
ok 9 global.gp_load_sh
# RUN global.gp_load_sw ...
# OK global.gp_load_sw
ok 10 global.gp_load_sw
# RUN global.gp_load_sd ...
# OK global.gp_load_sd
ok 11 global.gp_load_sd
# RUN global.gp_load_c_sw ...
# OK global.gp_load_c_sw
ok 12 global.gp_load_c_sw
# RUN global.gp_load_c_sd ...
# OK global.gp_load_c_sd
ok 13 global.gp_load_c_sd
# RUN global.gp_load_c_sdsp ...
# OK global.gp_load_c_sdsp
ok 14 global.gp_load_c_sdsp
# RUN global.fpu_load_flw ...
# OK global.fpu_load_flw
ok 15 global.fpu_load_flw
# RUN global.fpu_load_fld ...
# OK global.fpu_load_fld
ok 16 global.fpu_load_fld
# RUN global.fpu_load_c_fld ...
# OK global.fpu_load_c_fld
ok 17 global.fpu_load_c_fld
# RUN global.fpu_load_c_fldsp ...
# OK global.fpu_load_c_fldsp
ok 18 global.fpu_load_c_fldsp
# RUN global.fpu_store_fsw ...
# OK global.fpu_store_fsw
ok 19 global.fpu_store_fsw
# RUN global.fpu_store_fsd ...
# OK global.fpu_store_fsd
ok 20 global.fpu_store_fsd
# RUN global.fpu_store_c_fsd ...
# OK global.fpu_store_c_fsd
ok 21 global.fpu_store_c_fsd
# RUN global.fpu_store_c_fsdsp ...
# OK global.fpu_store_c_fsdsp
ok 22 global.fpu_store_c_fsdsp
# RUN global.gen_sigbus ...
[12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000]
[12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51
[12797.989169] Hardware name: riscv-virtio,qemu (DT)
[12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100
[12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008
[12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160
[12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002
[12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef
[12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000
[12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848
[12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200
[12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000
[12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0
[12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073
[12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006
[12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001
# OK global.gen_sigbus
ok 23 global.gen_sigbus
# PASSED: 23 / 23 tests passed.
# Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0
With kvm-tools:
# lkvm run -k sbi.flat -m 128
Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97
Info: Removed ghost socket file "/root/.lkvm//guest-97.sock".
##########################################################################
# kvm-unit-tests
##########################################################################
... [test messages elided]
PASS: sbi: fwft: FWFT extension probing no error
PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED
PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED
PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error
PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1
PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor
SUMMARY: 50 tests, 2 unexpected failures, 12 skipped
This series is available at [5].
[Palmer: slighyt commit text modification, as SBI-3.0 is merged now.
Also drop the KVM patches, as they're too late.]
* b4-shazam-merge:
riscv: misaligned: add a function to check misalign trap delegability
riscv: misaligned: move emulated access uniformity check in a function
riscv: misaligned: declare misaligned_access_speed under CONFIG_RISCV_MISALIGNED
riscv: misaligned: use on_each_cpu() for scalar misaligned access probing
riscv: misaligned: request misaligned exception from SBI
riscv: sbi: add SBI FWFT extension calls
riscv: sbi: add FWFT extension interface
riscv: sbi: add new SBI error mappings
riscv: sbi: remove useless parenthesis
riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions
Link: https://lore.kernel.org/r/20250523101932.1594077-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
put/get_user()"
Clément Léger <cleger@rivosinc.com> says:
While debugging a few problems with the misaligned access kselftest,
Alexandre discovered some crash with the current code. Indeed, some
misaligned access was done by the kernel using put_user(). This
was resulting in trap and a kernel crash since. The path was the
following:
user -> kernel -> access to user memory -> misaligned trap -> trap ->
kernel -> misaligned handling -> memcpy -> crash due to failed page fault
while in interrupt disabled section.
Last discussion about kernel misaligned handling and interrupt reenabling
were actually not to reenable interrupt when handling misaligned access
being done by kernel. The best solution being not to do any misaligned
accesses to userspace memory, we considered a few options:
- Remove any call to put/get_user() potentially doing misaligned
accesses
- Do not do any misaligned accesses in put/get_user() itself
The second solution was the one chosen as there are too many callsites to
put/get_user() that could potentially do misaligned accesses. We tried
two approaches for that, either split access in two aligned accesses
(and do RMW for put_user()) or call copy_from/to_user() which does not
do any misaligned accesses. The later one was the simpler to implement
(although the performances are probably lower than split aligned
accesses but still way better than doing misaligned access emulation)
and allows to support what we wanted.
These commits are based on top of Alex dev/alex/get_user_misaligned_v1
branch.
[Palmer: No idea what that branch is, so I'm basing it on the uaccess
optimizations patch series which is the last thing to touch these.]
* b4-shazam-merge
riscv: uaccess: do not do misaligned accesses in get/put_user()
riscv: process: use unsigned int instead of unsigned long for put_user()
riscv: make unsafe user copy routines use existing assembly routines
Link: https://lore.kernel.org/r/20250602193918.868962-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
During switch to csrs will OR the value of the register into the
corresponding csr. In this case we're only interested in restoring the
SUM bit not the entire register.
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
Link: https://lore.kernel.org/r/20250522160954.429333-1-cyrilbur@tenstorrent.com
Co-developed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Fixes: 788aa64c01f1 ("riscv: save the SR_SUM status over switches")
Link: https://lore.kernel.org/r/20250602121543.1544278-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next
riscv patches for 6.16-rc1, part 2
* Performance improvements
- Add support for vdso getrandom
- Implement raid6 calculations using vectors
- Introduce svinval tlb invalidation
* Cleanup
- A bunch of deduplication of the macros we use for manipulating instructions
* Misc
- Introduce a kunit test for kprobes
- Add support for mseal as riscv fits the requirements (thanks to Lorenzo for making sure of that :))
[Palmer: There was a rebase between part 1 and part 2, so I've had to do
some more git surgery here... at least two rounds of surgery...]
* alex-pr-2: (866 commits)
RISC-V: vDSO: Wire up getrandom() vDSO implementation
riscv: enable mseal sysmap for RV64
raid6: Add RISC-V SIMD syndrome and recovery calculations
riscv: mm: Add support for Svinval extension
riscv: Add kprobes KUnit test
riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM
riscv: kprobes: Remove duplication of RV_EXTRACT_UTYPE_IMM
riscv: kprobes: Remove duplication of RV_EXTRACT_RD_REG
riscv: kprobes: Remove duplication of RVC_EXTRACT_BTYPE_IMM
riscv: kprobes: Remove duplication of RVC_EXTRACT_C2_RS1_REG
riscv: kproves: Remove duplication of RVC_EXTRACT_JTYPE_IMM
riscv: kprobes: Remove duplication of RV_EXTRACT_BTYPE_IMM
riscv: kprobes: Remove duplication of RV_EXTRACT_RS1_REG
riscv: kprobes: Remove duplication of RV_EXTRACT_JTYPE_IMM
riscv: kprobes: Move branch_funct3 to insn.h
riscv: kprobes: Move branch_rs2_idx to insn.h
Linux 6.15-rc6
Input: xpad - fix xpad_device sorting
Input: xpad - add support for several more controllers
Input: xpad - fix Share button on Xbox One controllers
...
|
|
Hook up the generic vDSO implementation to the generic vDSO getrandom
implementation by providing the required __arch_chacha20_blocks_nostack
and getrandom_syscall implementations. Also wire up the selftests.
The benchmark result:
vdso: 25000000 times in 2.466341333 seconds
libc: 25000000 times in 41.447720005 seconds
syscall: 25000000 times in 41.043926672 seconds
vdso: 25000000 x 256 times in 162.286219353 seconds
libc: 25000000 x 256 times in 2953.855018685 seconds
syscall: 25000000 x 256 times in 2796.268546000 seconds
[ alex: - Fix dynamic relocation
- Squash Nathan's fix https://lore.kernel.org/all/20250423-riscv-fix-compat_vdso-lld-v2-1-b7bbbc244501@kernel.org/
- Add comment from Loongarch ]
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Link: https://lore.kernel.org/r/20250411024600.16045-1-xry111@xry111.site
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/alexghiti/linux into for-next
riscv patches for 6.16-rc1
* Implement atomic patching support for ftrace which finally allows to
get rid of stop_machine().
* Support for kexec_file_load() syscall
* Improve module loading time by changing the algorithm that counts the
number of plt/got entries in a module.
* Zicbop is now used in the kernel to prefetch instructions
[Palmer: There's been two rounds of surgery on this one, so as a result
it's a bit different than the PR.]
* alex-pr: (734 commits)
riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
MAINTAINERS: Update Atish's email address
riscv: hwprobe: export Zabha extension
riscv: Make regs_irqs_disabled() more clear
perf symbols: Ignore mapping symbols on riscv
RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
riscv: module: Optimize PLT/GOT entry counting
riscv: Add support for PUD THP
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
riscv: Introduce Zicbop instructions
riscv/kexec_file: Fix comment in purgatory relocator
riscv: kexec_file: Support loading Image binary file
riscv: kexec_file: Split the loading of kernel and others
riscv: Documentation: add a description about dynamic ftrace
riscv: ftrace: support direct call using call_ops
riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
riscv: ftrace: support PREEMPT
riscv: add a data fence for CMODX in the kernel mode
...
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Alexandre Ghiti <alexghiti@rivosinc.com> says:
I found this lost series developed by Guo so here is a respin with the
comments on v2 applied.
This patch series adds Zicbop support and then enables the Linux
prefetch features.
* patches from https://lore.kernel.org/r/20250421142441.395849-1-alexghiti@rivosinc.com:
riscv: xchg: Prefetch the destination word for sc.w
riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
riscv: Add support for Zicbop
riscv: Introduce Zicbop instructions
Link: https://lore.kernel.org/r/20250421142441.395849-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Doing misaligned access to userspace memory would make a trap on
platform where it is emulated. Latest fixes removed the kernel
capability to do unaligned accesses to userspace memory safely since
interrupts are kept disabled at all time during that. Thus doing so
would crash the kernel.
Such behavior was detected with GET_UNALIGN_CTL() that was doing
a put_user() with an unsigned long* address that should have been an
unsigned int*. Reenabling kernel misaligned access emulation is a bit
risky and it would also degrade performances. Rather than doing that,
we will try to avoid any misaligned accessed by using copy_from/to_user()
which does not do any misaligned accesses. This can be done only for
!CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and thus allows to only generate
a bit more code for this config.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The current implementation is underperforming and in addition, it
triggers misaligned access traps on platforms which do not handle
misaligned accesses in hardware.
Use the existing assembly routines to solve both problems at once.
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250602193918.868962-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Export Zabha through the hwprobe syscall.
Reviewed-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20250421141413.394444-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The return value of regs_irqs_disabled() is true or false, so change
its type to reflect that and also make it always inline.
Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20250422113156.25742-1-yangtiezhu@loongson.cn
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Björn Töpel <bjorn@kernel.org> says:
From: Björn Töpel <bjorn@rivosinc.com>
Hi!
For over a year ago, Daniel and I was testing the V2 of Song's series.
I also promised to take the V2, that had been sitting on the lists for
too long, to rebase it on a new kernel, and re-test it.
One year later, here's the V3! ;-)
There are no changes from V2 other, than some simple checkpatch
cleanups.
Song's original cover:
| This series makes the kexec_file_load() syscall support to load
| Image binary file. At the same time, corresponding support for
| kexec-tools had been pushed to my repo[2].
|
| Now, we can leverage that kexec-tools and this series to use the
| kexec_load() or kexec_file_load() syscall to boot both vmlinux and
| Image file, as seen in these combo tests:
|
| ```
| 1. kexec -l vmlinux
| 2. kexec -l Image
| 3. kexec -s -l vmlinux
| 4. kexec -s -l Image
| ```
Notably, kexec-tools has still not made it upstream. I've prepared a
branch on my GH [3], that I indend to post ASAP. That branch is a
collection of fixes/features, including Song's userland Image loading.
The V2 is here [2], and V1 [1].
I've tested the kexec-file/Image on qemu-rv64, with following
combinations:
* ACPI/UEFI
* DT/UEFI
* DT
both "regular" kexec (-s + -e), and crashkernels (-p).
Note that there are two purgatory patches that has to be present (part
of -rc1, so all good):
commit 28093cfef5dd ("riscv/kexec_file: Handle R_RISCV_64 in purgatory relocator")
commit 3f7023171df4 ("riscv/purgatory: 4B align purgatory_start")
[1] https://lore.kernel.org/linux-riscv/20230914020044.1397356-1-songshuaishuai@tinylab.org/
[2] https://lore.kernel.org/linux-riscv/20231016092006.3347632-1-songshuaishuai@tinylab.org/
[3] https://github.com/bjoto/kexec-tools/tree/rv-on-master
* patches from https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org:
riscv: kexec_file: Support loading Image binary file
riscv: kexec_file: Split the loading of kernel and others
Link: https://lore.kernel.org/r/20250409193004.643839-1-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Andy Chiu <andybnac@gmail.com> says:
This series makes atomic code patching in ftrace possible and eliminates
the need of the stop_machine dance. The major difference of this version
is that we merge the CALL_OPS support from Puranjay [1] and make direct
calls available for practical uses such as BPF. Thanks for the time
reviewing the series and suggestions, we hope this version gets a step
closer to happening in the upstream.
Please reference the link to v3 below for more introductory view of the
implementation [2]
Added patch: 2, 4, 10, 11, 12
Modified patch: 5, 6
Unchanged patch: 1, 3, 7, 8, 9
(1, 8 has commit msg modified)
Special thanks to Björn for his efforts on testing and guiding the
series!
[1]: https://lore.kernel.org/lkml/20240306165904.108141-1-puranjay12@gmail.com/
[2]: https://lore.kernel.org/linux-riscv/20241127172908.17149-1-andybnac@gmail.com/
* patches from https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com:
riscv: Documentation: add a description about dynamic ftrace
riscv: ftrace: support direct call using call_ops
riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
riscv: ftrace: support PREEMPT
riscv: add a data fence for CMODX in the kernel mode
riscv: vector: Support calling schedule() for preemptible Vector
riscv: ftrace: do not use stop_machine to update code
riscv: ftrace: prepare ftrace for atomic code patching
kernel: ftrace: export ftrace_sync_ipi
riscv: ftrace: align patchable functions to 4 Byte boundary
riscv: ftrace factor out code defined by !WITH_ARG
riscv: ftrace: support fastcc in Clang for WITH_ARGS
Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
|
|
Add the necessary page table functions to deal with PUD THP, this
enables the use of PUD pfnmap.
Link: https://lore.kernel.org/r/20250321123954.225097-1-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.
This patch makes use of prefetch.w to prefetch cachelines for write
prior to lr/sc loops when using the xchg_small atomic routine.
This patch is inspired by commit 0ea366f5e1b6 ("arm64: atomics:
prefetch the destination word for write prior to stxr").
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-4-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-5-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Enable Linux prefetch and prefetchw primitives using Zicbop.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20231231082955.16516-3-guoren@kernel.org
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-4-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Zicbop introduces cache blocks prefetching instructions, add the
necessary support for the kernel to use it in the coming commits.
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-3-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The S-type instructions are first introduced and then used to define the
encoding of the Zicbop prefetching instructions.
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20250421142441.395849-2-alexghiti@rivosinc.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This patch creates image_kexec_ops to load Image binary file
for kexec_file_load() syscall.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-3-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This is the preparative patch for kexec_file_load Image support.
It separates the elf_kexec_load() as two parts:
- the first part loads the vmlinux (or Image)
- the second part loads other segments (e.g. initrd,fdt,purgatory)
And the second part is exported as the load_extra_segments() function
which would be used in both kexec-elf.c and kexec-image.c.
No functional change intended.
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250409193004.643839-2-bjorn@kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
jump to FTRACE_ADDR if distance is out of reach
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/r/20250407180838.42877-11-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Each function entry implies a call to ftrace infrastructure. And it may
call into schedule in some cases. So, it is possible for preemptible
kernel-mode Vector to implicitly call into schedule. Since all V-regs
are caller-saved, it is possible to drop all V context when a thread
voluntarily call schedule(). Besides, we currently don't pass argument
through vector register, so we don't have to save/restore V-regs in
ftrace trampoline.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-7-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
instruction fetch can break down to 4 byte at a time, it is impossible
to update two instructions without a race. In order to mitigate it, we
initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
patching can change NOP4 to JALR to eable/disable ftrcae from a
function. This limits the reach of each ftrace entry to +-2KB displacing
from ftrace_caller.
Starting from the trampoline, we add a level of indirection for it to
reach ftrace caller target. Now, it loads the target address from a
memory location, then perform the jump. This enable the kernel to update
the target atomically.
The new don't-stop-the-world text patching on change only one RISC-V
instruction:
| -8: &ftrace_ops of the associated tracer function.
| <ftrace enable>:
| 0: auipc t0, hi(ftrace_caller)
| 4: jalr t0, lo(ftrace_caller)
|
| -8: &ftrace_nop_ops
| <ftrace disable>:
| 0: auipc t0, hi(ftrace_caller)
| 4: nop
This means that f+0x0 is fixed, and should not be claimed by ftrace,
e.g. kprobe should be able to put a probe in f+0x0. Thus, we adjust the
offset and MCOUNT_INSN_SIZE accordingly.
[ alex: Fix build errors with !CONFIG_DYNAMIC_FTRACE ]
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20250407180838.42877-5-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Some caller-saved registers which are not defined as function arguments
in the ABI can still be passed as arguments when the kernel is compiled
with Clang. As a result, we must save and restore those registers to
prevent ftrace from clobbering them.
- [1]: https://reviews.llvm.org/D68559
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Closes: https://lore.kernel.org/linux-riscv/7e7c7914-445d-426d-89a0-59a9199c45b1@yadro.com/
Fixes: 7caa9765465f ("ftrace: riscv: move from REGS to ARGS")
Acked-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20250407180838.42877-1-andybnac@gmail.com
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Checking for the delegability of the misaligned access trap is needed
for the KVM FWFT extension implementation. Add a function to get the
delegability of the misaligned trap exception.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-11-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
While misaligned_access_speed was defined in a file compile with
CONFIG_RISCV_MISALIGNED, its definition was under
CONFIG_RISCV_SCALAR_MISALIGNED. This resulted in compilation problems
when using it in a file compiled with CONFIG_RISCV_MISALIGNED.
Move the declaration under CONFIG_RISCV_MISALIGNED so that it can be
used unconditionnally when compiled with that config and remove the check
for that variable in traps_misaligned.c.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-9-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
Now that the kernel can handle misaligned accesses in S-mode, request
misaligned access exception delegation from SBI. This uses the FWFT SBI
extension defined in SBI version 3.0.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-7-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This SBI extensions enables supervisor mode to control feature that are
under M-mode control (For instance, Svadu menvcfg ADUE bit, Ssdbltrp
DTE, etc). Add an interface to set local features for a specific cpu
mask as well as for the online cpu mask.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-5-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
A few new errors have been added with SBI V3.0, maps them as close as
possible to errno values.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
The Firmware Features extension (FWFT) was added as part of the SBI 3.0
specification. Add SBI definitions to use this extension.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Deepak Gupta <debug@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20250523101932.1594077-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
This has been on -next for a bit, but it's broken and there's already a
v2. So I'm reverting it to avoid more rebasing.
This reverts commit 89079520cef65d6da1e864eab4464effe5396e23.
Link: https://lore.kernel.org/r/20250602173315.20228-1-palmer@dabbelt.com
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
creating a pte which addresses the first page in a folio and reduces
the amount of plumbing which architecture must implement to provide
this.
- "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
largely unrelated folio infrastructure changes which clean things up
and better prepare us for future work.
- "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
Price adds early-init code to prevent x86 from leaving physical
memory unused when physical address regions are not aligned to memory
block size.
- "mm/compaction: allow more aggressive proactive compaction" from
Michal Clapinski provides some tuning of the (sadly, hard-coded (more
sadly, not auto-tuned)) thresholds for our invokation of proactive
compaction. In a simple test case, the reduction of a guest VM's
memory consumption was dramatic.
- "Minor cleanups and improvements to swap freeing code" from Kemeng
Shi provides some code cleaups and a small efficiency improvement to
this part of our swap handling code.
- "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
adds the ability for a ptracer to modify syscalls arguments. At this
time we can alter only "system call information that are used by
strace system call tampering, namely, syscall number, syscall
arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
against /proc/pid/pagemap. This permits CRIU to more efficiently get
at the info about guard regions.
- "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
implements that fix. No runtime effect is expected because
validate_page_before_insert() happens to fix up this error.
- "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
Hildenbrand basically brings uprobe text poking into the current
decade. Remove a bunch of hand-rolled implementation in favor of
using more current facilities.
- "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
Khandual provides enhancements and generalizations to the pte dumping
code. This might be needed when 128-bit Page Table Descriptors are
enabled for ARM.
- "Always call constructor for kernel page tables" from Kevin Brodsky
ensures that the ctor/dtor is always called for kernel pgtables, as
it already is for user pgtables.
This permits the addition of more functionality such as "insert hooks
to protect page tables". This change does result in various
architectures performing unnecesary work, but this is fixed up where
it is anticipated to occur.
- "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
Ryhl adds plumbing to permit Rust access to core MM structures.
- "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
Stoakes takes advantage of some VMA merging opportunities which we've
been missing for 15 years.
- "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
SeongJae Park optimizes process_madvise()'s TLB flushing.
Instead of flushing each address range in the provided iovec, we
batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- "Track node vacancy to reduce worst case allocation counts" from
Sidhartha Kumar makes the maple tree smarter about its node
preallocation.
stress-ng mmap performance increased by single-digit percentages and
the amount of unnecessarily preallocated memory was dramaticelly
reduced.
- "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
a few unnecessary things which Baoquan noted when reading the code.
- ""Enhance sysfs handling for memory hotplug in weighted interleave"
from Rakie Kim "enhances the weighted interleave policy in the memory
management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug
support". Fixes things on error paths which we are unlikely to hit.
- "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
from SeongJae Park introduces new DAMOS quota goal metrics which
eliminate the manual tuning which is required when utilizing DAMON
for memory tiering.
- "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
provides cleanups and small efficiency improvements which Baoquan
found via code inspection.
- "vmscan: enforce mems_effective during demotion" from Gregory Price
changes reclaim to respect cpuset.mems_effective during demotion when
possible. because presently, reclaim explicitly ignores
cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently.
- "Clean up split_huge_pmd_locked() and remove unnecessary folio
pointers" from Gavin Guo provides minor cleanups and efficiency gains
in in the huge page splitting and migrating code.
- "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
for `struct mem_cgroup', yielding improved memory utilization.
- "add max arg to swappiness in memory.reclaim and lru_gen" from
Zhongkun He adds a new "max" argument to the "swappiness=" argument
for memory.reclaim MGLRU's lru_gen.
This directs proactive reclaim to reclaim from only anon folios
rather than file-backed folios.
- "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
first step on the path to permitting the kernel to maintain existing
VMs while replacing the host kernel via file-based kexec. At this
time only memblock's reserve_mem is preserved.
- "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
and uses a smarter way of looping over a pfn range. By skipping
ranges of invalid pfns.
- "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
when a task is pinned a single NUMA mode.
Dramatic performance benefits were seen in some real world cases.
- "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
Garg addresses a warning which occurs during memory compaction when
using JFS.
- "move all VMA allocation, freeing and duplication logic to mm" from
Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
appropriate mm/vma.c.
- "mm, swap: clean up swap cache mapping helper" from Kairui Song
provides code consolidation and cleanups related to the folio_index()
function.
- "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.
- "memcg: Fix test_memcg_min/low test failures" from Waiman Long
addresses some bogus failures which are being reported by the
test_memcontrol selftest.
- "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
Stoakes commences the deprecation of file_operations.mmap() in favor
of the new file_operations.mmap_prepare().
The latter is more restrictive and prevents drivers from messing with
things in ways which, amongst other problems, may defeat VMA merging.
- "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
the per-cpu memcg charge cache from the objcg's one.
This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- "mm/damon: minor fixups and improvements for code, tests, and
documents" from SeongJae Park is yet another batch of miscellaneous
DAMON changes. Fix and improve minor problems in code, tests and
documents.
- "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
stats to be irq safe. Another step along the way to making memcg
charging and stats updates NMI-safe, a BPF requirement.
- "Let unmap_hugepage_range() and several related functions take folio
instead of page" from Fan Ni provides folio conversions in the
hugetlb code.
* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
mm: pcp: increase pcp->free_count threshold to trigger free_high
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
mm/hugetlb: pass folio instead of page to unmap_ref_private()
memcg: objcg stock trylock without irq disabling
memcg: no stock lock for cpu hot-unplug
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
memcg: make count_memcg_events re-entrant safe against irqs
memcg: make mod_memcg_state re-entrant safe against irqs
memcg: move preempt disable to callers of memcg_rstat_updated
memcg: memcg_rstat_updated re-entrant safe against irqs
mm: khugepaged: decouple SHMEM and file folios' collapse
selftests/eventfd: correct test name and improve messages
alloc_tag: check mem_profiling_support in alloc_tag_init
Docs/damon: update titles and brief introductions to explain DAMOS
selftests/damon/_damon_sysfs: read tried regions directories in order
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
|
|
Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.
Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.
ARM:
- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.
- Enable nested virtualisation support on systems that support it,
though it is disabled by default.
- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.
- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.
- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.
- New selftest checking the pKVM ownership transition rules
- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.
- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.
- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.
- Add a new selftest for the SVE host state being corrupted by a
guest.
- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.
- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.
- and the usual random cleanups.
LoongArch:
- Don't flush tlb if the host supports hardware page table walks.
- Add KVM selftests support.
RISC-V:
- Add vector registers to get-reg-list selftest
- VCPU reset related improvements
- Remove scounteren initialization from VCPU reset
- Support VCPU reset from userspace using set_mpstate() ioctl
x86:
- Initial support for TDX in KVM.
This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.
This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...
|
|
Add a toggleable VM capability to reset the VCPU from userspace by
setting MP_STATE_INIT_RECEIVED through IOCTL.
Reset through a mp_state to avoid adding a new IOCTL.
Do not reset on a transition from STOPPED to RUNNABLE, because it's
better to avoid side effects that would complicate userspace adoption.
The MP_STATE_INIT_RECEIVED is not a permanent mp_state -- IOCTL resets
the VCPU while preserving the original mp_state -- because we wouldn't
gain much from having a new state it in the rest of KVM, but it's a very
non-standard use of the IOCTL.
Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20250515143723.2450630-5-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The SBI reset state has only two variables -- pc and a1.
The rest is known, so keep only the necessary information.
The reset structures make sense if we want userspace to control the
reset state (which we do), but I'd still remove them now and reintroduce
with the userspace interface later -- we could probably have just a
single reset state per VM, instead of a reset state for each VCPU.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Link: https://lore.kernel.org/r/20250403112522.1566629-6-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
The same code is used twice and SBI reset sets only two variables.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Link: https://lore.kernel.org/r/20250403112522.1566629-5-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Do not depend on the reset structures.
vector.datap is a kernel memory pointer that needs to be preserved as it
is not a part of the guest vector data.
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Radim Krčmář <rkrcmar@ventanamicro.com>
Link: https://lore.kernel.org/r/20250403112522.1566629-4-rkrcmar@ventanamicro.com
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Similar to syscall_set_arguments() that complements
syscall_get_arguments(), introduce syscall_set_nr() that complements
syscall_get_nr().
syscall_set_nr() is going to be needed along with syscall_set_arguments()
on all HAVE_ARCH_TRACEHOOK architectures to implement
PTRACE_SET_SYSCALL_INFO API.
Link: https://lkml.kernel.org/r/20250303112020.GD24170@strace.io
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> # mips
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexey Gladkov (Intel) <legion@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: anton ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davide Berardi <berardi.dav@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Eugene Syromyatnikov <evgsyr@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Renzo Davoi <renzo@cs.unibo.it>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This function is going to be needed on all HAVE_ARCH_TRACEHOOK
architectures to implement PTRACE_SET_SYSCALL_INFO API.
This partially reverts commit 7962c2eddbfe ("arch: remove unused function
syscall_set_arguments()") by reusing some of old syscall_set_arguments()
implementations.
[nathan@kernel.org: fix compile time fortify checks]
Link: https://lkml.kernel.org/r/20250408213131.GA2872426@ax162
Link: https://lkml.kernel.org/r/20250303112009.GC24170@strace.io
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Reviewed-by: Maciej W. Rozycki <macro@orcam.me.uk> [mips]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexey Gladkov (Intel) <legion@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: anton ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davide Berardi <berardi.dav@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Eugene Syromiatnikov <esyr@redhat.com>
Cc: Eugene Syromyatnikov <evgsyr@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Renzo Davoi <renzo@cs.unibo.it>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them.
Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Most architectures simply call pfn_pte(). Centralise that as the normal
definition and remove the definition of mk_pte() from the architectures
which have either that exact definition or something similar.
Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Cc: Zi Yan <ziy@nvidia.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Cyan Yang <cyan.yang@sifive.com> says:
This patch set adds four vendor-specific ISA extensions from SiFive:
"xsfvqmaccdod", "xsfvqmaccqoq", "xsfvfnrclipxfqf", and "xsfvfwmaccqqq".
Additionally, a new hwprobe key, RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0,
has been added to query which SiFive vendor extensions are supported on
the current platform.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-1-cyan.yang@sifive.com
* b4-shazam-merge:
riscv: hwprobe: Add SiFive xsfvfwmaccqqq vendor extension
riscv: hwprobe: Document SiFive xsfvfwmaccqqq vendor extension
riscv: Add SiFive xsfvfwmaccqqq vendor extension
dt-bindings: riscv: Add xsfvfwmaccqqq ISA extension description
riscv: hwprobe: Add SiFive xsfvfnrclipxfqf vendor extension
riscv: hwprobe: Document SiFive xsfvfnrclipxfqf vendor extension
riscv: Add SiFive xsfvfnrclipxfqf vendor extension
dt-bindings: riscv: Add xsfvfnrclipxfqf ISA extension description
riscv: hwprobe: Add SiFive vendor extension support and probe for xsfqmaccdod and xsfqmaccqoq
riscv: hwprobe: Document SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions
riscv: Add SiFive xsfvqmaccdod and xsfvqmaccqoq vendor extensions
dt-bindings: riscv: Add xsfvqmaccdod and xsfvqmaccqoq ISA extension description
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add hwprobe for SiFive "xsfvfwmaccqqq" vendor extension.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-13-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add SiFive vendor extension "xsfvfwmaccqqq" support to the kernel.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-11-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add hwprobe for SiFive "xsfvfnrclipxfqf" vendor extension.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-9-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Add SiFive vendor extension "xsfvfnrclipxfqf" support to the kernel.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-7-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
xsfqmaccdod and xsfqmaccqoq
Add a new hwprobe key "RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0" which allows
userspace to probe for the new vendor extensions from SiFive. Also, add
new hwprobe for SiFive "xsfvqmaccdod" and "xsfvqmaccqoq" vendor
extensions.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-5-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Document the support for sifive vendor extensions using the key
RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0 and two vendor extensions for SiFive
Int8 Matrix Multiplication Instructions using
RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCDOD and
RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCQOQ.
Signed-off-by: Cyan Yang <cyan.yang@sifive.com>
Link: https://lore.kernel.org/r/20250418053239.4351-4-cyan.yang@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|