summaryrefslogtreecommitdiff
path: root/arch/riscv
AgeCommit message (Collapse)Author
2025-03-07riscv: signal: fix signal frame sizeYong-Xuan Wang
commit aa49bc2ca8524186ceb0811c23a7f00c3dea6987 upstream. The signal context of certain RISC-V extensions will be appended after struct __riscv_extra_ext_header, which already includes an empty context header. Therefore, there is no need to preserve a separate hdr for the END of signal context. Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Zong Li <zong.li@sifive.com> Reviewed-by: Andy Chiu <AndybnAC@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241220083926.19453-2-yongxuan.wang@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv/futex: sign extend compare value in atomic cmpxchgAndreas Schwab
commit 599c44cd21f4967774e0acf58f734009be4aea9a upstream. Make sure the compare value in the lr/sc loop is sign extended to match what lr.w does. Fortunately, due to the compiler keeping the register contents sign extended anyway the lack of the explicit extension didn't result in wrong code so far, but this cannot be relied upon. Fixes: b90edb33010b ("RISC-V: Add futex support.") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-07riscv: KVM: Fix SBI TIME error generationAndrew Jones
[ Upstream commit b901484852992cf3d162a5eab72251cc813ca624 ] When an invalid function ID of an SBI extension is used we should return not-supported, not invalid-param. Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-11-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07riscv: KVM: Fix SBI IPI error generationAndrew Jones
[ Upstream commit 0611f78f83c93c000029ab01daa28166d03590ed ] When an invalid function ID of an SBI extension is used we should return not-supported, not invalid-param. Also, when we see that at least one hartid constructed from the base and mask parameters is invalid, then we should return invalid-param. Finally, rather than relying on overflowing a left shift to result in zero and then using that zero in a condition which [correctly] skips sending an IPI (but loops unnecessarily), explicitly check for overflow and exit the loop immediately. Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-10-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07riscv: KVM: Fix hart suspend status checkAndrew Jones
[ Upstream commit c7db342e3b4744688be1e27e31254c1d31a35274 ] "Not stopped" means started or suspended so we need to check for a single state in order to have a chance to check for each state. Also, we need to use target_vcpu when checking for the suspend state. Fixes: 763c8bed8c05 ("RISC-V: KVM: Implement SBI HSM suspend call") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-8-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07RISCV: KVM: Introduce mp_state_lock to avoid lock inversionYong-Xuan Wang
[ Upstream commit 2121cadec45aaf61fa45b3aa3d99723ed4e6683a ] Documentation/virt/kvm/locking.rst advises that kvm->lock should be acquired outside vcpu->mutex and kvm->srcu. However, when KVM/RISC-V handling SBI_EXT_HSM_HART_START, the lock ordering is vcpu->mutex, kvm->srcu then kvm->lock. Although the lockdep checking no longer complains about this after commit f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies"), it's necessary to replace kvm->lock with a new dedicated lock to ensure only one hart can execute the SBI_EXT_HSM_HART_START call for the target hart simultaneously. Additionally, this patch also rename "power_off" to "mp_state" with two possible values. The vcpu->mp_state_lock also protects the access of vcpu->mp_state. Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20240417074528.16506-2-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org> Stable-dep-of: c7db342e3b47 ("riscv: KVM: Fix hart suspend status check") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08RISC-V: Mark riscv_v_init() as __initPalmer Dabbelt
[ Upstream commit 9d87cf525fd2e1a5fcbbb40ee3df216d1d266c88 ] This trips up with Xtheadvector enabled, but as far as I can tell it's just been an issue since the original patchset. Fixes: 7ca7a7b9b635 ("riscv: Add sysctl to set the default vector rule for new processes") Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20250115180251.31444-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17riscv: Fix text patching when IPI are usedAlexandre Ghiti
[ Upstream commit c97bf629963e52b205ed5fbaf151e5bd342f9c63 ] For now, we use stop_machine() to patch the text and when we use IPIs for remote icache flushes (which is emitted in patch_text_nosync()), the system hangs. So instead, make sure every CPU executes the stop_machine() patching function and emit a local icache flush there. Co-developed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240229121056.203419-3-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Stable-dep-of: 13134cc94914 ("riscv: kprobes: Fix incorrect address calculation") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17riscv: kprobes: Fix incorrect address calculationNam Cao
commit 13134cc949148e1dfa540a0fe5dc73569bc62155 upstream. p->ainsn.api.insn is a pointer to u32, therefore arithmetic operations are multiplied by four. This is clearly undesirable for this case. Cast it to (void *) first before any calculation. Below is a sample before/after. The dumped memory is two kprobe slots, the first slot has - c.addiw a0, 0x1c (0x7125) - ebreak (0x00100073) and the second slot has: - c.addiw a0, -4 (0x7135) - ebreak (0x00100073) Before this patch: (gdb) x/16xh 0xff20000000135000 0xff20000000135000: 0x7125 0x0000 0x0000 0x0000 0x7135 0x0010 0x0000 0x0000 0xff20000000135010: 0x0073 0x0010 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 After this patch: (gdb) x/16xh 0xff20000000125000 0xff20000000125000: 0x7125 0x0073 0x0010 0x0000 0x7135 0x0073 0x0010 0x0000 0xff20000000125010: 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 Fixes: b1756750a397 ("riscv: kprobes: Use patch_text_nosync() for insn slots") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241119111056.2554419-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> [rebase to v6.6] Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17riscv: Fix sleeping in invalid context in die()Nam Cao
commit 6a97f4118ac07cfdc316433f385dbdc12af5025e upstream. die() can be called in exception handler, and therefore cannot sleep. However, die() takes spinlock_t which can sleep with PREEMPT_RT enabled. That causes the following warning: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 285, name: mutex preempt_count: 110001, expected: 0 RCU nest depth: 0, expected: 0 CPU: 0 UID: 0 PID: 285 Comm: mutex Not tainted 6.12.0-rc7-00022-ge19049cf7d56-dirty #234 Hardware name: riscv-virtio,qemu (DT) Call Trace: dump_backtrace+0x1c/0x24 show_stack+0x2c/0x38 dump_stack_lvl+0x5a/0x72 dump_stack+0x14/0x1c __might_resched+0x130/0x13a rt_spin_lock+0x2a/0x5c die+0x24/0x112 do_trap_insn_illegal+0xa0/0xea _new_vmalloc_restore_context_a0+0xcc/0xd8 Oops - illegal instruction [#1] Switch to use raw_spinlock_t, which does not sleep even with PREEMPT_RT enabled. Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20241118091333.1185288-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17riscv: mm: Fix the out of bound issue of vmemmap addressXu Lu
[ Upstream commit f754f27e98f88428aaf6be6e00f5cbce97f62d4b ] In sparse vmemmap model, the virtual address of vmemmap is calculated as: ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT)). And the struct page's va can be calculated with an offset: (vmemmap + (pfn)). However, when initializing struct pages, kernel actually starts from the first page from the same section that phys_ram_base belongs to. If the first page's physical address is not (phys_ram_base >> PAGE_SHIFT), then we get an va below VMEMMAP_START when calculating va for it's struct page. For example, if phys_ram_base starts from 0x82000000 with pfn 0x82000, the first page in the same section is actually pfn 0x80000. During init_unavailable_range(), we will initialize struct page for pfn 0x80000 with virtual address ((struct page *)VMEMMAP_START - 0x2000), which is below VMEMMAP_START as well as PCI_IO_END. This commit fixes this bug by introducing a new variable 'vmemmap_start_pfn' which is aligned with memory section size and using it to calculate vmemmap address instead of phys_ram_base. Fixes: a11dd49dcb93 ("riscv: Sparse-Memory/vmemmap out-of-bounds fix") Signed-off-by: Xu Lu <luxu.kernel@bytedance.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20241209122617.53341-1-luxu.kernel@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-17riscv: Fix early ftrace nop patchingAlexandre Ghiti
commit 6ca445d8af0ed5950ebf899415fd6bfcd7d9d7a3 upstream. Commit c97bf629963e ("riscv: Fix text patching when IPI are used") converted ftrace_make_nop() to use patch_insn_write() which does not emit any icache flush relying entirely on __ftrace_modify_code() to do that. But we missed that ftrace_make_nop() was called very early directly when converting mcount calls into nops (actually on riscv it converts 2B nops emitted by the compiler into 4B nops). This caused crashes on multiple HW as reported by Conor and Björn since the booting core could have half-patched instructions in its icache which would trigger an illegal instruction trap: fix this by emitting a local flush icache when early patching nops. Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reported-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240523115134.70380-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-19riscv: Fix IPIs usage in kfence_protect_page()Alexandre Ghiti
commit b3431a8bb336cece8adc452437befa7d4534b2fd upstream. flush_tlb_kernel_range() may use IPIs to flush the TLBs of all the cores, which triggers the following warning when the irqs are disabled: [ 3.455330] WARNING: CPU: 1 PID: 0 at kernel/smp.c:815 smp_call_function_many_cond+0x452/0x520 [ 3.456647] Modules linked in: [ 3.457218] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7-00010-g91d3de7240b8 #1 [ 3.457416] Hardware name: QEMU QEMU Virtual Machine, BIOS [ 3.457633] epc : smp_call_function_many_cond+0x452/0x520 [ 3.457736] ra : on_each_cpu_cond_mask+0x1e/0x30 [ 3.457786] epc : ffffffff800b669a ra : ffffffff800b67c2 sp : ff2000000000bb50 [ 3.457824] gp : ffffffff815212b8 tp : ff6000008014f080 t0 : 000000000000003f [ 3.457859] t1 : ffffffff815221e0 t2 : 000000000000000f s0 : ff2000000000bc10 [ 3.457920] s1 : 0000000000000040 a0 : ffffffff815221e0 a1 : 0000000000000001 [ 3.457953] a2 : 0000000000010000 a3 : 0000000000000003 a4 : 0000000000000000 [ 3.458006] a5 : 0000000000000000 a6 : ffffffffffffffff a7 : 0000000000000000 [ 3.458042] s2 : ffffffff815223be s3 : 00fffffffffff000 s4 : ff600001ffe38fc0 [ 3.458076] s5 : ff600001ff950d00 s6 : 0000000200000120 s7 : 0000000000000001 [ 3.458109] s8 : 0000000000000001 s9 : ff60000080841ef0 s10: 0000000000000001 [ 3.458141] s11: ffffffff81524812 t3 : 0000000000000001 t4 : ff60000080092bc0 [ 3.458172] t5 : 0000000000000000 t6 : ff200000000236d0 [ 3.458203] status: 0000000200000100 badaddr: ffffffff800b669a cause: 0000000000000003 [ 3.458373] [<ffffffff800b669a>] smp_call_function_many_cond+0x452/0x520 [ 3.458593] [<ffffffff800b67c2>] on_each_cpu_cond_mask+0x1e/0x30 [ 3.458625] [<ffffffff8000e4ca>] __flush_tlb_range+0x118/0x1ca [ 3.458656] [<ffffffff8000e6b2>] flush_tlb_kernel_range+0x1e/0x26 [ 3.458683] [<ffffffff801ea56a>] kfence_protect+0xc0/0xce [ 3.458717] [<ffffffff801e9456>] kfence_guarded_free+0xc6/0x1c0 [ 3.458742] [<ffffffff801e9d6c>] __kfence_free+0x62/0xc6 [ 3.458764] [<ffffffff801c57d8>] kfree+0x106/0x32c [ 3.458786] [<ffffffff80588cf2>] detach_buf_split+0x188/0x1a8 [ 3.458816] [<ffffffff8058708c>] virtqueue_get_buf_ctx+0xb6/0x1f6 [ 3.458839] [<ffffffff805871da>] virtqueue_get_buf+0xe/0x16 [ 3.458880] [<ffffffff80613d6a>] virtblk_done+0x5c/0xe2 [ 3.458908] [<ffffffff8058766e>] vring_interrupt+0x6a/0x74 [ 3.458930] [<ffffffff800747d8>] __handle_irq_event_percpu+0x7c/0xe2 [ 3.458956] [<ffffffff800748f0>] handle_irq_event+0x3c/0x86 [ 3.458978] [<ffffffff800786cc>] handle_simple_irq+0x9e/0xbe [ 3.459004] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a [ 3.459027] [<ffffffff804bf87c>] imsic_handle_irq+0xba/0x120 [ 3.459056] [<ffffffff80073934>] generic_handle_domain_irq+0x1c/0x2a [ 3.459080] [<ffffffff804bdb76>] riscv_intc_aia_irq+0x24/0x34 [ 3.459103] [<ffffffff809d0452>] handle_riscv_irq+0x2e/0x4c [ 3.459133] [<ffffffff809d923e>] call_on_irq_stack+0x32/0x40 So only flush the local TLB and let the lazy kfence page fault handling deal with the faults which could happen when a core has an old protected pte version cached in its TLB. That leads to potential inaccuracies which can be tolerated when using kfence. Fixes: 47513f243b45 ("riscv: Enable KFENCE for riscv64") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241209074125.52322-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-19riscv: Fix wrong usage of __pa() on a fixmap addressAlexandre Ghiti
commit c796e187201242992d6d292bfeff41aadfdf3f29 upstream. riscv uses fixmap addresses to map the dtb so we can't use __pa() which is reserved for linear mapping addresses. Fixes: b2473a359763 ("of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20241209074508.53037-1-alexghiti@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-09RISC-V: KVM: Fix APLIC in_clrip and clripnum write emulationYong-Xuan Wang
[ Upstream commit 60821fb4dd7345e5662094accf0a52845306de8c ] In the section "4.7 Precise effects on interrupt-pending bits" of the RISC-V AIA specification defines that: "If the source mode is Level1 or Level0 and the interrupt domain is configured in MSI delivery mode (domaincfg.DM = 1): The pending bit is cleared whenever the rectified input value is low, when the interrupt is forwarded by MSI, or by a relevant write to an in_clrip register or to clripnum." Update the aplic_write_pending() to match the spec. Fixes: d8dd9f113e16 ("RISC-V: KVM: Fix APLIC setipnum_le/be write emulation") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20241029085542.30541-1-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-09of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verifyUsama Arif
[ Upstream commit b2473a359763e27567993e7d8f37de82f57a0829 ] __pa() is only intended to be used for linear map addresses and using it for initial_boot_params which is in fixmap for arm64 will give an incorrect value. Hence save the physical address when it is known at boot time when calling early_init_dt_scan for arm64 and use it at kexec time instead of converting the virtual address using __pa(). Note that arm64 doesn't need the FDT region reserved in the DT as the kernel explicitly reserves the passed in FDT. Therefore, only a debug warning is fixed with this change. Reported-by: Breno Leitao <leitao@debian.org> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Fixes: ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()") Link: https://lore.kernel.org/r/20241023171426.452688-1-usamaarif642@gmail.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-17RISCV: KVM: use raw_spinlock for critical section in imsicCyan Yang
[ Upstream commit 3ec4350d4efb5ccb6bd0e11d9cf7f2be4f47297d ] For the external interrupt updating procedure in imsic, there was a spinlock to protect it already. But since it should not be preempted in any cases, we should turn to use raw_spinlock to prevent any preemption in case PREEMPT_RT was enabled. Signed-off-by: Cyan Yang <cyan.yang@sifive.com> Reviewed-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Message-ID: <20240919160126.44487-1-cyan.yang@sifive.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-14riscv/purgatory: align riscv_kernel_entryDaniel Maslowski
commit fb197c5d2fd24b9af3d4697d0cf778645846d6d5 upstream. When alignment handling is delegated to the kernel, everything must be word-aligned in purgatory, since the trap handler is then set to the kexec one. Without the alignment, hitting the exception would ultimately crash. On other occasions, the kernel's handler would take care of exceptions. This has been tested on a JH7110 SoC with oreboot and its SBI delegating unaligned access exceptions and the kernel configured to handle them. Fixes: 736e30af583fb ("RISC-V: Add purgatory") Signed-off-by: Daniel Maslowski <cyrevolt@gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-08riscv: Remove duplicated GET_RMChunyan Zhang
[ Upstream commit 164f66de6bb6ef454893f193c898dc8f1da6d18b ] The macro GET_RM defined twice in this file, one can be removed. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241008094141.549248-3-zhangchunyan@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08riscv: Remove unused GENERATING_ASM_OFFSETSChunyan Zhang
[ Upstream commit 46d4e5ac6f2f801f97bcd0ec82365969197dc9b1 ] The macro is not used in the current version of kernel, it looks like can be removed to avoid a build warning: ../arch/riscv/kernel/asm-offsets.c: At top level: ../arch/riscv/kernel/asm-offsets.c:7: warning: macro "GENERATING_ASM_OFFSETS" is not used [-Wunused-macros] 7 | #define GENERATING_ASM_OFFSETS Fixes: 9639a44394b9 ("RISC-V: Provide a cleaner raw_smp_processor_id()") Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Link: https://lore.kernel.org/r/20241008094141.549248-2-zhangchunyan@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08riscv: Use '%u' to format the output of 'cpu'WangYuli
[ Upstream commit e0872ab72630dada3ae055bfa410bf463ff1d1e0 ] 'cpu' is an unsigned integer, so its conversion specifier should be %u, not %d. Suggested-by: Wentao Guan <guanwentao@uniontech.com> Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://lore.kernel.org/all/alpine.DEB.2.21.2409122309090.40372@angie.orcam.me.uk/ Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Fixes: f1e58583b9c7 ("RISC-V: Support cpu hotplug") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/4C127DEECDA287C8+20241017032010.96772-1-wangyuli@uniontech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08riscv: efi: Set NX compat flag in PE/COFF headerHeinrich Schuchardt
[ Upstream commit d41373a4b910961df5a5e3527d7bde6ad45ca438 ] The IMAGE_DLLCHARACTERISTICS_NX_COMPAT informs the firmware that the EFI binary does not rely on pages that are both executable and writable. The flag is used by some distro versions of GRUB to decide if the EFI binary may be executed. As the Linux kernel neither has RWX sections nor needs RWX pages for relocation we should set the flag. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com> Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Fixes: cb7d2dd5612a ("RISC-V: Add PE/COFF header for EFI stub") Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240929140233.211800-1-heinrich.schuchardt@canonical.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08riscv: vdso: Prevent the compiler from inserting calls to memset()Alexandre Ghiti
[ Upstream commit bf40167d54d55d4b54d0103713d86a8638fb9290 ] The compiler is smart enough to insert a call to memset() in riscv_vdso_get_cpus(), which generates a dynamic relocation. So prevent this by using -fno-builtin option. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Cc: stable@vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20241016083625.136311-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-11-08RISC-V: ACPI: fix early_ioremap to early_memremapYunhui Cui
commit 1966db682f064172891275cb951aa8c98a0a809b upstream. When SVPBMT is enabled, __acpi_map_table() will directly access the data in DDR through the IO attribute, rather than through hardware cache consistency, resulting in incorrect data in the obtained ACPI table. The log: ACPI: [ACPI:0x18] Invalid zero length. We do not assume whether the bootloader flushes or not. We should access in a cacheable way instead of maintaining cache consistency by software. Fixes: 3b426d4b5b14 ("RISC-V: ACPI : Fix for usage of pointers in different address space") Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20241014130141.86426-1-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-01riscv, bpf: Make BPF_CMPXCHG fully orderedAndrea Parri
[ Upstream commit e59db0623f6955986d1be0880b351a1f56e7fd6d ] According to the prototype formal BPF memory consistency model discussed e.g. in [1] and following the ordering properties of the C/in-kernel macro atomic_cmpxchg(), a BPF atomic operation with the BPF_CMPXCHG modifier is fully ordered. However, the current RISC-V JIT lowerings fail to meet such memory ordering property. This is illustrated by the following litmus test: BPF BPF__MP+success_cmpxchg+fence { 0:r1=x; 0:r3=y; 0:r5=1; 1:r2=y; 1:r4=f; 1:r7=x; } P0 | P1 ; *(u64 *)(r1 + 0) = 1 | r1 = *(u64 *)(r2 + 0) ; r2 = cmpxchg_64 (r3 + 0, r4, r5) | r3 = atomic_fetch_add((u64 *)(r4 + 0), r5) ; | r6 = *(u64 *)(r7 + 0) ; exists (1:r1=1 /\ 1:r6=0) whose "exists" clause is not satisfiable according to the BPF memory model. Using the current RISC-V JIT lowerings, the test can be mapped to the following RISC-V litmus test: RISCV RISCV__MP+success_cmpxchg+fence { 0:x1=x; 0:x3=y; 0:x5=1; 1:x2=y; 1:x4=f; 1:x7=x; } P0 | P1 ; sd x5, 0(x1) | ld x1, 0(x2) ; L00: | amoadd.d.aqrl x3, x5, 0(x4) ; lr.d x2, 0(x3) | ld x6, 0(x7) ; bne x2, x4, L01 | ; sc.d x6, x5, 0(x3) | ; bne x6, x4, L00 | ; fence rw, rw | ; L01: | ; exists (1:x1=1 /\ 1:x6=0) where the two stores in P0 can be reordered. Update the RISC-V JIT lowerings/implementation of BPF_CMPXCHG to emit an SC with RELEASE ("rl") annotation in order to meet the expected memory ordering guarantees. The resulting RISC-V JIT lowerings of BPF_CMPXCHG match the RISC-V lowerings of the C atomic_cmpxchg(). Other lowerings were fixed via 20a759df3bba ("riscv, bpf: make some atomic operations fully ordered"). Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lpc.events/event/18/contributions/1949/attachments/1665/3441/bpfmemmodel.2024.09.19p.pdf [1] Link: https://lore.kernel.org/bpf/20241017143628.2673894-1-parri.andrea@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17riscv/kexec_file: Fix relocation type R_RISCV_ADD16 and R_RISCV_SUB16 unknownYing Sun
[ Upstream commit c6ebf2c528470a09be77d0d9df2c6617ea037ac5 ] Runs on the kernel with CONFIG_RISCV_ALTERNATIVE enabled: kexec -sl vmlinux Error: kexec_image: Unknown rela relocation: 34 kexec_image: Error loading purgatory ret=-8 and kexec_image: Unknown rela relocation: 38 kexec_image: Error loading purgatory ret=-8 The purgatory code uses the 16-bit addition and subtraction relocation type, but not handled, resulting in kexec_file_load failure. So add handle to arch_kexec_apply_relocations_add(). Tested on RISC-V64 Qemu-virt, issue fixed. Co-developed-by: Petr Tesarik <petr@tesarici.cz> Signed-off-by: Petr Tesarik <petr@tesarici.cz> Signed-off-by: Ying Sun <sunying@isrc.iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240711083236.2859632-1-sunying@isrc.iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17riscv: avoid Imbalance in RASJisheng Zhang
[ Upstream commit 8f1534e7440382d118c3d655d3a6014128b2086d ] Inspired by[1], modify the code to remove the code of modifying ra to avoid imbalance RAS (return address stack) which may lead to incorret predictions on return. Link: https://lore.kernel.org/linux-riscv/20240607061335.2197383-1-cyrilbur@tenstorrent.com/ [1] Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Cyril Bur <cyrilbur@tenstorrent.com> Link: https://lore.kernel.org/r/20240720170659.1522-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_tPalmer Dabbelt
[ Upstream commit ad380f6a0a5e82e794b45bb2eaec24ed51a56846 ] I recently ended up with a warning on some compilers along the lines of CC kernel/resource.o In file included from include/linux/ioport.h:16, from kernel/resource.c:15: kernel/resource.c: In function 'gfr_start': include/linux/minmax.h:49:37: error: conversion from 'long long unsigned int' to 'resource_size_t' {aka 'unsigned int'} changes value from '17179869183' to '4294967295' [-Werror=overflow] 49 | ({ type ux = (x); type uy = (y); __cmp(op, ux, uy); }) | ^ include/linux/minmax.h:52:9: note: in expansion of macro '__cmp_once_unique' 52 | __cmp_once_unique(op, type, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:161:27: note: in expansion of macro '__cmp_once' 161 | #define min_t(type, x, y) __cmp_once(min, type, x, y) | ^~~~~~~~~~ kernel/resource.c:1829:23: note: in expansion of macro 'min_t' 1829 | end = min_t(resource_size_t, base->end, | ^~~~~ kernel/resource.c: In function 'gfr_continue': include/linux/minmax.h:49:37: error: conversion from 'long long unsigned int' to 'resource_size_t' {aka 'unsigned int'} changes value from '17179869183' to '4294967295' [-Werror=overflow] 49 | ({ type ux = (x); type uy = (y); __cmp(op, ux, uy); }) | ^ include/linux/minmax.h:52:9: note: in expansion of macro '__cmp_once_unique' 52 | __cmp_once_unique(op, type, x, y, __UNIQUE_ID(x_), __UNIQUE_ID(y_)) | ^~~~~~~~~~~~~~~~~ include/linux/minmax.h:161:27: note: in expansion of macro '__cmp_once' 161 | #define min_t(type, x, y) __cmp_once(min, type, x, y) | ^~~~~~~~~~ kernel/resource.c:1847:24: note: in expansion of macro 'min_t' 1847 | addr <= min_t(resource_size_t, base->end, | ^~~~~ cc1: all warnings being treated as errors which looks like a real problem: our phys_addr_t is only 32 bits now, so having 34-bit masks is just going to result in overflows. Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240731162159.9235-2-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17riscv: cpufeature: Fix thead vector hwcap removalCharlie Jenkins
[ Upstream commit e482eab4d1eb31031eff2b6afb71776483101979 ] The riscv_cpuinfo struct that contains mvendorid and marchid is not populated until all harts are booted which happens after the DT parsing. Use the mvendorid/marchid from the boot hart to determine if the DT contains an invalid V. Fixes: d82f32202e0d ("RISC-V: Ignore V from the riscv,isa DT property on older T-Head CPUs") Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/20240502-cpufeature_fixes-v4-1-b3d1a088722d@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-17riscv: Remove SHADOW_OVERFLOW_STACK_SIZE macroSong Shuai
[ Upstream commit a7565f4d068b2e60f95c3223c3167c40b8fe83ae ] The commit be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe") got rid of `shadow_stack`, so SHADOW_OVERFLOW_STACK_SIZE should be removed too. Fixes: be97d0db5f44 ("riscv: VMAP_STACK overflow detection thread-safe") Signed-off-by: Song Shuai <songshuaishuai@tinylab.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20231211110331.359534-1-songshuaishuai@tinylab.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10riscv: Fix kernel stack size when KASAN is enabledAlexandre Ghiti
commit cfb10de18538e383dbc4f3ce7f477ce49287ff3d upstream. We use Kconfig to select the kernel stack size, doubling the default size if KASAN is enabled. But that actually only works if KASAN is selected from the beginning, meaning that if KASAN config is added later (for example using menuconfig), CONFIG_THREAD_SIZE_ORDER won't be updated, keeping the default size, which is not enough for KASAN as reported in [1]. So fix this by moving the logic to compute the right kernel stack into a header. Fixes: a7555f6b62e7 ("riscv: stack: Add config of thread stack size") Reported-by: syzbot+ba9eac24453387a9d502@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000eb301906222aadc2@google.com/ [1] Cc: stable@vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240917150328.59831-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-10riscv: define ILLEGAL_POINTER_VALUE for 64bitJisheng Zhang
commit 5c178472af247c7b50f962495bb7462ba453b9fb upstream. This is used in poison.h for poison pointer offset. Based on current SV39, SV48 and SV57 vm layout, 0xdead000000000000 is a proper value that is not mappable, this can avoid potentially turning an oops to an expolit. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Fixes: fbe934d69eb7 ("RISC-V: Build Infrastructure") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240705170210.3236-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04riscv: Fix fp alignment bug in perf_callchain_user()Jinjie Ruan
[ Upstream commit 22ab08955ea13be04a8efd20cc30890e0afaa49c ] The standard RISC-V calling convention said: "The stack grows downward and the stack pointer is always kept 16-byte aligned". So perf_callchain_user() should check whether 16-byte aligned for fp. Link: https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf Fixes: dbeb90b0c1eb ("riscv: Add perf callchain support") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240708032847.2998158-2-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04RISC-V: KVM: Fix to allow hpmcounter31 from the guestAtish Patra
[ Upstream commit 5aa09297a3dcc798d038bd7436f8c90f664045a6 ] The csr_fun defines a count parameter which defines the total number CSRs emulated in KVM starting from the base. This value should be equal to total number of counters possible for trap/emulation (32). Fixes: a9ac6c37521f ("RISC-V: KVM: Implement trap & emulate for hpmcounters") Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240816-kvm_pmu_fixes-v1-2-cdfce386dd93@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04RISC-V: KVM: Allow legacy PMU access from guestAtish Patra
[ Upstream commit 7d1ffc8b087e97dbe1985912c7a2d00e53cea169 ] Currently, KVM traps & emulates PMU counter access only if SBI PMU is available as the guest can only configure/read PMU counters via SBI only. However, if SBI PMU is not enabled in the host, the guest will fallback to the legacy PMU which will try to access cycle/instret and result in an illegal instruction trap which is not desired. KVM can allow dummy emulation of cycle/instret only for the guest if SBI PMU is not enabled in the host. The dummy emulation will still return zero as we don't to expose the host counter values from a guest using legacy PMU. Fixes: a9ac6c37521f ("RISC-V: KVM: Implement trap & emulate for hpmcounters") Signed-off-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240816-kvm_pmu_fixes-v1-1-cdfce386dd93@rivosinc.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-04RISC-V: KVM: Fix sbiret init before forwarding to userspaceAndrew Jones
[ Upstream commit 6b7b282e6baea06ba65b55ae7d38326ceb79cebf ] When forwarding SBI calls to userspace ensure sbiret.error is initialized to SBI_ERR_NOT_SUPPORTED first, in case userspace neglects to set it to anything. If userspace neglects it then we can't be sure it did anything else either, so we just report it didn't do or try anything. Just init sbiret.value to zero, which is the preferred value to return when nothing special is specified. KVM was already initializing both sbiret.error and sbiret.value, but the values used appear to come from a copy+paste of the __sbi_ecall() implementation, i.e. a0 and a1, which don't apply prior to the call being executed, nor at all when forwarding to userspace. Fixes: dea8ee31a039 ("RISC-V: KVM: Add SBI v0.1 support") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240807154943.150540-2-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-18riscv: dts: starfive: add assigned-clock* to limit frquencyWilliam Qiu
commit af571133f7ae028ec9b5fdab78f483af13bf28d3 upstream. In JH7110 SoC, we need to go by-pass mode, so we need add the assigned-clock* properties to limit clock frquency. Signed-off-by: William Qiu <william.qiu@starfivetech.com> Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12membarrier: riscv: Add full memory barrier in switch_mm()Andrea Parri
commit d6cfd1770f20392d7009ae1fdb04733794514fa9 upstream. The membarrier system call requires a full memory barrier after storing to rq->curr, before going back to user-space. The barrier is only needed when switching between processes: the barrier is implied by mmdrop() when switching from kernel to userspace, and it's not needed when switching from userspace to kernel. Rely on the feature/mechanism ARCH_HAS_MEMBARRIER_CALLBACKS and on the primitive membarrier_arch_switch_mm(), already adopted by the PowerPC architecture, to insert the required barrier. Fixes: fab957c11efe2f ("RISC-V: Atomic and Locking Code") Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/r/20240131144936.29190-2-parri.andrea@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12riscv: Do not restrict memory size because of linear mapping on nommuAlexandre Ghiti
[ Upstream commit 5f771088a2b5edd6f2c5c9f34484ca18dc389f3e ] It makes no sense to restrict physical memory size because of linear mapping size constraints when there is no linear mapping, so only do that when mmu is enabled. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/linux-riscv/CAMuHMdW0bnJt5GMRtOZGkTiM7GK4UaLJCDMF_Ouq++fnDKi3_A@mail.gmail.com/ Fixes: 3b6564427aea ("riscv: Fix linear mapping checks for non-contiguous memory regions") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240827065230.145021-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12riscv: Fix toolchain vector detectionAnton Blanchard
[ Upstream commit 5ba7a75a53dffbf727e842b5847859bb482ac4aa ] A recent change to gcc flags rv64iv as no longer valid: cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension and as a result vector support is disabled. Fix this by adding m to our toolchain vector detection code. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Fixes: fa8e7cce55da ("riscv: Enable Vector code to be built") Link: https://lore.kernel.org/r/20240819001131.1738806-1-antonb@tenstorrent.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12riscv: Use accessors to page table entries instead of direct dereferenceAlexandre Ghiti
commit edf955647269422e387732870d04fc15933a25ea upstream. As very well explained in commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose page table walker can modify the PTE in parallel must use READ_ONCE()/WRITE_ONCE() macro to avoid any compiler transformation. So apply that to riscv which is such architecture. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20231213203001.179237-5-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12riscv: mm: Only compile pgtable.c if MMUAlexandre Ghiti
commit d6508999d1882ddd0db8b3b4bd7967d83e9909fa upstream. All functions defined in there depend on MMU, so no need to compile it for !MMU configs. Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213203001.179237-4-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12riscv: Use WRITE_ONCE() when setting page table entriesAlexandre Ghiti
commit c30fa83b49897e708a52e122dd10616a52a4c82b upstream. To avoid any compiler "weirdness" when accessing page table entries which are concurrently modified by the HW, let's use WRITE_ONCE() macro (commit 20a004e7b017 ("arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables") gives a great explanation with more details). Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20231213203001.179237-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-12riscv: set trap vector earlieryang.zhang
[ Upstream commit 6ad8735994b854b23c824dd6b1dd2126e893a3b4 ] The exception vector of the booting hart is not set before enabling the mmu and then still points to the value of the previous firmware, typically _start. That makes it hard to debug setup_vm() when bad things happen. So fix that by setting the exception vector earlier. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: yang.zhang <yang.zhang@hexintek.com> Link: https://lore.kernel.org/r/20240508022445.6131-1-gaoshanliukou@163.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12riscv: kprobes: Use patch_text_nosync() for insn slotsSamuel Holland
[ Upstream commit b1756750a397f36ddc857989d31887c3f5081fb0 ] These instructions are not yet visible to the rest of the system, so there is no need to do the whole stop_machine() dance. Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20240327160520.791322-4-samuel.holland@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29riscv: entry: always initialize regs->a0 to -ENOSYSCeleste Liu
[ Upstream commit 61119394631f219e23ce98bcc3eb993a64a8ea64 ] Otherwise when the tracer changes syscall number to -1, the kernel fails to initialize a0 with -ENOSYS and subsequently fails to return the error code of the failed syscall to userspace. For example, it will break strace syscall tampering. Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1") Reported-by: "Dmitry V. Levin" <ldv@strace.io> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Cc: stable@vger.kernel.org Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29riscv: blacklist assembly symbols for kprobeClément Léger
[ Upstream commit 5014396af9bbac0f28d9afee7eae405206d01ee7 ] Adding kprobes on some assembly functions (mainly exception handling) will result in crashes (either recursive trap or panic). To avoid such errors, add ASM_NOKPROBE() macro which allow adding specific symbols into the __kprobe_blacklist section and use to blacklist the following symbols that showed to be problematic: - handle_exception() - ret_from_exception() - handle_kernel_stack_overflow() Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20231004131009.409193-1-cleger@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-29riscv: change XIP's kernel_map.size to be size of the entire kernelNam Cao
commit 57d76bc51fd80824bcc0c84a5b5ec944f1b51edd upstream. With XIP kernel, kernel_map.size is set to be only the size of data part of the kernel. This is inconsistent with "normal" kernel, who sets it to be the size of the entire kernel. More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is enabled, because there are checks on virtual addresses with the assumption that kernel_map.size is the size of the entire kernel (these checks are in arch/riscv/mm/physaddr.c). Change XIP's kernel_map.size to be the size of the entire kernel. Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-11riscv: Fix linear mapping checks for non-contiguous memory regionsStuart Menefy
[ Upstream commit 3b6564427aea83b7a35a15ca278291d50a1edcfc ] The RISC-V kernel already has checks to ensure that memory which would lie outside of the linear mapping is not used. However those checks use memory_limit, which is used to implement the mem= kernel command line option (to limit the total amount of memory, not its address range). When memory is made up of two or more non-contiguous memory banks this check is incorrect. Two changes are made here: - add a call in setup_bootmem() to memblock_cap_memory_range() which will cause any memory which falls outside the linear mapping to be removed from the memory regions. - remove the check in create_linear_mapping_page_table() which was intended to remove memory which is outside the liner mapping based on memory_limit, as it is no longer needed. Note a check for mapping more memory than memory_limit (to implement mem=) is unnecessary because of the existing call to memblock_enforce_memory_limit(). This issue was seen when booting on a SV39 platform with two memory banks: 0x00,80000000 1GiB 0x20,00000000 32GiB This memory range is 158GiB from top to bottom, but the linear mapping is limited to 128GiB, so the lower block of RAM will be mapped at PAGE_OFFSET, and the upper block straddles the top of the linear mapping. This causes the following Oops: [ 0.000000] Linux version 6.10.0-rc2-gd3b8dd5b51dd-dirty (stuart.menefy@codasip.com) (riscv64-codasip-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41.0.20231213) #20 SMP Sat Jun 22 11:34:22 BST 2024 [ 0.000000] memblock_add: [0x0000000080000000-0x00000000bfffffff] early_init_dt_add_memory_arch+0x4a/0x52 [ 0.000000] memblock_add: [0x0000002000000000-0x00000027ffffffff] early_init_dt_add_memory_arch+0x4a/0x52 ... [ 0.000000] memblock_alloc_try_nid: 23724 bytes align=0x8 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_init_dt_alloc_memory_arch+0x1e/0x48 [ 0.000000] memblock_reserve: [0x00000027ffff5350-0x00000027ffffaffb] memblock_alloc_range_nid+0xb8/0x132 [ 0.000000] Unable to handle kernel paging request at virtual address fffffffe7fff5350 [ 0.000000] Oops [#1] [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc2-gd3b8dd5b51dd-dirty #20 [ 0.000000] Hardware name: codasip,a70x (DT) [ 0.000000] epc : __memset+0x8c/0x104 [ 0.000000] ra : memblock_alloc_try_nid+0x74/0x84 [ 0.000000] epc : ffffffff805e88c8 ra : ffffffff806148f6 sp : ffffffff80e03d50 [ 0.000000] gp : ffffffff80ec4158 tp : ffffffff80e0bec0 t0 : fffffffe7fff52f8 [ 0.000000] t1 : 00000027ffffb000 t2 : 5f6b636f6c626d65 s0 : ffffffff80e03d90 [ 0.000000] s1 : 0000000000005cac a0 : fffffffe7fff5350 a1 : 0000000000000000 [ 0.000000] a2 : 0000000000005cac a3 : fffffffe7fffaff8 a4 : 000000000000002c [ 0.000000] a5 : ffffffff805e88c8 a6 : 0000000000005cac a7 : 0000000000000030 [ 0.000000] s2 : fffffffe7fff5350 s3 : ffffffffffffffff s4 : 0000000000000000 [ 0.000000] s5 : ffffffff8062347e s6 : 0000000000000000 s7 : 0000000000000001 [ 0.000000] s8 : 0000000000002000 s9 : 00000000800226d0 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff8080a928 t4 : ffffffff8080a928 [ 0.000000] t5 : ffffffff8080a928 t6 : ffffffff8080a940 [ 0.000000] status: 0000000200000100 badaddr: fffffffe7fff5350 cause: 000000000000000f [ 0.000000] [<ffffffff805e88c8>] __memset+0x8c/0x104 [ 0.000000] [<ffffffff8062349c>] early_init_dt_alloc_memory_arch+0x1e/0x48 [ 0.000000] [<ffffffff8043e892>] __unflatten_device_tree+0x52/0x114 [ 0.000000] [<ffffffff8062441e>] unflatten_device_tree+0x9e/0xb8 [ 0.000000] [<ffffffff806046fe>] setup_arch+0xd4/0x5bc [ 0.000000] [<ffffffff806007aa>] start_kernel+0x76/0x81a [ 0.000000] Code: b823 02b2 bc23 02b2 b023 04b2 b423 04b2 b823 04b2 (bc23) 04b2 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- The problem is that memblock (unaware that some physical memory cannot be used) has allocated memory from the top of memory but which is outside the linear mapping region. Signed-off-by: Stuart Menefy <stuart.menefy@codasip.com> Fixes: c99127c45248 ("riscv: Make sure the linear mapping does not use the kernel mapping") Reviewed-by: David McKay <david.mckay@codasip.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240622114217.2158495-1-stuart.menefy@codasip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-11riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()Zhe Qiao
[ Upstream commit 0c710050c47d45eb77b28c271cddefc5c785cb40 ] Handle VM_FAULT_SIGSEGV in the page fault path so that we correctly kill the process and we don't BUG() the kernel. Fixes: 07037db5d479 ("RISC-V: Paging and MMU") Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240731084547.85380-1-qiaozhe@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>