Age | Commit message (Collapse) | Author |
|
commit 52c22661c79a7b6af7fad9f77200738fc6c51878 upstream.
When building kernel with LLVM there are occasionally such errors:
In file included from ./include/linux/spinlock.h:59:
In file included from ./include/linux/irqflags.h:17:
arch/loongarch/include/asm/irqflags.h:38:3: error: must not be $r0 or $r1
38 | "csrxchg %[val], %[mask], %[reg]\n\t"
| ^
<inline asm>:1:16: note: instantiated into assembly here
1 | csrxchg $a1, $ra, 0
| ^
To prevent the compiler from allocating $r0 or $r1 for the "mask" of the
csrxchg instruction, the 'q' constraint must be used but Clang < 21 does
not support it. So force to use $t0 in the inline asm, in order to avoid
using $r0/$r1 while keeping the backward compatibility.
Cc: stable@vger.kernel.org
Link: https://github.com/llvm/llvm-project/pull/141037
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Suggested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit e242bbbb6d7ac7556aa1e358294dc7e3c82cc902 upstream.
The syscall wrappers use the "a0" register for two different register
variables, both the first argument and the return value. Here the "ret"
variable is used as both input and output while the argument register is
only used as input. Clang treats the conflicting input parameters as an
undefined behaviour and optimizes away the argument assignment.
The code seems to work by chance for the most part today but that may
change in the future. Specifically clock_gettime_fallback() fails with
clockids from 16 to 23, as implemented by the upcoming auxiliary clocks.
Switch the "ret" register variable to a pure output, similar to the
other architectures' vDSO code. This works in both clang and GCC.
Link: https://lore.kernel.org/lkml/20250602102825-42aa84f0-23f1-4d10-89fc-e8bbaffd291a@linutronix.de/
Link: https://lore.kernel.org/lkml/20250519082042.742926976@linutronix.de/
Fixes: c6b99bed6b8f ("LoongArch: Add VDSO and VSYSCALL support")
Fixes: 18efd0b10e0f ("LoongArch: vDSO: Wire up getrandom() vDSO implementation")
Cc: stable@vger.kernel.org
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 12614f794274f63fbdfe76771b2b332077d63848 upstream.
arch_uprobe_skip_sstep() returns true if instruction was emulated, that
is to say, there is no need to single step for the emulated instructions.
regs->csr_era will point to the destination address directly after the
exception, so the resume_era related code is redundant, just remove them.
Cc: stable@vger.kernel.org
Fixes: 19bc6cb64092 ("LoongArch: Add uprobes support")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90436d234230e9a950ccd87831108b688b27a234 upstream.
Fix MAX_REG_OFFSET calculation, make it point to the last register
in 'struct pt_regs' and not to the marker itself, which could allow
regs_get_register() to return an invalid offset.
Cc: stable@vger.kernel.org
Fixes: 803b0fc5c3f2baa6e5 ("LoongArch: Add process management")
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 2ef174b13344b3b4554d3d28e6f9e2a2c1d3138f upstream.
Like the other relevant symbols, export some fp, lsx, lasx and lbt
assembly symbols and put the function declarations in header files
rather than source files.
While at it, use "asmlinkage" for the other existing C prototypes
of assembly functions and also do not use the "extern" keyword with
function declarations according to the document coding-style.rst.
Cc: stable@vger.kernel.org # 6.6+
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit bb0511d59db9b3e40c8d51f0d151ccd0fd44071d ]
In the current code, the definition of regs_irqs_disabled() is actually
"!(regs->csr_prmd & CSR_CRMD_IE)" because arch_irqs_disabled_flags() is
defined as "!(flags & CSR_CRMD_IE)", it looks a little strange.
Define regs_irqs_disabled() as !(regs->csr_prmd & CSR_PRMD_PIE) directly
to make it more clear, no functional change.
While at it, the return value of regs_irqs_disabled() is true or false,
so change its type to reflect that and also make it always inline.
Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit ec105cadff5d8c0a029a3dc1084cae46cf3f799d upstream.
Begin with Loongson-3C6000, the number of PCI host can be as many as
8 for multi-chip machines, and this number should be the same for I/O
interrupt controllers. To support these machines we also increase the
MAX_IO_PICS up to 8.
Cc: stable@vger.kernel.org
Tested-by: Mingcong Bai <baimingcong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 4103cfe9dcb88010ae4911d3ff417457d1b6a720 upstream.
ARCH_DMA_MINALIGN is 1 by default, but some LoongArch-specific devices
(such as APBDMA) require 16 bytes alignment. When the data buffer length
is too small, the hardware may make an error writing cacheline. Thus, it
is dangerous to allocate a small memory buffer for DMA. It's always safe
to define ARCH_DMA_MINALIGN as L1_CACHE_BYTES but unnecessary (kmalloc()
need small memory objects). Therefore, just increase it to 16.
Cc: stable@vger.kernel.org
Tested-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 7c977393b8277ed319e92e4b598b26598c9d30c0 ]
If 'regs' points to a local stack variable, prepare_frametrace() stores
all registers to the stack. This confuses objtool as it expects them to
be restored from the stack later.
The stores don't affect stack tracing, so use unwind hints to hide them
from objtool.
Fixes the following warnings:
arch/loongarch/kernel/traps.o: warning: objtool: show_stack+0xe0: stack state mismatch: reg1[22]=-1+0 reg2[22]=-2-160
arch/loongarch/kernel/traps.o: warning: objtool: show_stack+0xe0: stack state mismatch: reg1[23]=-1+0 reg2[23]=-2-152
Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support")
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/270cadd8040dda74db2307f23497bb68e65db98d.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202503280703.OARM8SrY-lkp@intel.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 02410ac72ac3707936c07ede66e94360d0d65319 upstream.
In order to fix a bug, arm64 needs to be told the size of the huge page
for which the huge_pte is being cleared in huge_ptep_get_and_clear().
Provide for this by adding an `unsigned long sz` parameter to the
function. This follows the same pattern as huge_pte_clear() and
set_huge_pte_at().
This commit makes the required interface modifications to the core mm as
well as all arches that implement this function (arm64, loongarch, mips,
parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed
in a separate commit.
Cc: stable@vger.kernel.org
Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> # riscv
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Link: https://lore.kernel.org/r/20250226120656.2400136-2-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 624bde3465f660e54a7cd4c1efc3e536349fead5 upstream.
annotate_reachable() is unreliable since the compiler is free to place
random code inbetween two consecutive asm() statements.
This removes the last and only annotate_reachable() user.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20241128094312.133437051@infradead.org
Closes: https://lore.kernel.org/loongarch/20250307214943.372210-1-ojeda@kernel.org/
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 531936dee53e471a3ec668de3c94ca357f54b7e8 upstream.
The maximum number of load/store watchpoints and fetch instruction
watchpoints is 14 each according to LoongArch Reference Manual, so
extend the maximum number of watchpoints from 8 to 14 for ptrace.
By the way, just simply change 8 to 14 for the definition in struct
user_watch_state at the beginning, but it may corrupt uapi, then add
a new struct user_watch_state_v2 directly.
As far as I can tell, the only users for this struct in the userspace
are GDB and LLDB, there are no any problems of software compatibility
between the application and kernel according to the analysis.
The compatibility problem has been considered while developing and
testing. When the applications in the userspace get watchpoint state,
the length will be specified which is no bigger than the sizeof struct
user_watch_state or user_watch_state_v2, the actual length is assigned
as the minimal value of the application and kernel in the generic code
of ptrace:
kernel/ptrace.c: ptrace_regset():
kiov->iov_len = min(kiov->iov_len,
(__kernel_size_t) (regset->n * regset->size));
if (req == PTRACE_GETREGSET)
return copy_regset_to_user(task, view, regset_no, 0,
kiov->iov_len, kiov->iov_base);
else
return copy_regset_from_user(task, view, regset_no, 0,
kiov->iov_len, kiov->iov_base);
For example, there are four kind of combinations, all of them work well.
(1) "older kernel + older gdb", the actual length is 8+(8+8+4+4)*8=200;
(2) "newer kernel + newer gdb", the actual length is 8+(8+8+4+4)*14=344;
(3) "older kernel + newer gdb", the actual length is 8+(8+8+4+4)*8=200;
(4) "newer kernel + older gdb", the actual length is 8+(8+8+4+4)*8=200.
Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints
Cc: stable@vger.kernel.org
Fixes: 1a69f7a161a7 ("LoongArch: ptrace: Expose hardware breakpoints to debuggers")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit f502ea618bf16d615d7dc6138c8988d3118fe750 upstream.
The maximum number of load/store watchpoints and fetch instruction
watchpoints is 14 each according to LoongArch Reference Manual, so
change 8 to 14 for the related code.
Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#control-and-status-registers-related-to-watchpoints
Cc: stable@vger.kernel.org
Fixes: edffa33c7bb5 ("LoongArch: Add hardware breakpoints/watchpoints support")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit c1474bb0b7cff4e8481095bd0618b8f6c2f0aeb4 ]
The branch instructions beq, bne, blt, bge, bltu, bgeu and jirl belong
to the format reg2i16, but the sequence of oprand is different for the
instruction jirl. So adjust the parameter order of emit_jirl() to make
it more readable correspond with the Instruction Set Architecture manual.
Here are the instruction formats:
beq rj, rd, offs16
bne rj, rd, offs16
blt rj, rd, offs16
bge rj, rd, offs16
bltu rj, rd, offs16
bgeu rj, rd, offs16
jirl rd, rj, offs16
Link: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#branch-instructions
Suggested-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
commit 7cd1f5f77925ae905a57296932f0f9ef0dc364f8 upstream.
When executing mm selftests run_vmtests.sh, there is such an error:
BUG: Bad page state in process uffd-unit-tests pfn:00000
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x0
flags: 0xffff0000002000(reserved|node=0|zone=0|lastcpupid=0xffff)
raw: 00ffff0000002000 ffffbf0000000008 ffffbf0000000008 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
Modules linked in: snd_seq_dummy snd_seq snd_seq_device rfkill vfat fat
virtio_balloon efi_pstore virtio_net pstore net_failover failover fuse
nfnetlink virtio_scsi virtio_gpu virtio_dma_buf dm_multipath efivarfs
CPU: 2 UID: 0 PID: 1913 Comm: uffd-unit-tests Not tainted 6.12.0 #184
Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
Stack : 900000047c8ac000 0000000000000000 9000000000223a7c 900000047c8ac000
900000047c8af690 900000047c8af698 0000000000000000 900000047c8af7d8
900000047c8af7d0 900000047c8af7d0 900000047c8af5b0 0000000000000001
0000000000000001 900000047c8af698 10b3c7d53da40d26 0000010000000000
0000000000000022 0000000fffffffff fffffffffe000000 ffff800000000000
000000000000002f 0000800000000000 000000017a6d4000 90000000028f8940
0000000000000000 0000000000000000 90000000025aa5e0 9000000002905000
0000000000000000 90000000028f8940 ffff800000000000 0000000000000000
0000000000000000 0000000000000000 9000000000223a94 000000012001839c
00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
...
Call Trace:
[<9000000000223a94>] show_stack+0x5c/0x180
[<9000000001c3fd64>] dump_stack_lvl+0x6c/0xa0
[<900000000056aa08>] bad_page+0x1a0/0x1f0
[<9000000000574978>] free_unref_folios+0xbf0/0xd20
[<90000000004e65cc>] folios_put_refs+0x1a4/0x2b8
[<9000000000599a0c>] free_pages_and_swap_cache+0x164/0x260
[<9000000000547698>] tlb_batch_pages_flush+0xa8/0x1c0
[<9000000000547f30>] tlb_finish_mmu+0xa8/0x218
[<9000000000543cb8>] exit_mmap+0x1a0/0x360
[<9000000000247658>] __mmput+0x78/0x200
[<900000000025583c>] do_exit+0x43c/0xde8
[<9000000000256490>] do_group_exit+0x68/0x110
[<9000000000256554>] sys_exit_group+0x1c/0x20
[<9000000001c413b4>] do_syscall+0x94/0x130
[<90000000002216d8>] handle_syscall+0xb8/0x158
Disabling lock debugging due to kernel taint
BUG: non-zero pgtables_bytes on freeing mm: -16384
On LoongArch system, invalid huge pte entry should be invalid_pte_table
or a single _PAGE_HUGE bit rather than a zero value. And it should be
the same with invalid pmd entry, since pmd_none() is called by function
free_pgd_range() and pmd_none() return 0 by huge_pte_clear(). So single
_PAGE_HUGE bit is also treated as a valid pte table and free_pte_range()
will be called in free_pmd_range().
free_pmd_range()
pmd = pmd_offset(pud, addr);
do {
next = pmd_addr_end(addr, end);
if (pmd_none_or_clear_bad(pmd))
continue;
free_pte_range(tlb, pmd, addr);
} while (pmd++, addr = next, addr != end);
Here invalid_pte_table is used for both invalid huge pte entry and
pmd entry.
Cc: stable@vger.kernel.org
Fixes: 09cfefb7fa70 ("LoongArch: Add memory management")
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, the kernel couldn't boot when ARCH_IOREMAP, ARCH_WRITECOMBINE
and KASAN are enabled together. Because DMW2 is used by kernel now which
is configured as 0xa000000000000000 for WriteCombine, but KASAN has no
segment mapping for it. This patch fix this issue.
Solution: Add the relevant definitions for WriteCombine (DMW2) in KASAN.
Cc: stable@vger.kernel.org
Fixes: 8e02c3b782ec ("LoongArch: Add writecombine support for DMW-based ioremap()")
Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
If PGDIR_SIZE is too large for cpu_vabits, KASAN_SHADOW_END will
overflow UINTPTR_MAX because KASAN_SHADOW_START/KASAN_SHADOW_END are
aligned up by PGDIR_SIZE. And then the overflowed KASAN_SHADOW_END looks
like a user space address.
For example, PGDIR_SIZE of CONFIG_4KB_4LEVEL is 2^39, which is too large
for Loongson-2K series whose cpu_vabits = 39.
Since CONFIG_4KB_4LEVEL is completely legal for CPUs with cpu_vabits <=
39, we just disable KASAN via early return in kasan_init(). Otherwise we
get a boot failure.
Moreover, we change KASAN_SHADOW_END from the first address after KASAN
shadow area to the last address in KASAN shadow area, in order to avoid
the end address exactly overflow to 0 (which is a legal case). We don't
need to worry about alignment because pgd_addr_end() can handle it.
Cc: stable@vger.kernel.org
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
This is a trivial cleanup, commit c62da0c35d58518d ("mm/vma: define a
default value for VM_DATA_DEFAULT_FLAGS") has unified default values of
VM_DATA_DEFAULT_FLAGS across different platforms.
Apply the same consistency to LoongArch.
Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Currently, KASAN on LoongArch assume the CPU VA bits is 48, which is
true for Loongson-3 series, but not for Loongson-2 series (only 40 or
lower), this patch fix that issue and make KASAN usable for variable
cpu_vabits.
Solution is very simple: Just define XRANGE_SHADOW_SHIFT which means
valid address length from VA_BITS to min(cpu_vabits, VA_BITS).
Cc: stable@vger.kernel.org
Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
There are two pages in one TLB entry on LoongArch system. For kernel
space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit
set, otherwise HW treats it as non-global tlb, there will be potential
problems if tlb entry for kernel space is not global. Such as fail to
flush kernel tlb with the function local_flush_tlb_kernel_range() which
supposed only flush tlb with global bit.
Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and
kasan areas. For these areas both two consecutive page table entries
should be enabled with PAGE_GLOBAL bit. So with function set_pte() and
pte_clear(), pte buddy entry is checked and set besides its own pte
entry. However it is not atomic operation to set both two pte entries,
there is problem with test_vmalloc test case.
So function kernel_pte_init() is added to init a pte table when it is
created for kernel address space, and the default initial pte value is
PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry
need update with function set_pte() and pte_clear(), nothing to do with
the pte buddy entry.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
In loongson_sysconf, The "core" of cores_per_node and cores_per_package
stands for a logical core, which means in a SMT system it stands for a
thread indeed. This information is gotten from SMBIOS Type4 Structure,
so in order to get a correct cores_per_package for both SMT and non-SMT
systems in parse_cpu_table() we should use SMBIOS_THREAD_PACKAGE_OFFSET
instead of SMBIOS_CORE_PACKAGE_OFFSET.
Cc: stable@vger.kernel.org
Reported-by: Chao Li <lichao@loongson.cn>
Tested-by: Chao Li <lichao@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The information contained in the comment for LOONGARCH_CSR_ERA is even
less informative than the macro itself, which can cause confusion for
junior developers. Let's use the full English term.
Signed-off-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Fix objtool about do_syscall() and Clang
- Enable generic CPU vulnerabilites support
- Enable ACPI BGRT handling
- Rework CPU feature probe from CPUCFG/IOCSR
- Add ARCH_HAS_SET_MEMORY support
- Add ARCH_HAS_SET_DIRECT_MAP support
- Improve hardware page table walker
- Simplify _percpu_read() and _percpu_write()
- Add advanced extended IRQ model documentions
- Some bug fixes and other small changes
* tag 'loongarch-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
Docs/LoongArch: Add advanced extended IRQ model description
LoongArch: Remove posix_types.h include from sigcontext.h
LoongArch: Fix memleak in pci_acpi_scan_root()
LoongArch: Simplify _percpu_read() and _percpu_write()
LoongArch: Improve hardware page table walker
LoongArch: Add ARCH_HAS_SET_DIRECT_MAP support
LoongArch: Add ARCH_HAS_SET_MEMORY support
LoongArch: Rework CPU feature probe from CPUCFG/IOCSR
LoongArch: Enable ACPI BGRT handling
LoongArch: Enable generic CPU vulnerabilites support
LoongArch: Remove STACK_FRAME_NON_STANDARD(do_syscall)
LoongArch: Set AS_HAS_THIN_ADD_SUB as y if AS_IS_LLVM
LoongArch: Enable objtool for Clang
objtool: Handle frame pointer related instructions
|
|
Nothing in sigcontext.h seems to require anything from
linux/posix_types.h. This include seems a MIPS relic originated from
an error in Linux 2.6.11-rc2 (in 2005).
The unneeded include was found debugging some vDSO self test build
failure (it's not the root cause though).
Link: https://lore.kernel.org/linux-mips/20240828030413.143930-2-xry111@xry111.site/
Link: https://lore.kernel.org/loongarch/0b540679ec8cfccec75aeb3463810924f6ff71e6.camel@xry111.site/
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Now _percpu_read() and _percpu_write() macros call __percpu_read()
and __percpu_write() static inline functions that result in a single
assembly instruction. However, percpu infrastructure expects its leaf
definitions to encode the size of their percpu variable, so the patch
merges all the asm clauses from the static inline function into the
corresponding leaf macros.
The secondary effect of this change is to avoid explicit __percpu
annotations for function arguments. Currently, __percpu macro is defined
in include/linux/compiler_types.h, but with proposed patch [1], __percpu
definition will need macros from include/asm-generic/percpu.h, creating
forward dependency loop.
The proposed solution is the same as x86 architecture uses.
[1] https://lore.kernel.org/lkml/20240812115945.484051-4-ubizjak@gmail.com/
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
LoongArch has similar problems explained in commit 7f0b1bf04511348995d6
("arm64: Fix barriers used for page table modifications"), when hardware
page table walker (PTW) enabled, speculative accesses may cause spurious
page fault in kernel space. Theoretically, in order to completely avoid
spurious page fault we need a "dbar + ibar" pair between the page table
modifications and the subsequent memory accesses using the corresponding
virtual address. But "ibar" is too heavy for performace, so we only use
a "dbar 0b11000" in set_pte(). And let spurious_fault() filter the rest
rare spurious page faults which should be avoided by "ibar".
Besides, we replace the llsc loop with amo in set_pte() which has better
performace, and refactor mmu_context.h to 1) avoid any load/store/branch
instructions between the writing of CSR.ASID & CSR.PGDL, 2) ensure flush
tlb operation is after updating ASID.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add set_direct_map_*() functions for setting the direct map alias for
the page to its default permissions and to an invalid state that cannot
be cached in a TLB. (See d253ca0c3 ("x86/mm/cpa: Add set_direct_map_*()
functions")) Add a similar implementation for LoongArch.
This fixes the KFENCE warnings during hibernation:
==================================================================
BUG: KFENCE: invalid read in swsusp_save+0x368/0x4d8
Invalid read at 0x00000000f7b89a3c:
swsusp_save+0x368/0x4d8
hibernation_snapshot+0x3f0/0x4e0
hibernate+0x20c/0x440
state_store+0x128/0x140
kernfs_fop_write_iter+0x160/0x260
vfs_write+0x2c0/0x520
ksys_write+0x74/0x160
do_syscall+0xb0/0x160
CPU: 0 UID: 0 PID: 812 Comm: bash Tainted: G B 6.11.0-rc1+ #1566
Tainted: [B]=BAD_PAGE
Hardware name: Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0 10/21/2022
==================================================================
Note: We can only set permissions for KVRANGE/XKVRANGE kernel addresses.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add set_memory_ro/rw/x/nx architecture hooks to change the page
attribution.
Use own set_memory.h rather than generic set_memory.h (i.e.
include/asm-generic/set_memory.h), because we want to add other function
prototypes here.
Note: We can only set attributes for KVRANGE/XKVRANGE kernel addresses.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Probe ISA level, TLB, IOCSR information from CPUCFG to improve kernel
resilience to different core implementations.
BTW, IOCSR register definition appears to be a platform-specific spec
instead of an architecture spec, even for the Loongson CPUs there is no
guarantee that IOCSR will always present.
Thus it's dangerous to perform IOCSR probing without checking CPU type
and instruction availability.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"Originally I'd planned on sending each of the vDSO getrandom()
architecture ports to their respective arch trees. But as we started
to work on this, we found lots of interesting issues in the shared
code and infrastructure, the fixes for which the various archs needed
to base their work.
So in the end, this turned into a nice collaborative effort fixing up
issues and porting to 5 new architectures -- arm64, powerpc64,
powerpc32, s390x, and loongarch64 -- with everybody pitching in and
commenting on each other's code. It was a fun development cycle.
This contains:
- Numerous fixups to the vDSO selftest infrastructure, getting it
running successfully on more platforms, and fixing bugs in it.
- Additions to the vDSO getrandom & chacha selftests. Basically every
time manual review unearthed a bug in a revision of an arch patch,
or an ambiguity, the tests were augmented.
By the time the last arch was submitted for review, s390x, v1 of
the series was essentially fine right out of the gate.
- Fixes to the the generic C implementation of vDSO getrandom, to
build and run successfully on all archs, decoupling it from
assumptions we had (unintentionally) made on x86_64 that didn't
carry through to the other architectures.
- Port of vDSO getrandom to LoongArch64, from Xi Ruoyao and acked by
Huacai Chen.
- Port of vDSO getrandom to ARM64, from Adhemerval Zanella and acked
by Will Deacon.
- Port of vDSO getrandom to PowerPC, in both 32-bit and 64-bit
varieties, from Christophe Leroy and acked by Michael Ellerman.
- Port of vDSO getrandom to S390X from Heiko Carstens, the arch
maintainer.
While it'd be natural for there to be things to fix up over the course
of the development cycle, these patches got a decent amount of review
from a fairly diverse crew of folks on the mailing lists, and, for the
most part, they've been cooking in linux-next, which has been helpful
for ironing out build issues.
In terms of architectures, I think that mostly takes care of the
important 64-bit archs with hardware still being produced and running
production loads in settings where vDSO getrandom is likely to help.
Arguably there's still RISC-V left, and we'll see for 6.13 whether
they find it useful and submit a port"
* tag 'random-6.12-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (47 commits)
selftests: vDSO: check cpu caps before running chacha test
s390/vdso: Wire up getrandom() vdso implementation
s390/vdso: Move vdso symbol handling to separate header file
s390/vdso: Allow alternatives in vdso code
s390/module: Provide find_section() helper
s390/facility: Let test_facility() generate static branch if possible
s390/alternatives: Remove ALT_FACILITY_EARLY
s390/facility: Disable compile time optimization for decompressor code
selftests: vDSO: fix vdso_config for s390
selftests: vDSO: fix ELF hash table entry size for s390x
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO64
powerpc/vdso: Wire up getrandom() vDSO implementation on VDSO32
powerpc/vdso: Refactor CFLAGS for CVDSO build
powerpc/vdso32: Add crtsavres
mm: Define VM_DROPPABLE for powerpc/32
powerpc/vdso: Fix VDSO data access when running in a non-root time namespace
selftests: vDSO: don't include generated headers for chacha test
arm64: vDSO: Wire up getrandom() vDSO implementation
arm64: alternative: make alternative_has_cap_likely() VDSO compatible
selftests: vDSO: also test counter in vdso_test_chacha
...
|
|
LoongArch architecture changes for 6.12 depend on the irq core
changes about AVEC irqchip to avoid confliction, so merge them
to create a base.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Core:
- Remove a global lock in the affinity setting code
The lock protects a cpumask for intermediate results and the lock
causes a bottleneck on simultaneous start of multiple virtual
machines. Replace the lock and the static cpumask with a per CPU
cpumask which is nicely serialized by raw spinlock held when
executing this code.
- Provide support for giving a suffix to interrupt domain names.
That's required to support devices with subfunctions so that the
domain names are distinct even if they originate from the same
device node.
- The usual set of cleanups and enhancements all over the place
Drivers:
- Support for longarch AVEC interrupt chip
- Refurbishment of the Armada driver so it can be extended for new
variants.
- The usual set of cleanups and enhancements all over the place"
* tag 'irq-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
genirq: Use cpumask_intersects()
genirq/cpuhotplug: Use cpumask_intersects()
irqchip/apple-aic: Only access system registers on SoCs which provide them
irqchip/apple-aic: Add a new "Global fast IPIs only" feature level
irqchip/apple-aic: Skip unnecessary enabling of use_fast_ipi
dt-bindings: apple,aic: Document A7-A11 compatibles
irqdomain: Use IS_ERR_OR_NULL() in irq_domain_trim_hierarchy()
genirq/msi: Use kmemdup_array() instead of kmemdup()
genirq/proc: Change the return value for set affinity permission error
genirq/proc: Use irq_move_pending() in show_irq_affinity()
genirq/proc: Correctly set file permissions for affinity control files
genirq: Get rid of global lock in irq_do_set_affinity()
genirq: Fix typo in struct comment
irqchip/loongarch-avec: Add AVEC irqchip support
irqchip/loongson-pch-msi: Prepare get_pch_msi_handle() for AVECINTC
irqchip/loongson-eiointc: Rename CPUHP_AP_IRQ_LOONGARCH_STARTING
LoongArch: Architectural preparation for AVEC irqchip
LoongArch: Move irqchip function prototypes to irq-loongson.h
irqchip/loongson-pch-msi: Switch to MSI parent domains
softirq: Remove unused 'action' parameter from action callback
...
|
|
Hook up the generic vDSO implementation to the LoongArch vDSO data page
by providing the required __arch_chacha20_blocks_nostack,
__arch_get_k_vdso_rng_data, and getrandom_syscall implementations. Also
wire up the selftests.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Acked-by: Huacai Chen <chenhuacai@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Implement function kvm_para_has_feature() to detect supported paravirt
features. It can be used by device driver to detect and enable paravirt
features, such as the EIOINTC irqchip driver is able to detect feature
KVM_FEATURE_VIRT_EXTIOI and do some optimization.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Export kernel paravirt features to user space, so that VMM can control
each single paravirt feature. By default paravirt features will be the
same with kvm supported features if VMM does not set it.
Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set
from user space. This feature indicates that the virt EIOINTC can route
interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
On LoongArch, the host and guest have their own PMU CSRs registers and
they share PMU hardware resources. A set of PMU CSRs consists of a CTRL
register and a CNTR register. We can set which PMU CSRs are used by the
guest by writing to the GCFG register [24:26] bits.
On KVM side:
- Save the host PMU CSRs into structure kvm_context.
- If the host supports the PMU feature.
- When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs.
- When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Every vcpu has separate LBT registers. And there are four scr registers,
one flags and ftop register for LBT extension. When VM migrates, VMM
needs to get LBT registers for every vcpu.
Here macro KVM_REG_LOONGARCH_LBT is added for new vcpu lbt register type,
the following macro is added to get/put LBT registers.
KVM_REG_LOONGARCH_LBT_SCR0
KVM_REG_LOONGARCH_LBT_SCR1
KVM_REG_LOONGARCH_LBT_SCR2
KVM_REG_LOONGARCH_LBT_SCR3
KVM_REG_LOONGARCH_LBT_EFLAGS
KVM_REG_LOONGARCH_LBT_FTOP
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Loongson Binary Translation (LBT) is used to accelerate binary translation,
which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags)
and x87 fpu stack pointer (ftop).
Like FPU extension, here a lazy enabling method is used for LBT. the LBT
context is saved/restored on the vcpu context switch path.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Loongson SIMD Extension (LSX), Loongson Advanced SIMD Extension (LASX)
and Loongson Binary Translation (LBT) features are defined in register
CPUCFG2. Two kinds of LSX/LASX/LBT feature detection are added here, one
is VCPU feature, and the other is VM feature. VCPU feature dection can
only work with VCPU thread itself, and requires VCPU thread is created
already. So LSX/LASX/LBT feature detection for VM is added also, it can
be done even if VM is not created, and also can be done by any threads
besides VCPU threads.
Here ioctl command KVM_HAS_DEVICE_ATTR is added for VM, and macro
KVM_LOONGARCH_VM_FEAT_CTRL is added to check supported feature. And
five sub-features relative with LSX/LASX/LBT are added as following:
KVM_LOONGARCH_VM_FEAT_LSX
KVM_LOONGARCH_VM_FEAT_LASX
KVM_LOONGARCH_VM_FEAT_X86BT
KVM_LOONGARCH_VM_FEAT_ARMBT
KVM_LOONGARCH_VM_FEAT_MIPSBT
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Similar with x86, when VM is detected, revert to a simple test-and-set
lock to avoid the horrors of queue preemption.
Tested on 3C5000 Dual-way machine with 32 cores and 2 numa nodes,
test case is kcbench on kernel mainline 6.10, the detailed command is
"kcbench --src /root/src/linux"
Performance on host machine
kernel compile time performance impact
Original 150.29 seconds
With patch 150.19 seconds almost no impact
Performance on virtual machine:
1. 1 VM with 32 vCPUs and 2 numa node, numa node pinned
kernel compile time performance impact
Original 170.87 seconds
With patch 171.73 seconds almost no impact
2. 2 VMs, each VM with 32 vCPUs and 2 numa node, numa node pinned
kernel compile time performance impact
Original 2362.04 seconds
With patch 354.73 seconds +565%
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Every architecture that supports NUMA defines node_data in the same way:
struct pglist_data *node_data[MAX_NUMNODES];
No reason to keep multiple copies of this definition and its forward
declarations, especially when such forward declaration is the only thing
in include/asm/mmzone.h for many architectures.
Add definition and declaration of node_data to generic code and drop
architecture-specific versions.
Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If ParaVirt steal time feature is enabled, there is a percpu gpa address
passed from guest vCPU and host modifies guest memory space with this gpa
address. When vCPU is reset normally, it will notify host and invalidate
gpa address.
However if VM is crashed and VMM reboots VM forcely, the vCPU reboot
notification callback will not be called in VM. Host needs invalidate
the gpa address, else host will modify guest memory during VM reboots.
Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl
interface.
Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW
emulated timer is only used in vCPU block state. When a vCPU is removed
from the block waiting queue, kvm_restore_timer() is called and SW timer
is cancelled. And the timer register is also cleared at VMM when a vCPU
is reset.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed
it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only
legacy interrupts have been allocated.
Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h
and the core will automatically set the flag for all interrupts.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
|
|
dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add
Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c
("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"),
ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/
dma_to_phys() functions were removed. However, the unused dma-direct.h
was left behind, which is removed by this patch.
Cc: <stable@vger.kernel.org>
Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Introduce the advanced extended interrupt controllers (AVECINTC). This
feature will allow each core to have 256 independent interrupt vectors
and MSI interrupts can be independently routed to any vector on any CPU.
The whole topology of irqchips in LoongArch machines looks like this if
AVECINTC is supported:
+-----+ +-----------------------+ +-------+
| IPI | --> | CPUINTC | <-- | Timer |
+-----+ +-----------------------+ +-------+
^ ^ ^
| | |
+---------+ +----------+ +---------+ +-------+
| EIOINTC | | AVECINTC | | LIOINTC | <-- | UARTs |
+---------+ +----------+ +---------+ +-------+
^ ^
| |
+---------+ +---------+
| PCH-PIC | | PCH-MSI |
+---------+ +---------+
^ ^ ^
| | |
+---------+ +---------+ +---------+
| Devices | | PCH-LPC | | Devices |
+---------+ +---------+ +---------+
^
|
+---------+
| Devices |
+---------+
Co-developed-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Co-developed-by: Liupu Wang <wangliupu@loongson.cn>
Signed-off-by: Liupu Wang <wangliupu@loongson.cn>
Co-developed-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240823104337.25577-2-zhangtianyang@loongson.cn
|
|
Add architectural preparation for AVEC irqchip, including:
1. CPUCFG feature bits definition for AVEC;
2. Detection of AVEC irqchip in cpu_probe();
3. New IPI type definition (IPI_CLEAR_VECTOR) for AVEC;
4. Provide arch_probe_nr_irqs() for large NR_IRQS;
5. Other related changes about the number of interrupts.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240823103936.25092-2-zhangtianyang@loongson.cn
|
|
Some irqchip functions are only for internal use by irqchip drivers, so
move their prototypes from asm/irq.h to drivers/irqchip/irq-loongson.h.
All related driver files include the new irq-loongson.h.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240823103936.25092-1-zhangtianyang@loongson.cn
|
|
The kvm_hypercall() set for LoongArch is limited to a1-a5. So the
mention of a6 in the comment is undefined that needs to be rectified.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Dandan Zhang <zhangdandan@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
1. "KVM_PRIVATE_MEM_SLOTS" is renamed as "KVM_INTERNAL_MEM_SLOTS".
2. "KVM_INTERNAL_MEM_SLOTS" defaults to zero, so it is not necessary to
define it in LoongArch's asm/kvm_host.h.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bdd1c37a315bc50ab14066c4852bc8dcf070451e
Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b075450868dbc0950f0942617f222eeb989cad10
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|