summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-02-16arm64: mm: Wire up TCR.DS bit to PTE shareability fieldsArd Biesheuvel
When LPA2 is enabled, bits 8 and 9 of page and block descriptors become part of the output address instead of carrying shareability attributes for the region in question. So avoid setting these bits if TCR.DS == 1, which means LPA2 is enabled. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-74-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Add ESR decoding for exceptions involving translation level -1Ard Biesheuvel
The LPA2 feature introduces new FSC values to report abort exceptions related to translation level -1. Define these and wire them up. Reuse the new ESR FSC classification helpers that arrived via the KVM arm64 tree, and update the one for translation faults to check specifically for a translation fault at level -1. (Access flag or permission faults cannot occur at level -1 because they alway involve a descriptor at the superior level so changing those helpers is not needed). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-73-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Avoid #define'ing PTE_MAYBE_NG to 0x0 for asm useArd Biesheuvel
The PROT_* macros resolve to expressions that are only valid in C and not in assembler, and so they are only usable from C code. Currently, we make an exception for the permission indirection init code in proc.S, which doesn't care about the bits that are conditionally set, and so we just #define PTE_MAYBE_NG to 0x0 for any assembler file that includes these definitions. This is dodgy because this means that PROT_NORMAL and friends is generally available in asm code, but defined in a way that deviates from the definition that C code will observe, which might lead to hard to diagnose issues down the road. So instead, #define PTE_MAYBE_NG only in the place where the PIE constants are evaluated, and #undef it again right after. This allows us to drop the #define from pgtable-prot.h, and avoid the risk of deviating definitions between asm and C. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-72-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Add feature override support for LVAArd Biesheuvel
Add support for overriding the VARange field of the MMFR2 CPU ID register. This permits the associated LVA feature to be overridden early enough for the boot code that creates the kernel mapping to take it into account. Given that LPA2 implies LVA, disabling the latter should disable the former as well. So override the ID_AA64MMFR0.TGran field of the current page size as well if it advertises support for 52-bit addressing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-71-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Handle LVA support as a CPU featureArd Biesheuvel
Currently, we detect CPU support for 52-bit virtual addressing (LVA) extremely early, before creating the kernel page tables or enabling the MMU. We cannot override the feature this early, and so large virtual addressing is always enabled on CPUs that implement support for it if the software support for it was enabled at build time. It also means we rely on non-trivial code in asm to deal with this feature. Given that both the ID map and the TTBR1 mapping of the kernel image are guaranteed to be 48-bit addressable, it is not actually necessary to enable support this early, and instead, we can model it as a CPU feature. That way, we can rely on code patching to get the correct TCR.T1SZ values programmed on secondary boot and resume from suspend. On the primary boot path, we simply enable the MMU with 48-bit virtual addressing initially, and update TCR.T1SZ if LVA is supported from C code, right before creating the kernel mapping. Given that TTBR1 still points to reserved_pg_dir at this point, updating TCR.T1SZ should be safe without the need for explicit TLB maintenance. Since this gets rid of all accesses to the vabits_actual variable from asm code that occurred before TCR.T1SZ had been programmed, we no longer have a need for this variable, and we can replace it with a C expression that produces the correct value directly, based on the value of TCR.T1SZ. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Revert "mm: provide idmap pointer to cpu_replace_ttbr1()"Ard Biesheuvel
This reverts commit 1682c45b920643c, which is no longer needed now that we create the permanent kernel mapping directly during early boot. This is a RINO (revert in name only) given that some of the code has moved around, but the changes are straight-forward. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-69-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: omit redundant remap of kernel imageArd Biesheuvel
Now that the early kernel mapping is created with all the right attributes and segment boundaries, there is no longer a need to recreate it and switch to it. This also means we no longer have to copy the kasan shadow or some parts of the fixmap from one set of page tables to the other. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-68-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: avoid fixmap for early swapper_pg_dir updatesArd Biesheuvel
Early in the boot, when .rodata is still writable, we can poke swapper_pg_dir entries directly, and there is no need to go through the fixmap. After a future patch, we will enter the kernel with swapper_pg_dir already active, and early swapper_pg_dir updates for creating the fixmap page table hierarchy itself cannot go through the fixmap for obvious reaons. So let's keep track of whether rodata is writable, and update the descriptor directly in that case. As the same reasoning applies to early KASAN init, make the function noinstr as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-67-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kernel: Create initial ID map from C codeArd Biesheuvel
The asm code that creates the initial ID map is rather intricate and hard to follow. This is problematic because it makes adding support for things like LPA2 or WXN more difficult than necessary. Also, it is parameterized like the rest of the MM code to run with a configurable number of levels, which is rather pointless, given that all AArch64 CPUs implement support for 48-bit virtual addressing, and that many systems exist with DRAM located outside of the 39-bit addressable range, which is the only smaller VA size that is widely used, and we need additional tricks to make things work in that combination. So let's bite the bullet, and rip out all the asm macros, and fiddly code, and replace it with a C implementation based on the newly added routines for creating the early kernel VA mappings. And while at it, create the initial ID map based on 48-bit virtual addressing as well, regardless of the number of configured levels for the kernel proper. Note that this code may execute with the MMU and caches disabled, and is therefore not permitted to make unaligned accesses. This shouldn't generally happen in any case for the algorithm as implemented, but to be sure, let's pass -mstrict-align to the compiler just in case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: pgtable: Decouple PGDIR size macros from PGD/PUD/PMD levelsArd Biesheuvel
The mapping from PGD/PUD/PMD to levels and shifts is very confusing, given that, due to folding, the shifts may be equal for different levels, if the macros are even #define'd to begin with. In a subsequent patch, we will modify the ID mapping code to decouple the number of levels from the kernel's view of how these types are folded, so prepare for this by reformulating the macros without the use of these types. Instead, use SWAPPER_BLOCK_SHIFT as the base quantity, and derive it from either PAGE_SHIFT or PMD_SHIFT, which -if defined at all- are defined unambiguously for a given page size, regardless of the number of configured levels. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-65-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Use 48-bit virtual addressing for the permanent ID mapArd Biesheuvel
Even though we support loading kernels anywhere in 48-bit addressable physical memory, we create the ID maps based on the number of levels that we happened to configure for the kernel VA and user VA spaces. The reason for this is that the PGD/PUD/PMD based classification of translation levels, along with the associated folding when the number of levels is less than 5, does not permit creating a page table hierarchy of a set number of levels. This means that, for instance, on 39-bit VA kernels we need to configure an additional level above PGD level on the fly, and 36-bit VA kernels still only support 47-bit virtual addressing with this trick applied. Now that we have a separate helper to populate page table hierarchies that does not define the levels in terms of PUDS/PMDS/etc at all, let's reuse it to create the permanent ID map with a fixed VA size of 48 bits. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-64-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: Move early kernel mapping routines into C codeArd Biesheuvel
The asm version of the kernel mapping code works fine for creating a coarse grained identity map, but for mapping the kernel down to its exact boundaries with the right attributes, it is not suitable. This is why we create a preliminary RWX kernel mapping first, and then rebuild it from scratch later on. So let's reimplement this in C, in a way that will make it unnecessary to create the kernel page tables yet another time in paging_init(). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-63-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mmu: Make __cpu_replace_ttbr1() out of lineArd Biesheuvel
__cpu_replace_ttbr1() is a static inline, and so it gets instantiated wherever it is used. This is not really necessary, as it is never called on a hot path. It also has the unfortunate side effect that the symbol idmap_cpu_replace_ttbr1 may never be referenced from kCFI enabled C code, and this means the type id symbol may not exist either. This will result in a build error once we start referring to this symbol from asm code as well. (Note that this problem only occurs when CnP, KAsan and suspend/resume are all disabled in the Kconfig but that is a valid config, if unusual). So let's just move it out of line so all callers will share the same implementation, which will reference idmap_cpu_replace_ttbr1 unconditionally. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-62-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: mm: Make kaslr_requires_kpti() a static inlineArd Biesheuvel
In preparation for moving the first assignment of arm64_use_ng_mappings to an earlier stage in the boot, ensure that kaslr_requires_kpti() is accessible without relying on the core kernel's view on whether or not KASLR is enabled. So make it a static inline, and move the kaslr_enabled() check out of it and into the callers, one of which will disappear in a subsequent patch. Once/when support for the obsolete ThunderX 1 platform is dropped, this check reduces to a E0PD feature check on the local CPU. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-61-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: move memstart_offset_seed handling to C codeArd Biesheuvel
Now that we can set BSS variables from the early code running from the ID map, we can set memstart_offset_seed directly from the C code that derives the value instead of passing it back and forth between C and asm code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-60-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: allocate more pages for the kernel mappingArd Biesheuvel
In preparation for switching to an early kernel mapping routine that maps each segment according to its precise boundaries, and with the correct attributes, let's allocate some extra pages for page tables for the 4k page size configuration. This is necessary because the start and end of each segment may not be aligned to the block size, and so we'll need an extra page table at each segment boundary. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-59-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Add helpers to probe local CPU for PAC and BTI supportArd Biesheuvel
Add some helpers that will be used by the early kernel mapping code to check feature support on the local CPU. This permits the early kernel mapping to be created with the right attributes, removing the need for tearing it down and recreating it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-58-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: idreg-override: Create a pseudo feature for rodata=offArd Biesheuvel
Add rodata=off to the set of kernel command line options that is parsed early using the CPU feature override detection code, so we can easily refer to it when creating the kernel mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-57-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kaslr: Use feature override instead of parsing the cmdline againArd Biesheuvel
The early kaslr code open codes the detection of 'nokaslr' on the kernel command line, and this is no longer necessary now that the feature detection code, which also looks for the same string, executes before this code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-56-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: cpufeature: Add helper to test for CPU feature overridesArd Biesheuvel
Add some helpers to extract and apply feature overrides to the bare idreg values. This involves inspecting the value and mask of the specific field that we are interested in, given that an override value/mask pair might be invalid for one field but valid for another. Then, wire up the new helper for the hVHE test - note that we can drop the sysreg test here, as the override will be invalid when trying to enable hVHE on non-VHE capable hardware. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-55-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: move dynamic shadow call stack patching into early C runtimeArd Biesheuvel
Once we update the early kernel mapping code to only map the kernel once with the right permissions, we can no longer perform code patching via this mapping. So move this code to an earlier stage of the boot, right after applying the relocations. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-54-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: Run feature override detection before mapping the kernelArd Biesheuvel
To permit the feature overrides to be taken into account before the KASLR init code runs and the kernel mapping is created, move the detection code to an earlier stage in the boot. In a subsequent patch, this will be taken advantage of by merging the preliminary and permanent mappings of the kernel text and data into a single one that gets created and relocated before start_kernel() is called. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-53-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: Move feature overrides into the BSS sectionArd Biesheuvel
In order to allow the CPU feature override detection code to run even earlier, move the feature override global variables into BSS, which is the only part of the static kernel image that is mapped read-write in the initial ID map. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-52-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: Clear BSS and the kernel page tables in one goArd Biesheuvel
We will move the CPU feature overrides into BSS in a subsequent patch, and this requires that BSS is zeroed before the feature override detection code runs. So let's map BSS read-write in the ID map, and zero it via this mapping. Since the kernel page tables are right next to it, and also zeroed via the ID map, let's drop the separate clear_page_tables() function, and just zero everything in one go. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-51-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kernel: Remove early fdt remap codeArd Biesheuvel
The early FDT remap code is no longer used so let's drop it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-50-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: idreg-override: Move to early mini C runtimeArd Biesheuvel
We will want to parse the ID register overrides even earlier, so that we can take them into account before creating the kernel mapping. So migrate the code and make it work in the context of the early C runtime. We will move the invocation to an earlier stage in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-49-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: head: move relocation handling to C codeArd Biesheuvel
Now that we have a mini C runtime before the kernel mapping is up, we can move the non-trivial relocation processing code out of head.S and reimplement it in C. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-48-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kernel: Don't rely on objcopy to make code under pi/ __initArd Biesheuvel
We will add some code under pi/ that contains global variables that should not end up in __initdata, as they will not be writable via the initial ID map. So only rely on objcopy for making the libfdt code __init, and use explicit annotations for the rest. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-47-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-16arm64: kernel: Manage absolute relocations in code built under pi/Ard Biesheuvel
The mini C runtime runs before relocations are processed, and so it cannot rely on statically initialized pointer variables. Add a check to ensure that such code does not get introduced by accident, by going over the relocations in each object, identifying the ones that operate on data sections that are part of the executable image, and raising an error if any relocations of type R_AARCH64_ABS64 exist. Note that such relocations are permitted in other places (e.g., debug sections) and will never occur in compiler generated code sections when using the small code model, so only check sections that have SHF_ALLOC set and SHF_EXECINSTR cleared. To accommodate cases where statically initialized symbol references are unavoidable, introduce a special case for ELF input data sections that have ".rodata.prel64" in their names, and in these cases, instead of rejecting any encountered ABS64 relocations, convert them into PREL64 relocations, which don't require any runtime fixups. Note that the code in question must still be modified to deal with this, as it needs to convert the 64-bit signed offsets into absolute addresses before use. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-46-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-09arm64: kaslr: Adjust randomization range dynamicallyArd Biesheuvel
Currently, we base the KASLR randomization range on a rough estimate of the available space in the upper VA region: the lower 1/4th has the module region and the upper 1/4th has the fixmap, vmemmap and PCI I/O ranges, and so we pick a random location in the remaining space in the middle. Once we enable support for 5-level paging with 4k pages, this no longer works: the vmemmap region, being dimensioned to cover a 52-bit linear region, takes up so much space in the upper VA region (the size of which is based on a 48-bit VA space for compatibility with non-LVA hardware) that the region above the vmalloc region takes up more than a quarter of the available space. So instead of a heuristic, let's derive the randomization range from the actual boundaries of the vmalloc region. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-16-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09arm64: mm: Reclaim unused vmemmap region for vmalloc useArd Biesheuvel
The vmemmap array is statically sized based on the maximum supported size of the virtual address space, but it is located inside the upper VA region, which is statically sized based on the *minimum* supported size of the VA space. This doesn't matter much when using 64k pages, which is the only configuration that currently supports 52-bit virtual addressing. However, upcoming LPA2 support will change this picture somewhat, as in that case, the vmemmap array will take up more than 25% of the upper VA region when using 4k pages. Given that most of this space is never used when running on a system that does not support 52-bit virtual addressing, let's reclaim the unused vmemmap area in that case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-15-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09arm64: vmemmap: Avoid base2 order of struct page size to dimension regionArd Biesheuvel
The placement and size of the vmemmap region in the kernel virtual address space is currently derived from the base2 order of the size of a struct page. This makes for nicely aligned constants with lots of leading 0xf and trailing 0x0 digits, but given that the actual struct pages are indexed as an ordinary array, this resulting region is severely overdimensioned when the size of a struct page is just over a power of 2. This doesn't matter today, but once we enable 52-bit virtual addressing for 4k pages configurations, the vmemmap region may take up almost half of the upper VA region with the current struct page upper bound at 64 bytes. And once we enable KMSAN or other features that push the size of a struct page over 64 bytes, we will run out of VMALLOC space entirely. So instead, let's derive the region size from the actual size of a struct page, and place the entire region 1 GB from the top of the VA space, where it still doesn't share any lower level translation table entries with the fixmap. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-14-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09arm64: ptdump: Discover start of vmemmap region at runtimeArd Biesheuvel
We will soon reclaim the part of the vmemmap region that covers VA space that is not addressable by the hardware. To avoid confusion, ensure that the 'vmemmap start' marker points at the start of the region that is actually being used for the struct page array, rather than the start of the region we set aside for it at build time. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-13-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09arm64: ptdump: Allow all region boundaries to be defined at boot timeArd Biesheuvel
Rework the way the address_markers array is populated so that we can tolerate values that are not compile time constants generally, rather than keeping track manually of the array indexes in question, and poking new values into them manually. This will be needed for VMALLOC_END, which will cease to be a compile time constant after a subsequent patch. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-12-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09arm64: mm: Move fixmap region above vmemmap regionArd Biesheuvel
Move the fixmap region above the vmemmap region, so that the start of the vmemmap delineates the end of the region available for vmalloc and vmap allocations and the randomized placement of the kernel and modules. In a subsequent patch, we will take advantage of this to reclaim most of the vmemmap area when running a 52-bit VA capable build with 52-bit virtual addressing disabled at runtime. Note that the existing guard region of 256 MiB covers the fixmap and PCI I/O regions as well, so we can reduce it 8 MiB, which is what we use in other places too. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-11-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-09arm64: mm: Move PCI I/O emulation region above the vmemmap regionArd Biesheuvel
Move the PCI I/O region above the vmemmap region in the kernel's VA space. This will permit us to reclaim the lower part of the vmemmap region for vmalloc/vmap allocations when running a 52-bit VA capable build on a 48-bit VA capable system. Also, given that PCI_IO_START is derived from VMEMMAP_END, use that symbolic constant directly in ptdump rather than deriving it from VMEMMAP_START and VMEMMAP_SIZE, as those definitions will change in subsequent patches. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-10-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
2024-02-04Linux 6.8-rc3v6.8-rc3Linus Torvalds
2024-02-04Merge tag 'for-linus-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator and extent handling code" * tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type ext4: make ext4_map_blocks() distinguish delalloc only extent ext4: add a hole extent entry in cache after punch ext4: correct the hole length returned by ext4_map_blocks() ext4: convert to exclusive lock while inserting delalloc extents ext4: refactor ext4_da_map_blocks() ext4: remove 'needed' in trace_ext4_discard_preallocations ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations ext4: remove unused return value of ext4_mb_release_group_pa ext4: remove unused return value of ext4_mb_release_inode_pa ext4: remove unused return value of ext4_mb_release ext4: remove unused ext4_allocation_context::ac_groups_considered ext4: remove unneeded return value of ext4_mb_release_context ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*() ext4: remove unused return value of __mb_check_buddy ext4: mark the group block bitmap as corrupted before reporting an error ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal() ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found() ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks() ...
2024-02-04Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull smb client fixes from Steve French: "Five smb3 client fixes, mostly multichannel related: - four multichannel fixes including fix for channel allocation when multiple inactive channels, fix for unneeded race in channel deallocation, correct redundant channel scaling, and redundant multichannel disabling scenarios - add warning if max compound requests reached" * tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: smb: client: increase number of PDUs allowed in a compound request cifs: failure to add channel on iface should bump up weight cifs: do not search for channel if server is terminating cifs: avoid redundant calls to disable multichannel cifs: make sure that channel scaling is done only once
2024-02-04Merge tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Chandan Babu: - Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format attribute fork - Remove conditional compilation of realtime geometry validator functions to prevent confusing error messages from being printed on the console during the mount operation * tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove conditional building of rt geometry validator functions xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
2024-02-04Merge tag 'char-misc-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are three tiny driver fixes for 6.8-rc3. They include: - Android binder long-term bug with epoll finally being fixed - fastrpc driver shutdown bugfix - open-dice lockdep fix All of these have been in linux-next this week with no reported issues" * tag 'char-misc-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: binder: signal epoll threads of self-work misc: open-dice: Fix spurious lockdep warning misc: fastrpc: Mark all sessions as invalid in cb_remove
2024-02-04Merge tag 'tty-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty and serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for 6.8-rc3 that resolve a number of reported issues. Included in here are: - rs485 flag definition fix that affected the user/kernel abi in -rc1 - max310x driver fixes - 8250_pci1xxxx driver off-by-one fix - uart_tiocmget locking race fix All of these have been in linux-next for over a week with no reported issues" * tag 'tty-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: max310x: prevent infinite while() loop in port startup serial: max310x: fail probe if clock crystal is unstable serial: max310x: improve crystal stable clock detection serial: max310x: set default value when reading clock ready bit serial: core: Fix atomicity violation in uart_tiocmget serial: 8250_pci1xxxx: fix off by one in pci1xxxx_process_read_data() tty: serial: Fix bit order in RS485 flag definitions
2024-02-04Merge tag 'usb-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are a bunch of small USB driver fixes for 6.8-rc3. Included in here are: - new usb-serial driver ids - new dwc3 driver id added - typec driver change revert - ncm gadget driver endian bugfix - xhci bugfixes for a number of reported issues - usb hub bugfix for alternate settings - ulpi driver debugfs memory leak fix - chipidea driver bugfix - usb gadget driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'usb-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits) USB: serial: option: add Fibocom FM101-GL variant USB: serial: qcserial: add new usb-id for Dell Wireless DW5826e USB: serial: cp210x: add ID for IMST iM871A-USB usb: typec: tcpm: fix the PD disabled case usb: ucsi_acpi: Quirk to ack a connector change ack cmd usb: ucsi_acpi: Fix command completion handling usb: ucsi: Add missing ppm_lock usb: ulpi: Fix debugfs directory leak Revert "usb: typec: tcpm: fix cc role at port reset" usb: gadget: pch_udc: fix an Excess kernel-doc warning usb: f_mass_storage: forbid async queue when shutdown happen USB: hub: check for alternate port before enabling A_ALT_HNP_SUPPORT usb: chipidea: core: handle power lost in workqueue usb: dwc3: gadget: Fix NULL pointer dereference in dwc3_gadget_suspend usb: dwc3: pci: add support for the Intel Arrow Lake-H usb: core: Prevent null pointer dereference in update_port_device_state xhci: handle isoc Babble and Buffer Overrun events properly xhci: process isoc TD properly when there was a transaction error mid TD. xhci: fix off by one check when adding a secondary interrupter. xhci: fix possible null pointer dereference at secondary interrupter removal ...
2024-02-04Merge tag 'i2c-for-6.8-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixlet from Wolfram Sang: "MAINTAINERS update to point people to the new tree for i2c host driver changes" * tag 'i2c-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: MAINTAINERS: Update i2c host drivers repository
2024-02-04Merge tag 'dmaengine-fix-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "Core: - fix return value of is_slave_direction() for D2D dma Driver fixes for: - Documentaion fixes to resolve warnings for at_hdmac driver - bunch of fsl driver fixes for memory leaks, and useless kfree - TI edma and k3 fixes for packet error and null pointer checks" * tag 'dmaengine-fix-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: at_hdmac: add missing kernel-doc style description dmaengine: fix is_slave_direction() return false when DMA_DEV_TO_DEV dmaengine: fsl-qdma: Remove a useless devm_kfree() dmaengine: fsl-qdma: Fix a memory leak related to the queue command DMA dmaengine: fsl-qdma: Fix a memory leak related to the status queue DMA dmaengine: ti: k3-udma: Report short packet errors dmaengine: ti: edma: Add some null pointer checks to the edma_probe dmaengine: fsl-dpaa2-qdma: Fix the size of dma pools dmaengine: at_hdmac: fix some kernel-doc warnings
2024-02-04Merge tag 'phy-fixes-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy driver fixes from Vinod Koul: - TI null pointer dereference - missing erdes mux entry in lan966x driver - Return of error code in renesas driver - Serdes init sequence and register offsets for IPQ drivers * tag 'phy-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: ti: phy-omap-usb2: Fix NULL pointer dereference for SRP phy: lan966x: Add missing serdes mux entry phy: renesas: rcar-gen3-usb2: Fix returning wrong error code phy: qcom-qmp-usb: fix serdes init sequence for IPQ6018 phy: qcom-qmp-usb: fix register offsets for ipq8074/ipq6018
2024-02-03Merge tag 'i2c-host-fixes-6.8-rc3' of ↵Wolfram Sang
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current Just a maintenance patch that updates the repository where the i2c host and muxes related patches will be collected.
2024-02-03Merge tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: "Vendor events: - Intel Alderlake/Sapphire Rapids metric fixes, the CPU type ("cpu_atom", "cpu_core") needs to be used as a prefix to be considered on a metric formula, detected via one of the 'perf test' entries. 'perf test' fixes: - Fix the creation of event selector lists on 'perf test' entries, by initializing the sample ID flag, which is done by 'perf record', so this fix affects only the tests, the common case isn't affected - Make 'perf list' respect debug settings (-v) to fix its 'perf test' entry - Fix 'perf script' test when python support isn't enabled - Special case 'perf script' tests on s390, where only DWARF call graphs are supported and only on software events - Make 'perf daemon' signal test less racy Compiler warnings/errors: - Remove needless malloc(0) call in 'perf top' that triggers -Walloc-size - Fix calloc() argument order to address error introduced in gcc-14 Build: - Make minimal shellcheck version to v0.6.0, avoiding the build to fail with older versions Sync kernel header copies: - stat.h to pick STATX_MNT_ID_UNIQUE - msr-index.h to pick IA32_MKTME_KEYID_PARTITIONING - drm.h to pick DRM_IOCTL_MODE_CLOSEFB - unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers - x86 cpufeatures to pick TDX, Zen, APIC MSR fence changes - x86's mem{cpy,set}_64.S used in 'perf bench' - Also, without tooling effects: asm-generic/unaligned.h, mount.h, fcntl.h, kvm headers" * tag 'perf-tools-fixes-for-v6.8-1-2024-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (21 commits) perf tools headers: update the asm-generic/unaligned.h copy with the kernel sources tools include UAPI: Sync linux/mount.h copy with the kernel sources perf evlist: Fix evlist__new_default() for > 1 core PMU tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' tools headers x86 cpufeatures: Sync with the kernel sources to pick TDX, Zen, APIC MSR fence changes tools headers UAPI: Sync unistd.h to pick {list,stat}mount, lsm_{[gs]et_self_attr,list_modules} syscall numbers perf vendor events intel: Alderlake/sapphirerapids metric fixes tools headers UAPI: Sync kvm headers with the kernel sources perf tools: Fix calloc() arguments to address error introduced in gcc-14 perf top: Remove needless malloc(0) call that triggers -Walloc-size perf build: Make minimal shellcheck version to v0.6.0 tools headers UAPI: Update tools's copy of drm.h headers to pick DRM_IOCTL_MODE_CLOSEFB perf test shell daemon: Make signal test less racy perf test shell script: Fix test for python being disabled perf test: Workaround debug output in list test perf list: Add output file option perf list: Switch error message to pr_err() to respect debug settings (-v) perf test: Fix 'perf script' tests on s390 tools headers UAPI: Sync linux/fcntl.h with the kernel sources tools arch x86: Sync the msr-index.h copy with the kernel sources to pick IA32_MKTME_KEYID_PARTITIONING ...
2024-02-02Merge tag 'trace-v6.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing and eventfs fixes from Steven Rostedt: - Fix the return code for ring_buffer_poll_wait() It was returing a -EINVAL instead of EPOLLERR. - Zero out the tracefs_inode so that all fields are initialized. The ti->private could have had stale data, but instead of just initializing it to NULL, clear out the entire structure when it is allocated. - Fix a crash in timerlat The hrtimer was initialized at read and not open, but is canceled at close. If the file was opened and never read the close will pass a NULL pointer to hrtime_cancel(). - Rewrite of eventfs. Linus wrote a patch series to remove the dentry references in the eventfs_inode and to use ref counting and more of proper VFS interfaces to make it work. - Add warning to put_ei() if ei is not set to free. That means something is about to free it when it shouldn't. - Restructure the eventfs_inode to make it more compact, and remove the unused llist field. - Remove the fsnotify*() funtions for when the inodes were being created in the lookup code. It doesn't make sense to notify about creation just because something is being looked up. - The inode hard link count was not accurate. It was being updated when a file was looked up. The inodes of directories were updating their parent inode hard link count every time the inode was created. That means if memory reclaim cleaned a stale directory inode and the inode was lookup up again, it would increment the parent inode again as well. Al Viro said to just have all eventfs directories have a hard link count of 1. That tells user space not to trust it. * tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Keep all directory links at 1 eventfs: Remove fsnotify*() functions from lookup() eventfs: Restructure eventfs_inode structure to be more condensed eventfs: Warn if an eventfs_inode is freed without is_freed being set tracing/timerlat: Move hrtimer_init to timerlat_fd open() eventfs: Get rid of dentry pointers without refcounts eventfs: Clean up dentry ops and add revalidate function eventfs: Remove unused d_parent pointer field tracefs: dentry lookup crapectomy tracefs: Avoid using the ei->dentry pointer unnecessarily eventfs: Initialize the tracefs inode properly tracefs: Zero out the tracefs_inode when allocating it ring-buffer: Clean ring_buffer_poll_wait() error return
2024-02-02Merge tag 'gfs2-v6.8-rc2-revert' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "It turns out that the commit to use GL_NOBLOCK flag for non-blocking lookups has several issues, and not all of them have a simple fix" * tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"