summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-15rtc: isl12026: drop obsolete dependency on COMPILE_TESTJean Delvare
Since commit 0166dc11be91 ("of: make CONFIG_OF user selectable"), it is possible to test-build any driver which depends on OF on any architecture by explicitly selecting OF. Therefore depending on COMPILE_TEST as an alternative is no longer needed. It is actually better to always build such drivers with OF enabled, so that the test builds are closer to how each driver will actually be built on its intended target. Building them without OF may not test much as the compiler will optimize out potentially large parts of the code. In the worst case, this could even pop false positive warnings. Dropping COMPILE_TEST here improves the quality of our testing and avoids wasting time on non-existent issues. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20221124154359.039be06c@endymion.delvare Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2022-12-15Merge tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Replace deprecated git://github.com link in MAINTAINERS (Palmer Dabbelt) - Simplify vfio/mlx5 with module_pci_driver() helper (Shang XiaoJing) - Drop unnecessary buffer from ACPI call (Rafael Mendonca) - Correct latent missing include issue in iova-bitmap and fix support for unaligned bitmaps. Follow-up with better fix through refactor (Joao Martins) - Rework ccw mdev driver to split private data from parent structure, better aligning with the mdev lifecycle and allowing us to remove a temporary workaround (Eric Farman) - Add an interface to get an estimated migration data size for a device, allowing userspace to make informed decisions, ex. more accurately predicting VM downtime (Yishai Hadas) - Fix minor typo in vfio/mlx5 array declaration (Yishai Hadas) - Simplify module and Kconfig through consolidating SPAPR/EEH code and config options and folding virqfd module into main vfio module (Jason Gunthorpe) - Fix error path from device_register() across all vfio mdev and sample drivers (Alex Williamson) - Define migration pre-copy interface and implement for vfio/mlx5 devices, allowing portions of the device state to be saved while the device continues operation, towards reducing the stop-copy state size (Jason Gunthorpe, Yishai Hadas, Shay Drory) - Implement pre-copy for hisi_acc devices (Shameer Kolothum) - Fixes to mdpy mdev driver remove path and error path on probe (Shang XiaoJing) - vfio/mlx5 fixes for incorrect return after copy_to_user() fault and incorrect buffer freeing (Dan Carpenter) * tag 'vfio-v6.2-rc1' of https://github.com/awilliam/linux-vfio: (42 commits) vfio/mlx5: error pointer dereference in error handling vfio/mlx5: fix error code in mlx5vf_precopy_ioctl() samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe() hisi_acc_vfio_pci: Enable PRE_COPY flag hisi_acc_vfio_pci: Move the dev compatibility tests for early check hisi_acc_vfio_pci: Introduce support for PRE_COPY state transitions hisi_acc_vfio_pci: Add support for precopy IOCTL vfio/mlx5: Enable MIGRATION_PRE_COPY flag vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error vfio/mlx5: Introduce multiple loads vfio/mlx5: Consider temporary end of stream as part of PRE_COPY vfio/mlx5: Introduce vfio precopy ioctl implementation vfio/mlx5: Introduce SW headers for migration states vfio/mlx5: Introduce device transitions of PRE_COPY vfio/mlx5: Refactor to use queue based data chunks vfio/mlx5: Refactor migration file state vfio/mlx5: Refactor MKEY usage vfio/mlx5: Refactor PD usage vfio/mlx5: Enforce a single SAVE command at a time vfio: Extend the device migration protocol with PRE_COPY ...
2022-12-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM64: - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d: "Fix a number of issues with MTE, such as races on the tags being initialised vs the PG_mte_tagged flag as well as the lack of support for VM_SHARED when KVM is involved. Patches from Catalin Marinas and Peter Collingbourne"). - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. s390: - Second batch of the lazy destroy patches - First batch of KVM changes for kernel virtual != physical address support - Removal of a unused function x86: - Allow compiling out SMM support - Cleanup and documentation of SMM state save area format - Preserve interrupt shadow in SMM state save area - Respond to generic signals during slow page faults - Fixes and optimizations for the non-executable huge page errata fix. - Reprogram all performance counters on PMU filter change - Cleanups to Hyper-V emulation and tests - Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest running on top of a L1 Hyper-V hypervisor) - Advertise several new Intel features - x86 Xen-for-KVM: - Allow the Xen runstate information to cross a page boundary - Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured - Add support for 32-bit guests in SCHEDOP_poll - Notable x86 fixes and cleanups: - One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0). - Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few years back when eliminating unnecessary barriers when switching between vmcs01 and vmcs02. - Clean up vmread_error_trampoline() to make it more obvious that params must be passed on the stack, even for x86-64. - Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective of the current guest CPUID. - Fudge around a race with TSC refinement that results in KVM incorrectly thinking a guest needs TSC scaling when running on a CPU with a constant TSC, but no hardware-enumerated TSC frequency. - Advertise (on AMD) that the SMM_CTL MSR is not supported - Remove unnecessary exports Generic: - Support for responding to signals during page faults; introduces new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks Selftests: - Fix an inverted check in the access tracking perf test, and restore support for asserting that there aren't too many idle pages when running on bare metal. - Fix build errors that occur in certain setups (unsure exactly what is unique about the problematic setup) due to glibc overriding static_assert() to a variant that requires a custom message. - Introduce actual atomics for clear/set_bit() in selftests - Add support for pinning vCPUs in dirty_log_perf_test. - Rename the so called "perf_util" framework to "memstress". - Add a lightweight psuedo RNG for guest use, and use it to randomize the access pattern and write vs. read percentage in the memstress tests. - Add a common ucall implementation; code dedup and pre-work for running SEV (and beyond) guests in selftests. - Provide a common constructor and arch hook, which will eventually be used by x86 to automatically select the right hypercall (AMD vs. Intel). - A bunch of added/enabled/fixed selftests for ARM64, covering memslots, breakpoints, stage-2 faults and access tracking. - x86-specific selftest changes: - Clean up x86's page table management. - Clean up and enhance the "smaller maxphyaddr" test, and add a related test to cover generic emulation failure. - Clean up the nEPT support checks. - Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values. - Fix an ordering issue in the AMX test introduced by recent conversions to use kvm_cpu_has(), and harden the code to guard against similar bugs in the future. Anything that tiggers caching of KVM's supported CPUID, kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if the caching occurs before the test opts in via prctl(). Documentation: - Remove deleted ioctls from documentation - Clean up the docs for the x86 MSR filter. - Various fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits) KVM: x86: Add proper ReST tables for userspace MSR exits/flags KVM: selftests: Allocate ucall pool from MEM_REGION_DATA KVM: arm64: selftests: Align VA space allocator with TTBR0 KVM: arm64: Fix benign bug with incorrect use of VA_BITS KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow KVM: x86: Advertise that the SMM_CTL MSR is not supported KVM: x86: remove unnecessary exports KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic" tools: KVM: selftests: Convert clear/set_bit() to actual atomics tools: Drop "atomic_" prefix from atomic test_and_set_bit() tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests perf tools: Use dedicated non-atomic clear/set bit helpers tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers KVM: arm64: selftests: Enable single-step without a "full" ucall() KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself KVM: Remove stale comment about KVM_REQ_UNHALT KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR KVM: Reference to kvm_userspace_memory_region in doc and comments KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl ...
2022-12-15x86/mm: Ensure forced page table splittingDave Hansen
There are a few kernel users like kfence that require 4k pages to work correctly and do not support large mappings. They use set_memory_4k() to break down those large mappings. That, in turn relies on cpa_data->force_split option to indicate to set_memory code that it should split page tables regardless of whether the need to be. But, a recent change added an optimization which would return early if a set_memory request came in that did not change permissions. It did not consult ->force_split and would mistakenly optimize away the splitting that set_memory_4k() needs. This broke kfence. Skip the same-permission optimization when ->force_split is set. Fixes: 127960a05548 ("x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Marco Elver <elver@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/CA+G9fYuFxZTxkeS35VTZMXwQvohu73W3xbZ5NtjebsVvH6hCuA@mail.gmail.com/
2022-12-15x86/kasan: Populate shadow for shared chunk of the CPU entry areaSean Christopherson
Popuplate the shadow for the shared portion of the CPU entry area, i.e. the read-only IDT mapping, during KASAN initialization. A recent change modified KASAN to map the per-CPU areas on-demand, but forgot to keep a shadow for the common area that is shared amongst all CPUs. Map the common area in KASAN init instead of letting idt_map_in_cea() do the dirty work so that it Just Works in the unlikely event more shared data is shoved into the CPU entry area. The bug manifests as a not-present #PF when software attempts to lookup an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs direct CALL to the IRQ handler to avoid the overhead of INTn): BUG: unable to handle page fault for address: fffffbc0000001d8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel] vcpu_run+0x1d89/0x2bd0 [kvm] kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm] kvm_vcpu_ioctl+0x349/0x900 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+8cdd16fd5a6c0565e227@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110203504.1985010-6-seanjc@google.com
2022-12-15x86/kasan: Add helpers to align shadow addresses up and downSean Christopherson
Add helpers to dedup code for aligning shadow address up/down to page boundaries when translating an address to its shadow. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-5-seanjc@google.com
2022-12-15x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten namesSean Christopherson
Rename the CPU entry area variables in kasan_init() to shorten their names, a future fix will reference the beginning of the per-CPU portion of the CPU entry area, and shadow_cpu_entry_per_cpu_begin is a bit much. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-4-seanjc@google.com
2022-12-15x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry areaSean Christopherson
Populate a KASAN shadow for the entire possible per-CPU range of the CPU entry area instead of requiring that each individual chunk map a shadow. Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping was left behind, which can lead to not-present page faults during KASAN validation if the kernel performs a software lookup into the GDT. The DS buffer is also likely affected. The motivation for mapping the per-CPU areas on-demand was to avoid mapping the entire 512GiB range that's reserved for the CPU entry area, shaving a few bytes by not creating shadows for potentially unused memory was not a goal. The bug is most easily reproduced by doing a sigreturn with a garbage CS in the sigcontext, e.g. int main(void) { struct sigcontext regs; syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); memset(&regs, 0, sizeof(regs)); regs.cs = 0x1d0; syscall(__NR_rt_sigreturn); return 0; } to coerce the kernel into doing a GDT lookup to compute CS.base when reading the instruction bytes on the subsequent #GP to determine whether or not the #GP is something the kernel should handle, e.g. to fixup UMIP violations or to emulate CLI/STI for IOPL=3 applications. BUG: unable to handle page fault for address: fffffbc8379ace00 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 Call Trace: <TASK> get_desc+0xb0/0x1d0 insn_get_seg_base+0x104/0x270 insn_fetch_from_user+0x66/0x80 fixup_umip_exception+0xb1/0x530 exc_general_protection+0x181/0x210 asm_exc_general_protection+0x22/0x30 RIP: 0003:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0003:0000000000000000 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+ffb4f000dc2872c93f62@syzkaller.appspotmail.com Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-3-seanjc@google.com
2022-12-15x86/mm: Recompute physical address for every page of per-CPU CEA mappingSean Christopherson
Recompute the physical address for each per-CPU page in the CPU entry area, a recent commit inadvertantly modified cea_map_percpu_pages() such that every PTE is mapped to the physical address of the first page. Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Link: https://lkml.kernel.org/r/20221110203504.1985010-2-seanjc@google.com
2022-12-15x86/mm: Rename __change_page_attr_set_clr(.checkalias)Peter Zijlstra
Now that the checkalias functionality is taken by CPA_NO_CHECK_ALIAS rename the argument to better match is remaining purpose: primary, matching __change_page_attr(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.661001508%40infradead.org
2022-12-15x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()Peter Zijlstra
There is a cludge in change_page_attr_set_clr() that inhibits propagating NX changes to the aliases (directmap and highmap) -- this is a cludge twofold: - it also inhibits the primary checks in __change_page_attr(); - it hard depends on single bit changes. The introduction of set_memory_rox() triggered this last issue for clearing both _PAGE_RW and _PAGE_NX. Explicitly ignore _PAGE_NX in cpa_process_alias() instead. Fixes: b38994948567 ("x86/mm: Implement native set_memory_rox()") Reported-by: kernel test robot <oliver.sang@intel.com> Debugged-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.594991716%40infradead.org
2022-12-15x86/mm: Untangle __change_page_attr_set_clr(.checkalias)Peter Zijlstra
The .checkalias argument to __change_page_attr_set_clr() is overloaded and serves two different purposes: - it inhibits the call to cpa_process_alias() -- as suggested by the name; however, - it also serves as 'primary' indicator for __change_page_attr() ( which in turn also serves as a recursion terminator for cpa_process_alias() ). Untangle these by extending the use of CPA_NO_CHECK_ALIAS to all callsites that currently use .checkalias=0 for this purpose. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.527267183%40infradead.org
2022-12-15x86/mm: Add a few commentsPeter Zijlstra
It's a shame to hide useful comments in Changelogs, add some to the code. Shamelessly stolen from commit: c40a56a7818c ("x86/mm/init: Remove freed kernel image areas from alias mapping") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221110125544.460677011%40infradead.org
2022-12-15x86/mm: Fix CR3_ADDR_MASKKirill A. Shutemov
The mask must not include bits above physical address mask. These bits are reserved and can be used for other things. Bits 61 and 62 are used for Linear Address Masking. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20221109165140.9137-2-kirill.shutemov%40linux.intel.com
2022-12-15x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macrosPasha Tatashin
Other architectures and the common mm/ use P*D_MASK, and P*D_SIZE. Remove the duplicated P*D_PAGE_MASK and P*D_PAGE_SIZE which are only used in x86/*. Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/r/20220516185202.604654-1-tatashin@google.com
2022-12-15mm: Convert __HAVE_ARCH_P..P_GET to the new stylePeter Zijlstra
Since __HAVE_ARCH_* style guards have been depricated in favour of defining the function name onto itself, convert pxxp_get(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y2EUEBlQXNgaJgoI@hirez.programming.kicks-ass.net
2022-12-15mm: Remove pointless barrier() after pmdp_get_lockless()Peter Zijlstra
pmdp_get_lockless() should itself imply any ordering required. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.298833095%40infradead.org
2022-12-15x86/mm/pae: Get rid of set_64bit()Peter Zijlstra
Recognise that set_64bit() is a special case of our previously introduced pxx_xchg64(), so use that and get rid of set_64bit(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.233481884%40infradead.org
2022-12-15x86_64: Remove pointless set_64bit() usagePeter Zijlstra
The use of set_64bit() in X86_64 only code is pretty pointless, seeing how it's a direct assignment. Remove all this nonsense. [nathanchance: unbreak irte] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.168036718%40infradead.org
2022-12-15x86/mm/pae: Be consistent with pXXp_get_and_clear()Peter Zijlstra
Given that ptep_get_and_clear() uses cmpxchg8b, and that should be by far the most common case, there's no point in having an optimized variant for pmd/pud. Introduce the pxx_xchg64() helper to implement the common logic once. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.103392961%40infradead.org
2022-12-15x86/mm/pae: Use WRITE_ONCE()Peter Zijlstra
Disallow write-tearing, that would be really unfortunate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114425.038102604%40infradead.org
2022-12-15x86/mm/pae: Don't (ab)use atomic64Peter Zijlstra
PAE implies CX8, write readable code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.971450128%40infradead.org
2022-12-15mm/gup: Fix the lockless PMD accessPeter Zijlstra
On architectures where the PTE/PMD is larger than the native word size (i386-PAE for example), READ_ONCE() can do the wrong thing. Use pmdp_get_lockless() just like we use ptep_get_lockless(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.906110403%40infradead.org
2022-12-15mm: Rename pmd_read_atomic()Peter Zijlstra
There's no point in having the identical routines for PTE/PMD have different names. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.841277397%40infradead.org
2022-12-15mm: Rename GUP_GET_PTE_LOW_HIGHPeter Zijlstra
Since it no longer applies to only PTEs, rename it to PXX. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.776404066%40infradead.org
2022-12-15mm: Fix pmd_read_atomic()Peter Zijlstra
AFAICT there's no reason to do anything different than what we do for PTEs. Make it so (also affects SH). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.711181252%40infradead.org
2022-12-15sh/mm: Make pmd_t similar to pte_tPeter Zijlstra
Just like 64bit pte_t, have a low/high split in pmd_t. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.645657294%40infradead.org
2022-12-15x86/mm/pae: Make pmd_t similar to pte_tPeter Zijlstra
Instead of mucking about with at least 2 different ways of fudging it, do the same thing we do for pte_t. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.580310787%40infradead.org
2022-12-15mm: Update ptep_get_lockless()'s commentPeter Zijlstra
Improve the comment. Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221022114424.515572025%40infradead.org
2022-12-15x86/mm: Implement native set_memory_rox()Peter Zijlstra
Provide a native implementation of set_memory_rox(), avoiding the double set_memory_ro();set_memory_x(); calls. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-12-15mm: Introduce set_memory_rox()Peter Zijlstra
Because endlessly repeating: set_memory_ro() set_memory_x() is getting tedious. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
2022-12-15x86/mm: Do verify W^X at boot upPeter Zijlstra
Straight up revert of commit: a970174d7a10 ("x86/mm: Do not verify W^X at boot up") now that the root cause has been fixed. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201058.011279208@infradead.org
2022-12-15x86/ftrace: Remove SYSTEM_BOOTING exceptionsPeter Zijlstra
Now that text_poke is available before ftrace, remove the SYSTEM_BOOTING exceptions. Specifically, this cures a W+X case during boot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
2022-12-15x86/mm: Initialize text poking earlierPeter Zijlstra
Move poking_init() up a bunch; specifically move it right after mm_init() which is right before ftrace_init(). This will allow simplifying ftrace text poking which currently has a bunch of exceptions for early boot. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.881703081@infradead.org
2022-12-15x86/mm: Use mm_alloc() in poking_init()Peter Zijlstra
Instead of duplicating init_mm, allocate a fresh mm. The advantage is that mm_alloc() has much simpler dependencies. Additionally it makes more conceptual sense, init_mm has no (and must not have) user state to duplicate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
2022-12-15mm: Move mm_cachep initialization to mm_init()Peter Zijlstra
In order to allow using mm_alloc() much earlier, move initializing mm_cachep into mm_init(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
2022-12-15x86/mm: Randomize per-cpu entry areaPeter Zijlstra
Seth found that the CPU-entry-area; the piece of per-cpu data that is mapped into the userspace page-tables for kPTI is not subject to any randomization -- irrespective of kASLR settings. On x86_64 a whole P4D (512 GB) of virtual address space is reserved for this structure, which is plenty large enough to randomize things a little. As such, use a straight forward randomization scheme that avoids duplicates to spread the existing CPUs over the available space. [ bp: Fix le build. ] Reported-by: Seth Jenkins <sethjenkins@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-12-15x86/kasan: Map shadow for percpu pages on demandAndrey Ryabinin
KASAN maps shadow for the entire CPU-entry-area: [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE] This will explode once the per-cpu entry areas are randomized since it will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to allocate shadow for such big area. Fix this by allocating KASAN shadow only for really used cpu entry area addresses mapped by cea_map_percpu_pages() Thanks to the 0day folks for finding and reporting this to be an issue. [ dhansen: tweak changelog since this will get committed before peterz's actual cpu-entry-area randomization ] Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Yujie Liu <yujie.liu@intel.com> Cc: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
2022-12-15Merge tag 'acpi-6.2-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These fix an AML byte code execution issue in ACPICA and two issues in the ACPI EC driver which requires rearranging ACPICA code. Specifics: - Avoid trying to resolve operands in AML when there are none (Amadeusz Sławiński) - Fix indentation in include/acpi/acpixf.h to help applying patches from the upstream ACPICA git (Hans de Goede) - Make it possible to install an address space handler without evaluating _REG for Operation Regions in the given address space (Hans de Goede) - Defer the evaluation of _REG for ECDT described ECs till the matching EC device in the DSDT gets parsed and acpi_ec_add() gets called for it (Hans de Goede) - Fix EC address space handler unregistration (Hans de Goede)" * tag 'acpi-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: EC: Fix ECDT probe ordering issues ACPI: EC: Fix EC address space handler unregistration ACPICA: Allow address_space_handler Install and _REG execution as 2 separate steps ACPICA: include/acpi/acpixf.h: Fix indentation ACPICA: Fix operand resolution
2022-12-15Merge tag 'thermal-6.2-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "These are updates of assorted thermal drivers, mostly for ARM platforms, generally isolated and fairly straightforward, and the recent Intel HFI driver fix for systems without HFI support. Specifics: - Avoid clearing the HFI status bit on systems without HFI support which triggers unchecked MSR access errors (Srinivas Pandruvada) - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca Weiss, Neil Armstrong) - Use devm_platform_get_and_ioremap_resource on the ST platform to group two calls into a single one (Minghao Chi) - Use GENMASK instead of bitmaps and validate the temperature after reading it in the imx8mm_thermal driver (Marcus Folkesson) - Convert generic-adc-thermal to DT schema (Rob Herring) - Fix debug print message with inverted logic in the k3_j72xx_bandgap driver (Keerthy) - Fix memory leak on thermal_of_zone_register() failure (Ido Schimmel) - Add support for IPQ8074 in the tsens thermal driver along with the DT bindings (Robert Marko) - Fix and rework the debugfs code in the tsens driver (Christian Marangi) - Add calibration and DT documentation for the imx8mm driver (Marek Vasut) - Add DT bindings and compatible for the Mediatek SoCs mt7981 and mt7983 (Daniel Golle) - Don't show an error message if it happens at probe time while it will be deferred on the QCom SPMI ADC driver (Johan Hovold) - Add HWMon support for the imx8mm board (Alexander Stein) - Remove pointless include from the power allocator governor (Christophe JAILLET) - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450 (Krzysztof Kozlowski) - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss) - Demote error log of thermal zone register to debug in the tsens QCom driver (Manivannan Sadhasivam) - Consolidate the the efuse values and the errata handling in the TI Bandgap driver (Bryan Brattlof) - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT bindings (Lad Prabhakar) - Fix the irq handler return value in the LMh driver (Bjorn Andersson) - Delete empty platform remove callback from imx_sc_thermal (Uwe Kleine-König)" * tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits) thermal/drivers/imx_sc_thermal: Drop empty platform remove function thermal/drivers/qcom/lmh: Fix irq handler return value dt-bindings: thermal: qcom-tsens: Add compatible for sm8550 thermal/drivers/st: Use devm_platform_get_and_ioremap_resource() dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range dt-bindings: thermal: k3-j72xx: elaborate on binding description thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function thermal/drivers/qcom: Demote error log of thermal zone register to debug thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2 dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450 thermal/core/power allocator: Remove a useless include thermal/drivers/imx8mm: Add hwmon support thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC thermal: ti-soc-thermal: Drop comma after SoC match table sentinel thermal/drivers/imx: Add support for loading calibration data from OCOTP ...
2022-12-15Revert "arm64: errata: Workaround possible Cortex-A715 [ESR|FAR]_ELx corruption"Will Deacon
This reverts commit 44ecda71fd8a70185c270f5914ac563827fe1d4c. All versions of this patch on the mailing list, including the version that ended up getting merged, have portions of code guarded by the non-existent CONFIG_ARM64_WORKAROUND_2645198 option. Although Anshuman says he tested the code with some additional debug changes [1], I'm hesitant to fix the CONFIG option and light up a bunch of code right before I (and others) disappear for the end of year holidays, during which time we won't be around to deal with any fallout. So revert the change for now. We can bring back a fixed, tested version for a later -rc when folks are thinking about things other than trees and turkeys. [1] https://lore.kernel.org/r/b6f61241-e436-5db1-1053-3b441080b8d6@arm.com Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20221215094811.23188-1-lukas.bulwahn@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2022-12-15Merge tag 'gpio-updates-for-v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We have a new GPIO multiplexer driver, bunch of driver updates and refactoring in the core GPIO library. GPIO core: - teach gpiolib to work with software nodes for HW description - remove ARCH_NR_GPIOS treewide as we no longer impose any limit on the number of GPIOS since the allocation became entirely dynamic - add support for HW quirks for Cirrus CS42L56 codec, Marvell NFC controller, Freescale PCIe and Ethernet controller, Himax LCDs and Mediatek mt2701 - refactor OF quirk code - some general refactoring of the OF and ACPI code, adding new helpers, minor tweaks and fixes, making fwnode usage consistent etc. GPIO uAPI: - fix an issue where the user-space can trigger a NULL-pointer dereference in the kernel by opening a device file, forcing a driver unbind and then calling one of the syscalls on the associated file descriptor New drivers: - add gpio-latch: a new GPIO multiplexer based on latches connected to other GPIOs Driver updates: - convert i2c GPIO expanders to using .probe_new() - drop the gpio-sta2x11 driver - factor out common code for the ACCES IDIO-16 family of controllers and use this new library wherever applicable in drivers - add DT support to gpio-hisi - allow building gpio-davinci as a module and increase its maxItems property - add support for a new model to gpio-pca9570 - other minor changes to various drivers" * tag 'gpio-updates-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (66 commits) gpio: sim: set a limit on the number of GPIOs gpiolib: protect the GPIO device against being dropped while in use by user-space gpiolib: cdev: fix NULL-pointer dereferences gpiolib: Provide to_gpio_device() helper gpiolib: Unify access to the device properties gpio: Do not include <linux/kernel.h> when not really needed. gpio: pcf857x: Convert to i2c's .probe_new() gpio: pca953x: Convert to i2c's .probe_new() gpio: max732x: Convert to i2c's .probe_new() dt-bindings: gpio: gpio-davinci: Increase maxItems in gpio-line-names gpiolib: ensure that fwnode is properly set gpio: sl28cpld: Replace irqchip mask_invert with unmask_base gpiolib: of: Use correct fwnode for DT-probed chips gpiolib: of: Drop redundant check in of_mm_gpiochip_remove() gpiolib: of: Prepare of_mm_gpiochip_add_data() for fwnode gpiolib: add support for software nodes gpiolib: consolidate GPIO lookups gpiolib: acpi: avoid leaking ACPI details into upper gpiolib layers gpiolib: acpi: teach acpi_find_gpio() to handle data-only nodes gpiolib: acpi: change acpi_find_gpio() to accept firmware node ...
2022-12-15Merge branch 'acpi-ec'Rafael J. Wysocki
Merge additional ACPI EC driver fixes for 6.2-rc1: - Fix EC address space handler unregistration (Hans de Goede). - Defer the evaluation of _REG for ECDT described ECs till the matching EC device in the DSDT gets parsed and acpi_ec_add() gets called for it (Hans de Goede). * acpi-ec: ACPI: EC: Fix ECDT probe ordering issues ACPI: EC: Fix EC address space handler unregistration
2022-12-15Merge tag 'fbdev-for-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev Pull fbdev updates from Helge Deller: "The most relevant change are the patches from Dmitry Torokhov to switch omapfb to the gpiod API. The other patches are small and fix e.g. UML build issues, improve the error paths and cleanup code" * tag 'fbdev-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev: (32 commits) fbdev: fbcon: release buffer when fbcon_do_set_font() failed fbdev: sh_mobile_lcdcfb: use sysfs_emit() to instead of scnprintf() fbdev: uvesafb: use sysfs_emit() to instead of scnprintf() fbdev: uvesafb: Simplify uvesafb_remove() fbdev: uvesafb: Fixes an error handling path in uvesafb_probe() fbdev: uvesafb: don't build on UML fbdev: geode: don't build on UML fbdev: ep93xx-fb: Add missing clk_disable_unprepare in ep93xxfb_probe() fbdev: matroxfb: Convert to i2c's .probe_new() fbdev: da8xx-fb: add missing regulator_disable() in fb_probe fbdev: controlfb: fix spelling mistake "paramaters"->"parameters" fbdev: vermilion: decrease reference count in error path fbdev: smscufx: fix error handling code in ufx_usb_probe fbdev: via: Fix error in via_core_init() fbdev: pm2fb: fix missing pci_disable_device() fbdev: pxafb: Remove unnecessary print function dev_err() fbdev: omapfb: panel-sharp-ls037v7dw01: fix included headers fbdev: omapfb: panel-tpo-td028ttec1: stop including gpio.h fbdev: omapfb: panel-lgphilips-lb035q02: remove backlight GPIO handling fbdev: omapfb: encoder-opa362: fix included headers ...
2022-12-15Merge branch 'acpica'Rafael J. Wysocki
Merge additional ACPICA changes for 6.2-rc1: - Avoid trying to resolve operands in AML when there are none (Amadeusz Sławiński). - Fix indentation in include/acpi/acpixf.h to help applying patches from the upstream ACPICA git (Hans de Goede). - Make it possible to install an address space handler without evaluating _REG for Operation Regions in the given address space (Hans de Goede). * acpica: ACPICA: Allow address_space_handler Install and _REG execution as 2 separate steps ACPICA: include/acpi/acpixf.h: Fix indentation ACPICA: Fix operand resolution
2022-12-15Merge tag '6.2-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd updates from Steve French: "Six ksmbd server fixes" * tag '6.2-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Convert to use sysfs_emit()/sysfs_emit_at() APIs ksmbd: Fix resource leak in smb2_lock() ksmbd: Fix resource leak in ksmbd_session_rpc_open() ksmbd: replace one-element arrays with flexible-array members ksmbd: use F_SETLK when unlocking a file ksmbd: set SMB2_SESSION_FLAG_ENCRYPT_DATA when enforcing data encryption for this share
2022-12-15RDMA/rxe: Fix compile warnings on 32-bitJason Gunthorpe
Move the conditional code into a function, with two varients so it is harder to make these kinds of mistakes. drivers/infiniband/sw/rxe/rxe_resp.c: In function 'atomic_write_reply': drivers/infiniband/sw/rxe/rxe_resp.c:794:13: error: unused variable 'payload' [-Werror=unused-variable] 794 | int payload = payload_size(pkt); | ^~~~~~~ drivers/infiniband/sw/rxe/rxe_resp.c:793:24: error: unused variable 'mr' [-Werror=unused-variable] 793 | struct rxe_mr *mr = qp->resp.mr; | ^~ drivers/infiniband/sw/rxe/rxe_resp.c:791:19: error: unused variable 'dst' [-Werror=unused-variable] 791 | u64 src, *dst; | ^~~ drivers/infiniband/sw/rxe/rxe_resp.c:791:13: error: unused variable 'src' [-Werror=unused-variable] 791 | u64 src, *dst; Fixes: 034e285f8b99 ("RDMA/rxe: Make responder support atomic write on RC service") Link: https://lore.kernel.org/linux-rdma/Y5s+EVE7eLWQqOwv@nvidia.com/ Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-15gfs2: Remove support for glock holder auto-demotion (2)Andreas Gruenbacher
As a follow-up to the previous commit, move the recovery related code in __gfs2_glock_dq() to gfs2_glock_dq() where it better fits. No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-12-15gfs2: Remove support for glock holder auto-demotionAndreas Gruenbacher
Remove the support for glock holder auto-demotion (commit dc732906c245 and folow-ups) as we are not planning to use this feature, and the additional code therefore only adds unnecessary complexity. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-12-14Merge tag 'f2fs-for-6.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE and a per-block age-based extent cache. F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic write feature which guarantees a per-file atomicity. It would be more efficient than AtomicFile implementation in Android framework. The per-block age-based extent cache implements another type of extent cache in memory which keeps the per-block age in a file, so that block allocator could split the hot and cold data blocks more accurately. Enhancements: - introduce F2FS_IOC_START_ATOMIC_REPLACE - refactor extent_cache to add a new per-block-age-based extent cache support - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs - add proc entry to show discard_plist info - optimize iteration over sparse directories - add barrier mount option Bug fixes: - avoid victim selection from previous victim section - fix to enable compress for newly created file if extension matches - set zstd compress level correctly - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot - correct i_size change for atomic writes - allow to read node block after shutdown - allow to set compression for inlined file - fix gc mode when gc_urgent_high_remaining is 1 - should put a page when checking the summary info Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and doc" * tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: reset wait_ms to default if any of the victims have been selected f2fs: fix some format WARNING in debug.c and sysfs.c f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super() f2fs: fix iostat parameter for discard f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache f2fs: add block_age-based extent cache f2fs: allocate the extent_cache by default f2fs: refactor extent_cache to support for read and more f2fs: remove unnecessary __init_extent_tree f2fs: move internal functions into extent_cache.c f2fs: specify extent cache for read explicitly f2fs: introduce f2fs_is_readonly() for readability f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro f2fs: do some cleanup for f2fs module init MAINTAINERS: Add f2fs bug tracker link f2fs: remove the unused flush argument to change_curseg f2fs: open code allocate_segment_by_default f2fs: remove struct segment_allocation default_salloc_ops f2fs: introduce discard_urgent_util sysfs node f2fs: define MIN_DISCARD_GRANULARITY macro ...