summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2022-12-14Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-12-14Merge tag 'pci-v6.2-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Squash portdrv_{core,pci}.c into portdrv.c to ease maintenance and make more things static. - Make portdrv bind to Switch Ports that have AER. Previously, if these Ports lacked MSI/MSI-X, portdrv failed to bind, which meant the Ports couldn't be suspended to low-power states. AER on these Ports doesn't use interrupts, and the AER driver doesn't need to claim them. - Assign PCI domain IDs using ida_alloc(), which makes host bridge add/remove work better. Resource management: - To work better with recent BIOSes that use EfiMemoryMappedIO for PCI host bridge apertures, remove those regions from the E820 map (E820 entries normally prevent us from allocating BARs). In v5.19, we added some quirks to disable E820 checking, but that's not very maintainable. EfiMemoryMappedIO means the OS needs to map the region for use by EFI runtime services; it shouldn't prevent OS from using it. PCIe native device hotplug: - Build pciehp by default if USB4 is enabled, since Thunderbolt/USB4 PCIe tunneling depends on native PCIe hotplug. - Enable Command Completed Interrupt only if supported to avoid user confusion from lspci output that says this is enabled but not supported. - Prevent pciehp from binding to Switch Upstream Ports; this happened because of interaction with acpiphp and caused devices below the Upstream Port to disappear. Power management: - Convert AGP drivers to generic power management. We hope to remove legacy power management from the PCI core eventually. Virtualization: - Fix pci_device_is_present(), which previously always returned "false" for VFs, causing virtio hangs when unbinding the driver. Miscellaneous: - Convert drivers to gpiod API to prepare for dropping some legacy code. - Fix DOE fencepost error for the maximum data object length. Baikal-T1 PCIe controller driver: - Add driver and DT bindings. Broadcom STB PCIe controller driver: - Enable Multi-MSI. - Delay 100ms after PERST# deassert to allow power and clocks to stabilize. - Configure Read Completion Boundary to 64 bytes. Freescale i.MX6 PCIe controller driver: - Initialize PHY before deasserting core reset to fix a regression in v6.0 on boards where the PHY provides the reference. - Fix imx6sx and imx8mq clock names in DT schema. Intel VMD host bridge driver: - Fix Secondary Bus Reset on VMD bridges, which allows reset of NVMe SSDs in VT-d pass-through scenarios. - Disable MSI remapping, which gets re-enabled by firmware during suspend/resume. MediaTek PCIe Gen3 controller driver: - Add MT7986 and MT8195 support. Qualcomm PCIe controller driver: - Add SC8280XP/SA8540P basic interconnect support. Rockchip DesignWare PCIe controller driver: - Base DT schema on common Synopsys schema. Synopsys DesignWare PCIe core: - Collect DT items shared between Root Port and Endpoint (PERST GPIO, PHY info, clocks, resets, link speed, number of lanes, number of iATU windows, interrupt info, etc) to snps,dw-pcie-common.yaml. - Add dma-ranges support for Root Ports and Endpoints. - Consolidate DT resource retrieval for "dbi", "dbi2", "atu", etc. to reduce code duplication. - Add generic names for clocks and resets to encourage more consistent naming across drivers using DesignWare IP. - Stop advertising PTM Responder role for Endpoints, which aren't allowed to be responders. TI J721E PCIe driver: - Add j721s2 host mode ID to DT schema. - Add interrupt properties to DT schema. Toshiba Visconti PCIe controller driver: - Fix interrupts array max constraints in DT schema" * tag 'pci-v6.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (95 commits) x86/PCI: Use pr_info() when possible x86/PCI: Fix log message typo x86/PCI: Tidy E820 removal messages PCI: Skip allocate_resource() if too little space available efi/x86: Remove EfiMemoryMappedIO from E820 map PCI/portdrv: Allow AER service only for Root Ports & RCECs PCI: xilinx-nwl: Fix coding style violations PCI: mvebu: Switch to using gpiod API PCI: pciehp: Enable Command Completed Interrupt only if supported PCI: aardvark: Switch to using devm_gpiod_get_optional() dt-bindings: PCI: mediatek-gen3: add support for mt7986 dt-bindings: PCI: mediatek-gen3: add SoC based clock config dt-bindings: PCI: qcom: Allow 'dma-coherent' property PCI: mt7621: Add sentinel to quirks table PCI: vmd: Fix secondary bus reset for Intel bridges PCI: endpoint: pci-epf-vntb: Fix sparse ntb->reg build warning PCI: endpoint: pci-epf-vntb: Fix sparse build warning for epf_db PCI: endpoint: pci-epf-vntb: Replace hardcoded 4 with sizeof(u32) PCI: endpoint: pci-epf-vntb: Remove unused epf_db_phy struct member PCI: endpoint: pci-epf-vntb: Fix call pci_epc_mem_free_addr() in error path ...
2022-12-13Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-13Merge tag 'x86_microcode_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode and IFS updates from Borislav Petkov: "The IFS (In-Field Scan) stuff goes through tip because the IFS driver uses the same structures and similar functionality as the microcode loader and it made sense to route it all through this branch so that there are no conflicts. - Add support for multiple testing sequences to the Intel In-Field Scan driver in order to be able to run multiple different test patterns. Rework things and remove the BROKEN dependency so that the driver can be enabled (Jithu Joseph) - Remove the subsys interface usage in the microcode loader because it is not really needed - A couple of smaller fixes and cleanups" * tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/microcode/intel: Do not retry microcode reloading on the APs x86/microcode/intel: Do not print microcode revision and processor flags platform/x86/intel/ifs: Add missing kernel-doc entry Revert "platform/x86/intel/ifs: Mark as BROKEN" Documentation/ABI: Update IFS ABI doc platform/x86/intel/ifs: Add current_batch sysfs entry platform/x86/intel/ifs: Remove reload sysfs entry platform/x86/intel/ifs: Add metadata validation platform/x86/intel/ifs: Use generic microcode headers and functions platform/x86/intel/ifs: Add metadata support x86/microcode/intel: Use a reserved field for metasize x86/microcode/intel: Add hdr_type to intel_microcode_sanity_check() x86/microcode/intel: Reuse microcode_sanity_check() x86/microcode/intel: Use appropriate type in microcode_sanity_check() x86/microcode/intel: Reuse find_matching_signature() platform/x86/intel/ifs: Remove memory allocation from load path platform/x86/intel/ifs: Remove image loading during init platform/x86/intel/ifs: Return a more appropriate error code platform/x86/intel/ifs: Remove unused selection x86/microcode: Drop struct ucode_cpu_info.valid ...
2022-12-13Merge tag 'x86_cpu_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Borislav Petkov: - Split MTRR and PAT init code to accomodate at least Xen PV and TDX guests which do not get MTRRs exposed but only PAT. (TDX guests do not support the cache disabling dance when setting up MTRRs so they fall under the same category) This is a cleanup work to remove all the ugly workarounds for such guests and init things separately (Juergen Gross) - Add two new Intel CPUs to the list of CPUs with "normal" Energy Performance Bias, leading to power savings - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern Centaur CPUs * tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) x86/mtrr: Make message for disabled MTRRs more descriptive x86/pat: Handle TDX guest PAT initialization x86/cpuid: Carve out all CPUID functionality x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area() x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack() x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode() x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h x86/acpi/cstate: Optimize ARB_DISABLE on Centaur CPUs x86/mtrr: Simplify mtrr_ops initialization x86/cacheinfo: Switch cache_ap_init() to hotplug callback x86: Decouple PAT and MTRR handling x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init() x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init x86/mtrr: Get rid of __mtrr_enabled bool x86/mtrr: Simplify mtrr_bp_init() x86/mtrr: Remove set_all callback from struct mtrr_ops x86/mtrr: Disentangle MTRR init from PAT init x86/mtrr: Move cache control code to cacheinfo.c x86/mtrr: Split MTRR-specific handling from cache dis/enabling ...
2022-12-13Merge tag 'x86_boot_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Borislav Petkov: "A of early boot cleanups and fixes. - Do some spring cleaning to the compressed boot code by moving the EFI mixed-mode code to a separate compilation unit, the AMD memory encryption early code where it belongs and fixing up build dependencies. Make the deprecated EFI handover protocol optional with the goal of removing it at some point (Ard Biesheuvel) - Skip realmode init code on Xen PV guests as it is not needed there - Remove an old 32-bit PIC code compiler workaround" * tag 'x86_boot_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Remove x86_32 PIC using %ebx workaround x86/boot: Skip realmode init code when running as Xen PV guest x86/efi: Make the deprecated EFI handover protocol optional x86/boot/compressed: Only build mem_encrypt.S if AMD_MEM_ENCRYPT=y x86/boot/compressed: Adhere to calling convention in get_sev_encryption_bit() x86/boot/compressed: Move startup32_check_sev_cbit() out of head_64.S x86/boot/compressed: Move startup32_check_sev_cbit() into .text x86/boot/compressed: Move startup32_load_idt() out of head_64.S x86/boot/compressed: Move startup32_load_idt() into .text section x86/boot/compressed: Pull global variable reference into startup32_load_idt() x86/boot/compressed: Avoid touching ECX in startup32_set_idt_entry() x86/boot/compressed: Simplify IDT/GDT preserve/restore in the EFI thunk x86/boot/compressed, efi: Merge multiple definitions of image_offset into one x86/boot/compressed: Move efi32_pe_entry() out of head_64.S x86/boot/compressed: Move efi32_entry out of head_64.S x86/boot/compressed: Move efi32_pe_entry into .text section x86/boot/compressed: Move bootargs parsing out of 32-bit startup code x86/boot/compressed: Move 32-bit entrypoint code into .text section x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S
2022-12-13Merge tag 'efi-next-for-v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: "Another fairly sizable pull request, by EFI subsystem standards. Most of the work was done by me, some of it in collaboration with the distro and bootloader folks (GRUB, systemd-boot), where the main focus has been on removing pointless per-arch differences in the way EFI boots a Linux kernel. - Refactor the zboot code so that it incorporates all the EFI stub logic, rather than calling the decompressed kernel as a EFI app. - Add support for initrd= command line option to x86 mixed mode. - Allow initrd= to be used with arbitrary EFI accessible file systems instead of just the one the kernel itself was loaded from. - Move some x86-only handling and manipulation of the EFI memory map into arch/x86, as it is not used anywhere else. - More flexible handling of any random seeds provided by the boot environment (i.e., systemd-boot) so that it becomes available much earlier during the boot. - Allow improved arch-agnostic EFI support in loaders, by setting a uniform baseline of supported features, and adding a generic magic number to the DOS/PE header. This should allow loaders such as GRUB or systemd-boot to reduce the amount of arch-specific handling substantially. - (arm64) Run EFI runtime services from a dedicated stack, and use it to recover from synchronous exceptions that might occur in the firmware code. - (arm64) Ensure that we don't allocate memory outside of the 48-bit addressable physical range. - Make EFI pstore record size configurable - Add support for decoding CXL specific CPER records" * tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits) arm64: efi: Recover from synchronous exceptions occurring in firmware arm64: efi: Execute runtime services from a dedicated stack arm64: efi: Limit allocations to 48-bit addressable physical region efi: Put Linux specific magic number in the DOS header efi: libstub: Always enable initrd command line loader and bump version efi: stub: use random seed from EFI variable efi: vars: prohibit reading random seed variables efi: random: combine bootloader provided RNG seed with RNG protocol output efi/cper, cxl: Decode CXL Error Log efi/cper, cxl: Decode CXL Protocol Error Section efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment efi: x86: Move EFI runtime map sysfs code to arch/x86 efi: runtime-maps: Clarify purpose and enable by default for kexec efi: pstore: Add module parameter for setting the record size efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures efi: memmap: Move manipulation routines into x86 arch tree efi: memmap: Move EFI fake memmap support into x86 arch tree efi: libstub: Undeprecate the command line initrd loader efi: libstub: Add mixed mode support to command line initrd loader efi: libstub: Permit mixed mode return types other than efi_status_t ...
2022-12-12Merge tag 'pull-iov_iter' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull iov_iter updates from Al Viro: "iov_iter work; most of that is about getting rid of direction misannotations and (hopefully) preventing more of the same for the future" * tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: use less confusing names for iov_iter direction initializers iov_iter: saner checks for attempt to copy to/from iterator [xen] fix "direction" argument of iov_iter_kvec() [vhost] fix 'direction' argument of iov_iter_{init,bvec}() [target] fix iov_iter_bvec() "direction" argument [s390] memcpy_real(): WRITE is "data source", not destination... [s390] zcore: WRITE is "data source", not destination... [infiniband] READ is "data destination", not source... [fsi] WRITE is "data source", not destination... [s390] copy_oldmem_kernel() - WRITE is "data source", not destination csum_and_copy_to_iter(): handle ITER_DISCARD get rid of unlikely() on page_copy_sane() calls
2022-12-12Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12Merge tag 'x86_alternatives_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 alternative update from Borislav Petkov: "A single alternatives patching fix for modules: - Have alternatives patch the same sections in modules as in vmlinux" * tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternative: Consistently patch SMP locks in vmlinux and modules
2022-12-12Merge tag 'ras_core_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Fix confusing output from /sys/kernel/debug/ras/daemon_active - Add another MCE severity error case to the Intel error severity table to promote UC and AR errors to panic severity and remove the corresponding code condition doing that. - Make sure the thresholding and deferred error interrupts on AMD SMCA systems clear the all registers reporting an error so that there are no multiple errors logged for the same event * tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS: Fix return value from show_trace() x86/mce: Use severity table to handle uncorrected errors in kernel x86/MCE/AMD: Clear DFR errors found in THR handler
2022-12-12Merge tag 'x86_fpu_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Dave Hansen: "There are two little fixes in here, one to give better XSAVE warnings and another to address some undefined behavior in offsetof(). There is also a collection of patches to fix some issues with ptrace and the protection keys register (PKRU). PKRU is a real oddity because it is exposed in the XSAVE-related ABIs, but it is generally managed without using XSAVE in the kernel. This fix thankfully came with a selftest to ward off future regressions. Summary: - Clarify XSAVE consistency warnings - Fix up ptrace interface to protection keys register (PKRU) - Avoid undefined compiler behavior with TYPE_ALIGN" * tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set x86/fpu: Allow PKRU to be (once again) written by ptrace. x86/fpu: Add a pkru argument to copy_uabi_to_xstate() x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate(). x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate() x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics
2022-12-12Merge tag 'x86_splitlock_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 splitlock updates from Dave Hansen: "Add a sysctl to control the split lock misery mode. This enables users to reduce the penalty inflicted on split lock users. There are some proprietary, binary-only games which became entirely unplayable with the old penalty. Anyone opting into the new mode is, of course, more exposed to the DoS nasitness inherent with split locks, but they can play their games again" * tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Add sysctl to control the misery mode
2022-12-12Merge tag 'x86_cache_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control updates from Dave Hansen: "These declare the resource control (rectrl) MSRs a bit more normally and clean up an unnecessary structure member: - Remove unnecessary arch_has_empty_bitmaps structure memory - Move rescrtl MSR defines into msr-index.h, like normal MSRs" * tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Move MSR defines into msr-index.h x86/resctrl: Remove arch_has_empty_bitmaps
2022-12-12Merge tag 'x86_sgx_for_6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 sgx updates from Dave Hansen: "The biggest deal in this series is support for a new hardware feature that allows enclaves to detect and mitigate single-stepping attacks. There's also a minor performance tweak and a little piece of the kmap_atomic() -> kmap_local() transition. Summary: - Introduce a new SGX feature (Asynchrounous Exit Notification) for bare-metal enclaves and KVM guests to mitigate single-step attacks - Increase batching to speed up enclave release - Replace kmap/kunmap_atomic() calls" * tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Replace kmap/kunmap_atomic() calls KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest x86/sgx: Allow enclaves to use Asynchrounous Exit Notification x86/sgx: Reduce delay and interference of enclave release
2022-12-12Merge tag 'x86-misc-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Thomas Gleixner: "Updates for miscellaneous x86 areas: - Reserve a new boot loader type for barebox which is usally used on ARM and MIPS, but can also be utilized as EFI payload on x86 to provide watchdog-supervised boot up. - Consolidate the native and compat 32bit signal handling code and split the 64bit version out into a separate source file - Switch the ESPFIX random usage to get_random_long()" * tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/espfix: Use get_random_long() rather than archrandom x86/signal/64: Move 64-bit signal code to its own file x86/signal/32: Merge native and compat 32-bit signal code x86/signal: Add ABI prefixes to frame setup functions x86/signal: Merge get_sigframe() x86: Remove __USER32_DS signal/compat: Remove compat_sigset_t override x86/signal: Remove sigset_t parameter from frame setup functions x86/signal: Remove sig parameter from frame setup functions Documentation/x86/boot: Reserve type_of_loader=13 for barebox
2022-12-12Merge tag 'x86-cleanups-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Thomas Gleixner: "A set of x86 cleanups: - Rework the handling of x86_regset for 32 and 64 bit. The original implementation tried to minimize the allocation size with quite some hard to understand and fragile tricks. Make it robust and straight forward by separating the register enumerations for 32 and 64 bit completely. - Add a few missing static annotations - Remove the stale unused setup_once() assembly function - Address a few minor static analysis and kernel-doc warnings" * tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/32: Remove setup_once() x86/kaslr: Fix process_mem_region()'s return value x86: Fix misc small issues x86/boot: Repair kernel-doc for boot_kstrtoul() x86: Improve formatting of user_regset arrays x86: Separate out x86_regset for 32 and 64 bit x86/i8259: Make default_legacy_pic static x86/tsc: Make art_related_clocksource static
2022-12-12Merge tag 'x86-apic-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic update from Thomas Gleixner: "A set of changes for the x86 APIC code: - Handle the case where x2APIC is enabled and locked by the BIOS on a kernel with CONFIG_X86_X2APIC=n gracefully. Instead of a panic which does not make it to the graphical console during very early boot, simply disable the local APIC completely and boot with the PIC and very limited functionality, which allows to diagnose the issue - Convert x86 APIC device tree bindings to YAML - Extend x86 APIC device tree bindings to configure interrupt delivery mode and handle this in during init. This allows to boot with device tree on platforms which lack a legacy PIC" * tag 'x86-apic-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/of: Add support for boot time interrupt delivery mode configuration x86/of: Replace printk(KERN_LVL) with pr_lvl() dt-bindings: x86: apic: Introduce new optional bool property for lapic dt-bindings: x86: apic: Convert Intel's APIC bindings to YAML schema x86/of: Remove unused early_init_dt_add_memory_arch() x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS
2022-12-12Merge tag 'irq-core-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt core and driver subsystem: The bulk is the rework of the MSI subsystem to support per device MSI interrupt domains. This solves conceptual problems of the current PCI/MSI design which are in the way of providing support for PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device. IMS (Interrupt Message Store] is a new specification which allows device manufactures to provide implementation defined storage for MSI messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified message store which is uniform accross all devices). The PCI/MSI[-X] uniformity allowed us to get away with "global" PCI/MSI domains. IMS not only allows to overcome the size limitations of the MSI-X table, but also gives the device manufacturer the freedom to store the message in arbitrary places, even in host memory which is shared with the device. There have been several attempts to glue this into the current MSI code, but after lengthy discussions it turned out that there is a fundamental design problem in the current PCI/MSI-X implementation. This needs some historical background. When PCI/MSI[-X] support was added around 2003, interrupt management was completely different from what we have today in the actively developed architectures. Interrupt management was completely architecture specific and while there were attempts to create common infrastructure the commonalities were rudimentary and just providing shared data structures and interfaces so that drivers could be written in an architecture agnostic way. The initial PCI/MSI[-X] support obviously plugged into this model which resulted in some basic shared infrastructure in the PCI core code for setting up MSI descriptors, which are a pure software construct for holding data relevant for a particular MSI interrupt, but the actual association to Linux interrupts was completely architecture specific. This model is still supported today to keep museum architectures and notorious stragglers alive. In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel, which was creating yet another architecture specific mechanism and resulted in an unholy mess on top of the existing horrors of x86 interrupt handling. The x86 interrupt management code was already an incomprehensible maze of indirections between the CPU vector management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X] implementation. At roughly the same time ARM struggled with the ever growing SoC specific extensions which were glued on top of the architected GIC interrupt controller. This resulted in a fundamental redesign of interrupt management and provided the today prevailing concept of hierarchical interrupt domains. This allowed to disentangle the interactions between x86 vector domain and interrupt remapping and also allowed ARM to handle the zoo of SoC specific interrupt components in a sane way. The concept of hierarchical interrupt domains aims to encapsulate the functionality of particular IP blocks which are involved in interrupt delivery so that they become extensible and pluggable. The X86 encapsulation looks like this: |--- device 1 [Vector]---[Remapping]---[PCI/MSI]--|... |--- device N where the remapping domain is an optional component and in case that it is not available the PCI/MSI[-X] domains have the vector domain as their parent. This reduced the required interaction between the domains pretty much to the initialization phase where it is obviously required to establish the proper parent relation ship in the components of the hierarchy. While in most cases the model is strictly representing the chain of IP blocks and abstracting them so they can be plugged together to form a hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware it's clear that the actual PCI/MSI[-X] interrupt controller is not a global entity, but strict a per PCI device entity. Here we took a short cut on the hierarchical model and went for the easy solution of providing "global" PCI/MSI domains which was possible because the PCI/MSI[-X] handling is uniform across the devices. This also allowed to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in turn made it simple to keep the existing architecture specific management alive. A similar problem was created in the ARM world with support for IP block specific message storage. Instead of going all the way to stack a IP block specific domain on top of the generic MSI domain this ended in a construct which provides a "global" platform MSI domain which allows overriding the irq_write_msi_msg() callback per allocation. In course of the lengthy discussions we identified other abuse of the MSI infrastructure in wireless drivers, NTB etc. where support for implementation specific message storage was just mindlessly glued into the existing infrastructure. Some of this just works by chance on particular platforms but will fail in hard to diagnose ways when the driver is used on platforms where the underlying MSI interrupt management code does not expect the creative abuse. Another shortcoming of today's PCI/MSI-X support is the inability to allocate or free individual vectors after the initial enablement of MSI-X. This results in an works by chance implementation of VFIO (PCI pass-through) where interrupts on the host side are not set up upfront to avoid resource exhaustion. They are expanded at run-time when the guest actually tries to use them. The way how this is implemented is that the host disables MSI-X and then re-enables it with a larger number of vectors again. That works by chance because most device drivers set up all interrupts before the device actually will utilize them. But that's not universally true because some drivers allocate a large enough number of vectors but do not utilize them until it's actually required, e.g. for acceleration support. But at that point other interrupts of the device might be in active use and the MSI-X disable/enable dance can just result in losing interrupts and therefore hard to diagnose subtle problems. Last but not least the "global" PCI/MSI-X domain approach prevents to utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS is not longer providing a uniform storage and configuration model. The solution to this is to implement the missing step and switch from global PCI/MSI domains to per device PCI/MSI domains. The resulting hierarchy then looks like this: |--- [PCI/MSI] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N which in turn allows to provide support for multiple domains per device: |--- [PCI/MSI] device 1 |--- [PCI/IMS] device 1 [Vector]---[Remapping]---|... |--- [PCI/MSI] device N |--- [PCI/IMS] device N This work converts the MSI and PCI/MSI core and the x86 interrupt domains to the new model, provides new interfaces for post-enable allocation/free of MSI-X interrupts and the base framework for PCI/IMS. PCI/IMS has been verified with the work in progress IDXD driver. There is work in progress to convert ARM over which will replace the platform MSI train-wreck. The cleanup of VFIO, NTB and other creative "solutions" are in the works as well. Drivers: - Updates for the LoongArch interrupt chip drivers - Support for MTK CIRQv2 - The usual small fixes and updates all over the place" * tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits) irqchip/ti-sci-inta: Fix kernel doc irqchip/gic-v2m: Mark a few functions __init irqchip/gic-v2m: Include arm-gic-common.h irqchip/irq-mvebu-icu: Fix works by chance pointer assignment iommu/amd: Enable PCI/IMS iommu/vt-d: Enable PCI/IMS x86/apic/msi: Enable PCI/IMS PCI/MSI: Provide pci_ims_alloc/free_irq() PCI/MSI: Provide IMS (Interrupt Message Store) support genirq/msi: Provide constants for PCI/IMS support x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X PCI/MSI: Provide prepare_desc() MSI domain op PCI/MSI: Split MSI-X descriptor setup genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN genirq/msi: Provide msi_domain_alloc_irq_at() genirq/msi: Provide msi_domain_ops:: Prepare_desc() genirq/msi: Provide msi_desc:: Msi_data genirq/msi: Provide struct msi_map x86/apic/msi: Remove arch_create_remap_msi_irq_domain() ...
2022-12-12Merge tag 'x86-urgent-2022-12-12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Three small x86 fixes which did not make it into 6.1: - Remove a superfluous noinline which prevents GCC-7.3 to optimize a stub function away - Allow uprobes on REP NOP and do not treat them like word-sized branch instructions - Make the VDSO symbol export of __vdso_sgx_enter_enclave() depend on CONFIG_X86_SGX to prevent build failures with newer LLVM versions which rightfully detect that there is no function behind the symbol" * tag 'x86-urgent-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Conditionally export __vdso_sgx_enter_enclave() uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubs
2022-12-12Merge tag 'hyperv-next-signed-20221208' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav Kohli) - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli) - Allow IRQ remapping to work without x2apic (Nuno Das Neves) - Fix comments (Olaf Hering) - Expand hv_vp_assist_page definition (Saurabh Sengar) - Improvement to page reporting (Shradha Gupta) - Make sure TSC clocksource works when Linux runs as the root partition (Stanislav Kinsburskiy) * tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Remove unregister syscore call from Hyper-V cleanup iommu/hyper-v: Allow hyperv irq remapping without x2apic clocksource: hyper-v: Add TSC page support for root partition clocksource: hyper-v: Use TSC PFN getter to map vvar page clocksource: hyper-v: Introduce TSC PFN getter clocksource: hyper-v: Introduce a pointer to TSC page x86/hyperv: Expand definition of struct hv_vp_assist_page PCI: hv: update comment in x86 specific hv_arch_irq_unmask hv: fix comment typo in vmbus_channel/low_latency drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers video: hyperv_fb: Avoid taking busy spinlock on panic path hv_balloon: Add support for configurable order free page reporting mm/page_reporting: Add checks for page_reporting_order param
2022-12-10x86/PCI: Tidy E820 removal messagesBjorn Helgaas
These messages: clipped [mem size 0x00000000 64bit] to [mem size 0xfffffffffffa0000 64bit] for e820 entry [mem 0x0009f000-0x000fffff] aren't as useful as they could be because (a) the resource is often IORESOURCE_UNSET, so we print the size instead of the start/end and (b) we print the available resource even if it is empty after removing the E820 entry. Print the available space by hand to avoid the IORESOURCE_UNSET problem and only if it's non-empty. No functional change intended. Link: https://lore.kernel.org/r/20221208190341.1560157-4-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Hans de Goede <hdegoede@redhat.com>
2022-12-09ftrace/x86: Add back ftrace_expected for ftrace bug reportsSteven Rostedt (Google)
After someone reported a bug report with a failed modification due to the expected value not matching what was found, it came to my attention that the ftrace_expected is no longer set when that happens. This makes for debugging the issue a bit more difficult. Set ftrace_expected to the expected code before calling ftrace_bug, so that it shows what was expected and why it failed. Link: https://lore.kernel.org/all/CA+wXwBQ-VhK+hpBtYtyZP-NiX4g8fqRRWithFOHQW-0coQ3vLg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20221209105247.01d4e51d@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-12-07Merge tag 'v6.1-rc8' into efi/nextArd Biesheuvel
Linux 6.1-rc8
2022-12-05x86/apic/msi: Enable PCI/IMSThomas Gleixner
Enable IMS in the domain init and allocation mapping code, but do not enable it on the vector domain as discussed in various threads on LKML. The interrupt remap domains can expand this setting like they do with PCI multi MSI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232327.022658817@linutronix.de
2022-12-05x86/apic/msi: Remove arch_create_remap_msi_irq_domain()Thomas Gleixner
and related code which is not longer required now that the interrupt remap code has been converted to MSI parent domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de
2022-12-05iommu/amd: Switch to MSI base domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de
2022-12-05iommu/vt-d: Switch to MSI parent domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
2022-12-05x86/apic/vector: Provide MSI parent domainThomas Gleixner
Enable MSI parent domain support in the x86 vector domain and fixup the checks in the iommu implementations to check whether device::msi::domain is the default MSI parent domain. That keeps the existing logic to protect e.g. devices behind VMD working. The interrupt remap PCI/MSI code still works because the underlying vector domain still provides the same functionality. None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are affected either. They still work the same way both at the low level and the PCI/MSI implementations they provide. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
2022-12-05x86/microcode/intel: Do not retry microcode reloading on the APsAshok Raj
The retries in load_ucode_intel_ap() were in place to support systems with mixed steppings. Mixed steppings are no longer supported and there is only one microcode image at a time. Any retries will simply reattempt to apply the same image over and over without making progress. [ bp: Zap the circumstantial reasoning from the commit message. ] Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading") Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221129210832.107850-3-ashok.raj@intel.com
2022-12-05genirq/msi: Move IRQ_DOMAIN_MSI_NOMASK_QUIRK to MSI flagsThomas Gleixner
It's truly a MSI only flag and for the upcoming per device MSI domains this must be in the MSI flags so it can be set during domain setup without exposing this quirk outside of x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124230313.454246167@linutronix.de
2022-12-05uprobes/x86: Allow to probe a NOP instruction with 0x66 prefixOleg Nesterov
Intel ICC -hotpatch inserts 2-byte "0x66 0x90" NOP at the start of each function to reserve extra space for hot-patching, and currently it is not possible to probe these functions because branch_setup_xol_ops() wrongly rejects NOP with REP prefix as it treats them like word-sized branch instructions. Fixes: 250bbd12c2fe ("uprobes/x86: Refuse to attach uprobe to "word-sized" branch insns") Reported-by: Seiji Nishikawa <snishika@redhat.com> Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20221204173933.GA31544@redhat.com
2022-12-05x86/mtrr: Make message for disabled MTRRs more descriptiveJuergen Gross
Instead of just saying "Disabled" when MTRRs are disabled for any reason, tell what is disabled and why. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20221205080433.16643-3-jgross@suse.com
2022-12-03x86/microcode/intel: Do not print microcode revision and processor flagsAshok Raj
collect_cpu_info() is used to collect the current microcode revision and processor flags on every CPU. It had a weird mechanism to try to mimick a "once" functionality in the sense that, that information should be issued only when it is differing from the previous CPU. However (1): the new calling sequence started doing that in parallel: microcode_init() |-> schedule_on_each_cpu(setup_online_cpu) |-> collect_cpu_info() resulting in multiple redundant prints: microcode: sig=0x50654, pf=0x80, revision=0x2006e05 microcode: sig=0x50654, pf=0x80, revision=0x2006e05 microcode: sig=0x50654, pf=0x80, revision=0x2006e05 However (2): dumping this here is not that important because the kernel does not support mixed silicon steppings microcode. Finally! Besides, there is already a pr_info() in microcode_reload_late() that shows both the old and new revisions. What is more, the CPU signature (sig=0x50654) and Processor Flags (pf=0x80) above aren't that useful to the end user, they are available via /proc/cpuinfo and they don't change anyway. Remove the redundant pr_info(). [ bp: Heavily massage. ] Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk") Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20221103175901.164783-2-ashok.raj@intel.com
2022-12-02x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3Pawan Gupta
The "force" argument to write_spec_ctrl_current() is currently ambiguous as it does not guarantee the MSR write. This is due to the optimization that writes to the MSR happen only when the new value differs from the cached value. This is fine in most cases, but breaks for S3 resume when the cached MSR value gets out of sync with the hardware MSR value due to S3 resetting it. When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write is skipped. Which results in SPEC_CTRL mitigations not getting restored. Move the MSR write from write_spec_ctrl_current() to a new function that unconditionally writes to the MSR. Update the callers accordingly and rename functions. [ bp: Rework a bit. ] Fixes: caa0ff24d5d0 ("x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value") Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/806d39b0bfec2fe8f50dc5446dff20f5bb24a959.1669821572.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-02x86/sgx: Replace kmap/kunmap_atomic() callsKristen Carlson Accardi
kmap_local_page() is the preferred way to create temporary mappings when it is feasible, because the mappings are thread-local and CPU-local. kmap_local_page() uses per-task maps rather than per-CPU maps. This in effect removes the need to disable preemption on the local CPU while the mapping is active, and thus vastly reduces overall system latency. It is also valid to take pagefaults within the mapped region. The use of kmap_atomic() in the SGX code was not an explicit design choice to disable page faults or preemption, and there is no compelling design reason to using kmap_atomic() vs. kmap_local_page(). Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/linux-sgx/Y0biN3%2FJsZMa0yUr@kernel.org/ Link: https://lore.kernel.org/r/20221115161627.4169428-1-kristen@linux.intel.com
2022-12-02x86/of: Add support for boot time interrupt delivery mode configurationRahul Tanwar
Presently, init/boot time interrupt delivery mode is enumerated only for ACPI enabled systems by parsing MADT table or for older systems by parsing MP table. But for OF based x86 systems, it is assumed & hardcoded to be legacy PIC mode. This causes a boot time crash for platforms which do not provide a 8259 compliant legacy PIC. Add support for configuration of init time interrupt delivery mode for x86 OF based systems by introducing a new optional boolean property 'intel,virtual-wire-mode' for the local APIC interrupt-controller node. This property emulates IMCRP Bit 7 of MP feature info byte 2 of MP floating pointer structure. Defaults to legacy PIC mode if absent. Configures it to virtual wire compatibility mode if present. Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221124084143.21841-5-rtanwar@maxlinear.com
2022-12-02x86/of: Replace printk(KERN_LVL) with pr_lvl()Rahul Tanwar
Use pr_lvl() instead of the deprecated printk(KERN_LVL). Just a upgrade of print utilities usage. no functional changes. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221124084143.21841-4-rtanwar@maxlinear.com
2022-12-02x86/of: Remove unused early_init_dt_add_memory_arch()Andy Shevchenko
Recently objtool started complaining about dead code in the object files, in particular vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x191: unreachable instruction when CONFIG_OF=y. Indeed, early_init_dt_scan() is not used on x86 and making it compile (with help of CONFIG_OF) will abrupt the code flow since in the middle of it there is a BUG() instruction. Remove the pointless function. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221124184824.9548-1-andriy.shevchenko@linux.intel.com
2022-12-02x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOSMateusz Jończyk
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on platforms that have x2APIC already enabled in the BIOS before starting the kernel. The kernel was supposed to panic with an approprite error message in validate_x2apic() due to the missing X2APIC support. However, validate_x2apic() was run too late in the boot cycle, and the kernel tried to initialize the APIC nonetheless. This resulted in an earlier panic in setup_local_APIC() because the APIC was not registered. In my experiments, a panic message in setup_local_APIC() was not visible in the graphical console, which resulted in a hang with no indication what has gone wrong. Instead of calling panic(), disable the APIC, which results in a somewhat working system with the PIC only (and no SMP). This way the user is able to diagnose the problem more easily. Disabling X2APIC mode is not an option because it's impossible on systems with locked x2APIC. The proper place to disable the APIC in this case is in check_x2apic(), which is called early from setup_arch(). Doing this in __apic_intr_mode_select() is too late. Make check_x2apic() unconditionally available and remove the empty stub. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Robert Elliott (Servers) <elliott@hpe.com> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/d573ba1c-0dc4-3016-712a-cc23a8a33d42@molgen.mpg.de Link: https://lore.kernel.org/lkml/20220911084711.13694-3-mat.jonczyk@o2.pl Link: https://lore.kernel.org/all/20221129215008.7247-1-mat.jonczyk@o2.pl
2022-12-02x86/asm/32: Remove setup_once()Brian Gerst
After the removal of the stack canary segment setup code, this function does nothing. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221115184328.70874-1-brgerst@gmail.com
2022-12-02x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubsMiaohe Lin
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the argument setup of: apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end); even when X86_KERNEL_IBT=n and the function is an empty stub, which leads to link errors due to missing __ibt_endbr_seal* symbols: ld: arch/x86/kernel/alternative.o: in function `alternative_instructions': alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end' ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal' Remove the explicit 'noinline' to help gcc optimize them away. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221011113803.956808-1-linmiaohe@huawei.com
2022-11-30Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton
2022-11-28iommu/hyper-v: Allow hyperv irq remapping without x2apicNuno Das Neves
If x2apic is not available, hyperv-iommu skips remapping irqs. This breaks root partition which always needs irqs remapped. Fix this by allowing irq remapping regardless of x2apic, and change hyperv_enable_irq_remapping() to return IRQ_REMAP_XAPIC_MODE in case x2apic is missing. Tested with root and non-root hyperv partitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1668715899-8971-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2022-11-27x86/resctrl: Move MSR defines into msr-index.hBorislav Petkov
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de
2022-11-25use less confusing names for iov_iter direction initializersAl Viro
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-11-25x86/boot: Skip realmode init code when running as Xen PV guestJuergen Gross
When running as a Xen PV guest there is no need for setting up the realmode trampoline, as realmode isn't supported in this environment. Trying to setup the trampoline has been proven to be problematic in some cases, especially when trying to debug early boot problems with Xen requiring to keep the EFI boot-services memory mapped (some firmware variants seem to claim basically all memory below 1Mb for boot services). Introduce new x86_platform_ops operations for that purpose, which can be set to a NOP by the Xen PV specific kernel boot code. [ bp: s/call_init_real_mode/do_init_real_mode/ ] Fixes: 084ee1c641a0 ("x86, realmode: Relocator for realmode code") Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221123114523.3467-1-jgross@suse.com
2022-11-24driver core: make struct class.devnode() take a const *Greg Kroah-Hartman
The devnode() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Justin Sanders <justin@coraid.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Benjamin Gaignard <benjamin.gaignard@collabora.com> Cc: Liam Mark <lmark@codeaurora.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Brian Starkey <Brian.Starkey@arm.com> Cc: John Stultz <jstultz@google.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Sean Young <sean@mess.org> Cc: Frank Haverkamp <haver@linux.ibm.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.com> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Xie Yongji <xieyongji@bytedance.com> Cc: Gautam Dawar <gautam.dawar@xilinx.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Eli Cohen <elic@nvidia.com> Cc: Parav Pandit <parav@nvidia.com> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: alsa-devel@alsa-project.org Cc: dri-devel@lists.freedesktop.org Cc: kvm@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: linux-block@vger.kernel.org Cc: linux-input@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20221123122523.1332370-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-24x86/paravirt: Use common macro for creating simple asm paravirt functionsJuergen Gross
There are some paravirt assembler functions which are sharing a common pattern. Introduce a macro DEFINE_PARAVIRT_ASM() for creating them. Note that this macro is including explicit alignment of the generated functions, leading to __raw_callee_save___kvm_vcpu_is_preempted(), _paravirt_nop() and paravirt_ret0() to be aligned at 4 byte boundaries now. The explicit _paravirt_nop() prototype in paravirt.c isn't needed, as it is included in paravirt_types.h already. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Link: https://lkml.kernel.org/r/20221109134418.6516-1-jgross@suse.com
2022-11-22x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGNYingChi Long
WG14 N2350 specifies that it is an undefined behavior to have type definitions within offsetof", see https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm This specification is also part of C23. Therefore, replace the TYPE_ALIGN macro with the _Alignof builtin to avoid undefined behavior. (_Alignof itself is C11 and the kernel is built with -gnu11). ISO C11 _Alignof is subtly different from the GNU C extension __alignof__. Latter is the preferred alignment and _Alignof the minimal alignment. For long long on x86 these are 8 and 4 respectively. The macro TYPE_ALIGN's behavior matches _Alignof rather than __alignof__. [ bp: Massage commit message. ] Signed-off-by: YingChi Long <me@inclyc.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220925153151.2467884-1-me@inclyc.cn