summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/cpu
AgeCommit message (Collapse)Author
2022-11-10x86/mtrr: Rename prepare_set() and post_set()Juergen Gross
Rename the currently MTRR-specific functions prepare_set() and post_set() in preparation to move them. Make them non-static and put their prototypes into cacheinfo.h, where they will end after moving them to their final position anyway. Expand the comment before the functions with an introductory line and rename two related static variables, too. [ bp: Massage commit message. ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221102074713.21493-5-jgross@suse.com Signed-off-by: Borislav Petkov <bp@suse.de>
2022-11-10x86/mtrr: Replace use_intel() with a local flagJuergen Gross
In MTRR code use_intel() is only used in one source file, and the relevant use_intel_if member of struct mtrr_ops is set only in generic_mtrr_ops. Replace use_intel() with a single flag in cacheinfo.c which can be set when assigning generic_mtrr_ops to mtrr_if. This allows to drop use_intel_if from mtrr_ops, while preparing to decouple PAT from MTRR. As another preparation for the PAT/MTRR decoupling use a bit for MTRR control and one for PAT control. For now set both bits together, this can be changed later. As the new flag will be set only if mtrr_enabled is set, the test for mtrr_enabled can be dropped at some places. [ bp: Massage commit message. ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221102074713.21493-4-jgross@suse.com Signed-off-by: Borislav Petkov <bp@suse.de>
2022-11-09x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callersPaolo Bonzini
x86_virt_spec_ctrl only deals with the paravirtualized MSR_IA32_VIRT_SPEC_CTRL now and does not handle MSR_IA32_SPEC_CTRL anymore; remove the corresponding, unused argument. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09KVM: SVM: move MSR_IA32_SPEC_CTRL save/restore to assemblyPaolo Bonzini
Restoration of the host IA32_SPEC_CTRL value is probably too late with respect to the return thunk training sequence. With respect to the user/kernel boundary, AMD says, "If software chooses to toggle STIBP (e.g., set STIBP on kernel entry, and clear it on kernel exit), software should set STIBP to 1 before executing the return thunk training sequence." I assume the same requirements apply to the guest/host boundary. The return thunk training sequence is in vmenter.S, quite close to the VM-exit. On hosts without V_SPEC_CTRL, however, the host's IA32_SPEC_CTRL value is not restored until much later. To avoid this, move the restoration of host SPEC_CTRL to assembly and, for consistency, move the restoration of the guest SPEC_CTRL as well. This is not particularly difficult, apart from some care to cover both 32- and 64-bit, and to share code between SEV-ES and normal vmentry. Cc: stable@vger.kernel.org Fixes: a149180fbcf3 ("x86: Add magic AMD return-thunk") Suggested-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-08x86/sgx: use VM_ACCESS_FLAGSKefeng Wang
Simplify VM_READ|VM_WRITE|VM_EXEC with VM_ACCESS_FLAGS. Link: https://lkml.kernel.org/r/20221019034945.93081-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08x86/sgx: Add overflow check in sgx_validate_offset_length()Borys Popławski
sgx_validate_offset_length() function verifies "offset" and "length" arguments provided by userspace, but was missing an overflow check on their addition. Add it. Fixes: c6d26d370767 ("x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES") Signed-off-by: Borys Popławski <borysp@invisiblethingslab.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: stable@vger.kernel.org # v5.11+ Link: https://lore.kernel.org/r/0d91ac79-6d84-abed-5821-4dbe59fa1a38@invisiblethingslab.com
2022-11-04KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guestKai Huang
The new Asynchronous Exit (AEX) notification mechanism (AEX-notify) allows one enclave to receive a notification in the ERESUME after the enclave exit due to an AEX. EDECCSSA is a new SGX user leaf function (ENCLU[EDECCSSA]) to facilitate the AEX notification handling. The new EDECCSSA is enumerated via CPUID(EAX=0x12,ECX=0x0):EAX[11]. Besides Allowing reporting the new AEX-notify attribute to KVM guests, also allow reporting the new EDECCSSA user leaf function to KVM guests so the guest can fully utilize the AEX-notify mechanism. Similar to existing X86_FEATURE_SGX1 and X86_FEATURE_SGX2, introduce a new scattered X86_FEATURE_SGX_EDECCSSA bit for the new EDECCSSA, and report it in KVM's supported CPUIDs. Note, no additional KVM enabling is required to allow the guest to use EDECCSSA. It's impossible to trap ENCLU (without completely preventing the guest from using SGX). Advertise EDECCSSA as supported purely so that userspace doesn't need to special case EDECCSSA, i.e. doesn't need to manually check host CPUID. The inability to trap ENCLU also means that KVM can't prevent the guest from using EDECCSSA, but that virtualization hole is benign as far as KVM is concerned. EDECCSSA is simply a fancy way to modify internal enclave state. More background about how do AEX-notify and EDECCSSA work: SGX maintains a Current State Save Area Frame (CSSA) for each enclave thread. When AEX happens, the enclave thread context is saved to the CSSA and the CSSA is increased by 1. For a normal ERESUME which doesn't deliver AEX notification, it restores the saved thread context from the previously saved SSA and decreases the CSSA. If AEX-notify is enabled for one enclave, the ERESUME acts differently. Instead of restoring the saved thread context and decreasing the CSSA, it acts like EENTER which doesn't decrease the CSSA but establishes a clean slate thread context using the CSSA for the enclave to handle the notification. After some handling, the enclave must discard the "new-established" SSA and switch back to the previously saved SSA (upon AEX). Otherwise, the enclave will run out of SSA space upon further AEXs and eventually fail to run. To solve this problem, the new EDECCSSA essentially decreases the CSSA. It can be used by the enclave notification handler to switch back to the previous saved SSA when needed, i.e. after it handles the notification. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/all/20221101022422.858944-1-kai.huang%40intel.com
2022-11-04x86/sgx: Allow enclaves to use Asynchrounous Exit NotificationDave Hansen
Short Version: Allow enclaves to use the new Asynchronous EXit (AEX) notification mechanism. This mechanism lets enclaves run a handler after an AEX event. These handlers can run mitigations for things like SGX-Step[1]. AEX Notify will be made available both on upcoming processors and on some older processors through microcode updates. Long Version: == SGX Attribute Background == The SGX architecture includes a list of SGX "attributes". These attributes ensure consistency and transparency around specific enclave features. As a simple example, the "DEBUG" attribute allows an enclave to be debugged, but also destroys virtually all of SGX security. Using attributes, enclaves can know that they are being debugged. Attributes also affect enclave attestation so an enclave can, for instance, be denied access to secrets while it is being debugged. The kernel keeps a list of known attributes and will only initialize enclaves that use a known set of attributes. This kernel policy eliminates the chance that a new SGX attribute could cause undesired effects. For example, imagine a new attribute was added called "PROVISIONKEY2" that provided similar functionality to "PROVISIIONKEY". A kernel policy that allowed indiscriminate use of unknown attributes and thus PROVISIONKEY2 would undermine the existing kernel policy which limits use of PROVISIONKEY enclaves. == AEX Notify Background == "Intel Architecture Instruction Set Extensions and Future Features - Version 45" is out[2]. There is a new chapter: Asynchronous Enclave Exit Notify and the EDECCSSA User Leaf Function. Enclaves exit can be either synchronous and consensual (EEXIT for instance) or asynchronous (on an interrupt or fault). The asynchronous ones can evidently be exploited to single step enclaves[1], on top of which other naughty things can be built. AEX Notify will be made available both on upcoming processors and on some older processors through microcode updates. == The Problem == These attacks are currently entirely opaque to the enclave since the hardware does the save/restore under the covers. The Asynchronous Enclave Exit Notify (AEX Notify) mechanism provides enclaves an ability to detect and mitigate potential exposure to these kinds of attacks. == The Solution == Define the new attribute value for AEX Notification. Ensure the attribute is cleared from the list reserved attributes. Instead of adding to the open-coded lists of individual attributes, add named lists of privileged (disallowed by default) and unprivileged (allowed by default) attributes. Add the AEX notify attribute as an unprivileged attribute, which will keep the kernel from rejecting enclaves with it set. 1. https://github.com/jovanbulck/sgx-step 2. https://cdrdv2.intel.com/v1/dl/getContent/671368?explicitVersion=true Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20220720191347.1343986-1-dave.hansen%40linux.intel.com
2022-11-03x86/intel_epb: Set Alder Lake N and Raptor Lake P normal EPBSrinivas Pandruvada
Intel processors support additional software hint called EPB ("Energy Performance Bias") to guide the hardware heuristic of power management features to favor increasing dynamic performance or conserve energy consumption. Since this EPB hint is processor specific, the same value of hint can result in different behavior across generations of processors. commit 4ecc933b7d1f ("x86: intel_epb: Allow model specific normal EPB value")' introduced capability to update the default power up EPB based on the CPU model and updated the default EPB to 7 for Alder Lake mobile CPUs. The same change is required for other Alder Lake-N and Raptor Lake-P mobile CPUs as the current default of 6 results in higher uncore power consumption. This increase in power is related to memory clock frequency setting based on the EPB value. Depending on the EPB the minimum memory frequency is set by the firmware. At EPB = 7, the minimum memory frequency is 1/4th compared to EPB = 6. This results in significant power saving for idle and semi-idle workload on a Chrome platform. For example Change in power and performance from EPB change from 6 to 7 on Alder Lake-N: Workload Performance diff (%) power diff ---------------------------------------------------- VP9 FHD30 0 (FPS) -218 mw Google meet 0 (FPS) -385 mw This 200+ mw power saving is very significant for mobile platform for battery life and thermal reasons. But as the workload demands more memory bandwidth, the memory frequency will be increased very fast. There is no power savings for such busy workloads. For example: Workload Performance diff (%) from EPB 6 to 7 ------------------------------------------------------- Speedometer 2.0 -0.8 WebGL Aquarium 10K Fish -0.5 Unity 3D 2018 0.2 WebXPRT3 -0.5 There are run to run variations for performance scores for such busy workloads. So the difference is not significant. Add a new define ENERGY_PERF_BIAS_NORMAL_POWERSAVE for EPB 7 and use it for Alder Lake-N and Raptor Lake-P mobile CPUs. This modification is done originally by Jeremy Compostella <jeremy.compostella@intel.com>. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/all/20221027220056.1534264-1-srinivas.pandruvada%40linux.intel.com
2022-11-02x86/microcode: Drop struct ucode_cpu_info.validBorislav Petkov
It is not needed anymore. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20221028142638.28498-6-bp@alien8.de
2022-11-02x86/microcode: Do some minor fixupsBorislav Petkov
Improve debugging printks and fixup formatting. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20221028142638.28498-5-bp@alien8.de
2022-11-02x86/microcode: Kill refresh_fwBorislav Petkov
request_microcode_fw() can always request firmware now so drop this superfluous argument. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20221028142638.28498-4-bp@alien8.de
2022-11-02x86/microcode: Simplify init path even moreBorislav Petkov
Get rid of all the IPI-sending functions and their wrappers and use those which are supposed to be called on each CPU. Thus: - microcode_init_cpu() gets called on each CPU on init, applying any new microcode that the driver might've found on the filesystem. - mc_cpu_starting() simply tries to apply cached microcode as this is the cpuhp starting callback which gets called on CPU resume too. Even if the driver init function is a late initcall, there is no filesystem by then (not even a hdd driver has been loaded yet) so a new firmware load attempt cannot simply be done. It is pointless anyway - for that there's late loading if one really needs it. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20221028142638.28498-3-bp@alien8.de
2022-11-02x86/microcode: Rip out the subsys interface gunkBorislav Petkov
This is a left-over from the old days when CPU hotplug wasn't as robust as it is now. Currently, microcode gets loaded early on the CPU init path and there's no need to attempt to load it again, which that subsys interface callback is doing. The only other thing that the subsys interface init path was doing is adding the /sys/devices/system/cpu/cpu*/microcode/ hierarchy. So add a function which gets called on each CPU after all the necessary driver setup has happened. Use schedule_on_each_cpu() which can block because the sysfs creating code does kmem_cache_zalloc() which can block too and the initial version of this where it did that setup in an IPI handler of on_each_cpu() can cause a deadlock of the sort: lock(fs_reclaim); <Interrupt> lock(fs_reclaim); as the IPI handler runs in IRQ context. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Link: https://lore.kernel.org/r/20221028142638.28498-2-bp@alien8.de
2022-11-01x86/ibt: Implement FineIBTPeter Zijlstra
Implement an alternative CFI scheme that merges both the fine-grained nature of kCFI but also takes full advantage of the coarse grained hardware CFI as provided by IBT. To contrast: kCFI is a pure software CFI scheme and relies on being able to read text -- specifically the instruction *before* the target symbol, and does the hash validation *before* doing the call (otherwise control flow is compromised already). FineIBT is a software and hardware hybrid scheme; by ensuring every branch target starts with a hash validation it is possible to place the hash validation after the branch. This has several advantages: o the (hash) load is avoided; no memop; no RX requirement. o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing the hash validation in the immediate instruction after the branch target there is a minimal speculation window and the whole is a viable defence against SpectreBHB. o Kees feels obliged to mention it is slightly more vulnerable when the attacker can write code. Obviously this patch relies on kCFI, but additionally it also relies on the padding from the call-depth-tracking patches. It uses this padding to place the hash-validation while the call-sites are re-written to modify the indirect target to be 16 bytes in front of the original target, thus hitting this new preamble. Notably, there is no hardware that needs call-depth-tracking (Skylake) and supports IBT (Tigerlake and onwards). Suggested-by: Joao Moreira (Intel) <joao@overdrivepizza.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221027092842.634714496@infradead.org
2022-10-31x86/sgx: Reduce delay and interference of enclave releaseReinette Chatre
commit 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") introduced a cond_resched() during enclave release where the EREMOVE instruction is applied to every 4k enclave page. Giving other tasks an opportunity to run while tearing down a large enclave placates the soft lockup detector but Iqbal found that the fix causes a 25% performance degradation of a workload run using Gramine. Gramine maintains a 1:1 mapping between processes and SGX enclaves. That means if a workload in an enclave creates a subprocess then Gramine creates a duplicate enclave for that subprocess to run in. The consequence is that the release of the enclave used to run the subprocess can impact the performance of the workload that is run in the original enclave, especially in large enclaves when SGX2 is not in use. The workload run by Iqbal behaves as follows: Create enclave (enclave "A") /* Initialize workload in enclave "A" */ Create enclave (enclave "B") /* Run subprocess in enclave "B" and send result to enclave "A" */ Release enclave (enclave "B") /* Run workload in enclave "A" */ Release enclave (enclave "A") The performance impact of releasing enclave "B" in the above scenario is amplified when there is a lot of SGX memory and the enclave size matches the SGX memory. When there is 128GB SGX memory and an enclave size of 128GB, from the time enclave "B" starts the 128GB SGX memory is oversubscribed with a combined demand for 256GB from the two enclaves. Before commit 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") enclave release was done in a tight loop without giving other tasks a chance to run. Even though the system experienced soft lockups the workload (run in enclave "A") obtained good performance numbers because when the workload started running there was no interference. Commit 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") gave other tasks opportunity to run while an enclave is released. The impact of this in this scenario is that while enclave "B" is released and needing to access each page that belongs to it in order to run the SGX EREMOVE instruction on it, enclave "A" is attempting to run the workload needing to access the enclave pages that belong to it. This causes a lot of swapping due to the demand for the oversubscribed SGX memory. Longer latencies are experienced by the workload in enclave "A" while enclave "B" is released. Improve the performance of enclave release while still avoiding the soft lockup detector with two enhancements: - Only call cond_resched() after XA_CHECK_SCHED iterations. - Use the xarray advanced API to keep the xarray locked for XA_CHECK_SCHED iterations instead of locking and unlocking at every iteration. This batching solution is copied from sgx_encl_may_map() that also iterates through all enclave pages using this technique. With this enhancement the workload experiences a 5% performance degradation when compared to a kernel without commit 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves"), an improvement to the reported 25% degradation, while still placating the soft lockup detector. Scenarios with poor performance are still possible even with these enhancements. For example, short workloads creating sub processes while running in large enclaves. Further performance improvements are pursued in user space through avoiding to create duplicate enclaves for certain sub processes, and using SGX2 that will do lazy allocation of pages as needed so enclaves created for sub processes start quickly and release quickly. Fixes: 8795359e35bc ("x86/sgx: Silence softlockup detection when releasing large enclaves") Reported-by: Md Iqbal Hossain <md.iqbal.hossain@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Md Iqbal Hossain <md.iqbal.hossain@intel.com> Link: https://lore.kernel.org/all/00efa80dd9e35dc85753e1c5edb0344ac07bb1f0.1667236485.git.reinette.chatre%40intel.com
2022-10-31x86/mce: Use severity table to handle uncorrected errors in kernelTony Luck
mce_severity_intel() has a special case to promote UC and AR errors in kernel context to PANIC severity. The "AR" case is already handled with separate entries in the severity table for all instruction fetch errors, and those data fetch errors that are not in a recoverable area of the kernel (i.e. have an extable fixup entry). Add an entry to the severity table for UC errors in kernel context that reports severity = PANIC. Delete the special case code from mce_severity_intel(). Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220922195136.54575-2-tony.luck@intel.com
2022-10-27x86/MCE/AMD: Clear DFR errors found in THR handlerYazen Ghannam
AMD's MCA Thresholding feature counts errors of all severity levels, not just correctable errors. If a deferred error causes the threshold limit to be reached (it was the error that caused the overflow), then both a deferred error interrupt and a thresholding interrupt will be triggered. The order of the interrupts is not guaranteed. If the threshold interrupt handler is executed first, then it will clear MCA_STATUS for the error. It will not check or clear MCA_DESTAT which also holds a copy of the deferred error. When the deferred error interrupt handler runs it will not find an error in MCA_STATUS, but it will find the error in MCA_DESTAT. This will cause two errors to be logged. Check for deferred errors when handling a threshold interrupt. If a bank contains a deferred error, then clear the bank's MCA_DESTAT register. Define a new helper function to do the deferred error check and clearing of MCA_DESTAT. [ bp: Simplify, convert comment to passive voice. ] Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com
2022-10-24x86/resctrl: Remove arch_has_empty_bitmapsBabu Moger
The field arch_has_empty_bitmaps is not required anymore. The field min_cbm_bits is enough to validate the CBM (capacity bit mask) if the architecture can support the zero CBM or not. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/166430979654.372014.615622285687642644.stgit@bmoger-ubuntu
2022-10-22Merge branch 'x86/urgent' into x86/core, to resolve conflictIngo Molnar
There's a conflict between the call-depth tracking commits in x86/core: ee3e2469b346 ("x86/ftrace: Make it call depth tracking aware") 36b64f101219 ("x86/ftrace: Rebalance RSB") eac828eaef29 ("x86/ftrace: Remove ftrace_epilogue()") And these fixes in x86/urgent: 883bbbffa5a4 ("ftrace,kcfi: Separate ftrace_stub() and ftrace_stub_graph()") b5f1fc318440 ("x86/ftrace: Remove ftrace_epilogue()") It's non-trivial overlapping modifications - resolve them. Conflicts: arch/x86/kernel/ftrace_64.S Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-10-20x86/mtrr: Remove unused cyrix_set_all() functionJuergen Gross
The Cyrix CPU specific MTRR function cyrix_set_all() will never be called as the mtrr_ops->set_all() callback will only be called in the use_intel() case, which would require the use_intel_if member of struct mtrr_ops to be set, which isn't the case for Cyrix. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221004081023.32402-3-jgross@suse.com
2022-10-19x86/mtrr: Add comment for set_mtrr_state() serializationJuergen Gross
Add a comment about set_mtrr_state() needing serialization. [ bp: Touchups. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220820092533.29420-2-jgross@suse.com
2022-10-18x86/resctrl: Fix min_cbm_bits for AMDBabu Moger
AMD systems support zero CBM (capacity bit mask) for cache allocation. That is reflected in rdt_init_res_defs_amd() by: r->cache.arch_has_empty_bitmaps = true; However given the unified code in cbm_validate(), checking for: val == 0 && !arch_has_empty_bitmaps is not enough because of another check in cbm_validate(): if ((zero_bit - first_bit) < r->cache.min_cbm_bits) The default value of r->cache.min_cbm_bits = 1. Leading to: $ cd /sys/fs/resctrl $ mkdir foo $ cd foo $ echo L3:0=0 > schemata -bash: echo: write error: Invalid argument $ cat /sys/fs/resctrl/info/last_cmd_status Need at least 1 bits in the mask Initialize the min_cbm_bits to 0 for AMD. Also, remove the default setting of min_cbm_bits and initialize it separately. After the fix: $ cd /sys/fs/resctrl $ mkdir foo $ cd foo $ echo L3:0=0 > schemata $ cat /sys/fs/resctrl/info/last_cmd_status ok Fixes: 316e7f901f5a ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps") Co-developed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com
2022-10-18x86/microcode/AMD: Apply the patch early on every logical threadBorislav Petkov
Currently, the patch application logic checks whether the revision needs to be applied on each logical CPU (SMT thread). Therefore, on SMT designs where the microcode engine is shared between the two threads, the application happens only on one of them as that is enough to update the shared microcode engine. However, there are microcode patches which do per-thread modification, see Link tag below. Therefore, drop the revision check and try applying on each thread. This is what the BIOS does too so this method is very much tested. Btw, change only the early paths. On the late loading paths, there's no point in doing per-thread modification because if is it some case like in the bugzilla below - removing a CPUID flag - the kernel cannot go and un-use features it has detected are there early. For that, one should use early loading anyway. [ bp: Fixes does not contain the oldest commit which did check for equality but that is good enough. ] Fixes: 8801b3fcb574 ("x86/microcode/AMD: Rework container parsing") Reported-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com> Cc: <stable@vger.kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
2022-10-17x86/topology: Fix duplicated core ID within a packageZhang Rui
Today, core ID is assumed to be unique within each package. But an AlderLake-N platform adds a Module level between core and package, Linux excludes the unknown modules bits from the core ID, resulting in duplicate core ID's. To keep core ID unique within a package, Linux must include all APIC-ID bits for known or unknown levels above the core and below the package in the core ID. It is important to understand that core ID's have always come directly from the APIC-ID encoding, which comes from the BIOS. Thus there is no guarantee that they start at 0, or that they are contiguous. As such, naively using them for array indexes can be problematic. [ dhansen: un-known -> unknown ] Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support") Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221014090147.1836-5-rui.zhang@intel.com
2022-10-17x86/topology: Fix multiple packages shown on a single-package systemZhang Rui
CPUID.1F/B does not enumerate Package level explicitly, instead, all the APIC-ID bits above the enumerated levels are assumed to be package ID bits. Current code gets package ID by shifting out all the APIC-ID bits that Linux supports, rather than shifting out all the APIC-ID bits that CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a level that Linux does not support. For example, on a single package AlderLake-N, there are 2 Ecore Modules with 4 atom cores in each module. Linux does not support the Module level and interprets the Module ID bits as package ID and erroneously reports a multi module system as a multi-package system. Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels as package ID. [ dhansen: spelling fix ] Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support") Suggested-by: Len Brown <len.brown@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Len Brown <len.brown@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221014090147.1836-4-rui.zhang@intel.com
2022-10-17x86/bugs: Add retbleed=forcePeter Zijlstra (Intel)
Debug aid, allows running retbleed=force,stuff on non-affected uarchs Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-10-17x86/retbleed: Add call depth tracking mitigationThomas Gleixner
The fully secure mitigation for RSB underflow on Intel SKL CPUs is IBRS, which inflicts up to 30% penalty for pathological syscall heavy work loads. Software based call depth tracking and RSB refill is not perfect, but reduces the attack surface massively. The penalty for the pathological case is about 8% which is still annoying but definitely more palatable than IBRS. Add a retbleed=stuff command line option to enable the call depth tracking and software refill of the RSB. This gives admins a choice. IBeeRS are safe and cause headaches, call depth tracking is considered to be s(t)ufficiently safe. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111149.029587352@infradead.org
2022-10-17x86/percpu: Move irq_stack variables next to current_taskThomas Gleixner
Further extend struct pcpu_hot with the hard and soft irq stack pointers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.599170752@infradead.org
2022-10-17x86/percpu: Move current_top_of_stack next to current_taskThomas Gleixner
Extend the struct pcpu_hot cacheline with current_top_of_stack; another very frequently used value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.493038635@infradead.org
2022-10-17x86/percpu: Move preempt_count next to current_taskThomas Gleixner
Add preempt_count to pcpu_hot, since it is once of the most used per-cpu variables. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.284170644@infradead.org
2022-10-17x86: Put hot per CPU variables into a structThomas Gleixner
The layout of per-cpu variables is at the mercy of the compiler. This can lead to random performance fluctuations from build to build. Create a structure to hold some of the hottest per-cpu variables, starting with current_task. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.179707194@infradead.org
2022-10-17x86/cpu: Re-enable stackprotectorThomas Gleixner
Commit 5416c2663517 ("x86: make sure load_percpu_segment has no stackprotector") disabled the stackprotector for cpu/common.c because of load_percpu_segment(). Back then the boot stack canary was initialized very early in start_kernel(). Switching the per CPU area by loading the GDT caused the stackprotector to fail with paravirt enabled kernels as the GSBASE was not updated yet. In hindsight a wrong change because it would have been sufficient to ensure that the canary is the same in both per CPU areas. Commit d55535232c3d ("random: move rand_initialize() earlier") moved the stack canary initialization to a later point in the init sequence. As a consequence the per CPU stack canary is 0 when switching the per CPU areas, so there is no requirement anymore to exclude this file. Add a comment to load_percpu_segment(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.303010511@infradead.org
2022-10-17x86/cpu: Get rid of redundant switch_to_new_gdt() invocationsThomas Gleixner
The only place where switch_to_new_gdt() is required is early boot to switch from the early GDT to the direct GDT. Any other invocation is completely redundant because it does not change anything. Secondary CPUs come out of the ASM code with GDT and GSBASE correctly set up. The same is true for XEN_PV. Remove all the voodoo invocations which are left overs from the ancient past, rename the function to switch_gdt_and_percpu_base() and mark it init. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.198076128@infradead.org
2022-10-17x86/cpu: Remove segment load from switch_to_new_gdt()Thomas Gleixner
On 32bit FS and on 64bit GS segments are already set up correctly, but load_percpu_segment() still sets [FG]S after switching from the early GDT to the direct GDT. For 32bit the segment load has no side effects, but on 64bit it causes GSBASE to become 0, which means that any per CPU access before GSBASE is set to the new value is going to fault. That's the reason why the whole file containing this code has stackprotector removed. But that's a pointless exercise for both 32 and 64 bit as the relevant segment selector is already correct. Loading the new GDT does not change that. Remove the segment loads and add comments. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.097052006@infradead.org
2022-10-17x86/bugs: Use sysfs_emit()Borislav Petkov
Those mitigations are very talkative; use the printing helper which pays attention to the buffer size. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220809153419.10182-1-bp@alien8.de
2022-10-11treewide: use get_random_u32() when possibleJason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-10-10Merge tag 'mm-stable-2022-10-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2022-10-10Merge tag 'perf-core-2022-10-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "PMU driver updates: - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature support for Zen 4 processors. - Extend the perf ABI to provide branch speculation information, if available, and use this on CPUs that have it (eg. LbrExtV2). - Improve Intel PEBS TSC timestamp handling & integration. - Add Intel Raptor Lake S CPU support. - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs by utilizing IBS tagged load/store samples. - Clean up & optimize various x86 PMU details. HW breakpoints: - Big rework to optimize the code for systems with hundreds of CPUs and thousands of breakpoints: - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem per-CPU rwsem that is read-locked during most of the key operations. - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and fetch_bp_busy_slots(). - Apply micro-optimizations & cleanups. - Misc cleanups & enhancements" * tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex perf: Fix pmu_filter_match() perf: Fix lockdep_assert_event_ctx() perf/x86/amd/lbr: Adjust LBR regardless of filtering perf/x86/utils: Fix uninitialized var in get_branch_type() perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR perf/x86/amd: Support PERF_SAMPLE_ADDR perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} perf/x86/amd: Support PERF_SAMPLE_DATA_SRC perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} perf/x86/uncore: Add new Raptor Lake S support perf/x86/cstate: Add new Raptor Lake S support perf/x86/msr: Add new Raptor Lake S support perf/x86: Add new Raptor Lake S support bpf: Check flags for branch stack in bpf_read_branch_records helper perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails perf: Use sample_flags for raw_data perf: Use sample_flags for addr ...
2022-10-06Merge tag 'pull-file_inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file_inode() updates from Al Vrio: "whack-a-mole: cropped up open-coded file_inode() uses..." * tag 'pull-file_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: orangefs: use ->f_mapping _nfs42_proc_copy(): use ->f_mapping instead of file_inode()->i_mapping dma_buf: no need to bother with file_inode()->i_mapping nfs_finish_open(): don't open-code file_inode() bprm_fill_uid(): don't open-code file_inode() sgx: use ->f_mapping... exfat_iterate(): don't open-code file_inode(file) ibmvmc: don't open-code file_inode()
2022-10-04Merge tag 'x86_cleanups_for_v6.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The usual round of smaller fixes and cleanups all over the tree * tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32 x86: Fix various duplicate-word comment typos x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h
2022-10-04Merge tag 'x86_cache_for_v6.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control updates from Borislav Petkov: - More work by James Morse to disentangle the resctrl filesystem generic code from the architectural one with the endgoal of plugging ARM's MPAM implementation into it too so that the user interface remains the same - Properly restore the MSR_MISC_FEATURE_CONTROL value instead of blindly overwriting it to 0 * tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data x86/resctrl: Rename and change the units of resctrl_cqm_threshold x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read() x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read() x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() x86/resctrl: Abstract __rmid_read() x86/resctrl: Allow per-rmid arch private storage to be reset x86/resctrl: Add per-rmid arch private storage for overflow and chunks x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks x86/resctrl: Allow update_mba_bw() to update controls directly x86/resctrl: Remove architecture copy of mbps_val x86/resctrl: Switch over to the resctrl mbps_val list x86/resctrl: Create mba_sc configuration in the rdt_domain x86/resctrl: Abstract and use supports_mba_mbps() x86/resctrl: Remove set_mba_sc()s control array re-initialisation x86/resctrl: Add domain offline callback for resctrl work x86/resctrl: Group struct rdt_hw_domain cleanup x86/resctrl: Add domain online callback for resctrl work x86/resctrl: Merge mon_capable and mon_enabled ...
2022-10-04Merge tag 'x86_microcode_for_v6.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x75 microcode loader updates from Borislav Petkov: - Get rid of a single ksize() usage - By popular demand, print the previous microcode revision an update was done over - Remove more code related to the now gone MICROCODE_OLD_INTERFACE - Document the problems stemming from microcode late loading * tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Track patch allocation size explicitly x86/microcode: Print previous version of microcode after reload x86/microcode: Remove ->request_microcode_user() x86/microcode: Document the whole late loading problem
2022-10-04Merge tag 'ras_core_for_v6.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Fix the APEI MCE callback handler to consult the hardware about the granularity of the memory error instead of hard-coding it - Offline memory pages on Intel machines after 2 errors reported per page * tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Retrieve poison range from hardware RAS/CEC: Reduce offline page threshold for Intel systems
2022-10-04Merge tag 'x86_sgx_for_v6.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SGX update from Borislav Petkov: - Improve the documentation of a couple of SGX functions handling backing storage * tag 'x86_sgx_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Improve comments for sgx_encl_lookup/alloc_backing()
2022-10-04Merge tag 'x86_platform_for_v6.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform update from Borislav Petkov: "A single x86/platform improvement when the kernel is running as an ACRN guest: - Get TSC and CPU frequency from CPUID leaf 0x40000010 when the kernel is running as a guest on the ACRN hypervisor" * tag 'x86_platform_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/acrn: Set up timekeeping
2022-10-03x86: kmsan: disable instrumentation of unsupported codeAlexander Potapenko
Instrumenting some files with KMSAN will result in kernel being unable to link, boot or crashing at runtime for various reasons (e.g. infinite recursion caused by instrumentation hooks calling instrumented code again). Completely omit KMSAN instrumentation in the following places: - arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386; - arch/x86/entry/vdso, which isn't linked with KMSAN runtime; - three files in arch/x86/kernel - boot problems; - arch/x86/mm/cpu_entry_area.c - recursion. Link: https://lkml.kernel.org/r/20220915150417.722975-33-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-09-26x86/cpu: Include the header of init_ia32_feat_ctl()'s prototypeLuciano Leão
Include the header containing the prototype of init_ia32_feat_ctl(), solving the following warning: $ make W=1 arch/x86/kernel/cpu/feat_ctl.o arch/x86/kernel/cpu/feat_ctl.c:112:6: warning: no previous prototype for ‘init_ia32_feat_ctl’ [-Wmissing-prototypes] 112 | void init_ia32_feat_ctl(struct cpuinfo_x86 *c) This warning appeared after commit 5d5103595e9e5 ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") had moved the function init_ia32_feat_ctl()'s prototype from arch/x86/kernel/cpu/cpu.h to arch/x86/include/asm/cpu.h. Note that, before the commit mentioned above, the header include "cpu.h" (arch/x86/kernel/cpu/cpu.h) was added by commit 0e79ad863df43 ("x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()") solely to fix init_ia32_feat_ctl()'s missing prototype. So, the header include "cpu.h" is no longer necessary. [ bp: Massage commit message. ] Fixes: 5d5103595e9e5 ("x86/cpu: Reinitialize IA32_FEAT_CTL MSR on BSP during wakeup") Signed-off-by: Luciano Leão <lucianorsleao@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nícolas F. R. A. Prado <n@nfraprado.net> Link: https://lore.kernel.org/r/20220922200053.1357470-1-lucianorsleao@gmail.com
2022-09-23x86/resctrl: Make resctrl_arch_rmid_read() return values in bytesJames Morse
resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jamie Iles <quic_jiles@quicinc.com> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Xin Hao <xhao@linux.alibaba.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Cristian Marussi <cristian.marussi@arm.com> Link: https://lore.kernel.org/r/20220902154829.30399-22-james.morse@arm.com