summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-08Input: zinitix - make sure the IRQ is allocated before it gets enabledNikita Travkin
Since irq request is the last thing in the driver probe, it happens later than the input device registration. This means that there is a small time window where if the open method is called the driver will attempt to enable not yet available irq. Fix that by moving the irq request before the input device registration. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Fixes: 26822652c85e ("Input: add zinitix touchscreen driver") Signed-off-by: Nikita Travkin <nikita@trvn.ru> Link: https://lore.kernel.org/r/20220106072840.36851-2-nikita@trvn.ru Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-01-08ARM: dts: gpio-ranges property is now requiredPhil Elwell
Since [1], added in 5.7, the absence of a gpio-ranges property has prevented GPIOs from being restored to inputs when released. Add those properties for BCM283x and BCM2711 devices. [1] commit 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") Link: https://lore.kernel.org/r/20220104170247.956760-1-linus.walleij@linaro.org Fixes: 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") Fixes: 266423e60ea1 ("pinctrl: bcm2835: Change init order for gpio hogs") Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Reported-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Phil Elwell <phil@raspberrypi.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20211206092237.4105895-3-phil@raspberrypi.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2022-01-08Merge tag 'soc-fixes-5.16-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Olof Johansson: "A few more fixes have come in, nothing overly severe but would be good to get in by final release: - More specific compatible fields on the qspi controller for socfpga, to enable quirks in the driver - A runtime PM fix for Renesas to fix mismatched reference counts on errors" * tag 'soc-fixes-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: ARM: dts: socfpga: change qspi to "intel,socfpga-qspi" dt-bindings: spi: cadence-quadspi: document "intel,socfpga-qspi" reset: renesas: Fix Runtime PM usage
2022-01-08Merge branch 'i2c/for-current' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Fix the regression with AMD GPU suspend by reverting the handling of bus regulators in the I2C core. Also, there is a fix for the MPC driver to prevent an out-of-bound-access" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: core: support bus regulator controlling in adapter" i2c: mpc: Avoid out of bounds memory access
2022-01-08Merge tag 'for-v5.16-rc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: "Three fixes for the 5.16 cycle: - Avoid going beyond last capacity in the power-supply core - Replace 1E6L with NSEC_PER_MSEC to avoid floating point calculation in LLVM resulting in a build failure - Fix ADC measurements in bq25890 charger driver" * tag 'for-v5.16-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: reset: ltc2952: Fix use of floating point literals power: bq25890: Enable continuous conversion for ADC at charging power: supply: core: Break capacity loop
2022-01-08Merge tag 'xfs-5.16-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Darrick Wong: - Make the old ALLOCSP ioctl behave in a consistent manner with newer syscalls like fallocate. * tag 'xfs-5.16-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate
2022-01-07Merge branch 'for-5.16-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "This contains the cgroup.procs permission check fixes so that they use the credentials at the time of open rather than write, which also fixes the cgroup namespace lifetime bug" * 'for-5.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: selftests: cgroup: Test open-time cgroup namespace usage for migration checks selftests: cgroup: Test open-time credential usage for migration checks selftests: cgroup: Make cg_create() use 0755 for permission instead of 0644 cgroup: Use open-time cgroup namespace for process migration perm checks cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv cgroup: Use open-time credentials for process migraton perm checks
2022-01-07Merge tag 'block-5.16-2022-01-07' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fix from Jens Axboe: "Just the md bitmap regression this time" * tag 'block-5.16-2022-01-07' of git://git.kernel.dk/linux-block: md/raid1: fix missing bitmap update w/o WriteMostly devices
2022-01-07Merge tag 'edac_urgent_for_v5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Tony Luck: "Fix 10nm EDAC driver to release and unmap resources on systems without HBM" * tag 'edac_urgent_for_v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/i10nm: Release mdev/mbase when failing to detect HBM
2022-01-07Revert "i2c: core: support bus regulator controlling in adapter"Wolfram Sang
This largely reverts commit 5a7b95fb993ec399c8a685552aa6a8fc995c40bd. It breaks suspend with AMD GPUs, and we couldn't incrementally fix it. So, let's remove the code and go back to the drawing board. We keep the header extension to not break drivers already populating the regulator. We expect to re-add the code handling it soon. Fixes: 5a7b95fb993e ("i2c: core: support bus regulator controlling in adapter") Reported-by: "Tareque Md.Hanif" <tarequemd.hanif@yahoo.com> Link: https://lore.kernel.org/r/1295184560.182511.1639075777725@mail.yahoo.com Reported-by: Konstantin Kharlamov <hi-angel@yandex.ru> Link: https://lore.kernel.org/r/7143a7147978f4104171072d9f5225d2ce355ec1.camel@yandex.ru BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1850 Tested-by: "Tareque Md.Hanif" <tarequemd.hanif@yahoo.com> Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru> Signed-off-by: Wolfram Sang <wsa@kernel.org> Cc: <stable@vger.kernel.org> # 5.14+
2022-01-07Revert "libtraceevent: Increase libtraceevent logging when verbose"Arnaldo Carvalho de Melo
This reverts commit 08efcb4a638d260ef7fcbae64ecf7ceceb3f1841. This breaks the build as it will prefer using libbpf-devel header files, even when not using LIBBPF_DYNAMIC=1, breaking the build. This was detected on OpenSuSE Tumbleweed with libtraceevent-devel 1.3.0, as described by Jiri Slaby: ======================================================================= It breaks build with LIBTRACEEVENT_DYNAMIC and version 1.3.0: > util/debug.c: In function ‘perf_debug_option’: > util/debug.c:243:17: error: implicit declaration of function ‘tep_set_loglevel’ [-Werror=implicit-function-declaration] > 243 | tep_set_loglevel(TEP_LOG_INFO); > | ^~~~~~~~~~~~~~~~ > util/debug.c:243:34: error: ‘TEP_LOG_INFO’ undeclared (first use in this function); did you mean ‘TEP_PRINT_INFO’? > 243 | tep_set_loglevel(TEP_LOG_INFO); > | ^~~~~~~~~~~~ > | TEP_PRINT_INFO > util/debug.c:243:34: note: each undeclared identifier is reported only once for each function it appears in > util/debug.c:245:34: error: ‘TEP_LOG_DEBUG’ undeclared (first use in this function) > 245 | tep_set_loglevel(TEP_LOG_DEBUG); > | ^~~~~~~~~~~~~ > util/debug.c:247:34: error: ‘TEP_LOG_ALL’ undeclared (first use in this function) > 247 | tep_set_loglevel(TEP_LOG_ALL); > | ^~~~~~~~~~~ It is because the gcc's command line looks like: gcc ... -I/home/abuild/rpmbuild/BUILD/tools/lib/ ... -DLIBTRACEEVENT_VERSION=65790 ... ======================================================================= The proper way to fix this is more involved and so not suitable for this late in the 5.16-rc stage. Reported-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/lkml/bc2b0786-8965-1bcd-2316-9d9bb37b9c31@kernel.org Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/lkml/YddGjjmlMZzxUZbN@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-07perf trace: Avoid early exit due to running SIGCHLD handler before it makes ↵Jiri Olsa
sense to When running 'perf trace' with an BPF object like: # perf trace -e openat,tools/perf/examples/bpf/hello.c the event parsing eventually calls llvm__get_kbuild_opts() that runs a script and that ends up with SIGCHLD delivered to the 'perf trace' handler, which assumes the workload process is done and quits 'perf trace'. Move the SIGCHLD handler setup directly to trace__run(), where the event is parsed and the object is already compiled. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Christy Lee <christyc.y.lee@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220106222030.227499-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-07kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUIDJing Liu
KVM_GET_SUPPORTED_CPUID should not include any dynamic xstates in CPUID[0xD] if they have not been requested with prctl. Otherwise a process which directly passes KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2 would now fail even if it doesn't intend to use a dynamically enabled feature. Userspace must know that prctl is required and allocate >4K xstate buffer before setting any dynamic bit. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-5-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07kvm: x86: Fix xstate_required_size() to follow XSTATE alignment ruleJing Liu
CPUID.0xD.1.EBX enumerates the size of the XSAVE area (in compacted format) required by XSAVES. If CPUID.0xD.i.ECX[1] is set for a state component (i), this state component should be located on the next 64-bytes boundary following the preceding state component in the compacted layout. Fix xstate_required_size() to follow the alignment rule. AMX is the first state component with 64-bytes alignment to catch this bug. Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-4-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/fpu: Prepare guest FPU for dynamically enabled FPU featuresThomas Gleixner
To support dynamically enabled FPU features for guests prepare the guest pseudo FPU container to keep track of the currently enabled xfeatures and the guest permissions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-3-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/fpu: Extend fpu_xstate_prctl() with guest permissionsThomas Gleixner
KVM requires a clear separation of host user space and guest permissions for dynamic XSTATE components. Add a guest permissions member to struct fpu and a separate set of prctl() arguments: ARCH_GET_XCOMP_GUEST_PERM and ARCH_REQ_XCOMP_GUEST_PERM. The semantics are equivalent to the host user space permission control except for the following constraints: 1) Permissions have to be requested before the first vCPU is created 2) Permissions are frozen when the first vCPU is created to ensure consistency. Any attempt to expand permissions via the prctl() after that point is rejected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-2-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07kvm: selftests: move ucall declarations into ucall_common.hMichael Roth
Now that core kvm_util declarations have special home in kvm_util_base.h, move ucall-related declarations out into a separate header. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211210164620.11636-3-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07kvm: selftests: move base kvm_util.h declarations to kvm_util_base.hMichael Roth
Between helper macros and interfaces that will be introduced in subsequent patches, much of kvm_util.h would end up being declarations specific to ucall. Ideally these could be separated out into a separate header since they are not strictly required for writing guest tests and are mostly self-contained interfaces other than a reliance on a few core declarations like struct kvm_vm. This doesn't make a big difference as far as how tests will be compiled/written since all these interfaces will still be packaged up into a single/common libkvm.a used by all tests, but it is still nice to be able to compartmentalize to improve readabilty and reduce merge conflicts in the future for common tasks like adding new interfaces to kvm_util.h. Furthermore, some of the ucall declarations will be arch-specific, requiring various #ifdef'ery in kvm_util.h. Ideally these declarations could live in separate arch-specific headers, e.g. include/<arch>/ucall.h, which would handle arch-specific declarations as well as pulling in common ucall-related declarations shared by all archs. One simple way to do this would be to #include ucall.h at the bottom of kvm_util.h, after declarations it relies upon like struct kvm_vm. This is brittle however, and doesn't scale easily to other sets of interfaces that may be added in the future. Instead, move all declarations currently in kvm_util.h into kvm_util_base.h, then have kvm_util.h #include it. With this change, non-base declarations can be selectively moved/introduced into separate headers, which can then be included in kvm_util.h so that individual tests don't need to be touched. Subsequent patches will then move ucall-related declarations into a separate header to meet the above goals. Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211210164620.11636-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Two small fixes for x86: - lockdep WARN due to missing lock nesting annotation - NULL pointer dereference when accessing debugfs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Check for rmaps allocation KVM: SEV: Mark nested locking of kvm->lock
2022-01-07Merge tag 'drm-fixes-2022-01-07' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "There is only the amdgpu runtime pm regression fix in here: amdgpu: - suspend/resume fix - fix runtime PM regression" * tag 'drm-fixes-2022-01-07' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: disable runpm if we are the primary adapter fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb drm/amd/pm: keep the BACO feature enabled for suspend
2022-01-07KVM: x86: Check for rmaps allocationNikunj A Dadhania
With TDP MMU being the default now, access to mmu_rmaps_stat debugfs file causes following oops: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 3185 Comm: cat Not tainted 5.16.0-rc4+ #204 RIP: 0010:pte_list_count+0x6/0x40 Call Trace: <TASK> ? kvm_mmu_rmaps_stat_show+0x15e/0x320 seq_read_iter+0x126/0x4b0 ? aa_file_perm+0x124/0x490 seq_read+0xf5/0x140 full_proxy_read+0x5c/0x80 vfs_read+0x9f/0x1a0 ksys_read+0x67/0xe0 __x64_sys_read+0x19/0x20 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fca6fc13912 Return early when rmaps are not present. Reported-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220105040337.4234-1-nikunj@amd.com> Cc: stable@vger.kernel.org Fixes: 3bcd0662d66f ("KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: SEV: Mark nested locking of kvm->lockWanpeng Li
Both source and dest vms' kvm->locks are held in sev_lock_two_vms. Mark one with a different subtype to avoid false positives from lockdep. Fixes: c9d61dcb0bc26 (KVM: SEV: accept signals in sev_lock_two_vms) Reported-by: Yiru Xu <xyru1999@gmail.com> Tested-by: Jinrong Liang <cloudliang@tencent.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1641364863-26331-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: SVM: include CR3 in initial VMSA state for SEV-ES guestsMichael Roth
Normally guests will set up CR3 themselves, but some guests, such as kselftests, and potentially CONFIG_PVH guests, rely on being booted with paging enabled and CR3 initialized to a pre-allocated page table. Currently CR3 updates via KVM_SET_SREGS* are not loaded into the guest VMCB until just prior to entering the guest. For SEV-ES/SEV-SNP, this is too late, since it will have switched over to using the VMSA page prior to that point, with the VMSA CR3 copied from the VMCB initial CR3 value: 0. Address this by sync'ing the CR3 value into the VMCB save area immediately when KVM_SET_SREGS* is issued so it will find it's way into the initial VMSA. Suggested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-Id: <20211216171358.61140-10-michael.roth@amd.com> [Remove vmx_post_set_cr3; add a remark about kvm_set_cr3 not calling the new hook. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: VMX: Provide vmread version using asm-goto-with-outputsPeter Zijlstra
Use asm-goto-output for smaller fast path code. Message-Id: <YbcbbGW2GcMx6KpD@hirez.programming.kicks-ass.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86: Fix wall clock writes in Xen shared_info not to mark page dirtyDavid Woodhouse
When dirty ring logging is enabled, any dirty logging without an active vCPU context will cause a kernel oops. But we've already declared that the shared_info page doesn't get dirty tracking anyway, since it would be kind of insane to mark it dirty every time we deliver an event channel interrupt. Userspace is supposed to just assume it's always dirty any time a vCPU can run or event channels are routed. So stop using the generic kvm_write_wall_clock() and just write directly through the gfn_to_pfn_cache that we already have set up. We can make kvm_write_wall_clock() static in x86.c again now, but let's not remove the 'sec_hi_ofs' argument even though it's not used yet. At some point we *will* want to use that for KVM guests too. Fixes: 629b5348841a ("KVM: x86/xen: update wallclock region") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-6-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel deliveryDavid Woodhouse
This adds basic support for delivering 2 level event channels to a guest. Initially, it only supports delivery via the IRQ routing table, triggered by an eventfd. In order to do so, it has a kvm_xen_set_evtchn_fast() function which will use the pre-mapped shared_info page if it already exists and is still valid, while the slow path through the irqfd_inject workqueue will remap the shared_info page if necessary. It sets the bits in the shared_info page but not the vcpu_info; that is deferred to __kvm_xen_has_interrupt() which raises the vector to the appropriate vCPU. Add a 'verbose' mode to xen_shinfo_test while adding test cases for this. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-5-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/xen: Maintain valid mapping of Xen shared_info pageDavid Woodhouse
Use the newly reinstated gfn_to_pfn_cache to maintain a kernel mapping of the Xen shared_info page so that it can be accessed in atomic context. Note that we do not participate in dirty tracking for the shared info page and we do not explicitly mark it dirty every single tim we deliver an event channel interrupts. We wouldn't want to do that even if we *did* have a valid vCPU context with which to do so. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: Reinstate gfn_to_pfn_cache with invalidation supportDavid Woodhouse
This can be used in two modes. There is an atomic mode where the cached mapping is accessed while holding the rwlock, and a mode where the physical address is used by a vCPU in guest mode. For the latter case, an invalidation will wake the vCPU with the new KVM_REQ_GPC_INVALIDATE, and the architecture will need to refresh any caches it still needs to access before entering guest mode again. Only one vCPU can be targeted by the wake requests; it's simple enough to make it wake all vCPUs or even a mask but I don't see a use case for that additional complexity right now. Invalidation happens from the invalidate_range_start MMU notifier, which needs to be able to sleep in order to wake the vCPU and wait for it. This means that revalidation potentially needs to "wait" for the MMU operation to complete and the invalidate_range_end notifier to be invoked. Like the vCPU when it takes a page fault in that period, we just spin — fixing that in a future patch by implementing an actual *wait* may be another part of shaving this particularly hirsute yak. As noted in the comments in the function itself, the only case where the invalidate_range_start notifier is expected to be called *without* being able to sleep is when the OOM reaper is killing the process. In that case, we expect the vCPU threads already to have exited, and thus there will be nothing to wake, and no reason to wait. So we clear the KVM_REQUEST_WAIT bit and send the request anyway, then complain loudly if there actually *was* anything to wake up. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: Warn if mark_page_dirty() is called without an active vCPUDavid Woodhouse
The various kvm_write_guest() and mark_page_dirty() functions must only ever be called in the context of an active vCPU, because if dirty ring tracking is enabled it may simply oops when kvm_get_running_vcpu() returns NULL for the vcpu and then kvm_dirty_ring_get() dereferences it. This oops was reported by "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/ That actual bug will be fixed under separate cover but this warning should help to prevent new ones from being added. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211210163625.2886-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07x86/kvm: Silence per-cpu pr_info noise about KVM clocks and steal timeDavid Woodhouse
I made the actual CPU bringup go nice and fast... and then Linux spends half a minute printing stupid nonsense about clocks and steal time for each of 256 vCPUs. Don't do that. Nobody cares. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <20211209150938.3518-12-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86: Update vPMCs when retiring branch instructionsEric Hankland
When KVM retires a guest branch instruction through emulation, increment any vPMCs that are configured to monitor "branch instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Moved/consolidated the calls to kvm_pmu_trigger_event() in the emulation of VMLAUNCH/VMRESUME to accommodate the evolution of that code. ] Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20211130074221.93635-7-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86: Update vPMCs when retiring instructionsEric Hankland
When KVM retires a guest instruction through emulation, increment any vPMCs that are configured to monitor "instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Added 'static' to kvm_pmu_incr_counter() definition. - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state == PERF_EVENT_STATE_ACTIVE. ] Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> [likexu: - Drop checks for pmc->perf_event or event state or event type - Increase a counter once its umask bits and the first 8 select bits are matched - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf; - Rename kvm_pmu_record_event to kvm_pmu_trigger_event; - Add counter enable and CPL check for kvm_pmu_trigger_event(); ] Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-6-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}()Like Xu
Depending on whether intr should be triggered or not, KVM registers two different event overflow callbacks in the perf_event context. The code skeleton of these two functions is very similar, so the pmc->intr can be stored into pmc from pmc_reprogram_counter() which provides smaller instructions footprint against the u-architecture branch predictor. The __kvm_perf_overflow() can be called in non-nmi contexts and a flag is needed to distinguish the caller context and thus avoid a check on kvm_is_in_guest(), otherwise we might get warnings from suspicious RCU or check_preemption_disabled(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-5-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event()Like Xu
Since we set the same semantic event value for the fixed counter in pmc->eventsel, returning the perf_hw_id for the fixed counter via find_fixed_event() can be painlessly replaced by pmc_perf_hw_id() with the help of pmc_is_fixed() check. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-4-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id()Like Xu
The find_arch_event() returns a "unsigned int" value, which is used by the pmc_reprogram_counter() to program a PERF_TYPE_HARDWARE type perf_event. The returned value is actually the kernel defined generic perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler incoming parameters for better self-explanation. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/pmu: Setup pmc->eventsel for fixed PMCsLike Xu
The current pmc->eventsel for fixed counter is underutilised. The pmc->eventsel can be setup for all known available fixed counters since we have mapping between fixed pmc index and the intel_arch_events array. Either gp or fixed counter, it will simplify the later checks for consistency between eventsel and perf_hw_id. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20211130074221.93635-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86: avoid out of bounds indices for fixed performance countersPaolo Bonzini
Because IceLake has 4 fixed performance counters but KVM only supports 3, it is possible for reprogram_fixed_counters to pass to reprogram_fixed_counter an index that is out of bounds for the fixed_pmc_events array. Ultimately intel_find_fixed_event, which is the only place that uses fixed_pmc_events, handles this correctly because it checks against the size of fixed_pmc_events anyway. Every other place operates on the fixed_counters[] array which is sized according to INTEL_PMC_MAX_FIXED. However, it is cleaner if the unsupported performance counters are culled early on in reprogram_fixed_counters. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT + !URGLai Jiangshan
When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but GUEST_CR3 is still vmx->ept_identity_map_addr if EPT + !URG. So VCPU_EXREG_CR3 is considered to be dirty and GUEST_CR3 needs to be updated in this case. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-4-jiangshanlai@gmail.com> Fixes: c62c7bd4f95b ("KVM: VMX: Update vmcs.GUEST_CR3 only when the guest CR3 is dirty") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: x86/mmu: Reconstruct shadow page root if the guest PDPTEs is changedLai Jiangshan
For shadow paging, the page table needs to be reconstructed before the coming VMENTER if the guest PDPTEs is changed. But not all paths that call load_pdptrs() will cause the page tables to be reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots() are used to launch later reconstruction. The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs() when changing CR0.CD and CR0.NW. The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after load_pdptrs() when rewriting the CR3 with the same value. The commit a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after load_pdptrs() when changing CR4.PGE. Guests like linux would keep the PDPTEs unchanged for every instance of pagetable, so this missing reconstruction has no problem for linux guests. Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed") Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush") Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE") Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-3-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()Lai Jiangshan
The host CR3 in the vcpu thread can only be changed when scheduling, so commit 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") changed vmx.c to only save it in vmx_prepare_switch_to_guest(). However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS. vmx_set_host_fs_gs() is called in both places, so rename it to vmx_set_vmcs_host_state() and make it update HOST_CR3. Fixes: 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()") Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211216021938.11752-2-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07Revert "KVM: X86: Update mmu->pdptrs only when it is changed"Paolo Bonzini
This reverts commit 24cd19a28cb7174df502162641d6e1e12e7ffbd9. Sean Christopherson reports: "Commit 24cd19a28cb7 ('KVM: X86: Update mmu->pdptrs only when it is changed') breaks nested VMs with EPT in L0 and PAE shadow paging in L2. Reproducing is trivial, just disable EPT in L1 and run a VM. I haven't investigating how it breaks things." Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07selftests: KVM: sev_migrate_tests: Add mirror command testsPeter Gonda
Add tests to confirm mirror vms can only run correct subset of commands. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20211208191642.3792819-4-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07selftests: KVM: sev_migrate_tests: Fix sev_ioctl()Peter Gonda
TEST_ASSERT in SEV ioctl was allowing errors because it checked return value was good OR the FW error code was OK. This TEST_ASSERT should require both (aka. AND) values are OK. Removes the LAUNCH_START from the mirror VM because this call correctly fails because mirror VMs cannot call this command. Currently issues with the PSP driver functions mean the firmware error is not always reset to SEV_RET_SUCCESS when a call is successful. Mainly sev_platform_init() doesn't correctly set the fw error if the platform has already been initialized. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20211208191642.3792819-3-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07selftests: KVM: sev_migrate_tests: Fix test_sev_mirror()Peter Gonda
Mirrors should not be able to call LAUNCH_START. Remove the call on the mirror to correct the test before fixing sev_ioctl() to correctly assert on this failed ioctl. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Marc Orr <marcorr@google.com> Signed-off-by: Peter Gonda <pgonda@google.com> Message-Id: <20211208191642.3792819-2-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-07Merge tag 'kvm-riscv-5.17-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 5.17, take #1 - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish
2022-01-07Merge tag 'kvmarm-5.17' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.16 - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update
2022-01-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Last pull for 5.16, the reversion has been known for a while now but didn't get a proper fix in time. Looks like we will have several info-leak bugs to take care of going foward. - Revert the patch fixing the DM related crash causing a widespread regression for kernel ULPs. A proper fix just didn't appear this cycle due to the holidays - Missing NULL check on alloc in uverbs - Double free in rxe error paths - Fix a new kernel-infoleak report when forming ah_attr's without GRH's in ucma" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Don't infoleak GRH fields RDMA/uverbs: Check for null return of kmalloc_array Revert "RDMA/mlx5: Fix releasing unallocated memory in dereg MR flow" RDMA/rxe: Prevent double freeing rxe_map_set()
2022-01-06Merge tag 'trace-v5.16-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Three minor tracing fixes: - Fix missing prototypes in sample module for direct functions - Fix check of valid buffer in get_trace_buf() - Fix annotations of percpu pointers" * tag 'trace-v5.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Tag trace_percpu_buffer as a percpu pointer tracing: Fix check for trace_percpu_buffer validity in get_trace_buf() ftrace/samples: Add missing prototypes direct functions
2022-01-06selftests: cgroup: Test open-time cgroup namespace usage for migration checksTejun Heo
When a task is writing to an fd opened by a different task, the perm check should use the cgroup namespace of the latter task. Add a test for it. Tested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-06selftests: cgroup: Test open-time credential usage for migration checksTejun Heo
When a task is writing to an fd opened by a different task, the perm check should use the credentials of the latter task. Add a test for it. Tested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>