Age | Commit message (Collapse) | Author |
|
Since GICv4.1, it has become legal for an implementation to advertise
GICR_{INVLPIR,INVALLR,SYNCR} while having an ITS, allowing for a more
efficient invalidation scheme (no guest command queue contention when
multiple CPUs are generating invalidations).
Provide the invalidation registers as a primitive to their ITS
counterpart. Note that we don't advertise them to the guest yet
(the architecture allows an implementation to do this).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oupton@google.com>
Link: https://lore.kernel.org/r/20220405182327.205520-4-maz@kernel.org
|
|
When disabling LPIs, a guest needs to poll GICR_CTLR.RWP in order
to be sure that the write has taken effect. We so far reported it
as 0, as we didn't advertise that LPIs could be turned off the
first place.
Start tracking this state during which LPIs are being disabled,
and expose the 'in progress' state via the RWP bit.
We also take this opportunity to disallow enabling LPIs and programming
GICR_{PEND,PROP}BASER while LPI disabling is in progress, as allowed by
the architecture (UNPRED behaviour).
We don't advertise the feature to the guest yet (which is allowed by
the architecture).
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220405182327.205520-3-maz@kernel.org
|
|
For TDX guests, the maximum number of vcpus needs to be specified when the
TDX guest VM is initialized (creating the TDX data corresponding to TDX
guest) before creating vcpu. It needs to record the maximum number of
vcpus on VM creation (KVM_CREATE_VM) and return error if the number of
vcpus exceeds it
Because there is already max_vcpu member in arm64 struct kvm_arch, move it
to common struct kvm and initialize it to KVM_MAX_VCPUS before
kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove unnecessary casts.
Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220329102059.268983-1-yuzhe@nfschina.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 5.18
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
|
|
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr
|
|
It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always
result in the pending interrupts being accurately reported if they are
mapped to a HW interrupt. This is particularily visible when acking
the timer interrupt and reading the GICR_ISPENDR1 register immediately
after, for example (the interrupt appears as not-pending while it really
is...).
This is because a HW interrupt has its 'active and pending state' kept
in the *physical* distributor, and not in the virtual one, as mandated
by the spec (this is what allows the direct deactivation). The virtual
distributor only caries the pending and active *states* (note the
plural, as these are two independent and non-overlapping states).
Fix it by reading the HW state back, either from the timer itself or
from the distributor if necessary.
Reported-by: Ricardo Koller <ricarkol@google.com>
Tested-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #1
- Correctly update the shadow register on exception injection when
running in nVHE mode
- Correctly use the mm_ops indirection when performing cache invalidation
from the page-table walker
- Restrict the vgic-v3 workaround for SEIS to the two known broken
implementations
|
|
Contrary to what df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3
locally generated SErrors") was asserting, there is at least one other
system out there (Cavium ThunderX2) implementing SEIS, and not in
an obviously broken way.
So instead of imposing the M1 workaround on an innocent bystander,
let's limit it to the two known broken Apple implementations.
Fixes: df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors")
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220122103912.795026-1-maz@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.16
- Simplification of the 'vcpu first run' by integrating it into
KVM's 'pid change' flow
- Refactoring of the FP and SVE state tracking, also leading to
a simpler state and less shared data between EL1 and EL2 in
the nVHE case
- Tidy up the header file usage for the nvhe hyp object
- New HYP unsharing mechanism, finally allowing pages to be
unmapped from the Stage-1 EL2 page-tables
- Various pKVM cleanups around refcounting and sharing
- A couple of vgic fixes for bugs that would trigger once
the vcpu xarray rework is merged, but not sooner
- Add minimal support for ARMv8.7's PMU extension
- Rework kvm_pgtable initialisation ahead of the NV work
- New selftest for IRQ injection
- Teach selftests about the lack of default IPA space and
page sizes
- Expand sysreg selftest to deal with Pointer Authentication
- The usual bunch of cleanups and doc update
|
|
* kvm-arm64/vgic-fixes-5.17:
: .
: A few vgic fixes:
: - Harden vgic-v3 error handling paths against signed vs unsigned
: comparison that will happen once the xarray-based vcpus are in
: - Demote userspace-triggered console output to kvm_debug()
: .
KVM: arm64: vgic: Demote userspace-triggered console prints to kvm_debug()
KVM: arm64: vgic-v3: Fix vcpu index comparison
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Running the KVM selftests results in these messages being dumped
in the kernel console:
[ 188.051073] kvm [469]: VGIC redist and dist frames overlap
[ 188.056820] kvm [469]: VGIC redist and dist frames overlap
[ 188.076199] kvm [469]: VGIC redist and dist frames overlap
Being amle to trigger this from userspace is definitely not on,
so demote these warnings to kvm_debug().
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104507.1482017-1-maz@kernel.org
|
|
When handling an error at the point where we try and register
all the redistributors, we unregister all the previously
registered frames by counting down from the failing index.
However, the way the code is written relies on that index
being a signed value. Which won't be true once we switch to
an xarray-based vcpu set.
Since this code is pretty awkward the first place, and that the
failure mode is hard to spot, rewrite this loop to iterate
over the vcpus upwards rather than downwards.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211216104526.1482124-1-maz@kernel.org
|
|
* kvm-arm64/pkvm-cleanups-5.17:
: .
: pKVM cleanups from Quentin Perret:
:
: This series is a collection of various fixes and cleanups for KVM/arm64
: when running in nVHE protected mode. The first two patches are real
: fixes/improvements, the following two are minor cleanups, and the last
: two help satisfy my paranoia so they're certainly optional.
: .
KVM: arm64: pkvm: Make kvm_host_owns_hyp_mappings() robust to VHE
KVM: arm64: pkvm: Stub io map functions
KVM: arm64: Make __io_map_base static
KVM: arm64: Make the hyp memory pool static
KVM: arm64: pkvm: Disable GICv2 support
KVM: arm64: pkvm: Fix hyp_pool max order
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
GICv2 requires having device mappings in guests and the hypervisor,
which is incompatible with the current pKVM EL2 page ownership model
which only covers memory. While it would be desirable to support pKVM
with GICv2, this will require a lot more work, so let's make the
current assumption clear until then.
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211208152300.2478542-3-qperret@google.com
|
|
Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu
index. Unfortunately, we're about to move rework the iterator,
which requires this to be upgrade to an unsigned long.
Let's bite the bullet and repaint all of it in one go.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20211116160403.4074052-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
* kvm-arm64/misc-5.17:
: .
: - Add minimal support for ARMv8.7's PMU extension
: - Constify kvm_io_gic_ops
: - Drop kvm_is_transparent_hugepage() prototype
: .
KVM: Drop stale kvm_is_transparent_hugepage() declaration
KVM: arm64: Constify kvm_io_gic_ops
KVM: arm64: Add minimal handling for the ARMv8.7 PMU
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The only usage of kvm_io_gic_ops is to make a comparison with its
address and to pass its address to kvm_iodevice_init() which takes a
pointer to const kvm_io_device_ops as input. Make it const to allow the
compiler to put it in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211204213518.83642-1-rikard.falkeborn@gmail.com
|
|
With the transition to kvm_arch_vcpu_run_pid_change() to handle
the "run once" activities, it becomes obvious that has_run_once
is now an exact shadow of vcpu->pid.
Replace vcpu->arch.has_run_once with a new vcpu_has_run_once()
helper that directly checks for vcpu->pid, and get rid of the
now unused field.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/memory-accounting:
: .
: Sprinkle a bunch of GFP_KERNEL_ACCOUNT all over the code base
: to better track memory allocation made on behalf of a VM.
: .
KVM: arm64: Add memcg accounting to KVM allocations
KVM: arm64: vgic: Add memcg accounting to vgic allocations
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM
allocations"), it would be better to make arm64 vgic consistent with
common kvm codes.
The memory allocations of VM scope should be charged into VM process
cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT.
There remain a few cases since these allocations are global, not in VM
scope.
Signed-off-by: Jia He <justin.he@arm.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210907123112.10232-2-justin.he@arm.com
|
|
* kvm-arm64/vgic-fixes-5.16:
: .
: Multiple updates to the GICv3 emulation in order to better support
: the dreadful Apple M1 that only implements half of it, and in a
: broken way...
: .
KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode
KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS
KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
accesses from the guest. From a performance perspective, this is OK
as long as the guest doesn't hammer the GICv3 CPU interface.
In most cases, this is fine, unless the guest actively uses
priorities and switches PMR_EL1 very often. Which is exactly what
happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
In these condition, the performance plumets as we hit PMR each time
we mask/unmask interrupts. Not good.
There is however an opportunity for improvement. Careful reading
of the architecture specification indicates that the only GICv3
sysreg belonging to the common group (which contains the SGI
registers, PMR, DIR, CTLR and RPR) that is allowed to generate
a SError is DIR. Everything else is safe.
It is thus possible to substitute the trapping of all the common
group with just that of DIR if it supported by the implementation.
Yes, that's yet another optional bit of the architecture.
So let's just do that, as it leads to some impressive result on
the M1:
Without this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 56.596
With this change:
bash-5.1# /host/home/maz/hackbench 100 process 1000
Running with 100*40 (== 4000) tasks.
Time: 8.649
which is a pretty convincing result.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
|
|
The infamous M1 has a feature nobody else ever implemented,
in the form of the "GIC locally generated SError interrupts",
also known as SEIS for short.
These SErrors are generated when a guest does something that violates
the GIC state machine. It would have been simpler to just *ignore*
the damned thing, but that's not what this HW does. Oh well.
This part of of the architecture is also amazingly under-specified.
There is a whole 10 lines that describe the feature in a spec that
is 930 pages long, and some of these lines are factually wrong.
Oh, and it is deprecated, so the insentive to clarify it is low.
Now, the spec says that this should be a *virtual* SError when
HCR_EL2.AMO is set. As it turns out, that's not always the case
on this CPU, and the SError sometimes fires on the host as a
physical SError. Goodbye, cruel world. This clearly is a HW bug,
and it means that a guest can easily take the host down, on demand.
Thankfully, we have seen systems that were just as broken in the
past, and we have the perfect vaccine for it.
Apple M1, please meet the Cavium ThunderX workaround. All your
GIC accesses will be trapped, sanitised, and emulated. Only the
signalling aspect of the HW will be used. It won't be super speedy,
but it will at least be safe. You're most welcome.
Given that this has only ever been seen on this single implementation,
that the spec is unclear at best and that we cannot trust it to ever
be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
being set.
Tested-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
|
|
There are no more users of vgic_check_ioaddr(). Move its checks to
vgic_check_iorange() and then remove it.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-6-ricarkol@google.com
|
|
Verify that the ITS region does not extend beyond the VM-specified IPA
range (phys_size).
base + size > phys_size AND base < phys_size
Add the missing check into vgic_its_set_attr() which is called when
setting the region.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-5-ricarkol@google.com
|
|
Verify that the GICv2 CPU interface does not extend beyond the
VM-specified IPA range (phys_size).
base + size > phys_size AND base < phys_size
Add the missing check into kvm_vgic_addr() which is called when setting
the region. This patch also enables some superfluous checks for the
distributor (vgic_check_ioaddr was enough as alignment == size for the
distributors).
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-4-ricarkol@google.com
|
|
Verify that the redistributor regions do not extend beyond the
VM-specified IPA range (phys_size). This can happen when using
KVM_VGIC_V3_ADDR_TYPE_REDIST or KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS
with:
base + size > phys_size AND base < phys_size
Add the missing check into vgic_v3_alloc_redist_region() which is called
when setting the regions, and into vgic_v3_check_base() which is called
when attempting the first vcpu-run. The vcpu-run check does not apply to
KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONS because the regions size is known
before the first vcpu-run. Note that using the REDIST_REGIONS API
results in a different check, which already exists, at first vcpu run:
that the number of redist regions is enough for all vcpus.
Finally, this patch also enables some extra tests in
vgic_v3_alloc_redist_region() by calculating "size" early for the legacy
redist api: like checking that the REDIST region can fit all the already
created vcpus.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-3-ricarkol@google.com
|
|
Add the new vgic_check_iorange helper that checks that an iorange is
sane: the start address and size have valid alignments, the range is
within the addressable PA range, start+size doesn't overflow, and the
start wasn't already defined.
No functional change.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211005011921.437353-2-ricarkol@google.com
|
|
When a mapped level interrupt (a timer, for example) is deactivated
by the guest, the corresponding host interrupt is equally deactivated.
However, the fate of the pending state still needs to be dealt
with in SW.
This is specially true when the interrupt was in the active+pending
state in the virtual distributor at the point where the guest
was entered. On exit, the pending state is potentially stale
(the guest may have put the interrupt in a non-pending state).
If we don't do anything, the interrupt will be spuriously injected
in the guest. Although this shouldn't have any ill effect (spurious
interrupts are always possible), we can improve the emulation by
detecting the deactivation-while-pending case and resample the
interrupt.
While we're at it, move the logic into a common helper that can
be shared between the two GIC implementations.
Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts")
Reported-by: Raghavendra Rao Ananta <rananta@google.com>
Tested-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org
|
|
vgic_get_irq(intid) is used all over the vgic code in order to get a
reference to a struct irq. It warns whenever intid is not a valid number
(like when it's a reserved IRQ number). The issue is that this warning
can be triggered from userspace (e.g., KVM_IRQ_LINE for intid 1020).
Drop the WARN call from vgic_get_irq.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210818213205.598471-1-ricarkol@google.com
|
|
Remove the repeated word 'the' from two comments.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210728130623.12017-1-wangborong@cdjrlc.com
|
|
In order to deal with these systems that do not offer HW-based
deactivation of interrupts, let implement a SW-based approach:
- When the irq is queued into a LR, treat it as a pure virtual
interrupt and set the EOI flag in the LR.
- When the interrupt state is read back from the LR, force a
deactivation when the state is invalid (neither active nor
pending)
Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We already have the option to attach a callback to an interrupt
to retrieve its pending state. As we are planning to expand this
facility, move this callback into its own data structure.
This will limit the size of individual interrupts as the ops
structures can be shared across multiple interrupts.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The vGIC, as architected by ARM, allows a virtual interrupt to
trigger the deactivation of a physical interrupt. This allows
the following interrupt to be delivered without requiring an exit.
However, some implementations have choosen not to implement this,
meaning that we will need some unsavoury workarounds to deal with this.
On detecting such a case, taint the kernel and spit a nastygram.
We'll deal with this in later patches.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As it turns out, not all the interrupt controllers are able to
expose a vGIC maintenance interrupt that can be independently
enabled/disabled.
And to be fair, it doesn't really matter as all we require is
for the interrupt to kick us out of guest mode out way or another.
To that effect, add gic_kvm_info.no_maint_irq_mask for an interrupt
controller to advertise the lack of masking.
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The vGIC advertising code is unsurprisingly very much tied to
the GIC implementations. However, we are about to extend the
support to lesser implementations.
Let's dissociate the vgic registration from the GIC code and
move it into KVM, where it makes a bit more sense. This also
allows us to mark the gic_kvm_info structures as __initdata.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull CFI on arm64 support from Kees Cook:
"This builds on last cycle's LTO work, and allows the arm64 kernels to
be built with Clang's Control Flow Integrity feature. This feature has
happily lived in Android kernels for almost 3 years[1], so I'm excited
to have it ready for upstream.
The wide diffstat is mainly due to the treewide fixing of mismatched
list_sort prototypes. Other things in core kernel are to address
various CFI corner cases. The largest code portion is the CFI runtime
implementation itself (which will be shared by all architectures
implementing support for CFI). The arm64 pieces are Acked by arm64
maintainers rather than coming through the arm64 tree since carrying
this tree over there was going to be awkward.
CFI support for x86 is still under development, but is pretty close.
There are a handful of corner cases on x86 that need some improvements
to Clang and objtool, but otherwise works well.
Summary:
- Clean up list_sort prototypes (Sami Tolvanen)
- Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)"
* tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm64: allow CONFIG_CFI_CLANG to be selected
KVM: arm64: Disable CFI for nVHE
arm64: ftrace: use function_nocfi for ftrace_call
arm64: add __nocfi to __apply_alternatives
arm64: add __nocfi to functions that jump to a physical address
arm64: use function_nocfi with __pa_symbol
arm64: implement function_nocfi
psci: use function_nocfi for cpu_resume
lkdtm: use function_nocfi
treewide: Change list_sort to use const pointers
bpf: disable CFI in dispatcher functions
kallsyms: strip ThinLTO hashes from static functions
kthread: use WARN_ON_FUNCTION_MISMATCH
workqueue: use WARN_ON_FUNCTION_MISMATCH
module: ensure __cfi_check alignment
mm: add generic function_nocfi macro
cfi: add __cficanonical
add support for Clang CFI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The usual updates from the irq departement:
Core changes:
- Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can
be cleaned up which either use a seperate mechanism to prevent
auto-enable at request time or have a racy mechanism which disables
the interrupt right after request.
- Get rid of the last usage of irq_create_identity_mapping() and
remove the interface.
- An overhaul of tasklet_disable().
Most usage sites of tasklet_disable() are in task context and
usually in cleanup, teardown code pathes. tasklet_disable()
spinwaits for a tasklet which is currently executed. That's not
only a problem for PREEMPT_RT where this can lead to a live lock
when the disabling task preempts the softirq thread. It's also
problematic in context of virtualization when the vCPU which runs
the tasklet is scheduled out and the disabling code has to spin
wait until it's scheduled back in.
There are a few code pathes which invoke tasklet_disable() from
non-sleepable context. For these a new disable variant which still
spinwaits is provided which allows to switch tasklet_disable() to a
sleep wait mechanism. For the atomic use cases this does not solve
the live lock issue on PREEMPT_RT. That is mitigated by blocking on
the RT specific softirq lock.
- The PREEMPT_RT specific implementation of softirq processing and
local_bh_disable/enable().
On RT enabled kernels soft interrupt processing happens always in
task context and all interrupt handlers, which are not explicitly
marked to be invoked in hard interrupt context are forced into task
context as well. This allows to protect against softirq processing
with a per CPU lock, which in turn allows to make BH disabled
regions preemptible.
Most of the softirq handling code is still shared. The RT/non-RT
specific differences are addressed with a set of inline functions
which provide the context specific functionality. The
local_bh_disable() / local_bh_enable() mechanism are obviously
seperate.
- The usual set of small improvements and cleanups
Driver changes:
- New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt
controllers
- Extended functionality for MStar, STM32 and SC7280 irq chips
- Enhanced robustness for ARM GICv3/4.1 drivers
- The usual set of cleanups and improvements all over the place"
* tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP
irqchip/gic-v3: Do not enable irqs when handling spurious interrups
dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller
irqchip: Add support for IDT 79rc3243x interrupt controller
irqdomain: Drop references to recusive irqdomain setup
irqdomain: Get rid of irq_create_strict_mappings()
irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
ARM: PXA: Kill use of irq_create_strict_mappings()
irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection
irqchip/tb10x: Use 'fallthrough' to eliminate a warning
genirq: Reduce irqdebug cacheline bouncing
kernel: Initialize cpumask before parsing
irqchip/wpcm450: Drop COMPILE_TEST
irqchip/irq-mst: Support polarity configuration
irqchip: Add driver for WPCM450 interrupt controller
dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic
dt-bindings: qcom,pdc: Add compatible for sc7280
irqchip/stm32: Add usart instances exti direct event support
irqchip/gic-v3: Fix OF_BAD_ADDR error handling
irqchip/sifive-plic: Mark two global variables __ro_after_init
...
|
|
GIC CPU interfaces versions predating GIC v4.1 were not built to
accommodate vINTID within the vSGI range; as reported in the GIC
specifications (8.2 "Changes to the CPU interface"), it is
CONSTRAINED UNPREDICTABLE to deliver a vSGI to a PE with
ID_AA64PFR0_EL1.GIC < b0011.
Check the GIC CPUIF version by reading the SYS_ID_AA64_PFR0_EL1.
Disable vSGIs if a CPUIF version < 4.1 is detected to prevent using
vSGIs on systems where they may misbehave.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210317100719.3331-2-lorenzo.pieralisi@arm.com
|
|
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When reading the base address of the a REDIST region
through KVM_VGIC_V3_ADDR_TYPE_REDIST we expect the
redistributor region list to be populated with a single
element.
However list_first_entry() expects the list to be non empty.
Instead we should use list_first_entry_or_null which effectively
returns NULL if the list is empty.
Fixes: dbd9733ab674 ("KVM: arm/arm64: Replace the single rdist region by a list")
Cc: <Stable@vger.kernel.org> # v4.18+
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210412150034.29185-1-eric.auger@redhat.com
|
|
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.
Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
|
|
Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the
reporting of GICR_TYPER.Last for userspace") temporarily fixed
a bug identified when attempting to access the GICR_TYPER
register before the redistributor region setting, but dropped
the support of the LAST bit.
Emulating the GICR_TYPER.Last bit still makes sense for
architecture compliance though. This patch restores its support
(if the redistributor region was set) while keeping the code safe.
We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
computes whether a redistributor is the highest one of a series
of redistributor contributor pages.
With this new implementation we do not need to have a uaccess
read accessor anymore.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com
|
|
To improve the readability, we introduce the new
vgic_v3_free_redist_region helper and also rename
vgic_v3_insert_redist_region into vgic_v3_alloc_redist_region
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-8-eric.auger@redhat.com
|
|
vgic_uaccess() takes a struct vgic_io_device argument, converts it
to a struct kvm_io_device and passes it to the read/write accessor
functions, which convert it back to a struct vgic_io_device.
Avoid the indirection by passing the struct vgic_io_device argument
directly to vgic_uaccess_{read,write}.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-7-eric.auger@redhat.com
|
|
On vgic_dist_destroy(), the addresses are not reset. However for
kvm selftest purpose this would allow to continue the test execution
even after a failure when running KVM_RUN. So let's reset the
base addresses.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-5-eric.auger@redhat.com
|
|
vgic_v3_insert_redist_region() may succeed while
vgic_register_all_redist_iodevs fails. For example this happens
while adding a redistributor region overlapping a dist region. The
failure only is detected on vgic_register_all_redist_iodevs when
vgic_v3_check_base() gets called in vgic_register_redist_iodev().
In such a case, remove the newly added redistributor region and free
it.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-4-eric.auger@redhat.com
|
|
The doc says:
"The characteristics of a specific redistributor region can
be read by presetting the index field in the attr data.
Only valid for KVM_DEV_TYPE_ARM_VGIC_V3"
Unfortunately the existing code fails to read the input attr data.
Fixes: 04c110932225 ("KVM: arm/arm64: Implement KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION")
Cc: stable@vger.kernel.org#v4.17+
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-3-eric.auger@redhat.com
|