summaryrefslogtreecommitdiff
path: root/arch/s390/pci
AgeCommit message (Collapse)Author
2025-07-10s390/pci: Do not try re-enabling load/store if device is disabledNiklas Schnelle
commit b97a7972b1f4f81417840b9a2ab0c19722b577d5 upstream. If a device is disabled unblocking load/store on its own is not useful as a full re-enable of the function is necessary anyway. Note that SCLP Write Event Data Action Qualifier 0 (Reset) leaves the device disabled and triggers this case unless the driver already requests a reset. Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-10s390/pci: Fix stale function handles in error handlingNiklas Schnelle
commit 45537926dd2aaa9190ac0fac5a0fbeefcadfea95 upstream. The error event information for PCI error events contains a function handle for the respective function. This handle is generally captured at the time the error event was recorded. Due to delays in processing or cascading issues, it may happen that during firmware recovery multiple events are generated. When processing these events in order Linux may already have recovered an affected function making the event information stale. Fix this by doing an unconditional CLP List PCI function retrieving the current function handle with the zdev->state_lock held and ignoring the event if its function handle is stale. Cc: stable@vger.kernel.org Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Fix __pcilg_mio_inuser() inline assemblyHeiko Carstens
commit c4abe6234246c75cdc43326415d9cff88b7cf06c upstream. Use "a" constraint for the shift operand of the __pcilg_mio_inuser() inline assembly. The used "d" constraint allows the compiler to use any general purpose register for the shift operand, including register zero. If register zero is used this my result in incorrect code generation: 8f6: a7 0a ff f8 ahi %r0,-8 8fa: eb 32 00 00 00 0c srlg %r3,%r2,0 <---- If register zero is selected to contain the shift value, the srlg instruction ignores the contents of the register and always shifts zero bits. Therefore use the "a" constraint which does not permit to select register zero. Fixes: f058599e22d5 ("s390/pci: Fix s390_mmio_read/write with MIO") Cc: stable@vger.kernel.org Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Serialize device addition and removalNiklas Schnelle
commit 774a1fa880bc949d88b5ddec9494a13be733dfa8 upstream. Prior changes ensured that when zpci_release_device() is called and it removed the zdev from the zpci_list this instance can not be found via the zpci_list anymore even while allowing re-add of reserved devices. This only accounts for the overall lifetime and zpci_list addition and removal, it does not yet prevent concurrent add of a new instance for the same underlying device. Such concurrent add would subsequently cause issues such as attempted re-use of the same IOMMU sysfs directory and is generally undesired. Introduce a new zpci_add_remove_lock mutex to serialize adding a new device with removal. Together this ensures that if a struct zpci_dev is not found in the zpci_list it was either already removed and torn down, or its removal and tear down is in progress with the zpci_add_remove_lock held. Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Allow re-add of a reserved but not yet removed deviceNiklas Schnelle
commit 4b1815a52d7eb03b3e0e6742c6728bc16a4b2d1d upstream. The architecture assumes that PCI functions can be removed synchronously as PCI events are processed. This however clashes with the reference counting of struct pci_dev which allows device drivers to hold on to a struct pci_dev reference even as the underlying device is removed. To bridge this gap commit 2a671f77ee49 ("s390/pci: fix use after free of zpci_dev") keeps the struct zpci_dev in ZPCI_FN_STATE_RESERVED state until common code releases the struct pci_dev. Only when all references are dropped, the struct zpci_dev can be removed and freed. Later commit a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") moved the deletion of the struct zpci_dev from the zpci_list in zpci_release_device() to the point where the device is reserved. This was done to prevent handling events for a device that is already being removed, e.g. when the platform generates both PCI event codes 0x304 and 0x308. In retrospect, deletion from the zpci_list in the release function without holding the zpci_list_lock was also racy. A side effect of this handling is that if the underlying device re-appears while the struct zpci_dev is in the ZPCI_FN_STATE_RESERVED state, the new and old instances of the struct zpci_dev and/or struct pci_dev may clash. For example when trying to create the IOMMU sysfs files for the new instance. In this case, re-adding the new instance is aborted. The old instance is removed, and the device will remain absent until the platform issues another event. Fix this by allowing the struct zpci_dev to be brought back up right until it is finally removed. To this end also keep the struct zpci_dev in the zpci_list until it is finally released when all references have been dropped. Deletion from the zpci_list from within the release function is made safe by using kref_put_lock() with the zpci_list_lock. This ensures that the releasing code holds the last reference. Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27s390/pci: Remove redundant bus removal and disable from zpci_release_device()Niklas Schnelle
commit d76f9633296785343d45f85199f4138cb724b6d2 upstream. Remove zpci_bus_remove_device() and zpci_disable_device() calls from zpci_release_device(). These calls were done when the device transitioned into the ZPCI_FN_STATE_STANDBY state which is guaranteed to happen before it enters the ZPCI_FN_STATE_RESERVED state. When zpci_release_device() is called the device is known to be in the ZPCI_FN_STATE_RESERVED state which is also checked by a WARN_ON(). Cc: stable@vger.kernel.org Fixes: a46044a92add ("s390/pci: fix zpci_zdev_put() on reserve") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-18s390/pci: Fix missing check for zpci_create_device() error returnNiklas Schnelle
commit 42420c50c68f3e95e90de2479464f420602229fc upstream. The zpci_create_device() function returns an error pointer that needs to be checked before dereferencing it as a struct zpci_dev pointer. Add the missing check in __clp_add() where it was missed when adding the scan_list in the fixed commit. Simply not adding the device to the scan list results in the previous behavior. Cc: stable@vger.kernel.org Fixes: 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFsNiklas Schnelle
commit 8691abd3afaadd816a298503ec1a759df1305d2e upstream. For non-VFs, zpci_bus_is_isolated_vf() should return false because they aren't VFs. While zpci_iov_find_parent_pf() specifically checks if a function is a VF, it then simply returns that there is no parent. The simplistic check for a parent then leads to these functions being confused with isolated VFs and isolating them on their own domain even if sibling PFs should share the domain. Fix this by explicitly checking if a function is not a VF. Note also that at this point the case where RIDs are ignored is already handled and in this case all PCI functions get isolated by being detected in zpci_bus_is_multifunction_root(). Cc: stable@vger.kernel.org Fixes: 2844ddbd540f ("s390/pci: Fix handling of isolated VFs") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-20s390/pci: Fix s390_mmio_read/write syscall page fault handlingNiklas Schnelle
[ Upstream commit 41a0926e82f4963046876ed9a1b5f681be8087a8 ] The s390 MMIO syscalls when using the classic PCI instructions do not cause a page fault when follow_pfnmap_start() fails due to the page not being present. Besides being a general deficiency this breaks vfio-pci's mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on first access. Fix this by following a failed follow_pfnmap_start() with fixup_user_page() and retrying the follow_pfnmap_start(). Also fix a VM_READ vs VM_WRITE mixup in the read syscall. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-1-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-21s390/pci: Fix handling of isolated VFsNiklas Schnelle
commit 2844ddbd540fc84d7571cca65d6c43088e4d6952 upstream. In contrast to the commit message of the fixed commit VFs whose parent PF is not configured are not always isolated, that is put on their own PCI domain. This is because for VFs to be added to an existing PCI domain it is enough for that PCI domain to share the same topology ID or PCHID. Such a matching PCI domain without a parent PF may exist when a PF from the same PCI card created the domain with the VF being a child of a different, non accessible, PF. While not causing technical issues it makes the rules which VFs are isolated inconsistent. Fix this by explicitly checking that the parent PF exists on the PCI domain determined by the topology ID or PCHID before registering the VF. This works because a parent PF which is under control of this Linux instance must be enabled and configured at the point where its child VFs appear because otherwise SR-IOV could not have been enabled on the parent. Fixes: 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") Cc: stable@vger.kernel.org Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-21s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn()Niklas Schnelle
commit 05793884a1f30509e477de9da233ab73584b1c8c upstream. This creates a new zpci_iov_find_parent_pf() function which a future commit can use to find if a VF has a configured parent PF. Use zdev->rid instead of zdev->devfn such that the new function can be used before it has been decided if the RID will be exposed and zdev->devfn is set. Also handle the hypotheical case that the RID is not available but there is an otherwise matching zbus. Fixes: 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") Cc: stable@vger.kernel.org Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17s390/pci: Fix SR-IOV for PFs initially in standbyNiklas Schnelle
commit dc287e4c9149ab54a5003b4d4da007818b5fda3d upstream. Since commit 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") PFs which are not initially configured but in standby are considered isolated. That is they create only a single function PCI domain. Due to the PCI domains being created on discovery, this means that even if they are configured later on, sibling PFs and their child VFs will not be added to their PCI domain breaking SR-IOV expectations. The reason the referenced commit ignored standby PFs for the creation of multi-function PCI subhierarchies, was to work around a PCI domain renumbering scenario on reboot. The renumbering would occur after removing a previously in standby PF, whose domain number is used for its configured sibling PFs and their child VFs, but which itself remained in standby. When this is followed by a reboot, the sibling PF is used instead to determine the PCI domain number of it and its child VFs. In principle it is not possible to know which standby PFs will be configured later and which may be removed. The PCI domain and root bus are pre-requisites for hotplug slots so the decision of which functions belong to which domain can not be postponed. With the renumbering occurring only in rare circumstances and being generally benign, accept it as an oddity and fix SR-IOV for initially standby PFs simply by allowing them to create PCI domains. Cc: stable@vger.kernel.org Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Fixes: 25f39d3dcb48 ("s390/pci: Ignore RID for isolated VFs") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14s390/pci: Fix leak of struct zpci_dev when zpci_add_device() failsNiklas Schnelle
commit 48796104c864cf4dafa80bd8c2ce88f9c92a65ea upstream. Prior to commit 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") the IOMMU was initialized and the device was registered as part of zpci_create_device() with the struct zpci_dev freed if either resulted in an error. With that commit this was moved into a separate function called zpci_add_device(). While this new function logs when adding failed, it expects the caller not to use and to free the struct zpci_dev on error. This difference between it and zpci_create_device() was missed while changing the callers and the incompletely initialized struct zpci_dev may get used in zpci_scan_configured_device in the error path. This then leads to a crash due to the device not being registered with the zbus. It was also not freed in this case. Fix this by handling the error return of zpci_add_device(). Since in this case the zdev was not added to the zpci_list it can simply be discarded and freed. Also make this more explicit by moving the kref_init() into zpci_add_device() and document that zpci_zdev_get()/zpci_zdev_put() must be used after adding. Cc: stable@vger.kernel.org Fixes: 0467cdde8c43 ("s390/pci: Sort PCI functions prior to creating virtual busses") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-14s390/pci: Ignore RID for isolated VFsNiklas Schnelle
[ Upstream commit 25f39d3dcb48bbc824a77d16b3d977f0f3713cfe ] Ensure that VFs used in isolation, that is with their parent PF not visible to the configuration but with their RID exposed, are treated compatibly with existing isolated VF use cases without exposed RID including RoCE Express VFs. This allows creating configurations where one LPAR manages PFs while their child VFs are used by other LPARs. This gives the LPAR managing the PFs a role analogous to that of the hypervisor in a typical use case of passing child VFs to guests. Instead of creating a multifunction struct zpci_bus whenever a PCI function with RID exposed is discovered only create such a bus for configured physical functions and only consider multifunction busses when searching for an existing bus. Additionally only set zdev->devfn to the devfn part of the RID once the function is added to a multifunction bus. This also fixes probing of more than 7 such isolated VFs from the same physical bus. This is because common PCI code in pci_scan_slot() only looks for more functions when pdev->multifunction is set which somewhat counter intutively is not the case for VFs. Note that PFs are looked at before their child VFs is guaranteed because we sort the zpci_list by RID ascending. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14s390/pci: Use topology ID for multi-function devicesNiklas Schnelle
[ Upstream commit 126034faaac5f356822c4a9bebfa75664da11056 ] The newly introduced topology ID (TID) field in the CLP Query PCI Function explicitly identifies groups of PCI functions whose RIDs belong to the same (sub-)topology. When available use the TID instead of the PCHID to match zPCI busses/domains for multi-function devices. Note that currently only a single PCI bus per TID is supported. This change is required because in future machines the PCHID will not identify a PCI card but a specific port in the case of some multi-port NICs while from a PCI point of view the entire card is a subtopology. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-14s390/pci: Sort PCI functions prior to creating virtual bussesNiklas Schnelle
[ Upstream commit 0467cdde8c4320bbfdb31a8cff1277b202f677fc ] Instead of relying on the observed but not architected firmware behavior that PCI functions from the same card are listed in ascending RID order in clp_list_pci() ensure this by sorting. To allow for sorting separate the initial clp_list_pci() and creation of the virtual PCI busses. Note that fundamentally in our per-PCI function hotplug design non RID order of discovery is still possible. For example when the two PFs of a two port NIC are hotplugged after initial boot and in descending RID order. In this case the virtual PCI bus would be created by the second PF using that PF's UID as domain number instead of that of the first PF. Thus the domain number would then change from the UID of the second PF to that of the first PF on reboot but there is really nothing we can do about that since changing domain numbers at runtime seems even worse. This only impacts the domain number as the RIDs are consistent and thus even with just the second PF visible it will show up in the correct position on the virtual bus. Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05s390/pci: Fix potential double remove of hotplug slotNiklas Schnelle
[ Upstream commit c4a585e952ca403a370586d3f16e8331a7564901 ] In commit 6ee600bfbe0f ("s390/pci: remove hotplug slot when releasing the device") the zpci_exit_slot() was moved from zpci_device_reserved() to zpci_release_device() with the intention of keeping the hotplug slot around until the device is actually removed. Now zpci_release_device() is only called once all references are dropped. Since the zPCI subsystem only drops its reference once the device is in the reserved state it follows that zpci_release_device() must only deal with devices in the reserved state. Despite that it contains code to tear down from both configured and standby state. For the standby case this already includes the removal of the hotplug slot so would cause a double removal if a device was ever removed in either configured or standby state. Instead of causing a potential double removal in a case that should never happen explicitly WARN_ON() if a device in non-reserved state is released and get rid of the dead code cases. Fixes: 6ee600bfbe0f ("s390/pci: remove hotplug slot when releasing the device") Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Tested-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-12-05iommu/s390: Implement blocking domainMatthew Rosato
[ Upstream commit ecda483339a5151e3ca30d6b82691ef6f1d17912 ] This fixes a crash when surprise hot-unplugging a PCI device. This crash happens because during hot-unplug __iommu_group_set_domain_nofail() attaching the default domain fails when the platform no longer recognizes the device as it has already been removed and we end up with a NULL domain pointer and UAF. This is exactly the case referred to in the second comment in __iommu_device_set_domain() and just as stated there if we can instead attach the blocking domain the UAF is prevented as this can handle the already removed device. Implement the blocking domain to use this handling. With this change, the crash is fixed but we still hit a warning attempting to change DMA ownership on a blocked device. Fixes: c76c067e488c ("s390/pci: Use dma-iommu layer") Co-developed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240910211516.137933-1-mjrosato@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-10-10s390/pci: Handle PCI error codes other than 0x3aNiklas Schnelle
The Linux implementation of PCI error recovery for s390 was based on the understanding that firmware error recovery is a two step process with an optional initial error event to indicate the cause of the error if known followed by either error event 0x3A (Success) or 0x3B (Failure) to indicate whether firmware was able to recover. While this has been the case in testing and the error cases seen in the wild it turns out this is not correct. Instead firmware only generates 0x3A for some error and service scenarios and expects the OS to perform recovery for all PCI events codes except for those indicating permanent error (0x3B, 0x40) and those indicating errors on the function measurement block (0x2A, 0x2B, 0x2C). Align Linux behavior with these expectations. Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery") Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-09-27[tree-wide] finally take no_llseek outAl Viro
no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-23Merge tag 'pci-v6.12-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Wait for device readiness after reset by polling Vendor ID and looking for Configuration RRS instead of polling the Command register and looking for non-error completions, to avoid hardware retries done for RRS on non-Vendor ID reads (Bjorn Helgaas) - Rename CRS Completion Status to RRS ('Request Retry Status') to match PCIe r6.0 spec usage (Bjorn Helgaas) - Clear LBMS bit after a manual link retrain so we don't try to retrain a link when there's no downstream device anymore (Maciej W. Rozycki) - Revert to the original link speed after retraining fails instead of leaving it restricted to 2.5GT/s, so a future device has a chance to use higher speeds (Maciej W. Rozycki) - Wait for each level of downstream bus, not just the first, to become accessible before restoring devices on that bus (Ilpo Järvinen) - Add ARCH_PCI_DEV_GROUPS so s390 can add its own attribute_groups without having to stomp on the core's pdev->dev.groups (Lukas Wunner) Driver binding: - Export pcim_request_region(), a managed counterpart of pci_request_region(), for use by drivers (Philipp Stanner) - Export pcim_iomap_region() and deprecate pcim_iomap_regions() (Philipp Stanner) - Request the PCI BAR used by xboxvideo (Philipp Stanner) - Request and map drm/ast BARs with pcim_iomap_region() (Philipp Stanner) MSI: - Add MSI_FLAG_NO_AFFINITY flag for devices that mux MSIs onto a single IRQ line and cannot set the affinity of each MSI to a specific CPU core (Marek Vasut) - Use MSI_FLAG_NO_AFFINITY and remove unnecessary .irq_set_affinity() implementations in aardvark, altera, brcmstb, dwc, mediatek-gen3, mediatek, mobiveil, plda, rcar, tegra, vmd, xilinx-nwl, xilinx-xdma, and xilinx drivers to avoid 'IRQ: set affinity failed' warnings (Marek Vasut) Power management: - Add pwrctl support for ATH11K inside the WCN6855 package (Konrad Dybcio) PCI device hotplug: - Remove unnecessary hpc_ops struct from shpchp (ngn) - Check for PCI_POSSIBLE_ERROR(), not 0xffffffff, in cpqphp (weiyufeng) Virtualization: - Mark Creative Labs EMU20k2 INTx masking as broken (Alex Williamson) - Add an ACS quirk for Qualcomm SA8775P, which doesn't advertise ACS but does provide ACS-like features (Subramanian Ananthanarayanan) IOMMU: - Add function 0 DMA alias quirk for Glenfly Arise audio function, which uses the function 0 Requester ID (WangYuli) NPEM: - Add Native PCIe Enclosure Management (NPEM) support for sysfs control of NVMe RAID storage indicators (ok/fail/locate/ rebuild/etc) (Mariusz Tkaczyk) - Add support for the ACPI _DSM PCIe SSD status LED management, which is functionally similar to NPEM but mediated by platform firmware (Mariusz Tkaczyk) Device trees: - Drop minItems and maxItems from ranges in PCI generic host binding since host bridges may have several MMIO and I/O port apertures (Frank Li) - Add kirin, rcar-gen2, uniphier DT binding top-level constraints for clocks (Krzysztof Kozlowski) Altera PCIe controller driver: - Convert altera DT bindings from text to YAML (Matthew Gerlach) - Replace TLP_REQ_ID() with macro PCI_DEVID(), which does the same thing and is what other drivers use (Jinjie Ruan) Broadcom STB PCIe controller driver: - Add DT binding maxItems for reset controllers (Jim Quinlan) - Use the 'bridge' reset method if described in the DT (Jim Quinlan) - Use the 'swinit' reset method if described in the DT (Jim Quinlan) - Add 'has_phy' so the existence of a 'rescal' reset controller doesn't imply software control of it (Jim Quinlan) - Add support for many inbound DMA windows (Jim Quinlan) - Rename SoC 'type' to 'soc_base' express the fact that SoCs come in families of multiple similar devices (Jim Quinlan) - Add Broadcom 7712 DT description and driver support (Jim Quinlan) - Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings for maintainability (Bjorn Helgaas) Freescale i.MX6 PCIe controller driver: - Add imx6q-pcie 'dbi2' and 'atu' reg-names for i.MX8M Endpoints (Richard Zhu) - Fix a code restructuring error that caused i.MX8MM and i.MX8MP Endpoints to fail to establish link (Richard Zhu) - Fix i.MX8MP Endpoint occasional failure to trigger MSI by enforcing outbound alignment requirement (Richard Zhu) - Call phy_power_off() in the .probe() error path (Frank Li) - Rename internal names from imx6_* to imx_* since i.MX7/8/9 are also supported (Frank Li) - Manage Refclk by using SoC-specific callbacks instead of switch statements (Frank Li) - Manage core reset by using SoC-specific callbacks instead of switch statements (Frank Li) - Expand comments for erratum ERR010728 workaround (Frank Li) - Use generic PHY APIs to configure mode, speed, and submode, which is harmless for devices that implement their own internal PHY management and don't set the generic imx_pcie->phy (Frank Li) - Add i.MX8Q (i.MX8QM, i.MX8QXP, and i.MX8DXL) DT binding and driver Root Complex support (Richard Zhu) Freescale Layerscape PCIe controller driver: - Replace layerscape-pcie DT binding compatible fsl,lx2160a-pcie with fsl,lx2160ar2-pcie (Frank Li) - Add layerscape-pcie DT binding deprecated 'num-viewport' property to address a DT checker warning (Frank Li) - Change layerscape-pcie DT binding 'fsl,pcie-scfg' to phandle-array (Frank Li) Loongson PCIe controller driver: - Increase max PCI hosts to 8 for Loongson-3C6000 and newer chipsets (Huacai Chen) Marvell Aardvark PCIe controller driver: - Fix issue with emulating Configuration RRS for two-byte reads of Vendor ID; previously it only worked for four-byte reads (Bjorn Helgaas) MediaTek PCIe Gen3 controller driver: - Add per-SoC struct mtk_gen3_pcie_pdata to support multiple SoC types (Lorenzo Bianconi) - Use reset_bulk APIs to manage PHY reset lines (Lorenzo Bianconi) - Add DT and driver support for Airoha EN7581 PCIe controller (Lorenzo Bianconi) Qualcomm PCIe controller driver: - Update qcom,pcie-sc7280 DT binding with eight interrupts (Rayyan Ansari) - Add back DT 'vddpe-3v3-supply', which was incorrectly removed earlier (Johan Hovold) - Drop endpoint redundant masking of global IRQ events (Manivannan Sadhasivam) - Clarify unknown global IRQ message and only log it once to avoid a flood (Manivannan Sadhasivam) - Add 'linux,pci-domain' property to endpoint DT binding (Manivannan Sadhasivam) - Assign PCI domain number for endpoint controllers (Manivannan Sadhasivam) - Add 'qcom_pcie_ep' and the PCI domain number to IRQ names for endpoint controller (Manivannan Sadhasivam) - Add global SPI interrupt for PCIe link events to DT binding (Manivannan Sadhasivam) - Add global RC interrupt handler to handle 'Link up' events and automatically enumerate hot-added devices (Manivannan Sadhasivam) - Avoid mirroring of DBI and iATU register space so it doesn't overlap BAR MMIO space (Prudhvi Yarlagadda) - Enable controller resources like PHY only after PERST# is deasserted to partially avoid the problem that the endpoint SoC crashes when accessing things when Refclk is absent (Manivannan Sadhasivam) - Add 16.0 GT/s equalization and RX lane margining settings (Shashank Babu Chinta Venkata) - Pass domain number to pci_bus_release_domain_nr() explicitly to avoid a NULL pointer dereference (Manivannan Sadhasivam) Renesas R-Car PCIe controller driver: - Make the read-only const array 'check_addr' static (Colin Ian King) - Add R-Car V4M (R8A779H0) PCIe host and endpoint to DT binding (Yoshihiro Shimoda) TI DRA7xx PCIe controller driver: - Request IRQF_ONESHOT for 'dra7xx-pcie-main' IRQ since the primary handler is NULL (Siddharth Vadapalli) - Handle IRQ request errors during root port and endpoint probe (Siddharth Vadapalli) TI J721E PCIe driver: - Add DT 'ti,syscon-acspcie-proxy-ctrl' and driver support to enable the ACSPCIE module to drive Refclk for the Endpoint (Siddharth Vadapalli) - Extract the cadence link setup from cdns_pcie_host_setup() so link setup can be done separately during resume (Thomas Richard) - Add T_PERST_CLK_US definition for the mandatory delay between Refclk becoming stable and PERST# being deasserted (Thomas Richard) - Add j721e suspend and resume support (Théo Lebrun) TI Keystone PCIe controller driver: - Fix NULL pointer checking when applying MRRS limitation quirk for AM65x SR 1.0 Errata #i2037 (Dan Carpenter) Xilinx NWL PCIe controller driver: - Fix off-by-one error in INTx IRQ handler that caused INTx interrupts to be lost or delivered as the wrong interrupt (Sean Anderson) - Rate-limit misc interrupt messages (Sean Anderson) - Turn off the clock on probe failure and device removal (Sean Anderson) - Add DT binding and driver support for enabling/disabling PHYs (Sean Anderson) - Add PCIe phy bindings for the ZCU102 (Sean Anderson) Xilinx XDMA PCIe controller driver: - Add support for Xilinx QDMA Soft IP PCIe Root Port Bridge to DT binding and xilinx-dma-pl driver (Thippeswamy Havalige) Miscellaneous: - Fix buffer overflow in kirin_pcie_parse_port() (Alexandra Diupina) - Fix minor kerneldoc issues and typos (Bjorn Helgaas) - Use PCI_DEVID() macro in aer_inject() instead of open-coding it (Jinjie Ruan) - Check pcie_find_root_port() return in x86 fixups to avoid NULL pointer dereferences (Samasth Norway Ananda) - Make pci_bus_type constant (Kunwu Chan) - Remove unused declarations of __pci_pme_wakeup() and pci_vpd_release() (Yue Haibing) - Remove any leftover .*.cmd files with make clean (zhang jiao) - Remove unused BILLION macro (zhang jiao)" * tag 'pci-v6.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (132 commits) PCI: Fix typos dt-bindings: PCI: qcom: Allow 'vddpe-3v3-supply' again tools: PCI: Remove unused BILLION macro tools: PCI: Remove .*.cmd files with make clean PCI: Pass domain number to pci_bus_release_domain_nr() explicitly PCI: dra7xx: Fix error handling when IRQ request fails in probe PCI: dra7xx: Fix threaded IRQ request for "dra7xx-pcie-main" IRQ PCI: qcom: Add RX lane margining settings for 16.0 GT/s PCI: qcom: Add equalization settings for 16.0 GT/s PCI: dwc: Always cache the maximum link speed value in dw_pcie::max_link_speed PCI: dwc: Rename 'dw_pcie::link_gen' to 'dw_pcie::max_link_speed' PCI: qcom-ep: Enable controller resources like PHY only after refclk is available PCI: Mark Creative Labs EMU20k2 INTx masking as broken dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint dt-bindings: PCI: altera: msi: Convert to YAML PCI: imx6: Add i.MX8Q PCIe Root Complex (RC) support PCI: Rename CRS Completion Status to RRS PCI: aardvark: Correct Configuration RRS checking PCI: Wait for device readiness with Configuration RRS PCI: brcmstb: Sort enums, pcie_offsets[], pcie_cfg_data, .compatible strings ...
2024-09-17s390/pci_mmio: use follow_pfnmap APIPeter Xu
Use the new API that can understand huge pfn mappings. Link: https://lkml.kernel.org/r/20240826204353.2228736-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-09s390/pci: Stop usurping pdev->dev.groupsLukas Wunner
Bjorn suggests using pdev->dev.groups for attribute_groups constructed on PCI device enumeration: "Is it feasible to build an attribute group in pci_doe_init() and add it to dev->groups so device_add() will automatically add them?" https://lore.kernel.org/r/20231019165829.GA1381099@bhelgaas Unfortunately on s390, pcibios_device_add() usurps pdev->dev.groups for arch-specific attribute_groups, preventing its use for anything else. Introduce an ARCH_PCI_DEV_GROUPS macro which arches can define in <asm/pci.h>. The macro is visible in drivers/pci/pci-sysfs.c through the inclusion of <linux/pci.h>, which in turn includes <asm/pci.h>. On s390, define the macro to the three attribute_groups previously assigned to pdev->dev.groups. Thereby pdev->dev.groups is made available for use by the PCI core. As a side effect, arch/s390/pci/pci_sysfs.c no longer needs to be compiled into the kernel if CONFIG_SYSFS=n. Link: https://lore.kernel.org/r/7b970f7923e373d1b23784721208f93418720485.1722870934.git.lukas@wunner.de Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
2024-07-23s390/pci: Allow allocation of more than 1 MSI interruptGerd Bayer
On a PCI adapter that provides up to 8 MSI interrupt sources the s390 implementation of PCI interrupts rejected to accommodate them, although the underlying hardware is able to support that. For MSI-X it is sufficient to allocate a single irq_desc per msi_desc, but for MSI multiple irq descriptors are attached to and controlled by a single msi descriptor. Add the appropriate loops to maintain multiple irq descriptors and tie/untie them to/from the appropriate AIBV bit, if a device driver allocates more than 1 MSI interrupt. Common PCI code passes on requests to allocate a number of interrupt vectors based on the device drivers' demand and the PCI functions' capabilities. However, the root-complex of s390 systems support just a limited number of interrupt vectors per PCI function. Produce a kernel log message to inform about any architecture-specific capping that might be done. With this change, we had a PCI adapter successfully raising interrupts to its device driver via all 8 sources. Fixes: a384c8924a8b ("s390/PCI: Fix single MSI only check") Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-23s390/pci: Refactor arch_setup_msi_irqs()Gerd Bayer
Factor out adapter interrupt allocation from arch_setup_msi_irqs() in preparation for enabling registration of multiple MSIs. Code movement only, no change of functionality intended. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-18Merge tag 's390-6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Remove restrictions on PAI NNPA and crypto counters, enabling concurrent per-task and system-wide sampling and counting events - Switch to GENERIC_CPU_DEVICES by setting up the CPU present mask in the architecture code and letting the generic code handle CPU bring-up - Add support for the diag204 busy indication facility to prevent undesirable blocking during hypervisor logical CPU utilization queries. Implement results caching - Improve the handling of Store Data SCLP events by suppressing unnecessary warning, preventing buffer release in I/O during failures, and adding timeout handling for Store Data requests to address potential firmware issues - Provide optimized __arch_hweight*() implementations - Remove the unnecessary CPU KOBJ_CHANGE uevents generated during topology updates, as they are unused and also not present on other architectures - Cleanup atomic_ops, optimize __atomic_set() for small values and __atomic_cmpxchg_bool() for compilers supporting flag output constraint - Couple of cleanups for KVM: - Move and improve KVM struct definitions for DAT tables from gaccess.c to a new header - Pass the asce as parameter to sie64a() - Make the crdte() and cspg() page table handling wrappers return a boolean to indicate success, like the other existing "compare and swap" wrappers - Add documentation for HWCAP flags - Switch to obtaining total RAM pages from memblock instead of totalram_pages() during mm init, to ensure correct calculation of zero page size, when defer_init is enabled - Refactor lowcore access and switch to using the get_lowcore() function instead of the S390_lowcore macro - Cleanups for PG_arch_1 and folio handling in UV and hugetlb code - Add missing MODULE_DESCRIPTION() macros - Fix VM_FAULT_HWPOISON handling in do_exception() * tag 's390-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits) s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception() s390/kvm: Move bitfields for dat tables s390/entry: Pass the asce as parameter to sie64a() s390/sthyi: Use cached data when diag is busy s390/sthyi: Move diag operations s390/hypfs_diag: Diag204 busy loop s390/diag: Add busy-indication-facility requirements s390/diag: Diag204 add busy return errno s390/diag: Return errno's from diag204 s390/sclp: Diag204 busy indication facility detection s390/atomic_ops: Make use of flag output constraint s390/atomic_ops: Improve __atomic_set() for small values s390/atomic_ops: Use symbolic names s390/smp: Switch to GENERIC_CPU_DEVICES s390/hwcaps: Add documentation for HWCAP flags s390/pgtable: Make crdte() and cspg() return a value s390/topology: Remove CPU KOBJ_CHANGE uevents s390/sclp: Add timeout to Store Data requests s390/sclp: Prevent release of buffer in I/O s390/sclp: Suppress unnecessary Store Data warning ...
2024-06-18s390: Replace S390_lowcore by get_lowcore()Sven Schnelle
Replace all S390_lowcore usages in arch/s390/ by get_lowcore(). Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-06-17s390/pci: Add missing virt_to_phys() for directed DIBVNiklas Schnelle
In commit 4e4dc65ab578 ("s390/pci: use phys_to_virt() for AIBVs/DIBVs") the setting of dibv_addr was missed when adding virt_to_phys(). This only affects systems with directed interrupt delivery enabled which are not generally available. Fixes: 4e4dc65ab578 ("s390/pci: use phys_to_virt() for AIBVs/DIBVs") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Aside from the usual things this has an arch update for __iowrite64_copy() used by the RDMA drivers. This API was intended to generate large 64 byte MemWr TLPs on PCI. These days most processors had done this by just repeating writel() in a loop. S390 and some new ARM64 designs require a special helper to get this to generate. - Small improvements and fixes for erdma, efa, hfi1, bnxt_re - Fix a UAF crash after module unload on leaking restrack entry - Continue adding full RDMA support in mana with support for EQs, GID's and CQs - Improvements to the mkey cache in mlx5 - DSCP traffic class support in hns and several bug fixes - Cap the maximum number of MADs in the receive queue to avoid OOM - Another batch of rxe bug fixes from large scale testing - __iowrite64_copy() optimizations for write combining MMIO memory - Remove NULL checks before dev_put/hold() - EFA support for receive with immediate - Fix a recent memleaking regression in a cma error path" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (70 commits) RDMA/cma: Fix kmemleak in rdma_core observed during blktests nvme/rdma use siw RDMA/IPoIB: Fix format truncation compilation errors bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq RDMA/efa: Support QP with unsolicited write w/ imm. receive IB/hfi1: Remove generic .ndo_get_stats64 IB/hfi1: Do not use custom stat allocator RDMA/hfi1: Use RMW accessors for changing LNKCTL2 RDMA/mana_ib: implement uapi for creation of rnic cq RDMA/mana_ib: boundary check before installing cq callbacks RDMA/mana_ib: introduce a helper to remove cq callbacks RDMA/mana_ib: create and destroy RNIC cqs RDMA/mana_ib: create EQs for RNIC CQs RDMA/core: Remove NULL check before dev_{put, hold} RDMA/ipoib: Remove NULL check before dev_{put, hold} RDMA/mlx5: Remove NULL check before dev_{put, hold} RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources. RDMA/core: Add an option to display driver-specific QPs in the rdmatool RDMA/efa: Add shutdown notifier RDMA/mana_ib: Fix missing ret value IB/mlx5: Use __iowrite64_copy() for write combining stores ...
2024-05-05mm: pass VMA instead of MM to follow_pte()David Hildenbrand
... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll now also perform these sanity checks for direct follow_pte() invocations. For generic_access_phys(), we might now check multiple times: nothing to worry about, really. Link: https://lkml.kernel.org/r/20240410155527.474777-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Sean Christopherson <seanjc@google.com> [KVM] Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Fei Li <fei1.li@intel.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Yonghua Huang <yonghua.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-29s390/pci: Drop unneeded reference to CONFIG_DMIJean Delvare
The S/390 architecture doesn't support SMBIOS, so CONFIG_DMI will never be defined there. So we can simply omit these preprocessing directives and speed up the build a bit. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Niklas Schnelle <schnelle@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20240423162724.3966265a@endymion.delvare Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-04-22s390: Stop using weak symbols for __iowrite64_copy()Jason Gunthorpe
Complete switching the __iowriteXX_copy() routines over to use #define and arch provided inline/macro functions instead of weak symbols. S390 has an implementation that simply calls another memcpy function. Inline this so the callers don't have to do two jumps. Link: https://lore.kernel.org/r/3-v3-1893cd8b9369+1925-mlx5_arm_wc_jgg@nvidia.com Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-03-12Merge tag 's390-6.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Various virtual vs physical address usage fixes - Fix error handling in Processor Activity Instrumentation device driver, and export number of counters with a sysfs file - Allow for multiple events when Processor Activity Instrumentation counters are monitored in system wide sampling - Change multiplier and shift values of the Time-of-Day clock source to improve steering precision - Remove a couple of unneeded GFP_DMA flags from allocations - Disable mmap alignment if randomize_va_space is also disabled, to avoid a too small heap - Various changes to allow s390 to be compiled with LLVM=1, since ld.lld and llvm-objcopy will have proper s390 support witch clang 19 - Add __uninitialized macro to Compiler Attributes. This is helpful with s390's FPU code where some users have up to 520 byte stack frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the intention (performance improvement) of such code sections. - Convert switch_to() to an out-of-line function, and use the generic switch_to header file - Replace the usage of s390's debug feature with pr_debug() calls within the zcrypt device driver - Improve hotplug support of the Adjunct Processor device driver - Improve retry handling in the zcrypt device driver - Various changes to the in-kernel FPU code: - Make in-kernel FPU sections preemptible - Convert various larger inline assemblies and assembler files to C, mainly by using singe instruction inline assemblies. This increases readability, but also allows makes it easier to add proper instrumentation hooks - Cleanup of the header files - Provide fast variants of csum_partial() and csum_partial_copy_nocheck() based on vector instructions - Introduce and use a lock to synchronize accesses to zpci device data structures to avoid inconsistent states caused by concurrent accesses - Compile the kernel without -fPIE. This addresses the following problems if the kernel is compiled with -fPIE: - It uses dynamic symbols (.dynsym), for which the linker refuses to allow more than 64k sections. This can break features which use '-ffunction-sections' and '-fdata-sections', including kpatch-build and function granular KASLR - It unnecessarily uses GOT relocations, adding an extra layer of indirection for many memory accesses - Fix shared_cpu_list for CPU private L2 caches, which incorrectly were reported as globally shared * tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits) s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64 s390/cache: prevent rebuild of shared_cpu_list s390/crypto: remove retry loop with sleep from PAES pkey invocation s390/pkey: improve pkey retry behavior s390/zcrypt: improve zcrypt retry behavior s390/zcrypt: introduce retries on in-kernel send CPRB functions s390/ap: introduce mutex to lock the AP bus scan s390/ap: rework ap_scan_bus() to return true on config change s390/ap: clarify AP scan bus related functions and variables s390/ap: rearm APQNs bindings complete completion s390/configs: increase number of LOCKDEP_BITS s390/vfio-ap: handle hardware checkstop state on queue reset operation s390/pai: change sampling event assignment for PMU device driver s390/boot: fix minor comment style damages s390/boot: do not check for zero-termination relocation entry s390/boot: make type of __vmlinux_relocs_64_start|end consistent s390/boot: sanitize kaslr_adjust_relocs() function prototype s390/boot: simplify GOT handling s390: vmlinux.lds.S: fix .got.plt assertion s390/boot: workaround current 'llvm-objdump -t -j ...' behavior ...
2024-02-21s390: use the correct count for __iowrite64_copy()Jason Gunthorpe
The signature for __iowrite64_copy() requires the number of 64 bit quantities, not bytes. Multiple by 8 to get to a byte length before invoking zpci_memcpy_toio() Fixes: 87bc359b9822 ("s390/pci: speed up __iowrite64_copy by using pci store block insn") Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/0-v1-9223d11a7662+1d7785-s390_iowrite64_jgg@nvidia.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: fix three typos in commentsGerd Bayer
Found and fixed these while working on synchronizing the state handling of zpci_dev's. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: remove hotplug slot when releasing the deviceGerd Bayer
Centralize the removal so all paths are covered and the hotplug slot will remain active until the device is really destroyed. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: introduce lock to synchronize state of zpci_dev'sGerd Bayer
There's a number of tasks that need the state of a zpci device to be stable. Other tasks need to be synchronized as they change the state. State changes could be generated by the system as availability or error events, or be requested by the user through manipulations in sysfs. Some other actions accessible through sysfs - like device resets - need the state to be stable. Unsynchronized state handling could lead to unusable devices. This has been observed in cases of concurrent state changes through systemd udev rules and DPM boot control. Some breakage can be provoked by artificial tests, e.g. through repetitively injecting "recover" on a PCI function through sysfs while running a "hotplug remove/add" in a loop through a PCI slot's "power" attribute in sysfs. After a few iterations this could result in a kernel oops. So introduce a new mutex "state_lock" to guard the state property of the struct zpci_dev. Acquire this lock in all task that modify the state: - hotplug add and remove, through the PCI hotplug slot entry, - avaiability events, as reported by the platform, - error events, as reported by the platform, - during device resets, explicit through sysfs requests or implict through the common PCI layer. Break out an inner _do_recover() routine out of recover_store() to separte the necessary synchronizations from the actual manipulations of the zpci_dev required for the reset. With the following changes I was able to run the inject loops for hours without hitting an error. Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20s390/pci: rename lock member in struct zpci_devGerd Bayer
Since this guards only the Function Measurement Block, rename from generic lock to fmb_lock in preparation to introduce another lock that guards the state member Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-01-11s390/pci: fix max size calculation in zpci_memcpy_toio()Niklas Schnelle
The zpci_get_max_write_size() helper is used to determine the maximum size a PCI store or load can use at a given __iomem address. For the PCI block store the following restrictions apply: 1. The dst + len must not cross a 4K boundary in the (pseudo-)MMIO space 2. len must not exceed ZPCI_MAX_WRITE_SIZE 3. len must be a multiple of 8 bytes 4. The src address must be double word (8 byte) aligned 5. The dst address must be double word (8 byte) aligned Otherwise only a normal PCI store which takes its src value from a register can be used. For these PCI store restriction 1 still applies. Similarly 1 also applies to PCI loads. It turns out zpci_max_write_size() instead implements stricter conditions which prevents PCI block stores from being used where they can and should be used. In particular instead of conditions 4 and 5 it wrongly enforces both dst and src to be size aligned. This indirectly covers condition 1 but also prevents many legal PCI block stores. On top of the functional shortcomings the zpci_get_max_write_size() is misnamed as it is used for both read and write size calculations. Rename it to zpci_get_max_io_size() and implement the listed conditions explicitly. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Fixes: cd24834130ac ("s390/pci: base support") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> [agordeev@linux.ibm.com replaced spaces with tabs] Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-11-09Merge tag 'iommu-updates-v6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Make default-domains mandatory for all IOMMU drivers - Remove group refcounting - Add generic_single_device_group() helper and consolidate drivers - Cleanup map/unmap ops - Scaling improvements for the IOVA rcache depot - Convert dart & iommufd to the new domain_alloc_paging() ARM-SMMU: - Device-tree binding update: - Add qcom,sm7150-smmu-v2 for Adreno on SM7150 SoC - SMMUv2: - Support for Qualcomm SDM670 (MDSS) and SM7150 SoCs - SMMUv3: - Large refactoring of the context descriptor code to move the CD table into the master, paving the way for '->set_dev_pasid()' support on non-SVA domains - Minor cleanups to the SVA code Intel VT-d: - Enable debugfs to dump domain attached to a pasid - Remove an unnecessary inline function AMD IOMMU: - Initial patches for SVA support (not complete yet) S390 IOMMU: - DMA-API conversion and optimized IOTLB flushing And some smaller fixes and improvements" * tag 'iommu-updates-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (102 commits) iommu/dart: Remove the force_bypass variable iommu/dart: Call apple_dart_finalize_domain() as part of alloc_paging() iommu/dart: Convert to domain_alloc_paging() iommu/dart: Move the blocked domain support to a global static iommu/dart: Use static global identity domains iommufd: Convert to alloc_domain_paging() iommu/vt-d: Use ops->blocked_domain iommu/vt-d: Update the definition of the blocking domain iommu: Move IOMMU_DOMAIN_BLOCKED global statics to ops->blocked_domain Revert "iommu/vt-d: Remove unused function" iommu/amd: Remove DMA_FQ type from domain allocation path iommu: change iommu_map_sgtable to return signed values iommu/virtio: Add __counted_by for struct viommu_request and use struct_size() iommu/vt-d: debugfs: Support dumping a specified page table iommu/vt-d: debugfs: Create/remove debugfs file per {device, pasid} iommu/vt-d: debugfs: Dump entry pointing to huge page iommu/vt-d: Remove unused function iommu/arm-smmu-v3-sva: Remove bond refcount iommu/arm-smmu-v3-sva: Remove unused iommu_sva handle iommu/arm-smmu-v3: Rename cdcfg to cd_table ...
2023-11-03Merge tag 's390-6.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Get rid of private VM_FAULT flags - Add word-at-a-time implementation - Add DCACHE_WORD_ACCESS support - Cleanup control register handling - Disallow CPU hotplug of CPU 0 to simplify its handling complexity, following a similar restriction in x86 - Optimize pai crypto map allocation - Update the list of crypto express EP11 coprocessor operation modes - Fixes and improvements for secure guests AP pass-through - Several fixes to address incorrect page marking for address translation with the "cmma no-dat" feature, preventing potential incorrect guest TLB flushes - Fix early IPI handling - Several virtual vs physical address confusion fixes - Various small fixes and improvements all over the code * tag 's390-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (74 commits) s390/cio: replace deprecated strncpy with strscpy s390/sclp: replace deprecated strncpy with strtomem s390/cio: fix virtual vs physical address confusion s390/cio: export CMG value as decimal s390: delete the unused store_prefix() function s390/cmma: fix handling of swapper_pg_dir and invalid_pg_dir s390/cmma: fix detection of DAT pages s390/sclp: handle default case in sclp memory notifier s390/pai_crypto: remove per-cpu variable assignement in event initialization s390/pai: initialize event count once at initialization s390/pai_crypto: use PERF_ATTACH_TASK define for per task detection s390/mm: add missing arch_set_page_dat() call to gmap allocations s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc() s390/cmma: fix initial kernel address space page table walk s390/diag: add missing virt_to_phys() translation to diag224() s390/mm,fault: move VM_FAULT_ERROR handling to do_exception() s390/mm,fault: remove VM_FAULT_BADMAP and VM_FAULT_BADACCESS s390/mm,fault: remove VM_FAULT_SIGNAL s390/mm,fault: remove VM_FAULT_BADCONTEXT s390/mm,fault: simplify kfence fault handling ...
2023-10-19s390/pci: fix iommu bitmap allocationNiklas Schnelle
Since the fixed commits both zdev->iommu_bitmap and zdev->lazy_bitmap are allocated as vzalloc(zdev->iommu_pages / 8). The problem is that zdev->iommu_bitmap is a pointer to unsigned long but the above only yields an allocation that is a multiple of sizeof(unsigned long) which is 8 on s390x if the number of IOMMU pages is a multiple of 64. This in turn is the case only if the effective IOMMU aperture is a multiple of 64 * 4K = 256K. This is usually the case and so didn't cause visible issues since both the virt_to_phys(high_memory) reduced limit and hardware limits use nice numbers. Under KVM, and in particular with QEMU limiting the IOMMU aperture to the vfio DMA limit (default 65535), it is possible for the reported aperture not to be a multiple of 256K however. In this case we end up with an iommu_bitmap whose allocation is not a multiple of 8 causing bitmap operations to access it out of bounds. Sadly we can't just fix this in the obvious way and use bitmap_zalloc() because for large RAM systems (tested on 8 TiB) the zdev->iommu_bitmap grows too large for kmalloc(). So add our own bitmap_vzalloc() wrapper. This might be a candidate for common code, but this area of code will be replaced by the upcoming conversion to use the common code DMA API on s390 so just add a local routine. Fixes: 224593215525 ("s390/pci: use virtual memory for iommu bitmap") Fixes: 13954fd6913a ("s390/pci_dma: improve lazy flush for unmap") Cc: stable@vger.kernel.org Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-10-05s390/pci: Fix reset of IOMMU software countersNiklas Schnelle
Together with enabling the Function Measurement Block zpci_fmb_enable_device() also resets the software counters. This allows to use "echo 0 > /sys/kernel/debug/pci/<dev>/statistics" followed by echo "1 > /../statistics" to reset all counters. In commit c76c067e488c ("s390/pci: Use dma-iommu layer") this use of the now obsolete counters in struct zpci_device was missed as was their removal. Fix this by resetting the new counters and removing the old ones. Fixes: c76c067e488c ("s390/pci: Use dma-iommu layer") Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/20231004-dma_iommu_fix-v1-1-129777cd8232@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-02s390/pci: Use dma-iommu layerNiklas Schnelle
While s390 already has a standard IOMMU driver and previous changes have added I/O TLB flushing operations this driver is currently only used for user-space PCI access such as vfio-pci. For the DMA API s390 instead utilizes its own implementation in arch/s390/pci/pci_dma.c which drives the same hardware and shares some code but requires a complex and fragile hand over between DMA API and IOMMU API use of a device and despite code sharing still leads to significant duplication and maintenance effort. Let's utilize the common code DMAP API implementation from drivers/iommu/dma-iommu.c instead allowing us to get rid of arch/s390/pci/pci_dma.c. Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-3-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-10-02s390/pci: prepare is_passed_through() for dma-iommuNiklas Schnelle
With the IOMMU always controlled through the IOMMU driver testing for zdev->s390_domain is not a valid indication of the device being passed-through. Instead test if zdev->kzdev is set. Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230928-dma_iommu-v13-2-9e5fc4dacc36@linux.ibm.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-09-19s390: use control register bit definesHeiko Carstens
Use control register bit defines instead of plain numbers where possible. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-09-19s390/ctlreg: add local and system prefix to some functionsHeiko Carstens
Add local and system prefix to some functions to clarify they change control register contents on either the local CPU or the on all CPUs. This results in the following API: Two defines which load and save multiple control registers. The defines correlate with the following C prototypes: void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high); void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high); Two functions which locally set or clear one bit for a specified control register: void local_ctl_set_bit(unsigned int cr, unsigned int bit); void local_ctl_clear_bit(unsigned int cr, unsigned int bit); Two functions which set or clear one bit for a specified control register on all CPUs: void system_ctl_set_bit(unsigned int cr, unsigned int bit); void system_ctl_clear_bit(unsigend int cr, unsigned int bit); Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2023-08-29Merge tag 'mm-stable-2023-08-28-18-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") - Peter Xu has a series (mm/gup: Unify hugetlb, speed up thp") which reduces the special-case code for handling hugetlb pages in GUP. It also speeds up GUP handling of transparent hugepages. - Peng Zhang provides some maple tree speedups ("Optimize the fast path of mas_store()"). - Sergey Senozhatsky has improved te performance of zsmalloc during compaction (zsmalloc: small compaction improvements"). - Domenico Cerasuolo has developed additional selftest code for zswap ("selftests: cgroup: add zswap test program"). - xu xin has doe some work on KSM's handling of zero pages. These changes are mainly to enable the user to better understand the effectiveness of KSM's treatment of zero pages ("ksm: support tracking KSM-placed zero-pages"). - Jeff Xu has fixes the behaviour of memfd's MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED sysctl ("mm/memfd: fix sysctl MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED"). - David Howells has fixed an fscache optimization ("mm, netfs, fscache: Stop read optimisation when folio removed from pagecache"). - Axel Rasmussen has given userfaultfd the ability to simulate memory poisoning ("add UFFDIO_POISON to simulate memory poisoning with UFFD"). - Miaohe Lin has contributed some routine maintenance work on the memory-failure code ("mm: memory-failure: remove unneeded PageHuge() check"). - Peng Zhang has contributed some maintenance work on the maple tree code ("Improve the validation for maple tree and some cleanup"). - Hugh Dickins has optimized the collapsing of shmem or file pages into THPs ("mm: free retracted page table by RCU"). - Jiaqi Yan has a patch series which permits us to use the healthy subpages within a hardware poisoned huge page for general purposes ("Improve hugetlbfs read on HWPOISON hugepages"). - Kemeng Shi has done some maintenance work on the pagetable-check code ("Remove unused parameters in page_table_check"). - More folioification work from Matthew Wilcox ("More filesystem folio conversions for 6.6"), ("Followup folio conversions for zswap"). And from ZhangPeng ("Convert several functions in page_io.c to use a folio"). - page_ext cleanups from Kemeng Shi ("minor cleanups for page_ext"). - Baoquan He has converted some architectures to use the GENERIC_IOREMAP ioremap()/iounmap() code ("mm: ioremap: Convert architectures to take GENERIC_IOREMAP way"). - Anshuman Khandual has optimized arm64 tlb shootdown ("arm64: support batched/deferred tlb shootdown during page reclamation/migration"). - Better maple tree lockdep checking from Liam Howlett ("More strict maple tree lockdep"). Liam also developed some efficiency improvements ("Reduce preallocations for maple tree"). - Cleanup and optimization to the secondary IOMMU TLB invalidation, from Alistair Popple ("Invalidate secondary IOMMU TLB on permission upgrade"). - Ryan Roberts fixes some arm64 MM selftest issues ("selftests/mm fixes for arm64"). - Kemeng Shi provides some maintenance work on the compaction code ("Two minor cleanups for compaction"). - Some reduction in mmap_lock pressure from Matthew Wilcox ("Handle most file-backed faults under the VMA lock"). - Aneesh Kumar contributes code to use the vmemmap optimization for DAX on ppc64, under some circumstances ("Add support for DAX vmemmap optimization for ppc64"). - page-ext cleanups from Kemeng Shi ("add page_ext_data to get client data in page_ext"), ("minor cleanups to page_ext header"). - Some zswap cleanups from Johannes Weiner ("mm: zswap: three cleanups"). - kmsan cleanups from ZhangPeng ("minor cleanups for kmsan"). - VMA handling cleanups from Kefeng Wang ("mm: convert to vma_is_initial_heap/stack()"). - DAMON feature work from SeongJae Park ("mm/damon/sysfs-schemes: implement DAMOS tried total bytes file"), ("Extend DAMOS filters for address ranges and DAMON monitoring targets"). - Compaction work from Kemeng Shi ("Fixes and cleanups to compaction"). - Liam Howlett has improved the maple tree node replacement code ("maple_tree: Change replacement strategy"). - ZhangPeng has a general code cleanup - use the K() macro more widely ("cleanup with helper macro K()"). - Aneesh Kumar brings memmap-on-memory to ppc64 ("Add support for memmap on memory feature on ppc64"). - pagealloc cleanups from Kemeng Shi ("Two minor cleanups for pcp list in page_alloc"), ("Two minor cleanups for get pageblock migratetype"). - Vishal Moola introduces a memory descriptor for page table tracking, "struct ptdesc" ("Split ptdesc from struct page"). - memfd selftest maintenance work from Aleksa Sarai ("memfd: cleanups for vm.memfd_noexec"). - MM include file rationalization from Hugh Dickins ("arch: include asm/cacheflush.h in asm/hugetlb.h"). - THP debug output fixes from Hugh Dickins ("mm,thp: fix sloppy text output"). - kmemleak improvements from Xiaolei Wang ("mm/kmemleak: use object_cache instead of kmemleak_initialized"). - More folio-related cleanups from Matthew Wilcox ("Remove _folio_dtor and _folio_order"). - A VMA locking scalability improvement from Suren Baghdasaryan ("Per-VMA lock support for swap and userfaults"). - pagetable handling cleanups from Matthew Wilcox ("New page table range API"). - A batch of swap/thp cleanups from David Hildenbrand ("mm/swap: stop using page->private on tail pages for THP_SWAP + cleanups"). - Cleanups and speedups to the hugetlb fault handling from Matthew Wilcox ("Change calling convention for ->huge_fault"). - Matthew Wilcox has also done some maintenance work on the MM subsystem documentation ("Improve mm documentation"). * tag 'mm-stable-2023-08-28-18-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (489 commits) maple_tree: shrink struct maple_tree maple_tree: clean up mas_wr_append() secretmem: convert page_is_secretmem() to folio_is_secretmem() nios2: fix flush_dcache_page() for usage from irq context hugetlb: add documentation for vma_kernel_pagesize() mm: add orphaned kernel-doc to the rst files. mm: fix clean_record_shared_mapping_range kernel-doc mm: fix get_mctgt_type() kernel-doc mm: fix kernel-doc warning from tlb_flush_rmaps() mm: remove enum page_entry_size mm: allow ->huge_fault() to be called without the mmap_lock held mm: move PMD_ORDER to pgtable.h mm: remove checks for pte_index memcg: remove duplication detection for mem_cgroup_uncharge_swap mm/huge_memory: work on folio->swap instead of page->private when splitting folio mm/swap: inline folio_set_swap_entry() and folio_swap_entry() mm/swap: use dedicated entry for swap in folio mm/swap: stop using page->private on tail pages for THP_SWAP selftests/mm: fix WARNING comparing pointer to 0 selftests: cgroup: fix test_kmem_memcg_deletion kernel mem check ...
2023-08-23s390/pci: use builtin_misc_device macro to simplify the codeLi Zetao
Use the builtin_misc_device macro to simplify the code, which is the same as declaring with device_initcall(). Signed-off-by: Li Zetao <lizetao1@huawei.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://lore.kernel.org/r/20230815080833.1103609-1-lizetao1@huawei.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>