Age | Commit message (Collapse) | Author |
|
The s390x ISM device data sheet clearly states that only one
request-response sequence is allowable per ISM function at any point in
time. Unfortunately as of today the s390/ism driver in Linux does not
honor that requirement. This patch aims to rectify that.
This problem was discovered based on Aliaksei's bug report which states
that for certain workloads the ISM functions end up entering error state
(with PEC 2 as seen from the logs) after a while and as a consequence
connections handled by the respective function break, and for future
connection requests the ISM device is not considered -- given it is in a
dysfunctional state. During further debugging PEC 3A was observed as
well.
A kernel message like
[ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a
is a reliable indicator of the stated function entering error state
with PEC 2. Let me also point out that a kernel message like
[ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery
is a reliable indicator that the ISM function won't be auto-recovered
because the ISM driver currently lacks support for it.
On a technical level, without this synchronization, commands (inputs to
the FW) may be partially or fully overwritten (corrupted) by another CPU
trying to issue commands on the same function. There is hard evidence that
this can lead to DMB token values being used as DMB IOVAs, leading to
PEC 2 PCI events indicating invalid DMA. But this is only one of the
failure modes imaginable. In theory even completely losing one command
and executing another one twice and then trying to interpret the outputs
as if the command we intended to execute was actually executed and not
the other one is also possible. Frankly, I don't feel confident about
providing an exhaustive list of possible consequences.
Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory")
Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722161817.1298473-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
GCC 14.2.0 reports that passing a non-string literal as the
format argument of dev_set_name() is potentially insecure.
drivers/s390/net/ism_drv.c: In function 'ism_probe':
drivers/s390/net/ism_drv.c:615:2: warning: format not a string literal and no format arguments [-Wformat-security]
615 | dev_set_name(&ism->dev, dev_name(&pdev->dev));
| ^~~~~~~~~~~~
It seems to me that as pdev is a PCIE device then the dev_name
call above should always return the device's BDF, e.g. 00:12.0.
That this should not contain format escape sequences. And thus
the current usage is safe.
But, it seems better to be safe than sorry. And, as a bonus, compiler
output becomes less verbose by addressing this issue.
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250417-ism-str-fmt-v1-1-9818b029874d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Enable Configuration RRS SV, which makes device readiness visible,
early instead of during child bus scanning (Bjorn Helgaas)
- Log debug messages about reset methods being used (Bjorn Helgaas)
- Avoid reset when it has been disabled via sysfs (Nishanth
Aravamudan)
- Add common pci-ep-bus.yaml schema for exporting several peripherals
of a single PCI function via devicetree (Andrea della Porta)
- Create DT nodes for PCI host bridges to enable loading device tree
overlays to create platform devices for PCI devices that have
several features that require multiple drivers (Herve Codina)
Resource management:
- Enlarge devres table[] to accommodate bridge windows, ROM, IOV
BARs, etc., and validate BAR index in devres interfaces (Philipp
Stanner)
- Fix typo that repeatedly distributed resources to a bridge instead
of iterating over subordinate bridges, which resulted in too little
space to assign some BARs (Kai-Heng Feng)
- Relax bridge window tail sizing for optional resources, e.g., IOV
BARs, to avoid failures when removing and re-adding devices (Ilpo
Järvinen)
- Allow drivers to enable devices even if we haven't assigned
optional IOV resources to them (Ilpo Järvinen)
- Rework handling of optional resources (IOV BARs, ROMs) to reduce
failures if we can't allocate them (Ilpo Järvinen)
- Fix a NULL dereference in the SR-IOV VF creation error path (Shay
Drory)
- Fix s390 mmio_read/write syscalls, which didn't cause page faults
in some cases, which broke vfio-pci lazy mapping on first access
(Niklas Schnelle)
- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
was disabled only for s390 (Niklas Schnelle)
- Support mmap of PCI resources on s390 except for ISM devices
(Niklas Schnelle)
ASPM:
- Delay pcie_link_state deallocation to avoid dangling pointers that
cause invalid references during hot-unplug (Daniel Stodden)
Power management:
- Allow PCI bridges to go to D3Hot when suspending on all non-x86
systems (Manivannan Sadhasivam)
Power control:
- Create pwrctrl devices in pci_scan_device() to make it more
symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
for PCI bridges possible (Manivannan Sadhasivam)
- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
can still access devices after pci_stop_dev() (Manivannan
Sadhasivam)
- If there's a pwrctrl device for a PCI device, skip scanning it
because the pwrctrl core will rescan the bus after the device is
powered on (Manivannan Sadhasivam)
- Add a pwrctrl driver for PCI slots based on voltage regulators
described via devicetree (Manivannan Sadhasivam)
Bandwidth control:
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
set_pcie_cooling_state.sh test case (Yi Lai)
- Avoid a NULL pointer dereference when we run out of bus numbers to
assign for a bridge secondary bus (Lukas Wunner)
Hotplug:
- Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
NULL pointer checks (Lukas Wunner)
- Drop shpchp module init/exit logging, replace shpchp dbg() with
ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
(Ilpo Järvinen)
- Drop 'shpchp_debug' module parameter in favor of standard dynamic
debugging (Ilpo Järvinen)
- Drop unused cpcihp .get_power(), .set_power() function pointers
(Guilherme Giacomo Simoes)
- Disable hotplug interrupts in portdrv only when pciehp is not
enabled to avoid issuing two hotplug commands too close together
(Feng Tang)
- Skip pciehp 'device replaced' check if the device has been removed
to address a deadlock when resuming after a device was removed
during system sleep (Lukas Wunner)
- Don't enable pciehp hotplug interupt when resuming in poll mode
(Ilpo Järvinen)
Virtualization:
- Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
Dave)
DOE:
- Expose supported DOE features via sysfs (Alistair Francis)
- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
Francis)
Endpoint framework:
- Convert PCI device data so pci-epf-test works correctly on
big-endian endpoint systems (Niklas Cassel)
- Add BAR_RESIZABLE type to endpoint framework and add DWC core
support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
Cassel)
- Fix pci-epf-test double free that causes an oops if the host
reboots and PERST# deassertion restarts endpoint BAR allocation
(Christian Bruel)
- Fix endpoint BAR testing so tests can skip disabled BARs instead of
reporting them as failures (Niklas Cassel)
- Widen endpoint test BAR size variable to accommodate BARs larger
than INT_MAX (Niklas Cassel)
- Remove unused tools 'pci' build target left over after moving tests
to tools/testing/selftests/pci_endpoint (Jianfeng Liu)
Altera PCIe controller driver:
- Add DT binding and driver support for Agilex family (P-Tile,
F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)
AMD MDB PCIe controller driver:
- Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
(Thippeswamy Havalige)
Broadcom STB PCIe controller driver:
- Add BCM2712 MSI-X DT binding and interrupt controller drivers and
add softdep on irq_bcm2712_mip driver to ensure that it is loaded
first (Stanimir Varbanov)
- Expand inbound window map to 64GB so it can accommodate BCM2712
(Stanimir Varbanov)
- Add BCM2712 support and DT updates (Stanimir Varbanov)
- Apply link speed restriction before bringing link up, not after
(Jim Quinlan)
- Update Max Link Speed in Link Capabilities via the internal
writable register, not the read-only config register (Jim Quinlan)
- Handle regulator_bulk_get() error to avoid panic when we call
regulator_bulk_free() later (Jim Quinlan)
- Disable regulators only when removing the bus immediately below a
Root Port because we don't support regulators deeper in the
hierarchy (Jim Quinlan)
- Make const read-only arrays static (Colin Ian King)
Cadence PCIe endpoint driver:
- Correct MSG TLP generation so endpoints can generate INTx messages
(Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Identify the second controller on i.MX8MQ based on devicetree
'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)
- Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
ATU input address (using parent_bus_offset) from devicetree (Frank
Li)
Freescale Layerscape PCIe controller driver:
- Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
unnecessary 'status' from example (Krzysztof Kozlowski)
- Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
arg_count to fix probe failure on LS1043A (Ioana Ciornei)
HiSilicon STB PCIe controller driver:
- Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
JAILLET)
Intel Gateway PCIe controller driver:
- Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
input address (using parent_bus_offset) from devicetree (Frank Li)
Intel VMD host bridge driver:
- Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
pci_ops.read() will never sleep, even on PREEMPT_RT where
spinlock_t becomes a sleepable lock, to avoid calling a sleeping
function from invalid context (Ryo Takakura)
MediaTek PCIe Gen3 controller driver:
- Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
Bianconi)
- Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
program host bridge memory aperture to this syscon node (Lorenzo
Bianconi)
Qualcomm PCIe controller driver:
- Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)
- Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
Stein)
- Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
Baryshkov)
- Make DT iommu property required for SA8775P and prohibited for
SDX55 (Dmitry Baryshkov)
- Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
Baryshkov)
- Add endpoint DT properties for SAR2130P and enable endpoint mode in
driver (Dmitry Baryshkov)
- Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
RESERVED (Manivannan Sadhasivam)
Rockchip DesignWare PCIe controller driver:
- Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
Cassel)
Synopsys DesignWare PCIe controller driver:
- Add debugfs-based Silicon Debug, Error Injection, Statistical
Counter support for DWC (Shradha Todi)
- Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
Zhang)
- Add Rockchip support for DWC debugfs features (Niklas Cassel)
- Add dw_pcie_parent_bus_offset() to look up the parent bus address
of a specified 'reg' property and return the offset from the CPU
physical address (Frank Li)
- Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
via 'reg[config]' for host controllers and 'reg[addr_space]' for
endpoint controllers (Frank Li)
- Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
of .cpu_addr_fixup() when programming ATU (Frank Li)
TI J721E PCIe driver:
- Correct the 'link down' interrupt bit for J784S4 (Siddharth
Vadapalli)
TI Keystone PCIe controller driver:
- Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
alignment requirement from 1MB to 64KB (Niklas Cassel)
Xilinx Versal CPM PCIe controller driver:
- Free IRQ domain in probe error path to avoid leaking it
(Thippeswamy Havalige)
- Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)
- Add driver support for CPM5_HOST1 (Thippeswamy Havalige)
Miscellaneous:
- Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)
- Use for_each_available_child_of_node_scoped() to simplify apple,
kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"
* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
PCI: endpoint: Add intx_capable to epc_features struct
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
PCI: intel-gw: Remove intel_pcie_cpu_addr()
PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
PCI: dwc: ep: Ensure proper iteration over outbound map windows
PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
PCI: dwc: Add dw_pcie_parent_bus_offset()
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
PCI: brcmstb: Make const read-only arrays static
...
|
|
So far s390 does not allow mmap() of PCI resources to user-space via the
usual mechanisms, though it does use it for RDMA. For the PCI sysfs
resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor
ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on
disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars
for all devices.
This is partly because access to mapped PCI resources from user-space
requires special PCI load/store memory-I/O (MIO) instructions, or the
special MMIO syscalls when these are not available. Still, such access is
possible and useful not just for RDMA, in fact not being able to mmap() PCI
resources has previously caused extra work when testing devices.
One thing that doesn't work with PCI resources mapped to user-space though
is the s390 specific virtual ISM device. Not only because the BAR size of
256 TiB prevents mapping the whole BAR but also because access requires use
of the legacy PCI instructions which are not accessible to user-space on
systems with the newer MIO PCI instructions.
Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping
its resources while making this functionality available for all other PCI
devices. To this end introduce a minimal implementation of PCI_QUIRKS and
use that to set pdev->non_mappable_bars for ISM devices only. Then also set
ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic
implementation of pci_mmap_resource_range() enabling only the newer sysfs
mmap() interface. This follows the recommendation in
Documentation/PCI/sysfs-pci.rst.
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
According to device_release() in /drivers/base/core.c,
a device without a release function is a broken device
and must be fixed.
The current code directly frees the device after calling device_add()
without waiting for other kernel parts to release their references.
Thus, a reference could still be held to a struct device,
e.g., by sysfs, leading to potential use-after-free
issues if a proper release function is not set.
Fixes: 8c81ba20349d ("net/smc: De-tangle ism and smc device initialization")
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250214120137.563409-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The struct 'ism_client' is specialized for s390 platform firmware ISM.
So replace it with 'void' to make SMCD DMB registration helper generic
for both Emulated-ISM and existing ISM.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since [1], dma_alloc_coherent() does not accept requests for GFP_COMP
anymore, even on archs that may be able to fulfill this. Functionality that
relied on the receive buffer being a compound page broke at that point:
The SMC-D protocol, that utilizes the ism device driver, passes receive
buffers to the splice processor in a struct splice_pipe_desc with a
single entry list of struct pages. As the buffer is no longer a compound
page, the splice processor now rejects requests to handle more than a
page worth of data.
Replace dma_alloc_coherent() and allocate a buffer with folio_alloc and
create a DMA map for it with dma_map_page(). Since only receive buffers
on ISM devices use DMA, qualify the mapping as FROM_DEVICE.
Since ISM devices are available on arch s390, only, and on that arch all
DMA is coherent, there is no need to introduce and export some kind of
dma_sync_to_cpu() method to be called by the SMC-D protocol layer.
Analogously, replace dma_free_coherent by a two step dma_unmap_page,
then folio_put to free the receive buffer.
[1] https://lore.kernel.org/all/20221113163535.884299-1-hch@lst.de/
Fixes: c08004eede4b ("s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The System EID (SEID) is an internal EID that is used by the SMCv2
software stack that has a predefined and constant value representing
the s390 physical machine that the OS is executing on. So it should
be managed by SMC stack instead of ISM driver and be consistent for
all ISMv2 device (including virtual ISM devices) on s390 architecture.
Suggested-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-and-tested-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to virtual ISM support feature defined by SMCv2.1, GIDs of
virtual ISM device are UUIDs defined by RFC4122, which are 128-bits
long. So some adaptation work is required. And note that the GIDs of
existing platform firmware ISM devices still remain 64-bits long.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit a72178cfe855 ("net/smc: Fix dependency of SMC on ISM")
you can build the ism code without selecting the SMC network protocol.
That leaves some ism functions be reported as unused. Move these
functions under the conditional compile with CONFIG_SMC.
Also codify the suggestion to also configure the SMC protocol in ism's
Kconfig - but with an "imply" rather than a "select" as SMC depends on
other config options and allow for a deliberate decision not to build
SMC. Also, mention that in ISM's help.
Fixes: a72178cfe855 ("net/smc: Fix dependency of SMC on ISM")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/netdev/afd142a2-1fa0-46b9-8b2d-7652d41d3ab8@infradead.org/
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ism_unregister_client() is called but the client still has DMBs
registered it returns -EBUSY and prints an error. This only happens
after the client has already been unregistered however. This is
unexpected as the unregister claims to have failed. Furthermore as this
implies a client bug a WARN() is more appropriate. Thus move the
deregistration after the check and use WARN().
Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously the clients_lock was protecting the clients array against
concurrent addition/removal of clients but was also accessed from IRQ
context. This meant that it had to be a spinlock and that the add() and
remove() callbacks in which clients need to do allocation and take
mutexes can't be called under the clients_lock. To work around this these
callbacks were moved to workqueues. This not only introduced significant
complexity but is also subtly broken in at least one way.
In ism_dev_init() and ism_dev_exit() clients[i]->tgt_ism is used to
communicate the added/removed ISM device to the work function. While
write access to client[i]->tgt_ism is protected by the clients_lock and
the code waits that there is no pending add/remove work before and after
setting clients[i]->tgt_ism this is not enough. The problem is that the
wait happens based on per ISM device counters. Thus a concurrent
ism_dev_init()/ism_dev_exit() for a different ISM device may overwrite
a clients[i]->tgt_ism between unlocking the clients_lock and the
subsequent wait for the work to finnish.
Thankfully with the clients_lock no longer held in IRQ context it can be
turned into a mutex which can be held during the calls to add()/remove()
completely removing the need for the workqueues and the associated
broken housekeeping including the per ISM device counters and the
clients[i]->tgt_ism.
Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The clients array references all registered clients and is protected by
the clients_lock. Besides its use as general list of clients the clients
array is accessed in ism_handle_irq() to forward ISM device events to
clients.
While the clients_lock is taken in the IRQ handler when calling
handle_event() it is however incorrectly not held during the
client->handle_irq() call and for the preceding clients[] access leaving
it unprotected against concurrent client (un-)registration.
Furthermore the accesses to ism->sba_client_arr[] in ism_register_dmb()
and ism_unregister_dmb() are not protected by any lock. This is
especially problematic as the client ID from the ism->sba_client_arr[]
is not checked against NO_CLIENT and neither is the client pointer
checked.
Instead of expanding the use of the clients_lock further add a separate
array in struct ism_dev which references clients subscribed to the
device's events and IRQs. This array is protected by ism->lock which is
already taken in ism_handle_irq() and can be taken outside the IRQ
handler when adding/removing subscribers or the accessing
ism->sba_client_arr[]. This also means that the clients_lock is no
longer taken in IRQ context.
Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
include/linux/mlx5/driver.h
617f5db1a626 ("RDMA/mlx5: Fix affinity assignment")
dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
https://lore.kernel.org/all/20230613125939.595e50b8@canb.auug.org.au/
tools/testing/selftests/net/mptcp/mptcp_join.sh
47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported")
425ba803124b ("selftests: mptcp: join: support RM_ADDR for used endpoints or not")
45b1a1227a7a ("mptcp: introduces more address related mibs")
0639fa230a21 ("selftests: mptcp: add explicit check for new mibs")
https://lore.kernel.org/netdev/20230609-upstream-net-20230610-mptcp-selftests-support-old-kernels-part-3-v1-0-2896fe2ee8a3@tessares.net/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch prevents the system from crashing when unloading the ISM module.
How to reproduce: Attach an ISM device and execute 'rmmod ism'.
Error-Log:
- Trying to free already-free IRQ 0
- WARNING: CPU: 1 PID: 966 at kernel/irq/manage.c:1890 free_irq+0x140/0x540
After calling ism_dev_exit() for each ISM device in the exit routine,
pci_unregister_driver() will execute ism_remove() for each ISM device.
Because ism_remove() also calls ism_dev_exit(),
free_irq(pci_irq_vector(pdev, 0), ism) is called twice for each ISM
device. This results in a crash with the error
'Trying to free already-free IRQ'.
In the exit routine, it is enough to call pci_unregister_driver()
because it ensures that ism_dev_exit() is called once per
ISM device.
Cc: <stable@vger.kernel.org> # 6.3+
Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A future change will convert the DMA API implementation from the
architecture specific arch/s390/pci/pci_dma.c to using the common code
drivers/iommu/dma-iommu.c which the utilizes the same IOMMU hardware
through the s390-iommu driver. Unlike the s390 specific DMA API this
requires devices to correctly set the coherent mask to be allowed to use
IOVAs >2^32 in dma_alloc_coherent(). This was however not done for ISM
devices. ISM requires such addresses since currently the DMA aperture
for PCI devices starts at 2^32 and all calls to dma_alloc_coherent()
would thus fail.
Link: https://lore.kernel.org/all/20230310-dma_iommu-v9-1-65bb8edd2beb@linux.ibm.com/
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20230524075411.3734141-1-schnelle@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;
pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}
pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, v2 support was derived from a very specific format of the SEID
as part of the SMC-D codebase. Make this part of the SMC-D device API, so
implementers do not need to adhere to a specific SEID format.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct device for ISM devices was part of struct smcd_dev. Move to
struct ism_dev, provide a new API call in struct smcd_ops, and convert
existing SMCD code accordingly.
Furthermore, remove struct smcd_dev from struct ism_dev.
This is the final part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ism module had SMC-D-specific code sprinkled across the entire module.
We are now consolidating the SMC-D-specific parts into the latter parts
of the module, so it becomes more clear what code is intended for use with
ISM, and which parts are glue code for usage in the context of SMC-D.
This is the fourth part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We separate the code implementing the struct smcd_ops API in the ISM
device driver from the functions that may be used by other exploiters of
ISM devices.
Note: We start out small, and don't offer the whole breadth of the ISM
device for public use, as many functions are specific to or likely only
ever used in the context of SMC-D.
This is the third part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register the smc module with the new ism device driver API.
This is the second part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new API that allows other drivers to concurrently access ISM devices.
To do so, we introduce a new API that allows other modules to register for
ISM device usage. Furthermore, we move the GID to struct ism, where it
belongs conceptually, and rename and relocate struct smcd_event to struct
ism_event.
This is the first part of a bigger overhaul of the interfaces between SMC
and ISM.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conceptually, a DMB is a structure that belongs to ISM devices. However,
SMC currently 'owns' this structure. So future exploiters of ISM devices
would be forced to include SMC headers to work - which is just weird.
Therefore, we switch ISM to struct ism_dmb, introduce a new public header
with the definition (will be populated with further API calls later on),
and, add a thin wrapper to please SMC. Since structs smcd_dmb and ism_dmb
are identical, we can simply convert between the two for now.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Jan Karcher <jaka@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dma_alloc_coherent is an opaque allocator that only uses the GFP_ flags
for allocation context control. Don't pass __GFP_COMP which makes no
sense for an allocation that can't in any way be converted to a page
pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Wenjia Zhang <wenjia@linux.ibm.com>
|
|
Make the DMBE bits, which are passed on individually in ism_move() as
parameter idx, available to the receiver.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Reworked signature of the function to retrieve the system EID: No plausible
reason to use a double pointer. And neither to pass in the device as an
argument, as this identifier is by definition per system, not per device.
Plus some minor consistency edits.
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below.
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The system EID that is defined by the ISM driver is not correct. Using
an incorrect system EID allows to communicate with remote Linux systems
that use the same incorrect system EID, but when it comes to
interoperability with other operating systems then the system EIDs do
never match which prevents SMC-Dv2 communication.
Using the correct system EID fixes this problem.
Fixes: 201091ebb2a1 ("net/smc: introduce System Enterprise ID (SEID)")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With SMCD version 2 the CHIDs of ISM devices are needed for the
CLC handshake.
This patch provides the new callback to retrieve the CHID of an
ISM device.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SMCD version 2 defines a System Enterprise ID (short SEID).
This patch contains the SEID creation and adds the callback to
retrieve the created SEID.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the ism driver allocates a new dmb in ism_alloc_dmb() it must
first check for and reserve a slot in the sba bitmap. When
find_next_zero_bit() finds no free slot then the return code is -ENOMEM.
This code conflicts with the error when the alloc() fails later in the
code. As a result of that the caller can not differentiate
between out-of-memory conditions and sba-bitmap-full conditions.
Fix that by using the return code -ENOSPC when the sba slot
reservation failed.
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix to return negative error code -ENOMEM from the smcd_alloc_dev()
error handling case instead of 0, as done elsewhere in this function.
Fixes: 684b89bc39ce ("s390/ism: add device driver for internal shared memory")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As s390 no longer supports ARCH_HIBERNATION_POSSIBLE, drop the unused
pm ops from the ism driver.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
ISM devices are special in how they access PCI memory space. Provide
wrappers for handling commands to the device. No functional change.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
Prior to dma unmap/free operations the ism driver tries to ensure
that the memory is no longer accessed by the HW. When errors
during deregistration of memory regions from the HW occur the ism
driver will not unmap/free this memory.
When we receive notification from the hypervisor that a PCI function
has been detached we can no longer access the device and would never
unmap/free these memory regions which led to complaints by the DMA
debug API.
Treat this kind of errors during the deregistration of memory regions
from the HW as success since it is already ensured that the memory
is no longer accessed by HW.
Reported-by: Karsten Graul <kgraul@linux.ibm.com>
Reported-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
SMC-D stress workload showed connection stalls. Since the firmware
decides to skip raising an interrupt if the SBA DMBE mask bit is
still set, this SBA DMBE mask bit should be cleared before the
IRQ handling in the SMC code runs. Otherwise there are small windows
possible with missing interrupts for incoming data.
SMC-D currently does not care about the old value of the SBA DMBE
mask.
Acked-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The few callers can just use dma_set_max_seg_size ()directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The two callers can just use dma_set_seg_boundary() directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Add support for the Internal Shared Memory vPCI Adapter.
This driver implements the interfaces of the SMC-D protocol.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|