Age | Commit message (Collapse) | Author |
|
SPI rx data buffer can contain one or more receive data chunks. A receive
data chunk consists a 64 bytes receive data chunk payload followed a
4 bytes data footer at the end. The data footer contains the information
needed to determine the validity and location of the receive frame data
within the receive data chunk payload and the host can use these
information to generate ethernet frame. Initially the receive chunks
available will be updated from the buffer status register and then it
will be updated from the footer received on each spi data transfer. Tx
data valid or empty chunks equal to the number receive chunks available
will be transmitted in the MOSI to receive all the rx chunks.
Additionally the receive data footer contains the below information as
well. The received footer will be examined for the receive errors if any.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-11-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The transmit ethernet frame will be converted into multiple transmit data
chunks. Each transmit data chunk consists of a 4 bytes header followed by
a 64 bytes transmit data chunk payload. The 4 bytes data header occurs at
the beginning of each transmit data chunk on MOSI. The data header
contains the information needed to determine the validity and location of
the transmit frame data within the data chunk payload. The number of
transmit data chunks transmitted to mac-phy is limited to the number
transmit credits available in the mac-phy. Initially the transmit credits
will be updated from the buffer status register and then it will be
updated from the footer received on each spi data transfer. The received
footer will be examined for the transmit errors if any.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-10-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enabling Configuration Synchronization bit (SYNC) in the Configuration
Register #0 enables data communication in the MAC-PHY. The state of this
bit is reflected in the data footer SYNC bit.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-9-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds c45 registers direct access support in Microchip's
LAN865x internal PHY.
OPEN Alliance 10BASE-T1x compliance MAC-PHYs will have both C22 and C45
registers space. If the PHY is discovered via C22 bus protocol it assumes
it uses C22 protocol and always uses C22 registers indirect access to
access C45 registers. This is because, we don't have a clean separation
between C22/C45 register space and C22/C45 MDIO bus protocols. Resulting,
PHY C45 registers direct access can't be used which can save multiple SPI
bus access. To support this feature, set .read_mmd/.write_mmd in the PHY
driver to call .read_c45/.write_c45 in the OPEN Alliance framework
drivers/net/ethernet/oa_tc6.c
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-8-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Internal PHY is initialized as per the PHY register capability supported
by the MAC-PHY. Direct PHY Register Access Capability indicates if PHY
registers are directly accessible within the SPI register memory space.
Indirect PHY Register Access Capability indicates if PHY registers are
indirectly accessible through the MDIO/MDC registers MDIOACCn defined in
OPEN Alliance specification. Currently the direct register access is only
supported.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-7-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This will unmask the following error interrupts from the MAC-PHY.
tx protocol error
rx buffer overflow error
loss of framing error
header error
The MAC-PHY will signal an error by setting the EXST bit in the receive
data footer which will then allow the host to read the STATUS0 register
to find the source of the error.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-6-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Reset complete bit is set when the MAC-PHY reset completes and ready for
configuration. Additionally reset complete bit in the STS0 register has
to be written by one upon reset complete to clear the interrupt.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-5-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement register read operation according to the control communication
specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface
document. Control read commands are used by the SPI host to read
registers within the MAC-PHY. Each control read commands are composed of
a 32 bits control command header.
The MAC-PHY ignores all data from the SPI host following the control
header for the remainder of the control read command. Control read
commands can read either a single register or multiple consecutive
registers. When multiple consecutive registers are read, the address is
automatically post-incremented by the MAC-PHY. Reading any unimplemented
or undefined registers shall return zero.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-4-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement register write operation according to the control communication
specified in the OPEN Alliance 10BASE-T1x MACPHY Serial Interface
document. Control write commands are used by the SPI host to write
registers within the MAC-PHY. Each control write commands are composed of
a 32 bits control command header followed by register write data.
The MAC-PHY ignores the final 32 bits of data from the SPI host at the
end of the control write command. The write command and data is also
echoed from the MAC-PHY back to the SPI host to enable the SPI host to
identify which register write failed in the case of any bus errors.
Control write commands can write either a single register or multiple
consecutive registers. When multiple consecutive registers are written,
the address is automatically post-incremented by the MAC-PHY. Writing to
any unimplemented or undefined registers shall be ignored and yield no
effect.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-3-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The IEEE 802.3cg project defines two 10 Mbit/s PHYs operating over a
single pair of conductors. The 10BASE-T1L (Clause 146) is a long reach
PHY supporting full duplex point-to-point operation over 1 km of single
balanced pair of conductors. The 10BASE-T1S (Clause 147) is a short reach
PHY supporting full / half duplex point-to-point operation over 15 m of
single balanced pair of conductors, or half duplex multidrop bus
operation over 25 m of single balanced pair of conductors.
Furthermore, the IEEE 802.3cg project defines the new Physical Layer
Collision Avoidance (PLCA) Reconciliation Sublayer (Clause 148) meant to
provide improved determinism to the CSMA/CD media access method. PLCA
works in conjunction with the 10BASE-T1S PHY operating in multidrop mode.
The aforementioned PHYs are intended to cover the low-speed / low-cost
applications in industrial and automotive environment. The large number
of pins (16) required by the MII interface, which is specified by the
IEEE 802.3 in Clause 22, is one of the major cost factors that need to be
addressed to fulfil this objective.
The MAC-PHY solution integrates an IEEE Clause 4 MAC and a 10BASE-T1x PHY
exposing a low pin count Serial Peripheral Interface (SPI) to the host
microcontroller. This also enables the addition of Ethernet functionality
to existing low-end microcontrollers which do not integrate a MAC
controller.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Link: https://patch.msgid.link/20240909082514.262942-2-Parthiban.Veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Mina Almasry says:
====================
Device Memory TCP
Device memory TCP (devmem TCP) is a proposal for transferring data
to and/or from device memory efficiently, without bouncing the data
to a host memory buffer.
* Problem:
A large amount of data transfers have device memory as the source
and/or destination. Accelerators drastically increased the volume
of such transfers. Some examples include:
- ML accelerators transferring large amounts of training data from storage
into GPU/TPU memory. In some cases ML training setup time can be as long
as 50% of TPU compute time, improving data transfer throughput &
efficiency can help improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different
hosts, exchange data among them.
- Distributed raw block storage applications transfer large amounts of
data with remote SSDs, much of this data does not require host
processing.
Today, the majority of the Device-to-Device data transfers the network
are implemented as the following low level operations: Device-to-Host
copy, Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers,
and can put significant strains on system resources, such as host memory
bandwidth, PCIe bandwidth, etc. One important reason behind the current
state is the kernel’s lack of semantics to express device to network
transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive
and from device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP
stack normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through
the root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers,
and page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS
support, i.e. the NIC's ability to steer flows into certain rx queues.
This allows the sysadmin to enable devmem TCP on a subset of the rx
queues, and steer devmem TCP traffic onto these queues and non devmem
TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would
suffice. I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
====================
Link: https://patch.msgid.link/20240910171458.219195-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add dmabuf information to page_pool stats:
$ ./cli.py --spec ../netlink/specs/netdev.yaml --dump page-pool-get
...
{'dmabuf': 10,
'id': 456,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 455,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 454,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 453,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 452,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 451,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 450,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
{'dmabuf': 10,
'id': 449,
'ifindex': 3,
'inflight': 1023,
'inflight-mem': 4190208},
And queue stats:
$ ./cli.py --spec ../netlink/specs/netdev.yaml --dump queue-get
...
{'dmabuf': 10, 'id': 8, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 9, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 10, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 11, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 12, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 13, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 14, 'ifindex': 3, 'type': 'rx'},
{'dmabuf': 10, 'id': 15, 'ifindex': 3, 'type': 'rx'},
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-14-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ncdevmem is a devmem TCP netcat. It works similarly to netcat, but it
sends and receives data using the devmem TCP APIs. It uses udmabuf as
the dmabuf provider. It is compatible with a regular netcat running on
a peer, or a ncdevmem running on a peer.
In addition to normal netcat support, ncdevmem has a validation mode,
where it sends a specific pattern and validates this pattern on the
receiver side to ensure data integrity.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240910171458.219195-13-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add documentation outlining the usage and details of devmem TCP.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240910171458.219195-12-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add an interface for the user to notify the kernel that it is done
reading the devmem dmabuf frags returned as cmsg. The kernel will
drop the reference on the frags to make them available for reuse.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-11-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In tcp_recvmsg_locked(), detect if the skb being received by the user
is a devmem skb. In this case - if the user provided the MSG_SOCK_DEVMEM
flag - pass it to tcp_recvmsg_devmem() for custom handling.
tcp_recvmsg_devmem() copies any data in the skb header to the linear
buffer, and returns a cmsg to the user indicating the number of bytes
returned in the linear buffer.
tcp_recvmsg_devmem() then loops over the unaccessible devmem skb frags,
and returns to the user a cmsg_devmem indicating the location of the
data in the dmabuf device memory. cmsg_devmem contains this information:
1. the offset into the dmabuf where the payload starts. 'frag_offset'.
2. the size of the frag. 'frag_size'.
3. an opaque token 'frag_token' to return to the kernel when the buffer
is to be released.
The pages awaiting freeing are stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page. All pages are released when the socket is
destroyed.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910171458.219195-10-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For device memory TCP, we expect the skb headers to be available in host
memory for access, and we expect the skb frags to be in device memory
and unaccessible to the host. We expect there to be no mixing and
matching of device memory frags (unaccessible) with host memory frags
(accessible) in the same skb.
Add a skb->devmem flag which indicates whether the frags in this skb
are device memory frags or not.
__skb_fill_netmem_desc() now checks frags added to skbs for net_iov,
and marks the skb as skb->devmem accordingly.
Add checks through the network stack to avoid accessing the frags of
devmem skbs and avoid coalescing devmem skbs with non devmem skbs.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-9-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Make skb_frag_page() fail in the case where the frag is not backed
by a page, and fix its relevant callers to handle this case.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-8-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement a memory provider that allocates dmabuf devmem in the form of
net_iov.
The provider receives a reference to the struct netdev_dmabuf_binding
via the pool->mp_priv pointer. The driver needs to set this pointer for
the provider in the net_iov.
The provider obtains a reference on the netdev_dmabuf_binding which
guarantees the binding and the underlying mapping remains alive until
the provider is destroyed.
Usage of PP_FLAG_DMA_MAP is required for this memory provide such that
the page_pool can provide the driver with the dma-addrs of the devmem.
Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order !=
0.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-7-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Convert netmem to be a union of struct page and struct netmem. Overload
the LSB of struct netmem* to indicate that it's a net_iov, otherwise
it's a page.
Currently these entries in struct page are rented by the page_pool and
used exclusively by the net stack:
struct {
unsigned long pp_magic;
struct page_pool *pp;
unsigned long _pp_mapping_pad;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
Mirror these (and only these) entries into struct net_iov and implement
netmem helpers that can access these common fields regardless of
whether the underlying type is page or net_iov.
Implement checks for net_iov in netmem helpers which delegate to mm
APIs, to ensure net_iov are never passed to the mm stack.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-6-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement netdev devmem allocator. The allocator takes a given struct
netdev_dmabuf_binding as input and allocates net_iov from that
binding.
The allocation simply delegates to the binding's genpool for the
allocation logic and wraps the returned memory region in a net_iov
struct.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-5-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a netdev_dmabuf_binding struct which represents the
dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
rx queues on the netdevice. On the binding, the dma_buf_attach
& dma_buf_map_attachment will occur. The entries in the sg_table from
mapping will be inserted into a genpool to make it ready
for allocation.
The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
holds the dma-buf offset of the base of the chunk and the dma_addr of
the chunk. Both are needed to use allocations that come from this chunk.
We create a new type that represents an allocation from the genpool:
net_iov. We setup the net_iov allocation size in the
genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
allocated by the page pool and given to the drivers.
The user can unbind the dmabuf from the netdevice by closing the netlink
socket that established the binding. We do this so that the binding is
automatically unbound even if the userspace process crashes.
The binding and unbinding leaves an indicator in struct netdev_rx_queue
that the given queue is bound, and the binding is actuated by resetting
the rx queue using the queue API.
The netdev_dmabuf_binding struct is refcounted, and releases its
resources only when all the refs are released.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # excluding netlink
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-4-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
API takes the dma-buf fd as input, and binds it to the netdevice. The
user can specify the rx queues to bind the dma-buf to.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-3-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add netdev_rx_queue_restart(), which resets an rx queue using the
queue API recently merged[1].
The queue API was merged to enable the core net stack to reset individual
rx queues to actuate changes in the rx queue's configuration. In later
patches in this series, we will use netdev_rx_queue_restart() to reset
rx queues after binding or unbinding dmabuf configuration, which will
cause reallocation of the page_pool to repopulate its memory using the
new configuration.
[1] https://lore.kernel.org/netdev/20240430231420.699177-1-shailend@google.com/T/
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
idpf: XDP chapter II: convert Tx completion to libeth
Alexander Lobakin says:
XDP for idpf is currently 5 chapters:
* convert Rx to libeth;
* convert Tx completion to libeth (this);
* generic XDP and XSk code changes;
* actual XDP for idpf via libeth_xdp;
* XSk for idpf (^).
Part II does the following:
* adds generic libeth Tx completion routines;
* converts idpf to use generic libeth Tx comp routines;
* fixes Tx queue timeouts and robustifies Tx completion in general;
* fixes Tx event/descriptor flushes (writebacks).
Most idpf patches again remove more lines than adds.
Generic Tx completion helpers and structs are needed as libeth_xdp
(Ch. III) makes use of them. WB_ON_ITR is needed since XDPSQs don't
want to work without it at all. Tx queue timeouts fixes are needed
since without them, it's way easier to catch a Tx timeout event when
WB_ON_ITR is enabled.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: enable WB_ON_ITR
idpf: fix netdev Tx queue stop/wake
idpf: refactor Tx completion routines
netdevice: add netdev_tx_reset_subqueue() shorthand
idpf: convert to libeth Tx buffer completion
libeth: add Tx buffer completion helpers
====================
Link: https://patch.msgid.link/20240909205323.3110312-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for cable diagnostics in lan887x PHY.
Using this we can diagnose connected/open/short wires and
also length where cable fault is occurred.
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240909114339.3446-1-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When processing the netlink GET requests to get PHY info, the req_info.pdn
pointer is NULL when no PHY matches the requested parameters, such as when
the phy_index is invalid, or there's simply no PHY attached to the
interface.
Therefore, check the req_info.pdn pointer for NULL instead of
dereferencing it.
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iKRW0WpGAh1tKqY345D8WkYCPm3Y9ym--Si42JZrQAu1g@mail.gmail.com/T/#mfced87d607d18ea32b3b4934dfa18d7b36669285
Fixes: 17194be4c8e1 ("net: ethtool: Introduce a command to list PHYs on an interface")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910174636.857352-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If nvmem loads after the ethernet driver, mac address assignments will
not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a
case so we need to handle that to avoid eth_hw_addr_random.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910220913.14101-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add X4 series. Most functionality is the same as previous
EF10 nics but enough is different to warrant a new nic type struct
and revision; for example legacy interrupts and SRIOV are
not supported.
Most removed features will be re-added later as new implementations.
Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240910153014.12803-1-jonathan.s.cooper@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Don't populate the const read-only array key on the stack at
run time, instead make it static.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910120635.115266-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: fallback to TCP after 3 MPC drop + cache
The SYN + MPTCP_CAPABLE packets could be explicitly dropped by firewalls
somewhere in the network, e.g. if they decide to drop packets based on
the TCP options, instead of stripping them off.
The idea of this series is to fallback to TCP after 3 SYN+MPC drop
(patch 2). If the connection succeeds after the fallback, it very likely
means a blackhole has been detected. In this case (patch 3), MPTCP can
be disabled for a certain period of time, 1h by default. If after this
period, MPTCP is still blocked, the period is doubled. This technique is
inspired by the one used by TCP FastOpen.
This should help applications which want to use MPTCP by default on the
client side if available.
====================
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-0-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An MPTCP firewall blackhole can be detected if the following SYN
retransmission after a fallback to "plain" TCP is accepted.
In case of blackhole, a similar technique to the one in place with TFO
is now used: MPTCP can be disabled for a certain period of time, 1h by
default. This time period will grow exponentially when more blackhole
issues get detected right after MPTCP is re-enabled and will reset to
the initial value when the blackhole issue goes away.
The blackhole period can be modified thanks to a new sysctl knob:
blackhole_timeout. Two new MIB counters help understanding what's
happening:
- 'Blackhole', incremented when a blackhole is detected.
- 'MPCapableSYNTXDisabled', incremented when an MPTCP connection
directly falls back to TCP during the blackhole period.
Because the technique is inspired by the one used by TFO, an important
part of the new code is similar to what can find in tcp_fastopen.c, with
some adaptations to the MPTCP case.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some middleboxes might be nasty with MPTCP, and decide to drop packets
with MPTCP options, instead of just dropping the MPTCP options (or
letting them pass...).
In this case, it sounds better to fallback to "plain" TCP after 2
retransmissions, and try again.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/477
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-2-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This helper will be used outside protocol.h in the following commit.
While at it, also add a 'pr_fallback()' debug print, to help identifying
fallbacks.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-1-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sebastian Andrzej Siewior says:
====================
net: hsr: Use the seqnr lock for frames received via interlink port.
This is follow-up to the thread at
https://lore.kernel.org/all/20240904133725.1073963-1-edumazet@google.com/
====================
Link: https://patch.msgid.link/20240906132816.657485-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove interlink_sequence_nr which is unused.
[ bigeasy: split out from Eric's patch ].
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240906132816.657485-3-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported that the seqnr_lock is not acquire for frames received
over the interlink port. In the interlink case a new seqnr is generated
and assigned to the frame.
Frames, which are received over the slave port have already a sequence
number assigned so the lock is not required.
Acquire the hsr_priv::seqnr_lock during in the invocation of
hsr_forward_skb() if a packet has been received from the interlink port.
Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com
Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ
Fixes: 5055cccfc2d1c ("net: hsr: Provide RedBox support (HSR-SAN)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Tested-by: Lukasz Majewski <lukma@denx.de>
Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.12
The last -next "new features" pull request for v6.12. The stack now
supports DFS on MLO but otherwise nothing really standing out.
Major changes:
cfg80211/mac80211
* EHT rate support in AQL airtime
* DFS support for MLO
rtw89
* complete BT-coexistence code for RTL8852BT
* RTL8922A WoWLAN net-detect support
* tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
wifi: brcmfmac: cfg80211: Convert comma to semicolon
wifi: rsi: Remove an unused field in struct rsi_debugfs
wifi: libertas: Cleanup unused declarations
wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe()
wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe()
wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param
wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()
wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()
wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors
wifi: cfg80211: fix kernel-doc for per-link data
wifi: mt76: mt7925: replace chan config with extend txpower config for clc
wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc
wifi: mt76: mt7615: check devm_kasprintf() returned value
wifi: mt76: mt7925: convert comma to semicolon
wifi: mt76: mt7925: fix a potential association failure upon resuming
wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings
wifi: mt76: mt7921: Check devm_kasprintf() returned value
wifi: mt76: mt7915: check devm_kasprintf() returned value
wifi: mt76: mt7915: avoid long MCU command timeouts during SER
wifi: mt76: mt7996: fix uninitialized TLV data
...
====================
Link: https://patch.msgid.link/20240911084147.A205DC4AF0F@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Raju Lakkaraju says:
====================
Add support to PHYLINK for LAN743x/PCI11x1x chips
This is the follow-up patch series of
https://lkml.iu.edu/hypermail/linux/kernel/2310.2/02078.html
Divide the PHYLINK adaptation and SFP modifications into two separate patch
series.
The current patch series focuses on transitioning the LAN743x driver's PHY
support from phylib to phylink.
Tested on PCI11010 Rev-1 Evaluation board
Change List:
============
V5 -> V6:
- Remove the lan743x_find_max_speed( ) function. Not require
- Add EEE enable check before calling lan743x_mac_eee_enable( ) function
V4 -> V5:
- Remove the fixed_phy_unregister( ) function. Not require
- Remove the "phydev->eee_enabled" check to update the MAC EEE
enable/disable
- Call lan743x_mac_eee_enable() with true after update tx_lpi_timer.
- Add phy_support_eee() to initialize the EEE flags
V3 -> V4:
- Add fixed-link patch along with this series.
Note: Note: This code was developed by Mr.Russell King
Ref:
https://lore.kernel.org/netdev/LV8PR11MB8700C786F5F1C274C73036CC9F8E2@LV8PR11MB8700.namprd11.prod.outlook.com/T/#me943adf54f1ea082edf294aba448fa003a116815
- Change phylink fixed-link function header's string from "Returns" to
"Returns:"
- Remove the EEE private variable from LAN743x adapter strcture and fix the
EEE's set/get functions
- set the individual caps (i.e. _RGMII, _RGMII_ID, _RGMII_RXID and
__RGMII_TXID) replace with phy_interface_set_rgmii( ) function
- Change lan743x_set_eee( ) to lan743x_mac_eee_enable( )
V2 -> V3:
- Remove the unwanted parens in each of these if() sub-blocks
- Replace "to_net_dev(config->dev)" with "netdev".
- Add GMII_ID/RGMII_TXID/RGMII_RXID in supported_interfaces
- Fix the lan743x_phy_handle_exists( ) return type
V1 -> V2:
- Fix the Russell King's comments i.e. remove the speed, duplex update in
lan743x_phylink_mac_config( )
- pre-March 2020 legacy support has been removed
V0 -> V1:
- Integrate with Synopsys DesignWare XPCS drivers
- Based on external review comments,
- Changes made to SGMII interface support only 1G/100M/10M bps speed
- Changes made to 2500Base-X interface support only 2.5Gbps speed
- Add check for not is_sgmii_en with is_sfp_support_en support
- Change the "pci11x1x_strap_get_status" function return type from void to
int
- Add ethtool phylink wol, eee, pause get/set functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support to ethtool phylink functions:
- get/set settings like speed, duplex etc
- get/set the wake-on-lan (WOL)
- get/set the energy-efficient ethernet (EEE)
- get/set the pause
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Migrate phy support from phylib to phylink.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create separate Link Speed Duplex (LSD) update state function from
lan743x_sgmii_config () to use as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create separate PCS power reset function from lan743x_sgmii_config () to use
as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
phylink
The function allows for the configuration of a fixed link state for a given
phylink instance. This addition is particularly useful for network devices that
operate with a fixed link configuration, where the link parameters do not change
dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up
the fixed link state during initialization or configuration changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: support devlink subfunction
Michal Swiatkowski says:
Currently ice driver does not allow creating more than one networking
device per physical function. The only way to have more hardware backed
netdev is to use SR-IOV.
Following patchset adds support for devlink port API. For each new
pcisf type port, driver allocates new VSI, configures all resources
needed, including dynamically MSIX vectors, program rules and registers
new netdev.
This series supports only one Tx/Rx queue pair per subfunction.
Example commands:
devlink port add pci/0000:31:00.1 flavour pcisf pfnum 1 sfnum 1000
devlink port function set pci/0000:31:00.1/1 hw_addr 00:00:00:00:03:14
devlink port function set pci/0000:31:00.1/1 state active
devlink port function del pci/0000:31:00.1/1
Make the port representor and eswitch code generic to support
subfunction representor type.
VSI configuration is slightly different between VF and SF. It needs to
be reflected in the code.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: subfunction activation and base devlink ops
ice: basic support for VLAN in subfunctions
ice: support subfunction devlink Tx topology
ice: implement netdevice ops for SF representor
ice: check if SF is ready in ethtool ops
ice: don't set target VSI for subfunction
ice: create port representor for SF
ice: make representor code generic
ice: implement netdev for subfunction
ice: base subfunction aux driver
ice: allocate devlink for subfunction
ice: treat subfunction VSI the same as PF VSI
ice: add basic devlink subfunctions support
ice: export ice ndo_ops functions
ice: add new VSI type for subfunctions
====================
Link: https://patch.msgid.link/20240906223010.2194591-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-09-10
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michael Chan says:
====================
bnxt_en: MSIX improvements
This patchset makes some improvements related to MSIX. The first
patch adjusts the default MSIX vectors assigned for RoCE. On the
PF, the number of MSIX is increased to 64 from the current 9. The
second patch allocates additional MSIX vectors ahead of time when
changing ethtool channels if dynamic MSIX is supported. The 3rd
patch makes sure that the IRQ name is not truncated.
====================
Link: https://patch.msgid.link/20240909202737.93852-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The name field of struct bnxt_irq is written using snprintf in
bnxt_setup_msix(). Make the field large enough to fit the maximal
formatted string to prevent truncation. Truncated IRQ names are
less meaningful to the user. For example, "enp4s0f0np0-TxRx-0"
gets truncated to "enp4s0f0np0-TxRx-" with the existing code.
Make sure we have space for the extra characters added to the IRQ
names:
- the characters introduced by the static format string: hyphens
- the maximal static substituted ring type string: "TxRx"
- the maximum length of an integer formatted as a string, even
though reasonable ring numbers would never be as long as this.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
bnxt_check_rings() is called to ensure that we have the hardware ring
resources before committing to reinitialize with the new number of
rings. MSIX vectors are never checked at this point, because up
until recently we must first disable MSIX before we can allocate the
new set of MSIX vectors.
Now that we support dynamic MSIX allocation, check to make sure we
can dynamically allocate the new MSIX vectors as the last step in
bnxt_check_rings() if dynamic MSIX is supported.
For example, the IOMMU group may limit the number of MSIX vectors
for the device. With this patch, the ring change will fail more
gracefully when there is not enough MSIX vectors.
It is also better to move bnxt_check_rings() to be called as the last
step when changing ethtool rings.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|