diff options
author | David S. Miller <davem@davemloft.net> | 2021-06-08 14:41:10 -0700 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2021-06-08 14:41:10 -0700 |
commit | e0eb625a7da28410507140787bda4a63a5ba5d3e (patch) | |
tree | 339f2f36be7ba58dbc53e0c4c8275104ab926b61 | |
parent | fa6d61e9c7d65e25d8b79e9e85dbfb15545d138c (diff) | |
parent | a01f2cd0ccf473f7af32afc9b74ac5f2caff3c18 (diff) |
Merge branch 'ena-updates'
Shay Agroskin says:
====================
se build_skb and reorganize some code in ENA
this patchset introduces several changes:
- Use build_skb() on RX side.
This allows to ensure that the headers are in the linear part
- Batch some code into functions and remove some of the code to make it more
readable and less error prone
- Fix RST format and outdated description in ENA documentation
- Improve cache alignment in the code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r-- | Documentation/networking/device_drivers/ethernet/amazon/ena.rst | 164 | ||||
-rw-r--r-- | drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 2 | ||||
-rw-r--r-- | drivers/net/ethernet/amazon/ena/ena_com.c | 3 | ||||
-rw-r--r-- | drivers/net/ethernet/amazon/ena/ena_eth_com.c | 30 | ||||
-rw-r--r-- | drivers/net/ethernet/amazon/ena/ena_ethtool.c | 18 | ||||
-rw-r--r-- | drivers/net/ethernet/amazon/ena/ena_netdev.c | 216 | ||||
-rw-r--r-- | drivers/net/ethernet/amazon/ena/ena_netdev.h | 23 |
7 files changed, 249 insertions, 207 deletions
diff --git a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst index f8c6469f2bd2..01b2a69b0cb0 100644 --- a/Documentation/networking/device_drivers/ethernet/amazon/ena.rst +++ b/Documentation/networking/device_drivers/ethernet/amazon/ena.rst @@ -11,12 +11,12 @@ ENA is a networking interface designed to make good use of modern CPU features and system architectures. The ENA device exposes a lightweight management interface with a -minimal set of memory mapped registers and extendable command set +minimal set of memory mapped registers and extendible command set through an Admin Queue. The driver supports a range of ENA devices, is link-speed independent -(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has -a negotiated and extendable feature set. +(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc), and has +a negotiated and extendible feature set. Some ENA devices support SR-IOV. This driver is used for both the SR-IOV Physical Function (PF) and Virtual Function (VF) devices. @@ -27,9 +27,9 @@ is advertised by the device via the Admin Queue), a dedicated MSI-X interrupt vector per Tx/Rx queue pair, adaptive interrupt moderation, and CPU cacheline optimized data placement. -The ENA driver supports industry standard TCP/IP offload features such -as checksum offload and TCP transmit segmentation offload (TSO). -Receive-side scaling (RSS) is supported for multi-core scaling. +The ENA driver supports industry standard TCP/IP offload features such as +checksum offload. Receive-side scaling (RSS) is supported for multi-core +scaling. The ENA driver and its corresponding devices implement health monitoring mechanisms such as watchdog, enabling the device and driver @@ -38,22 +38,20 @@ debug logs. Some of the ENA devices support a working mode called Low-latency Queue (LLQ), which saves several more microseconds. - ENA Source Code Directory Structure =================================== ================= ====================================================== ena_com.[ch] Management communication layer. This layer is - responsible for the handling all the management - (admin) communication between the device and the - driver. + responsible for the handling all the management + (admin) communication between the device and the + driver. ena_eth_com.[ch] Tx/Rx data path. ena_admin_defs.h Definition of ENA management interface. ena_eth_io_defs.h Definition of ENA data path interface. ena_common_defs.h Common definitions for ena_com layer. ena_regs_defs.h Definition of ENA PCI memory-mapped (MMIO) registers. ena_netdev.[ch] Main Linux kernel driver. -ena_syfsfs.[ch] Sysfs files. ena_ethtool.c ethtool callbacks. ena_pci_id_tbl.h Supported device IDs. ================= ====================================================== @@ -69,7 +67,7 @@ ENA management interface is exposed by means of: - Asynchronous Event Notification Queue (AENQ) ENA device MMIO Registers are accessed only during driver -initialization and are not involved in further normal device +initialization and are not used during further normal device operation. AQ is used for submitting management commands, and the @@ -100,28 +98,27 @@ group may have multiple syndromes, as shown below The events are: - ==================== =============== - Group Syndrome - ==================== =============== - Link state change **X** - Fatal error **X** - Notification Suspend traffic - Notification Resume traffic - Keep-Alive **X** - ==================== =============== +==================== =============== +Group Syndrome +==================== =============== +Link state change **X** +Fatal error **X** +Notification Suspend traffic +Notification Resume traffic +Keep-Alive **X** +==================== =============== ACQ and AENQ share the same MSI-X vector. -Keep-Alive is a special mechanism that allows monitoring of the -device's health. The driver maintains a watchdog (WD) handler which, -if fired, logs the current state and statistics then resets and -restarts the ENA device and driver. A Keep-Alive event is delivered by -the device every second. The driver re-arms the WD upon reception of a -Keep-Alive event. A missed Keep-Alive event causes the WD handler to -fire. +Keep-Alive is a special mechanism that allows monitoring the device's health. +A Keep-Alive event is delivered by the device every second. +The driver maintains a watchdog (WD) handler which logs the current state and +statistics. If the keep-alive events aren't delivered as expected the WD resets +the device and the driver. Data Path Interface =================== + I/O operations are based on Tx and Rx Submission Queues (Tx SQ and Rx SQ correspondingly). Each SQ has a completion queue (CQ) associated with it. @@ -131,26 +128,24 @@ physical memory. The ENA driver supports two Queue Operation modes for Tx SQs: -- Regular mode +- **Regular mode:** + In this mode the Tx SQs reside in the host's memory. The ENA + device fetches the ENA Tx descriptors and packet data from host + memory. - * In this mode the Tx SQs reside in the host's memory. The ENA - device fetches the ENA Tx descriptors and packet data from host - memory. +- **Low Latency Queue (LLQ) mode or "push-mode":** + In this mode the driver pushes the transmit descriptors and the + first 128 bytes of the packet directly to the ENA device memory + space. The rest of the packet payload is fetched by the + device. For this operation mode, the driver uses a dedicated PCI + device memory BAR, which is mapped with write-combine capability. -- Low Latency Queue (LLQ) mode or "push-mode". - - * In this mode the driver pushes the transmit descriptors and the - first 128 bytes of the packet directly to the ENA device memory - space. The rest of the packet payload is fetched by the - device. For this operation mode, the driver uses a dedicated PCI - device memory BAR, which is mapped with write-combine capability. + **Note that** not all ENA devices support LLQ, and this feature is negotiated + with the device upon initialization. If the ENA device does not + support LLQ mode, the driver falls back to the regular mode. The Rx SQs support only the regular mode. -Note: Not all ENA devices support LLQ, and this feature is negotiated - with the device upon initialization. If the ENA device does not - support LLQ mode, the driver falls back to the regular mode. - The driver supports multi-queue for both Tx and Rx. This has various benefits: @@ -165,6 +160,7 @@ benefits: Interrupt Modes =============== + The driver assigns a single MSI-X vector per queue pair (for both Tx and Rx directions). The driver assigns an additional dedicated MSI-X vector for management (for ACQ and AENQ). @@ -190,20 +186,21 @@ unmasked by the driver after NAPI processing is complete. Interrupt Moderation ==================== + ENA driver and device can operate in conventional or adaptive interrupt moderation mode. -In conventional mode the driver instructs device to postpone interrupt +**In conventional mode** the driver instructs device to postpone interrupt posting according to static interrupt delay value. The interrupt delay -value can be configured through ethtool(8). The following ethtool -parameters are supported by the driver: tx-usecs, rx-usecs +value can be configured through `ethtool(8)`. The following `ethtool` +parameters are supported by the driver: ``tx-usecs``, ``rx-usecs`` -In adaptive interrupt moderation mode the interrupt delay value is +**In adaptive interrupt** moderation mode the interrupt delay value is updated by the driver dynamically and adjusted every NAPI cycle according to the traffic nature. -Adaptive coalescing can be switched on/off through ethtool(8) -adaptive_rx on|off parameter. +Adaptive coalescing can be switched on/off through `ethtool(8)`'s +:code:`adaptive_rx on|off` parameter. More information about Adaptive Interrupt Moderation (DIM) can be found in Documentation/networking/net_dim.rst @@ -214,17 +211,10 @@ The rx_copybreak is initialized by default to ENA_DEFAULT_RX_COPYBREAK and can be configured by the ETHTOOL_STUNABLE command of the SIOCETHTOOL ioctl. -SKB -=== -The driver-allocated SKB for frames received from Rx handling using -NAPI context. The allocation method depends on the size of the packet. -If the frame length is larger than rx_copybreak, napi_get_frags() -is used, otherwise netdev_alloc_skb_ip_align() is used, the buffer -content is copied (by CPU) to the SKB, and the buffer is recycled. - Statistics ========== -The user can obtain ENA device and driver statistics using ethtool. + +The user can obtain ENA device and driver statistics using `ethtool`. The driver can collect regular or extended statistics (including per-queue stats) from the device. @@ -232,22 +222,23 @@ In addition the driver logs the stats to syslog upon device reset. MTU === + The driver supports an arbitrarily large MTU with a maximum that is negotiated with the device. The driver configures MTU using the SetFeature command (ENA_ADMIN_MTU property). The user can change MTU -via ip(8) and similar legacy tools. +via `ip(8)` and similar legacy tools. Stateless Offloads ================== + The ENA driver supports: -- TSO over IPv4/IPv6 -- TSO with ECN - IPv4 header checksum offload - TCP/UDP over IPv4/IPv6 checksum offloads RSS === + - The ENA device supports RSS that allows flexible Rx traffic steering. - Toeplitz and CRC32 hash functions are supported. @@ -260,41 +251,42 @@ RSS function delivered in the Rx CQ descriptor is set in the received SKB. - The user can provide a hash key, hash function, and configure the - indirection table through ethtool(8). + indirection table through `ethtool(8)`. DATA PATH ========= + Tx -- -ena_start_xmit() is called by the stack. This function does the following: +:code:`ena_start_xmit()` is called by the stack. This function does the following: -- Maps data buffers (skb->data and frags). -- Populates ena_buf for the push buffer (if the driver and device are - in push mode.) +- Maps data buffers (``skb->data`` and frags). +- Populates ``ena_buf`` for the push buffer (if the driver and device are + in push mode). - Prepares ENA bufs for the remaining frags. -- Allocates a new request ID from the empty req_id ring. The request +- Allocates a new request ID from the empty ``req_id`` ring. The request ID is the index of the packet in the Tx info. This is used for - out-of-order TX completions. + out-of-order Tx completions. - Adds the packet to the proper place in the Tx ring. -- Calls ena_com_prepare_tx(), an ENA communication layer that converts - the ena_bufs to ENA descriptors (and adds meta ENA descriptors as - needed.) +- Calls :code:`ena_com_prepare_tx()`, an ENA communication layer that converts + the ``ena_bufs`` to ENA descriptors (and adds meta ENA descriptors as + needed). * This function also copies the ENA descriptors and the push buffer - to the Device memory space (if in push mode.) + to the Device memory space (if in push mode). -- Writes doorbell to the ENA device. +- Writes a doorbell to the ENA device. - When the ENA device finishes sending the packet, a completion interrupt is raised. - The interrupt handler schedules NAPI. -- The ena_clean_tx_irq() function is called. This function handles the +- The :code:`ena_clean_tx_irq()` function is called. This function handles the completion descriptors generated by the ENA, with a single completion descriptor per completed packet. - * req_id is retrieved from the completion descriptor. The tx_info of - the packet is retrieved via the req_id. The data buffers are - unmapped and req_id is returned to the empty req_id ring. + * ``req_id`` is retrieved from the completion descriptor. The ``tx_info`` of + the packet is retrieved via the ``req_id``. The data buffers are + unmapped and ``req_id`` is returned to the empty ``req_id`` ring. * The function stops when the completion descriptors are completed or the budget is reached. @@ -303,12 +295,11 @@ Rx - When a packet is received from the ENA device. - The interrupt handler schedules NAPI. -- The ena_clean_rx_irq() function is called. This function calls - ena_rx_pkt(), an ENA communication layer function, which returns the - number of descriptors used for a new unhandled packet, and zero if +- The :code:`ena_clean_rx_irq()` function is called. This function calls + :code:`ena_com_rx_pkt()`, an ENA communication layer function, which returns the + number of descriptors used for a new packet, and zero if no new packet is found. -- Then it calls the ena_clean_rx_irq() function. -- ena_eth_rx_skb() checks packet length: +- :code:`ena_rx_skb()` checks packet length: * If the packet is small (len < rx_copybreak), the driver allocates a SKB for the new packet, and copies the packet payload into the @@ -317,9 +308,10 @@ Rx - In this way the original data buffer is not passed to the stack and is reused for future Rx packets. - * Otherwise the function unmaps the Rx buffer, then allocates the - new SKB structure and hooks the Rx buffer to the SKB frags. + * Otherwise the function unmaps the Rx buffer, sets the first + descriptor as `skb`'s linear part and the other descriptors as the + `skb`'s frags. - The new SKB is updated with the necessary information (protocol, - checksum hw verify result, etc.), and then passed to the network - stack, using the NAPI interface function napi_gro_receive(). + checksum hw verify result, etc), and then passed to the network + stack, using the NAPI interface function :code:`napi_gro_receive()`. diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h index 4164eacc5c28..f5ec35fa4c63 100644 --- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h +++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h @@ -1042,8 +1042,6 @@ enum ena_admin_aenq_group { }; enum ena_admin_aenq_notification_syndrome { - ENA_ADMIN_SUSPEND = 0, - ENA_ADMIN_RESUME = 1, ENA_ADMIN_UPDATE_HINTS = 2, }; diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index 764852ead1d6..ab413fc1f68e 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -1979,7 +1979,8 @@ int ena_com_get_dev_attr_feat(struct ena_com_dev *ena_dev, if (rc) return rc; - if (get_resp.u.max_queue_ext.version != ENA_FEATURE_MAX_QUEUE_EXT_VER) + if (get_resp.u.max_queue_ext.version != + ENA_FEATURE_MAX_QUEUE_EXT_VER) return -EINVAL; memcpy(&get_feat_ctx->max_queue_ext, &get_resp.u.max_queue_ext, diff --git a/drivers/net/ethernet/amazon/ena/ena_eth_com.c b/drivers/net/ethernet/amazon/ena/ena_eth_com.c index c3be751e7379..3d6f0a466a9e 100644 --- a/drivers/net/ethernet/amazon/ena/ena_eth_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_eth_com.c @@ -151,11 +151,14 @@ static int ena_com_close_bounce_buffer(struct ena_com_io_sq *io_sq) return 0; /* bounce buffer was used, so write it and get a new one */ - if (pkt_ctrl->idx) { + if (likely(pkt_ctrl->idx)) { rc = ena_com_write_bounce_buffer_to_dev(io_sq, pkt_ctrl->curr_bounce_buf); - if (unlikely(rc)) + if (unlikely(rc)) { + netdev_err(ena_com_io_sq_to_ena_dev(io_sq)->net_device, + "Failed to write bounce buffer to device\n"); return rc; + } pkt_ctrl->curr_bounce_buf = ena_com_get_next_bounce_buffer(&io_sq->bounce_buf_ctrl); @@ -185,8 +188,11 @@ static int ena_com_sq_update_llq_tail(struct ena_com_io_sq *io_sq) if (!pkt_ctrl->descs_left_in_line) { rc = ena_com_write_bounce_buffer_to_dev(io_sq, pkt_ctrl->curr_bounce_buf); - if (unlikely(rc)) + if (unlikely(rc)) { + netdev_err(ena_com_io_sq_to_ena_dev(io_sq)->net_device, + "Failed to write bounce buffer to device\n"); return rc; + } pkt_ctrl->curr_bounce_buf = ena_com_get_next_bounce_buffer(&io_sq->bounce_buf_ctrl); @@ -406,8 +412,11 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq, } if (unlikely(io_sq->mem_queue_type == ENA_ADMIN_PLACEMENT_POLICY_DEV && - !buffer_to_push)) + !buffer_to_push)) { + netdev_err(ena_com_io_sq_to_ena_dev(io_sq)->net_device, + "Push header wasn't provided in LLQ mode\n"); return -EINVAL; + } rc = ena_com_write_header_to_bounce(io_sq, buffer_to_push, header_len); if (unlikely(rc)) @@ -423,6 +432,9 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq, /* If the caller doesn't want to send packets */ if (unlikely(!num_bufs && !header_len)) { rc = ena_com_close_bounce_buffer(io_sq); + if (rc) + netdev_err(ena_com_io_sq_to_ena_dev(io_sq)->net_device, + "Failed to write buffers to LLQ\n"); *nb_hw_desc = io_sq->tail - start_tail; return rc; } @@ -482,8 +494,11 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq, /* The first desc share the same desc as the header */ if (likely(i != 0)) { rc = ena_com_sq_update_tail(io_sq); - if (unlikely(rc)) + if (unlikely(rc)) { + netdev_err(ena_com_io_sq_to_ena_dev(io_sq)->net_device, + "Failed to update sq tail\n"); return rc; + } desc = get_sq_desc(io_sq); if (unlikely(!desc)) @@ -512,8 +527,11 @@ int ena_com_prepare_tx(struct ena_com_io_sq *io_sq, desc->len_ctrl |= ENA_ETH_IO_TX_DESC_LAST_MASK; rc = ena_com_sq_update_tail(io_sq); - if (unlikely(rc)) + if (unlikely(rc)) { + netdev_err(ena_com_io_sq_to_ena_dev(io_sq)->net_device, + "Failed to update sq tail of the last descriptor\n"); return rc; + } rc = ena_com_close_bounce_buffer(io_sq); diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 2fe7ccee55b2..27dae632efcb 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -233,10 +233,13 @@ int ena_get_sset_count(struct net_device *netdev, int sset) { struct ena_adapter *adapter = netdev_priv(netdev); - if (sset != ETH_SS_STATS) - return -EOPNOTSUPP; + switch (sset) { + case ETH_SS_STATS: + return ena_get_sw_stats_count(adapter) + + ena_get_hw_stats_count(adapter); + } - return ena_get_sw_stats_count(adapter) + ena_get_hw_stats_count(adapter); + return -EOPNOTSUPP; } static void ena_queue_strings(struct ena_adapter *adapter, u8 **data) @@ -314,10 +317,11 @@ static void ena_get_ethtool_strings(struct net_device *netdev, { struct ena_adapter *adapter = netdev_priv(netdev); - if (sset != ETH_SS_STATS) - return; - - ena_get_strings(adapter, data, adapter->eni_stats_supported); + switch (sset) { + case ETH_SS_STATS: + ena_get_strings(adapter, data, adapter->eni_stats_supported); + break; + } } static int ena_get_link_ksettings(struct net_device *netdev, diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 881f88754bf6..cd6ea59c543c 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -35,9 +35,6 @@ MODULE_LICENSE("GPL"); #define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | NETIF_MSG_IFUP | \ NETIF_MSG_TX_DONE | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR) -static int debug = -1; -module_param(debug, int, 0); -MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)"); static struct ena_aenq_handlers aenq_handlers; @@ -89,6 +86,12 @@ static void ena_increase_stat(u64 *statp, u64 cnt, u64_stats_update_end(syncp); } +static void ena_ring_tx_doorbell(struct ena_ring *tx_ring) +{ + ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq); + ena_increase_stat(&tx_ring->tx_stats.doorbells, 1, &tx_ring->syncp); +} + static void ena_tx_timeout(struct net_device *dev, unsigned int txqueue) { struct ena_adapter *adapter = netdev_priv(dev); @@ -147,7 +150,7 @@ static int ena_xmit_common(struct net_device *dev, netif_dbg(adapter, tx_queued, dev, "llq tx max burst size of queue %d achieved, writing doorbell to send burst\n", ring->qid); - ena_com_write_sq_doorbell(ring->ena_com_io_sq); + ena_ring_tx_doorbell(ring); } /* prepare the packet's descriptors to dma engine */ @@ -197,7 +200,6 @@ static int ena_xdp_io_poll(struct napi_struct *napi, int budget) int ret; xdp_ring = ena_napi->xdp_ring; - xdp_ring->first_interrupt = ena_napi->first_interrupt; xdp_budget = budget; @@ -229,6 +231,7 @@ static int ena_xdp_io_poll(struct napi_struct *napi, int budget) xdp_ring->tx_stats.napi_comp += napi_comp_call; xdp_ring->tx_stats.tx_poll++; u64_stats_update_end(&xdp_ring->syncp); + xdp_ring->tx_stats.last_napi_jiffies = jiffies; return ret; } @@ -316,14 +319,12 @@ static int ena_xdp_xmit_frame(struct ena_ring *xdp_ring, xdpf->len); if (rc) goto error_unmap_dma; - /* trigger the dma engine. ena_com_write_sq_doorbell() - * has a mb + + /* trigger the dma engine. ena_ring_tx_doorbell() + * calls a memory barrier inside it. */ - if (flags & XDP_XMIT_FLUSH) { - ena_com_write_sq_doorbell(xdp_ring->ena_com_io_sq); - ena_increase_stat(&xdp_ring->tx_stats.doorbells, 1, - &xdp_ring->syncp); - } + if (flags & XDP_XMIT_FLUSH) + ena_ring_tx_doorbell(xdp_ring); return rc; @@ -364,11 +365,8 @@ static int ena_xdp_xmit(struct net_device *dev, int n, } /* Ring doorbell to make device aware of the packets */ - if (flags & XDP_XMIT_FLUSH) { - ena_com_write_sq_doorbell(xdp_ring->ena_com_io_sq); - ena_increase_stat(&xdp_ring->tx_stats.doorbells, 1, - &xdp_ring->syncp); - } + if (flags & XDP_XMIT_FLUSH) + ena_ring_tx_doorbell(xdp_ring); spin_unlock(&xdp_ring->xdp_tx_lock); @@ -383,7 +381,6 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) u32 verdict = XDP_PASS; struct xdp_frame *xdpf; u64 *xdp_stat; - int qid; rcu_read_lock(); xdp_prog = READ_ONCE(rx_ring->xdp_bpf_prog); @@ -404,8 +401,7 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) } /* Find xmit queue */ - qid = rx_ring->qid + rx_ring->adapter->num_io_queues; - xdp_ring = &rx_ring->adapter->tx_ring[qid]; + xdp_ring = rx_ring->xdp_ring; /* The XDP queues are shared between XDP_TX and XDP_REDIRECT */ spin_lock(&xdp_ring->xdp_tx_lock); @@ -532,7 +528,7 @@ static void ena_xdp_exchange_program_rx_in_range(struct ena_adapter *adapter, rx_ring->rx_headroom = XDP_PACKET_HEADROOM; } else { ena_xdp_unregister_rxq_info(rx_ring); - rx_ring->rx_headroom = 0; + rx_ring->rx_headroom = NET_SKB_PAD; } } } @@ -681,7 +677,6 @@ static void ena_init_io_rings_common(struct ena_adapter *adapter, ring->ena_dev = adapter->ena_dev; ring->per_napi_packets = 0; ring->cpu = 0; - ring->first_interrupt = false; ring->no_interrupt_event_cnt = 0; u64_stats_init(&ring->syncp); } @@ -724,7 +719,9 @@ static void ena_init_io_rings(struct ena_adapter *adapter, rxr->smoothed_interval = ena_com_get_nonadaptive_moderation_interval_rx(ena_dev); rxr->empty_rx_queue = 0; + rxr->rx_headroom = NET_SKB_PAD; adapter->ena_napi[i].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE; + rxr->xdp_ring = &adapter->tx_ring[i + adapter->num_io_queues]; } } } @@ -978,47 +975,65 @@ static void ena_free_all_io_rx_resources(struct ena_adapter *adapter) ena_free_rx_resources(adapter, i); } -static int ena_alloc_rx_page(struct ena_ring *rx_ring, - struct ena_rx_buffer *rx_info, gfp_t gfp) +struct page *ena_alloc_map_page(struct ena_ring *rx_ring, dma_addr_t *dma) { - int headroom = rx_ring->rx_headroom; - struct ena_com_buf *ena_buf; struct page *page; - dma_addr_t dma; - /* restore page offset value in case it has been changed by device */ - rx_info->page_offset = headroom; - - /* if previous allocated page is not used */ - if (unlikely(rx_info->page)) - return 0; - - page = alloc_page(gfp); - if (unlikely(!page)) { + /* This would allocate the page on the same NUMA node the executing code + * is running on. + */ + page = dev_alloc_page(); + if (!page) { ena_increase_stat(&rx_ring->rx_stats.page_alloc_fail, 1, &rx_ring->syncp); - return -ENOMEM; + return ERR_PTR(-ENOSPC); } /* To enable NIC-side port-mirroring, AKA SPAN port, * we make the buffer readable from the nic as well */ - dma = dma_map_page(rx_ring->dev, page, 0, ENA_PAGE_SIZE, - DMA_BIDIRECTIONAL); - if (unlikely(dma_mapping_error(rx_ring->dev, dma))) { + *dma = dma_map_page(rx_ring->dev, page, 0, ENA_PAGE_SIZE, + DMA_BIDIRECTIONAL); + if (unlikely(dma_mapping_error(rx_ring->dev, *dma))) { ena_increase_stat(&rx_ring->rx_stats.dma_mapping_err, 1, &rx_ring->syncp); - __free_page(page); - return -EIO; + return ERR_PTR(-EIO); } + + return page; +} + +static int ena_alloc_rx_buffer(struct ena_ring *rx_ring, + struct ena_rx_buffer *rx_info) +{ + int headroom = rx_ring->rx_headroom; + struct ena_com_buf *ena_buf; + struct page *page; + dma_addr_t dma; + int tailroom; + + /* restore page offset value in case it has been changed by device */ + rx_info->page_offset = headroom; + + /* if previous allocated page is not used */ + if (unlikely(rx_info->page)) + return 0; + + /* We handle DMA here */ + page = ena_alloc_map_page(rx_ring, &dma); + if (unlikely(IS_ERR(page))) + return PTR_ERR(page); + netif_dbg(rx_ring->adapter, rx_status, rx_ring->netdev, "Allocate page %p, rx_info %p\n", page, rx_info); + tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + rx_info->page = page; ena_buf = &rx_info->ena_buf; ena_buf->paddr = dma + headroom; - ena_buf->len = ENA_PAGE_SIZE - headroom; + ena_buf->len = ENA_PAGE_SIZE - headroom - tailroom; return 0; } @@ -1065,8 +1080,7 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num) rx_info = &rx_ring->rx_buffer_info[req_id]; - rc = ena_alloc_rx_page(rx_ring, rx_info, - GFP_ATOMIC | __GFP_COMP); + rc = ena_alloc_rx_buffer(rx_ring, rx_info); if (unlikely(rc < 0)) { netif_warn(rx_ring->adapter, rx_err, rx_ring->netdev, "Failed to allocate buffer for rx queue %d\n", @@ -1384,21 +1398,23 @@ static int ena_clean_tx_irq(struct ena_ring *tx_ring, u32 budget) return tx_pkts; } -static struct sk_buff *ena_alloc_skb(struct ena_ring *rx_ring, bool frags) +static struct sk_buff *ena_alloc_skb(struct ena_ring *rx_ring, void *first_frag) { struct sk_buff *skb; - if (frags) - skb = napi_get_frags(rx_ring->napi); - else + if (!first_frag) skb = netdev_alloc_skb_ip_align(rx_ring->netdev, rx_ring->rx_copybreak); + else + skb = build_skb(first_frag, ENA_PAGE_SIZE); if (unlikely(!skb)) { ena_increase_stat(&rx_ring->rx_stats.skb_alloc_fail, 1, &rx_ring->syncp); + netif_dbg(rx_ring->adapter, rx_err, rx_ring->netdev, - "Failed to allocate skb. frags: %d\n", frags); + "Failed to allocate skb. first_frag %s\n", + first_frag ? "provided" : "not provided"); return NULL; } @@ -1410,10 +1426,12 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring, u32 descs, u16 *next_to_clean) { - struct sk_buff *skb; struct ena_rx_buffer *rx_info; u16 len, req_id, buf = 0; - void *va; + struct sk_buff *skb; + void *page_addr; + u32 page_offset; + void *data_addr; len = ena_bufs[buf].len; req_id = ena_bufs[buf].req_id; @@ -1431,12 +1449,14 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring, rx_info, rx_info->page); /* save virt address of first buffer */ - va = page_address(rx_info->page) + rx_info->page_offset; + page_addr = page_address(rx_info->page); + page_offset = rx_info->page_offset; + data_addr = page_addr + page_offset; - prefetch(va); + prefetch(data_addr); if (len <= rx_ring->rx_copybreak) { - skb = ena_alloc_skb(rx_ring, false); + skb = ena_alloc_skb(rx_ring, NULL); if (unlikely(!skb)) return NULL; @@ -1449,7 +1469,7 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring, dma_unmap_addr(&rx_info->ena_buf, paddr), len, DMA_FROM_DEVICE); - skb_copy_to_linear_data(skb, va, len); + skb_copy_to_linear_data(skb, data_addr, len); dma_sync_single_for_device(rx_ring->dev, dma_unmap_addr(&rx_info->ena_buf, paddr), len, @@ -1463,16 +1483,18 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring, return skb; } - skb = ena_alloc_skb(rx_ring, true); + ena_unmap_rx_buff(rx_ring, rx_info); + + skb = ena_alloc_skb(rx_ring, page_addr); if (unlikely(!skb)) return NULL; - do { - ena_unmap_rx_buff(rx_ring, rx_info); - - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_info->page, - rx_info->page_offset, len, ENA_PAGE_SIZE); + /* Populate skb's linear part */ + skb_reserve(skb, page_offset); + skb_put(skb, len); + skb->protocol = eth_type_trans(skb, rx_ring->netdev); + do { netif_dbg(rx_ring->adapter, rx_status, rx_ring->netdev, "RX skb updated. len %d. data_len %d\n", skb->len, skb->data_len); @@ -1491,6 +1513,12 @@ static struct sk_buff *ena_rx_skb(struct ena_ring *rx_ring, req_id = ena_bufs[buf].req_id; rx_info = &rx_ring->rx_buffer_info[req_id]; + + ena_unmap_rx_buff(rx_ring, rx_info); + + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_info->page, + rx_info->page_offset, len, ENA_PAGE_SIZE); + } while (1); return skb; @@ -1703,14 +1731,12 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, skb_record_rx_queue(skb, rx_ring->qid); - if (rx_ring->ena_bufs[0].len <= rx_ring->rx_copybreak) { - total_len += rx_ring->ena_bufs[0].len; + if (rx_ring->ena_bufs[0].len <= rx_ring->rx_copybreak) rx_copybreak_pkt++; - napi_gro_receive(napi, skb); - } else { - total_len += skb->len; - napi_gro_frags(napi); - } + + total_len += skb->len; + + napi_gro_receive(napi, skb); res_budget--; } while (likely(res_budget)); @@ -1922,9 +1948,6 @@ static int ena_io_poll(struct napi_struct *napi, int budget) tx_ring = ena_napi->tx_ring; rx_ring = ena_napi->rx_ring; - tx_ring->first_interrupt = ena_napi->first_interrupt; - rx_ring->first_interrupt = ena_napi->first_interrupt; - tx_budget = tx_ring->ring_size / ENA_TX_POLL_BUDGET_DIVIDER; if (!test_bit(ENA_FLAG_DEV_UP, &tx_ring->adapter->flags) || @@ -1979,6 +2002,8 @@ static int ena_io_poll(struct napi_struct *napi, int budget) tx_ring->tx_stats.tx_poll++; u64_stats_update_end(&tx_ring->syncp); + tx_ring->tx_stats.last_napi_jiffies = jiffies; + return ret; } @@ -2003,7 +2028,8 @@ static irqreturn_t ena_intr_msix_io(int irq, void *data) { struct ena_napi *ena_napi = data; - ena_napi->first_interrupt = true; + /* Used to check HW health */ + WRITE_ONCE(ena_napi->first_interrupt, true); WRITE_ONCE(ena_napi->interrupts_masked, true); smp_wmb(); /* write interrupts_masked before calling napi */ @@ -3089,14 +3115,11 @@ static netdev_tx_t ena_start_xmit(struct sk_buff *skb, struct net_device *dev) } } - if (netif_xmit_stopped(txq) || !netdev_xmit_more()) { - /* trigger the dma engine. ena_com_write_sq_doorbell() - * has a mb + if (netif_xmit_stopped(txq) || !netdev_xmit_more()) + /* trigger the dma engine. ena_ring_tx_doorbell() + * calls a memory barrier inside it. */ - ena_com_write_sq_doorbell(tx_ring->ena_com_io_sq); - ena_increase_stat(&tx_ring->tx_stats.doorbells, 1, - &tx_ring->syncp); - } + ena_ring_tx_doorbell(tx_ring); return NETDEV_TX_OK; @@ -3346,7 +3369,7 @@ static int ena_set_queues_placement_policy(struct pci_dev *pdev, llq_feature_mask = 1 << ENA_ADMIN_LLQ; if (!(ena_dev->supported_features & llq_feature_mask)) { - dev_err(&pdev->dev, + dev_warn(&pdev->dev, "LLQ is not supported Fallback to host mode policy.\n"); ena_dev->tx_mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; return 0; @@ -3657,7 +3680,9 @@ static void ena_fw_reset_device(struct work_struct *work) static int check_for_rx_interrupt_queue(struct ena_adapter *adapter, struct ena_ring *rx_ring) { - if (likely(rx_ring->first_interrupt)) + struct ena_napi *ena_napi = container_of(rx_ring->napi, struct ena_napi, napi); + + if (likely(READ_ONCE(ena_napi->first_interrupt))) return 0; if (ena_com_cq_empty(rx_ring->ena_com_io_cq)) @@ -3681,6 +3706,10 @@ static int check_for_rx_interrupt_queue(struct ena_adapter *adapter, static int check_missing_comp_in_tx_queue(struct ena_adapter *adapter, struct ena_ring *tx_ring) { + struct ena_napi *ena_napi = container_of(tx_ring->napi, struct ena_napi, napi); + unsigned int time_since_last_napi; + unsigned int missing_tx_comp_to; + bool is_tx_comp_time_expired; struct ena_tx_buffer *tx_buf; unsigned long last_jiffies; u32 missed_tx = 0; @@ -3694,8 +3723,10 @@ static int check_missing_comp_in_tx_queue(struct ena_adapter *adapter, /* no pending Tx at this location */ continue; - if (unlikely(!tx_ring->first_interrupt && time_is_before_jiffies(last_jiffies + - 2 * adapter->missing_tx_completion_to))) { + is_tx_comp_time_expired = time_is_before_jiffies(last_jiffies + + 2 * adapter->missing_tx_completion_to); + + if (unlikely(!READ_ONCE(ena_napi->first_interrupt) && is_tx_comp_time_expired)) { /* If after graceful period interrupt is still not * received, we schedule a reset */ @@ -3708,12 +3739,17 @@ static int check_missing_comp_in_tx_queue(struct ena_adapter *adapter, return -EIO; } - if (unlikely(time_is_before_jiffies(last_jiffies + - adapter->missing_tx_completion_to))) { - if (!tx_buf->print_once) + is_tx_comp_time_expired = time_is_before_jiffies(last_jiffies + + adapter->missing_tx_completion_to); + + if (unlikely(is_tx_comp_time_expired)) { + if (!tx_buf->print_once) { + time_since_last_napi = jiffies_to_usecs(jiffies - tx_ring->tx_stats.last_napi_jiffies); + missing_tx_comp_to = jiffies_to_msecs(adapter->missing_tx_completion_to); netif_notice(adapter, tx_err, adapter->netdev, - "Found a Tx that wasn't completed on time, qid %d, index %d.\n", - tx_ring->qid, i); + "Found a Tx that wasn't completed on time, qid %d, index %d. %u usecs have passed since last napi execution. Missing Tx timeout value %u msecs\n", + tx_ring->qid, i, time_since_last_napi, missing_tx_comp_to); + } tx_buf->print_once = 1; missed_tx++; @@ -4244,7 +4280,7 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->ena_dev = ena_dev; adapter->netdev = netdev; adapter->pdev = pdev; - adapter->msg_enable = netif_msg_init(debug, DEFAULT_MSG_ENABLE); + adapter->msg_enable = DEFAULT_MSG_ENABLE; ena_dev->net_device = netdev; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 74af15d62ee1..0c39fc2fa345 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -55,12 +55,6 @@ #define ENA_TX_WAKEUP_THRESH (MAX_SKB_FRAGS + 2) #define ENA_DEFAULT_RX_COPYBREAK (256 - NET_IP_ALIGN) -/* limit the buffer size to 600 bytes to handle MTU changes from very - * small to very large, in which case the number of buffers per packet - * could exceed ENA_PKT_MAX_BUFS - */ -#define ENA_DEFAULT_MIN_RX_BUFF_ALLOC_SIZE 600 - #define ENA_MIN_MTU 128 #define ENA_NAME_MAX_LEN 20 @@ -135,12 +129,12 @@ struct ena_irq { }; struct ena_napi { - struct napi_struct napi ____cacheline_aligned; + u8 first_interrupt ____cacheline_aligned; + u8 interrupts_masked; + struct napi_struct napi; struct ena_ring *tx_ring; struct ena_ring *rx_ring; struct ena_ring *xdp_ring; - bool first_interrupt; - bool interrupts_masked; u32 qid; struct dim dim; }; @@ -212,6 +206,7 @@ struct ena_stats_tx { u64 llq_buffer_copy; u64 missed_tx; u64 unmask_interrupt; + u64 last_napi_jiffies; }; struct ena_stats_rx { @@ -259,6 +254,10 @@ struct ena_ring { struct bpf_prog *xdp_bpf_prog; struct xdp_rxq_info xdp_rxq; spinlock_t xdp_tx_lock; /* synchronize XDP TX/Redirect traffic */ + /* Used for rx queues only to point to the xdp tx ring, to + * which traffic should be redirected from this rx ring. + */ + struct ena_ring *xdp_ring; u16 next_to_use; u16 next_to_clean; @@ -271,7 +270,6 @@ struct ena_ring { /* The maximum header length the device can handle */ u8 tx_max_header_size; - bool first_interrupt; bool disable_meta_caching; u16 no_interrupt_event_cnt; @@ -414,11 +412,6 @@ enum ena_xdp_errors_t { ENA_XDP_NO_ENOUGH_QUEUES, }; -static inline bool ena_xdp_queues_present(struct ena_adapter *adapter) -{ - return adapter->xdp_first_ring != 0; -} - static inline bool ena_xdp_present(struct ena_adapter *adapter) { return !!adapter->xdp_bpf_prog; |