summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/broadcom/bnxt/bnxt.c
AgeCommit message (Collapse)Author
2024-06-05bnxt_en: fix atomic counter for ptp packetsVadim Fedorenko
atomic_dec_if_positive returns new value regardless if it is updated or not. The commit in fixes changed the behavior of the condition to one that differs from original code. Restore original condition to properly maintain atomic counter. Fixes: 165f87691a89 ("bnxt_en: add timestamping statistics support") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240604091939.785535-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-01bnxt_en: add timestamping statistics supportVadim Fedorenko
The ethtool_ts_stats structure was introduced earlier this year. Now it's time to support this group of counters in more drivers. This patch adds support to bnxt driver. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240530204751.99636-1-vadfed@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-03bnxt: fix bnxt_get_avail_msix() returning negative valuesDavid Wei
Current net-next/main does not boot for older chipsets e.g. Stratus. Sample dmesg: [ 11.368315] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Able to reserve only 0 out of 9 requested RX rings [ 11.390181] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Unable to reserve tx rings [ 11.438780] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): 2nd rings reservation failed. [ 11.487559] bnxt_en 0000:02:00.0 (unnamed net_device) (uninitialized): Not enough rings available. [ 11.506012] bnxt_en 0000:02:00.0: probe with driver bnxt_en failed with error -12 This is caused by bnxt_get_avail_msix() returning a negative value for these chipsets not using the new resource manager i.e. !BNXT_NEW_RM. This in turn causes hwr.cp in __bnxt_reserve_rings() to be set to 0. In the current call stack, __bnxt_reserve_rings() is called from bnxt_set_dflt_rings() before bnxt_init_int_mode(). Therefore, bp->total_irqs is always 0 and for !BNXT_NEW_RM bnxt_get_avail_msix() always returns a negative number. Historically, MSIX vectors were requested by the RoCE driver during run-time and bnxt_get_avail_msix() was used for this purpose. Today, RoCE MSIX vectors are statically allocated. bnxt_get_avail_msix() should only be called for the BNXT_NEW_RM() case to reserve the MSIX ahead of time for RoCE use. bnxt_get_avail_msix() is also be simplified to handle the BNXT_NEW_RM() case only. Fixes: d630624ebd70 ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240502203757.3761827-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Add VF PCI ID for 5760X (P7) chipsAjit Khaparde
No driver logic changes are required to support the VFs, so just add the VF PCI ID. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Optimize recovery path ULP locking in the driverKalesh AP
In the error recovery path (AER, firmware recovery, etc), the driver notifies the RoCE driver via ULP_STOP before the reset and via ULP_START after the reset, all under RTNL_LOCK. The RoCE driver can take a long time if there are a lot of QPs to destroy, so it is not ideal to hold the global RTNL lock. Rely on the new en_dev_lock mutex instead for ULP_STOP and ULP_START. For the most part, we move the ULP_STOP call before we take the RTNL lock and move the ULP_START after RTNL unlock. Note that SRIOV re-enablement must be done after ULP_START or RoCE on the VFs will not resume properly after reset. The one scenario in bnxt_hwrm_if_change() where the RTNL lock is already taken in the .ndo_open() context requires the ULP restart to be deferred to the bnxt_sp_task() workqueue. Reviewed-by: Selvin Thyparampil Xavier <selvin.xavier@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: Don't call ULP_STOP/ULP_START during L2 resetMichael Chan
There is no need to call ULP_STOP and ULP_START before and after the L2 reset in bnxt_reset_task(). This L2 reset is done after detecting TX timeout, RX ring errors, or VF config changes. The L2 reset does not affect RoCE since the firmware is not reset and the backing store is left alone. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-02bnxt_en: share NQ ring sw_stats memory with subringsEdwin Peer
On P5_PLUS chips and later, the NQ rings have subrings for RX and TX completions respectively. These subrings are passed to the poll function instead of the base NQ, but each ring carries its own copy of the software ring statistics. For stats to be conveniently accessible in __bnxt_poll_work(), the statistics memory should either be shared between the NQ and its subrings or the subrings need to be included in the ethtool stats aggregation logic. This patch opts for the former, because it's more efficient and less confusing having the software statistics for a ring exist in a single place. Before this patch, the counter will not be displayed if the "wrong" cpr->sw_stats was used to increment a counter. Link: https://lore.kernel.org/netdev/CACKFLikEhVAJA+osD7UjQNotdGte+fth7zOy7yDdLkTyFk9Pyw@mail.gmail.com/ Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240501003056.100607-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24eth: bnxt: fix counting packets discarded due to OOM and netpollJakub Kicinski
I added OOM and netpoll discard counters, naively assuming that the cpr pointer is pointing to a common completion ring. Turns out that is usually *a* completion ring but not *the* completion ring which bnapi->cp_ring points to. bnapi->cp_ring is where the stats are read from, so we end up reporting 0 thru ethtool -S and qstat even though the drop events have happened. Make 100% sure we're recording statistics in the correct structure. Fixes: 907fd4a294db ("bnxt: count discards due to memory allocation errors") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240424002148.3937059-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-22bnxt_en: Fix error recovery for 5760X (P7) chipsMichael Chan
During error recovery, such as AER fatal error slot reset, we call bnxt_try_map_fw_health_reg() to try to get access to the health register to determine the firmware state. Fix bnxt_try_map_fw_health_reg() to recognize the P7 chip correctly and set up the health register. This fixes this type of AER slot reset failure: bnxt_en 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Inaccessible, (Unregistered Agent ID) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI I/O error detected bnxt_en 0000:04:00.0 bnxt_re0: Handle device suspend call bnxt_en 0000:04:00.1 enp4s0f1np1: PCI I/O error detected bnxt_en 0000:04:00.1 bnxt_re1: Handle device suspend call pcieport 0000:00:02.0: AER: Root Port link has been reset (0) bnxt_en 0000:04:00.0 enp4s0f0np0: PCI Slot Reset bnxt_en 0000:04:00.0: enabling device (0000 -> 0002) bnxt_en 0000:04:00.0: Firmware not ready bnxt_en 0000:04:00.1 enp4s0f1np1: PCI Slot Reset bnxt_en 0000:04:00.1: enabling device (0000 -> 0002) bnxt_en 0000:04:00.1: Firmware not ready pcieport 0000:00:02.0: AER: device recovery failed Fixes: a432a45bdba4 ("bnxt_en: Define basic P7 macros") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22bnxt_en: Fix the PCI-AER routinesVikas Gupta
We do not support two simultaneous recoveries so check for reset flag, BNXT_STATE_IN_FW_RESET, and do not proceed with AER further. When the pci channel state is pci_channel_io_frozen, the PCIe link can not be trusted so we disable the traffic immediately and stop BAR access by calling bnxt_fw_fatal_close(). BAR access after AER fatal error can cause an NMI. Fixes: f75d9a0aa967 ("bnxt_en: Re-write PCI BARs after PCI fatal error.") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-22bnxt_en: refactor reset close codeVikas Gupta
Introduce bnxt_fw_fatal_close() API which can be used to stop data path and disable device when firmware is in fatal state. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/unix/garbage.c 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.c faa12ca24558 ("bnxt_en: Reset PTP tx_avail after possible firmware reset") b3d0083caf9a ("bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()") drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c 7ac10c7d728d ("bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()") 194fad5b2781 ("bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functions") drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 958f56e48385 ("net/mlx5e: Un-expose functions in en.h") 49e6c9387051 ("net/mlx5e: RSS, Block XOR hash with over 128 channels") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10bnxt_en: Update MODULE_DESCRIPTIONMichael Chan
Update MODULE_DESCRIPTION to the more generic adapter family name. The old name only includes the first generation of supported adapters. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10bnxt_en: Utilize ulp client resources if RoCE is not registeredVikas Gupta
If the RoCE driver is not registered for a RoCE capable device, add flexibility to use the RoCE resources (MSIX/NQs) for L2 purposes, such as additional rings configured by the user or for XDP. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10bnxt_en: Change MSIX/NQs allocation policyVikas Gupta
The existing scheme sets aside a number of MSIX/NQs for the RoCE driver whether the RoCE driver is registered or not. This scheme is not flexible and limits the resources available for the L2 rings if RoCE is never used. Modify the scheme so that the RoCE MSIX/NQs can be used by the L2 driver if they are not used for RoCE. The MSIX/NQs are now represented by 3 fields. bp->ulp_num_msix_want contains the desired default value, edev->ulp_num_msix_vec contains the available value (but not necessarily in use), and ulp_tbl->msix_requested contains the actual value in use by RoCE. The L2 driver can dip into edev->ulp_num_msix_vec if necessary. We need to add rtnl_lock() back in bnxt_register_dev() and bnxt_unregister_dev() to synchronize the MSIX usage between L2 and RoCE. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10bnxt_en: Refactor bnxt_rdma_aux_device_init/uninit functionsVikas Gupta
In its current form, bnxt_rdma_aux_device_init() not only initializes the necessary data structures of the newly created aux device but also adds the aux device into the aux bus subsytem. Refactor the logic into separate functions, first function to initialize the aux device along with the required resources and second, to actually add the device to the aux bus subsytem. This separation helps to create bnxt_en_dev much earlier and save its resources separately. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-10bnxt_en: Remove unneeded MSIX base structure fields and codeVikas Gupta
Ever since commit: 303432211324 ("bnxt_en: Remove runtime interrupt vector allocation") The MSIX base vector is effectively always 0. Remove all unneeded structure fields and code referencing the MSIX base. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240409215431.41424-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-08bnxt_en: Reset PTP tx_avail after possible firmware resetPavan Chebbi
It is possible that during error recovery and firmware reset, there is a pending TX PTP packet waiting for the timestamp. We need to reset this condition so that after recovery, the tx_avail count for PTP is reset back to the initial value. Otherwise, we may not accept any PTP TX timestamps after recovery. Fixes: 118612d519d8 ("bnxt_en: Add PTP clock APIs, ioctls, and ethtool methods") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-04bnxt_en: Add warning message about disallowed speed changeSreekanth Reddy
Some chips may not allow changing default speed when dual rate transceivers modules are used. Firmware on those chips will indicate the same to the driver. Add a warning message when speed change is not supported because a dual rate transceiver is detected by the NIC. Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-8-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Add XDP Metadata supportSomnath Kotur
- Change the last arg to xdp_prepare_buff to true from false. - Ensure that when XDP_PASS is returned the xdp->data_meta area is copied to the skb->data area. Account for the meta data size on skb allocation and do a pull after to move it to the "reserved" zone. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-6-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Change bnxt_rx_xdp function prototypeSomnath Kotur
Change bnxt_rx_xdp() to take a pointer to xdp instead of stack variable. This is in prepartion for the XDP metadata patch change where the BPF program can change the value of the xdp.meta_data. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-5-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Allocate page pool per numa nodeSomnath Kotur
Driver's Page Pool allocation code looks at the node local to the PCIe device to determine where to allocate memory. In scenarios where the core count per NUMA node is low (< default rings) it makes sense to exhaust page pool allocations on Node 0 first and then moving on to allocating page pools for the remaining rings from Node 1. With this patch, and the following configuration on the NIC $ ethtool -L ens1f0np0 combined 16 (core count/node = 12, first 12 rings on node#0, last 4 rings node#1) and traffic redirected to a ring on node#1 , we see a performance improvement of ~20% Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-4-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Enable XPS by default on driver loadSomnath Kotur
Enable XPS on default during NIC open. The choice of Tx queue is based on the CPU executing the thread that submits the Tx request. The pool of Tx queues will be spread evenly across both device-attached NUMA nodes(local) and remote NUMA nodes. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-3-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bnxt_en: Add delay to handle Downstream Port Containment (DPC) AERVikas Gupta
In case of DPC, after issuing the hot reset, the kernel waits for 100ms for the device to complete the reset. However on some older chips, the firmware may take up to 1 second to complete the reset, only after which the driver can restart the card. Introduce delay of 900ms to handle this scenario on the older chipsets. Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240402093753.331120-2-pavan.chebbi@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-29netlink: introduce type-checking attribute iterationJohannes Berg
There are, especially with multi-attr arrays, many cases of needing to iterate all attributes of a specific type in a netlink message or a nested attribute. Add specific macros to support that case. Also convert many instances using this spatch: @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) == ATTR) { ... -} } @@ iterator nla_for_each_attr; iterator name nla_for_each_attr_type; identifier nla; expression head, len, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_attr(nla, head, len, rem) +nla_for_each_attr_type(nla, ATTR, head, len, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } @@ identifier nla; iterator nla_for_each_nested; iterator name nla_for_each_nested_type; expression attr, rem; expression ATTR; type T; identifier x; @@ -nla_for_each_nested(nla, attr, rem) +nla_for_each_nested_type(nla, ATTR, attr, rem) { <... T x; ...> -if (nla_type(nla) != ATTR) continue; ... } Although I had to undo one bad change this made, and I also adjusted some other code for whitespace and to use direct variable initialization now. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20240328203144.b5a6c895fb80.I1869b44767379f204998ff44dd239803f39c23e0@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Support adding ntuple rules on RSS contextsPavan Chebbi
When the user wants to add an ntuple filter to an RSS context, select the appropriate VNIC belonging to the selected RSS context and add the VNIC destination rule. Make the necessary changes to bnxt_add_ntuple_cls_rule(). Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-13-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Refactor bnxt_cfg_rfs_ring_tbl_idx()Pavan Chebbi
Refactor bnxt_cfg_rfs_ring_tbl_idx() to pass in the filter structure pointer instead of the RX ring number. This will allow an ntuple filter to be set up for the non-default RSS contexts in the next patch. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-12-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Support RSS contexts in ethtool .{get|set}_rxfh()Pavan Chebbi
Support up to 32 RSS contexts per device if supported by the device. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-11-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Add a new_rss_ctx parameter to bnxt_rfs_capable()Pavan Chebbi
Modify bnxt_rfs_capable() to check that there are enough resources to support aRFS/ntuple filters for a new RSS context requested by the user. Existing use cases in the driver will always set the new parameter to false. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-9-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Simplify bnxt_rfs_capable()Michael Chan
bnxt_rfs_capable() determines the number of VNICs and RSS_CTXs required to support aRFS and then reserves the resources. We already have functions bnxt_get_total_vnics() and bnxt_get_total_rss_ctxs() to do that. Simplify the code by calling these functions. It is also more correct to do the resource reservation after bnxt_can_reserve_rings() returns true. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-8-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Refactor RSS indir alloc/set functionsPavan Chebbi
We will need to dynamically allocate and change indirection tables for additional RSS contexts. Add the rss_ctx pointer parameter to bnxt_alloc_rss_indir_tbl() and bnxt_set_dflt_rss_indir_tbl(). Existing usage will always pass rss_ctx as NULL which means the default RSS context. When supporting additional RSS contexts in subsequent patches, we'll pass the valid rss_ctx to these 2 functions. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-7-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Introduce rss ctx structure, alloc/free functionsPavan Chebbi
Add struct bnxt_rss_ctx, related storage lists, required defines, and its alloc/free functions. Later patches will use them in order to support multiple RSS contexts. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-6-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Refactor VNIC alloc and cfg functionsPavan Chebbi
The current VNIC structures are stored in an array bp->vnic_info[]. The index of the array (vnic_id) is passed to all the functions that need to reference the VNIC. This patch changes the scheme to pass the VNIC pointer instead of the vnic index. Subsequent patches will create additional VNICs that will not be stored in the bp->vnic_info[] array. Using the VNIC pointer will work for all the VNICs. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28bnxt_en: Add helper function bnxt_hwrm_vnic_rss_cfg_p5()Pavan Chebbi
This is a pure refactoring patch. The new function bnxt_hwrm_vnic_set_rss_p5() will set up the P5_PLUS specific RSS ring table and then call bnxt_hwrm_vnic_cfg() to setup the vnic for proper RSS operations. This new function will be used later for additional RSS contexts. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240325222902.220712-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07eth: bnxt: support per-queue statisticsJakub Kicinski
Support per-queue statistics API in bnxt. $ ethtool -S eth0 NIC statistics: [0]: rx_ucast_packets: 1418 [0]: rx_mcast_packets: 178 [0]: rx_bcast_packets: 0 [0]: rx_discards: 0 [0]: rx_errors: 0 [0]: rx_ucast_bytes: 1141815 [0]: rx_mcast_bytes: 16766 [0]: rx_bcast_bytes: 0 [0]: tx_ucast_packets: 1734 ... $ ./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' [{'ifindex': 2, 'queue-id': 0, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 1164931, 'rx-packets': 1641}, ... {'ifindex': 2, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 631494, 'tx-packets': 1771}, ... Reset the per queue counters: $ ethtool -L eth0 combined 4 Inspect again: $ ./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' [{'ifindex': 2, 'queue-id': 0, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 32397, 'rx-packets': 145}, ... {'ifindex': 2, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 37481, 'tx-packets': 196}, ... $ ethtool -S eth0 | head NIC statistics: [0]: rx_ucast_packets: 174 [0]: rx_mcast_packets: 3 [0]: rx_bcast_packets: 0 [0]: rx_discards: 0 [0]: rx_errors: 0 [0]: rx_ucast_bytes: 37151 [0]: rx_mcast_bytes: 267 [0]: rx_bcast_bytes: 0 [0]: tx_ucast_packets: 267 ... Totals are still correct: $ ./cli.py --spec netlink/specs/netdev.yaml --dump qstats-get [{'ifindex': 2, 'rx-alloc-fail': 0, 'rx-bytes': 281949995, 'rx-packets': 216524, 'tx-bytes': 52694905, 'tx-packets': 75546}] $ ip -s link show dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 14:23:f2:61:05:40 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped missed mcast 282519546 218100 0 0 0 516 TX: bytes packets errors dropped carrier collsns 53323054 77674 0 0 0 0 Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27bnxt_en: fix accessing vnic_info before allocating itAlexander Lobakin
bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration block, but allocates it much later. As a result, the following crash happens on my setup: BUG: kernel NULL pointer dereference, address: 0000000000000090 fbcon: Taking over console #PF: supervisor write access in kernel mode #PF: error_code (0x0002) - not-present page PGD 12f382067 P4D 0 Oops: 8002 [#1] PREEMPT SMP NOPTI CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ #49 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540 FS: 000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x3a6/0x400 ? exc_page_fault+0x7a/0x1b0 ? asm_exc_page_fault+0x26/8x30 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en] _bnxt_open_nic+0x198/0xa50 [bnxt_en] ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en] bnxt_open+0xeb/0x1b0 [bnxt_en] _dev_open+0x12e/0x1f0 _dev_change_flags+0xb0/0x200 dev_change_flags+0x25/0x60 do_setlink+0x463/0x1260 ? sock_def_readable+0x14/0xc0 ? rtnl_getlink+0x4b9/0x590 ? _nla_validate_parse+0x91/0xfa0 rtnl_newlink+0xbac/0xe40 <...> Don't create a variable and dereference the first array member directly since it's used only once in the code. Fixes: ef4ee64e9990 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index") Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240226144911.1297336-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22bnxt_en: Use the new VNIC to create ntuple filtersPavan Chebbi
The newly created vnic (BNXT_VNIC_NTUPLE) is ready to be used to create ntuple filters when supported by firmware. All RX rings can be used regardless of the RSS indirection setting on the default VNIC. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Create and setup the additional VNIC for adding ntuple filtersPavan Chebbi
Allocate and setup the additional VNIC for ntuple filters if this new method is supported by the firmware. Even though this VNIC is only used for ntuple filters with direct ring destinations, we still setup the RSS hash to be identical to the default VNIC so that each RX packet will have the correct hash in the RX completion. This VNIC is always at VNIC index BNXT_VNIC_NTUPLE. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Provision for an additional VNIC for ntuple filtersPavan Chebbi
On newer chips that support the ring table index method for ntuple filters, the current scheme of using the same VNIC for both RSS and ntuple filters will not work in all cases. An ntuple filter can only be directed to a destination ring if that destination ring is also in the RSS indirection table. To support ntuple filters with any arbitratry RSS indirection table that may only include a subset of the rings, we need to use a separate VNIC for ntuple filters. This patch provisions the additional VNIC. The next patch will allocate additional VNIC from firmware and set it up. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic indexPavan Chebbi
Replace hard coded 0 index with more meaningful BNXT_VNIC_DEFAULT. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Refactor bnxt_set_features()Pavan Chebbi
Refactor bnxt_set_features() function to have a common function to re-init. We'll need this to reinitialize when ntuple configuration changes. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Add bnxt_get_total_vnics() to calculate number of VNICsVenkat Duvvuru
Refactor the code by adding a new function to calculate the number of required VNICs. This is used in multiple places when reserving or checking resources. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Check additional resources in bnxt_check_rings()Michael Chan
bnxt_check_rings() is called to check if we have enough resource assets to satisfy the new number of ethtool channels. If the asset test fails, the ethtool operation will fail gracefully. Otherwise we will proceed and commit to use the new number of channels. If it fails to allocate any resources, the chip will fail to come up. For completeness, check all possible resources before committing to the new settings. Add the missing ring group and RSS context asset tests in bnxt_check_rings(). Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Improve RSS context reservation infrastructurePavan Chebbi
Add RSS context fields to struct bnxt_hw_rings and struct bnxt_hw_resc. With these, we can now specific the exact number of RSS contexts to reserve and store the reserved value. The original code relies on other resources to infer the number of RSS contexts to reserve and the reserved value is not stored. This improved infrastructure will make the RSS context accounting more complete and is needed by later patches. Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Explicitly specify P5 completion rings to reserveMichael Chan
The current code assumes that every RX ring group and every TX ring requires a completion ring on P5_PLUS chips. Now that we have the bnxt_hw_rings structure, add the cp_p5 field so that it can be explicitly specified. This makes the logic more clear. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22bnxt_en: Refactor ring reservation functionsMichael Chan
The current functions to reserve hardware rings pass in 6 different ring or resource types as parameters. Add a structure bnxt_hw_rings to consolidate all these parameters and pass the structure pointer instead to these functions. Add 2 related helper functions also. This makes the code cleaner and makes it easier to add new resources to be reserved. Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-09bnxt_en: Add RSS support for IPSEC headersAjit Khaparde
IPSec uses two distinct protocols, Authentication Header (AH) and Encapsulating Security Payload (ESP). Add support to configure RSS based on AH and ESP headers. This functionality will be enabled based on the capabilities indicated by the firmware in HWRM_VNIC_QCAPS. Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240205223202.25341-14-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-09bnxt_en: Invalidate user filters when neededPavan Chebbi
The cached user filters slated to be reapplied need to be cleared if configured MAC changes, RSS key changes, number of rings changes, or ntuple is disabled. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20240205223202.25341-13-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>