summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel
AgeCommit message (Collapse)Author
2021-10-19ice: use devm_kcalloc() instead of devm_kzalloc()Gustavo A. R. Silva
Use 2-factor multiplication argument form devm_kcalloc() instead of devm_kzalloc(). Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: Make use of the helper function devm_add_action_or_reset()Cai Huoqing
The helper function devm_add_action_or_reset() will internally call devm_add_action(), and if devm_add_action() fails then it will execute the action mentioned and return the error code. So use devm_add_action_or_reset() instead of devm_add_action() to simplify the error handling, reduce the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: Refactor PR ethtool opsWojciech Drewek
This patch improves a few things: - it fixes issue where ethtool -i reports that PR supports priv-flags and tests when in fact it does not support them - instead of using the same functions for both PF and PR ethtool ops, this patch introduces separate ops for both cases and internal functions with core logic. - prevent accessing VF VSI while VF is not ready by calling ice_check_vf_ready_for_cfg - all PR specific functions in ethtool.c were moved to one place in file - instead overwriting n_priv_flags in ice_repr_get_drvinfo, priv-flags code was moved from __ice_get_drvinfo to ice_get_drvinfo Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: Manage act flags for switchdev offloadsWojciech Drewek
Currently it is not possible to set/unset lb_en and lan_en flags for advanced rules during their creation. Both flags are enabled by default. In case of switchdev offloads for egress traffic we need lb_en to be disabled. Because of that, we work around it by updating the rule immediately after its creation. This change allows us to set/unset those flags right away and it gets rid of old workaround as well. Using ice_adv_rule_flags_info structure we can pass info about flags we want to be set for a given advanced rule. Flags are stored in flags_info.act. Values from act would be used only if act_valid was set to true, otherwise default values would be used. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: Forbid trusted VFs in switchdev modeWojciech Drewek
Merge issues caused the check for switchdev mode has been inserted in wrong place. It should be in ice_set_vf_trust not in ice_set_vf_mac. Trusted VFs are forbidden in switchdev mode because they should be configured only from the host side. Fixes: 1c54c839935b ("ice: enable/disable switchdev when managing VFs") Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: fix software generating extra interruptsJesse Brandeburg
The driver tried to work around missing completion events that occurred while interrupts are disabled, by triggering a software interrupt whenever we exit polling (but we had to have polled at least once). This was causing a *lot* of extra interrupts for some workloads like NVMe over TCP, which resulted in regressions in performance. It was also visible when polling didn't prevent interrupts when busy_poll was enabled. Fix the extra interrupts by utilizing our previously unused 3rd ITR (interrupt throttle) index and set it to 20K interrupts per second, and then trigger a software interrupt within that rate limit. While here, slightly refactor the code to avoid an overwrite of a local variable in the case of wb_en = true. Fixes: b7306b42beaf ("ice: manage interrupts during poll exit") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: fix rate limit update after coalesce changeJesse Brandeburg
If the adaptive settings are changed with ethtool -C ethx adaptive-rx off adaptive-tx off then the interrupt rate limit should be maintained as a user set value, but only if BOTH adaptive settings are off. Fix a bug where the rate limit that was being used in adaptive mode was staying set in the register but was not reported correctly by ethtool -c ethx. Due to long lines include a small refactor of q_vector variable. Fixes: b8b4772377dd ("ice: refactor interrupt moderation writes") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: update dim usage and moderationJesse Brandeburg
The driver was having trouble with unreliable latency when doing single threaded ping-pong tests. This was root caused to the DIM algorithm landing on a too slow interrupt value, which caused high latency, and it was especially present when queues were being switched frequently by the scheduler as happens on default setups today. In attempting to improve this, we allow the upper rate limit for interrupts to move to rate limit of 4 microseconds as a max, which means that no vector can generate more than 250,000 interrupts per second. The old config was up to 100,000. The driver previously tried to program the rate limit too frequently and if the receive and transmit side were both active on the same vector, the INTRL would be set incorrectly, and this change fixes that issue as a side effect of the redesign. This driver will operate from now on with a slightly changed DIM table with more emphasis towards latency sensitivity by having more table entries with lower latency than with high latency (high being >= 64 microseconds). The driver also resets the DIM algorithm state with a new stats set when there is no work done and the data becomes stale (older than 1 second), for the respective receive or transmit portion of the interrupt. Add a new helper for setting rate limit, which will be used more in a followup patch. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: Add support for VF rate limitingBrett Creeley
Implement ndo_set_vf_rate to support setting of min_tx_rate and max_tx_rate; set the appropriate bandwidth in the scheduler for the node representing the specified VF VSI. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19e1000e: Remove redundant statementluo penghao
This assignment statement is meaningless, because the statement will execute to the tag "set_itr_now". The clang_analyzer complains as follows: drivers/net/ethernet/intel/e1000e/netdev.c:2552:3 warning: Value stored to 'current_itr' is never read. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: luo penghao <luo.penghao@zte.com.cn> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18iavf: Combine init and watchdog state machinesMateusz Palczewski
Use single state machine for driver initialization and for service initialized driver. The init state machine implemented in init_task() is merged into the watchdog_task(). The init_task() function is removed. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-18iavf: Add __IAVF_INIT_FAILED stateMateusz Palczewski
This commit adds a new state, __IAVF_INIT_FAILED to the state machine. From now on initialization functions report errors not by returning an error value, but by changing the state to indicate that something went wrong. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-18iavf: Refactor iavf state machine trackingMateusz Palczewski
Replace state changes of iavf state machine with a method that also tracks the previous state the machine was on. This change is required for further work with refactoring init and watchdog state machines. Tracking of previous state would help us recover iavf after failure has occurred. Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-16ethernet: ixgb: use eth_hw_addr_set()Jakub Kicinski
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Read the address into an array on the stack, then call eth_hw_addr_set(). ixgb_get_ee_mac_addr() is used with a non-nevdev->dev_addr pointer so we can't deal with the problem inside it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15ice: make use of ice_for_each_* macrosMaciej Fijalkowski
Go through the code base and use ice_for_each_* macros. While at it, introduce ice_for_each_xdp_txq() macro that can be used for looping over xdp_rings array. Commit is not introducing any new functionality. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: introduce XDP_TX fallback pathMaciej Fijalkowski
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: optimize XDP_TX workloadsMaciej Fijalkowski
Optimize Tx descriptor cleaning for XDP. Current approach doesn't really scale and chokes when multiple flows are handled. Introduce two ring fields, @next_dd and @next_rs that will keep track of descriptor that should be looked at when the need for cleaning arise and the descriptor that should have the RS bit set, respectively. Note that at this point the threshold is a constant (32), but it is something that we could make configurable. First thing is to get away from setting RS bit on each descriptor. Let's do this only once NTU is higher than the currently @next_rs value. In such case, grab the tx_desc[next_rs], set the RS bit in descriptor and advance the @next_rs by a 32. Second thing is to clean the Tx ring only when there are less than 32 free entries. For that case, look up the tx_desc[next_dd] for a DD bit. This bit is written back by HW to let the driver know that xmit was successful. It will happen only for those descriptors that had RS bit set. Clean only 32 descriptors and advance the DD bit. Actual cleaning routine is moved from ice_napi_poll() down to the ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any SKBs in there that would rely on interrupts for the cleaning. Nice side effect is that for rare case of Tx fallback path (that next patch is going to introduce) we don't have to trigger the SW irq to clean the ring. With those two concepts, ring is kept at being almost full, but it is guaranteed that driver will be able to produce Tx descriptors. This approach seems to work out well even though the Tx descriptors are produced in one-by-one manner. Test was conducted with the ice HW bombarded with packets from HW generator, configured to generate 30 flows. Xdp2 sample yields the following results: <snip> proto 17: 79973066 pkt/s proto 17: 80018911 pkt/s proto 17: 80004654 pkt/s proto 17: 79992395 pkt/s proto 17: 79975162 pkt/s proto 17: 79955054 pkt/s proto 17: 79869168 pkt/s proto 17: 79823947 pkt/s proto 17: 79636971 pkt/s </snip> As that sample reports the Rx'ed frames, let's look at sar output. It says that what we Rx'ed we do actually Tx, no noticeable drops. Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32 with tx_busy staying calm. When compared to a state before: Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64 it can be observed that the amount of txpck/s is almost doubled, meaning that the performance is improved by around 90%. All of this due to the drops in the driver, previously the tx_busy stat was bumped at a 7mpps rate. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: propagate xdp_ring onto rx_ringMaciej Fijalkowski
With rings being split, it is now convenient to introduce a pointer to XDP ring within the Rx ring. For XDP_TX workloads this means that xdp_rings array access will be skipped, which was executed per each processed frame. Also, read the XDP prog once per NAPI and if prog is present, set up the local xdp_ring pointer. Reading prog a single time was discussed in [1] with some concern raised by Toke around dispatcher handling and having the need for going through the RCU grace period in the ndo_bpf driver callback, but ice currently is torning down NAPI instances regardless of the prog presence on VSI. Although the pointer to XDP ring introduced to Rx ring makes things a lot slimmer/simpler, I still feel that single prog read per NAPI lifetime is beneficial. Further patch that will introduce the fallback path will also get a profit from that as xdp_ring pointer will be set during the XDP rings setup. [1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/ Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: do not create xdp_frame on XDP_TXMaciej Fijalkowski
xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: unify xdp_rings accessesMaciej Fijalkowski
There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: split ice_ring onto Tx/Rx separate structsMaciej Fijalkowski
While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: move ice_container_type onto ice_ring_containerMaciej Fijalkowski
Currently ice_container_type is scoped only for ice_ethtool.c. Next commit that will split the ice_ring struct onto Rx/Tx specific ring structs is going to also modify the type of linked list of rings that is within ice_ring_container. Therefore, the functions that are taking the ice_ring_container as an input argument will need to be aware of a ring type that will be looked up. Embed ice_container_type within ice_ring_container and initialize it properly when allocating the q_vectors. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: remove ring_active from ice_ringMaciej Fijalkowski
This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14net: intel: igc_ptp: fix build for UMLRandy Dunlap
On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp conversion. The function that is used is not available on UML builds, so have the function use the default system_counterval_t timestamp instead for UML builds. Prevents this build error on UML: ../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’: ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration] return convert_art_ns_to_tsc(tstamp); ../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected return convert_art_ns_to_tsc(tstamp); Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: linux-um@lists.infradead.org Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ice: Print the api_patch as part of the fw.mgmt.apiBrett Creeley
Currently when a user uses "devlink dev info", the fw.mgmt.api will be the major.minor numbers as shown below: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7 <--- No patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 There are many features in the driver that depend on the major, minor, and patch version of the FW. Without the patch number in the output for fw.mgmt.api debugging issues related to the FW API version is difficult. Also, using major.minor.patch aligns with the existing firmware version which uses a 3 digit value. Fix this by making the fw.mgmt.api print the major.minor.patch versions. Shown below is the result: devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver ice serial_number 00-01-00-ff-ff-00-00-00 versions: fixed: board.id K91258-000 running: fw.mgmt 6.1.2 fw.mgmt.api 1.7.9 <--- patch number included fw.mgmt.build 0xd75e7d06 fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.app.name ICE OS Default Package fw.app 1.3.27.0 fw.app.bundle_id 0xc0000001 fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 stored: fw.mgmt.srev 5 fw.undi 1.2992.0 fw.undi.srev 5 fw.psid.api 3.10 fw.bundle_id 0x800085cc fw.netlist 3.10.2000-3.1e.0 fw.netlist.build 0x2a76e110 Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: fix getting UDP tunnel entryMichal Swiatkowski
Correct parameters order in call to ice_tunnel_idx_to_entry function. Entry in sparse port table is correct when the idx is 0. For idx 1 one correct entry should be skipped, for idx 2 two of them should be skipped etc. Change if condition to be true when idx is 0, which means that previous valid entry of this tunnel type were skipped. Fixes: b20e6c17c468 ("ice: convert to new udp_tunnel infrastructure") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Avoid crash from unnecessary IDA freeDave Ertman
In the remove path, there is an attempt to free the aux_idx IDA whether it was allocated or not. This can potentially cause a crash when unloading the driver on systems that do not initialize support for RDMA. But, this free cannot be gated by the status bit for RDMA, since it is allocated if the driver detects support for RDMA at probe time, but the driver can enter into a state where RDMA is not supported after the IDA has been allocated at probe time and this would lead to a memory leak. Initialize aux_idx to an invalid value and check for a valid value when unloading to determine if an IDA free is necessary. Fixes: d25a0fc41c1f9 ("ice: Initialize RDMA support") Reported-by: Jun Miao <jun.miao@windriver.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Fix failure to re-add LAN/RDMA Tx queuesBrett Creeley
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active the RDMA Tx queue scheduler node configuration will not be cleaned up. This will cause the rebuild/re-add of the VSI to fail due to the software structures not being correctly cleaned up for the VSI index. Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there are no RDMA scheduler nodes created, then there is no harm in calling ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if RDMA support is added for other VSI types they will also get this change. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ethernet: constify references to netdev->dev_addr in driversJakub Kicinski
This big patch sprinkles const on local variables and function arguments which may refer to netdev->dev_addr. Commit 406f42fa0d3c ("net-next: When a bond have a massive amount of VLANs...") introduced a rbtree for faster Ethernet address look up. To maintain netdev->dev_addr in this tree we need to make all the writes to it got through appropriate helpers. Some of the changes here are not strictly required - const is sometimes cast off but pointer is not used for writing. It seems like it's still better to add the const in case the code changes later or relevant -W flags get enabled for the build. No functional changes. Link: https://lore.kernel.org/r/20211014142432.449314-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-14ice: Implement support for SMA and U.FL on E810-TMaciej Machnikowski
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and allow controlling them. E810-T adapters are equipped with: - 2 external bidirectional SMA connectors - 1 internal TX U.FL - 1 internal RX U.FL U.FL connectors share signal lines with the SMA connectors. The TX U.FL1 share the line with the SMA1 and the RX U.FL2 share line with the SMA2. This dependence is controlled by the ice_verify_pin_e810t. Additionally add support for the E810-T-based devices which don't use the SMA/U.FL controller. If the IO expander is not detected don't expose pins and use 2 predefined 1PPS input and output pins. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Add support for SMA control multiplexerMaciej Machnikowski
E810-T adapters have two external bidirectional SMA connectors and two internal unidirectional U.FL connectors. Multiplexing between U.FL and SMA and SMA direction is controlled using the PCA9575 expander. Add support for the PCA9575 detection and control of the respective pins of the SMA/U.FL multiplexer using the GPIO AQ API. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Implement functions for reading and setting GPIO pinsMaciej Machnikowski
Implement ice_aq_get_gpio and ice_aq_set_gpio for reading and changing the state of GPIO pins described in the topology. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Refactor ice_aqc_link_topo_addrMaciej Machnikowski
Separate link topo parameters in struct ice_aqc_link_topo_addr into new struct ice_aqc_link_topo_params. This keeps input parameters for the get_link_topo command in a separate structure and is required by future commands that operate only on link topo params without the node handle. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-12ice: fix locking for Tx timestamp tracking flushJacob Keller
Commit 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") added a lock around the Tx timestamp tracker flow which is used to cleanup any left over SKBs and prepare for device removal. This lock is problematic because it is being held around a call to ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY write command to firmware. This could lead to a deadlock if the mutex actually sleeps, and causes the following warning on a kernel with preemption debugging enabled: [ 715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573 [ 715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod [ 715.435652] INFO: lockdep is turned off. [ 715.439591] Preemption disabled at: [ 715.439594] [<0000000000000000>] 0x0 [ 715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G W OE 5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c [ 715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020 [ 715.468483] Call Trace: [ 715.470940] dump_stack_lvl+0x6a/0x9a [ 715.474613] ___might_sleep.cold+0x224/0x26a [ 715.478895] __mutex_lock+0xb3/0x1440 [ 715.482569] ? stack_depot_save+0x378/0x500 [ 715.486763] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.494979] ? kfree+0xc1/0x520 [ 715.498128] ? mutex_lock_io_nested+0x12a0/0x12a0 [ 715.502837] ? kasan_set_free_info+0x20/0x30 [ 715.507110] ? __kasan_slab_free+0x10b/0x140 [ 715.511385] ? slab_free_freelist_hook+0xc7/0x220 [ 715.516092] ? kfree+0xc1/0x520 [ 715.519235] ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.527359] ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.535133] ? pci_device_remove+0xab/0x1d0 [ 715.539318] ? __device_release_driver+0x35b/0x690 [ 715.544110] ? driver_detach+0x214/0x2f0 [ 715.548035] ? bus_remove_driver+0x11d/0x2f0 [ 715.552309] ? pci_unregister_driver+0x26/0x250 [ 715.556840] ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.564799] ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0 [ 715.570554] ? do_syscall_64+0x3b/0x90 [ 715.574303] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [ 715.579529] ? start_flush_work+0x542/0x8f0 [ 715.583719] ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.591923] ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.599960] ? wait_for_completion_io+0x250/0x250 [ 715.604662] ? lock_acquire+0x196/0x200 [ 715.608504] ? do_raw_spin_trylock+0xa5/0x160 [ 715.612864] ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.620813] ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.628497] ? __debug_check_no_obj_freed+0x1e8/0x3c0 [ 715.633550] ? trace_hardirqs_on+0x1c/0x130 [ 715.637748] ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.646220] ? do_raw_spin_trylock+0xa5/0x160 [ 715.650581] ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.658797] ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.667013] ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.675403] ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.683440] ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.691037] ? _raw_spin_unlock_irqrestore+0x46/0x73 [ 715.696005] pci_device_remove+0xab/0x1d0 [ 715.700018] __device_release_driver+0x35b/0x690 [ 715.704637] driver_detach+0x214/0x2f0 [ 715.708389] bus_remove_driver+0x11d/0x2f0 [ 715.712489] pci_unregister_driver+0x26/0x250 [ 715.716857] ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d] [ 715.724637] __do_sys_delete_module.constprop.0+0x2d8/0x4e0 [ 715.730210] ? free_module+0x6d0/0x6d0 [ 715.733963] ? task_work_run+0xe1/0x170 [ 715.737803] ? exit_to_user_mode_loop+0x17f/0x1d0 [ 715.742509] ? rcu_read_lock_sched_held+0x12/0x80 [ 715.747215] ? trace_hardirqs_on+0x1c/0x130 [ 715.751401] do_syscall_64+0x3b/0x90 [ 715.754981] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 715.760033] RIP: 0033:0x7f4dfe59000b [ 715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48 [ 715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b [ 715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918 [ 715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940 [ 715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0 Notice that this is the only case where we use the lock in this way. In the cleanup kthread and work kthread the lock is only taken around the bit accesses. This was done intentionally to avoid this kind of issue. The way the lock is used, we only protect ordering of bit sets vs bit clears. The Tx writers in the hot path don't need to be protected against the entire kthread loop. The Tx queues threads only need to ensure that they do not re-use an index that is currently in use. The cleanup loop does not need to block all new set bits, since it will re-queue itself if new timestamps are present. Fix the tracker flow so that it uses the same flow as the standard cleanup thread. In addition, ensure the in_use bitmap actually gets cleared properly. This fixes the warning and also avoids the potential deadlock that might have occurred otherwise. Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-11ice: ndo_setup_tc implementation for PRMichal Swiatkowski
Add tc-flower support for VF port representor devices. Implement ndo_setup_tc callback for TC HW offload on VF port representors devices. Implemented both methods: add and delete tc-flower flows. Mark NETIF_F_HW_TC bit in net device's feature set to enable offload TC infrastructure for port representor. Implement TC filters replay function required to restore filters settings while switchdev configuration is rebuilt. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: ndo_setup_tc implementation for PFKiran Patil
Implement ndo_setup_tc net device callback for TC HW offload on PF device. ndo_setup_tc provides support for HW offloading various TC filters. Add support for configuring the following filter with tc-flower: - default L2 filters (src/dst mac addresses, ethertype, VLAN) - variations of L3, L3+L4, L2+L3+L4 filters using advanced filters (including ipv4 and ipv6 addresses). Allow for adding/removing TC flows when PF device is configured in eswitch switchdev mode. Two types of actions are supported at the moment: FLOW_ACTION_DROP and FLOW_ACTION_REDIRECT. Co-developed-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com> Signed-off-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: Allow changing lan_en and lb_en on all kinds of filtersMichal Swiatkowski
There is no way to change default lan_en and lb_en flags while adding new rule. Add function that allows changing these flags on rule determined by rule id and recipe id. Function checks if the rule is presented on regular rules list or advance rules list and call the appropriate function to update rule entry. As rules with ICE_SW_LKUP_DFLT recipe aren't tracked in a list, implement function which updates flags without searching for rules based only on rule id. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: cleanup rules infoVictor Raj
Change ICE_SW_LKUP_LAST to ICE_MAX_NUM_RECIPES as for now there also can be recipes other than the default. Free all structures created for advanced recipes in cleanup function. Write a function to clean allocated structures on advanced rule info. Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: allow deleting advanced rulesShivanshu Shukla
To remove advanced rule the same protocols list like in adding should be send to function. Based on this information list of advanced rules is searched to find the correct rule id. Remove advanced rule if it forwards to only one VSI. If it forwards to list of VSI remove only input VSI from this list. Introduce function to remove rule by id. It is used in case rule needs to be removed even if it forwards to the list of VSI. Allow removing all advanced rules from a particular VSI. It is useful in rebuilding VSI path. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Shivanshu Shukla <shivanshu.shukla@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: allow adding advanced rulesGrishma Kotecha
Define dummy packet headers to allow adding advanced rules in HW. This header is used as admin queue command parameter for adding a rule. The firmware will extract correct fields and will use them in look ups. Define each supported packets header and offsets to words used in recipe. Supported headers: - MAC + IPv4 + UDP - MAC + VLAN + IPv4 + UDP - MAC + IPv4 + TCP - MAC + VLAN + IPv4 + TCP - MAC + IPv6 + UDP - MAC + VLAN + IPv6 + UDP - MAC + IPv6 + TCP - MAC + VLAN + IPv6 + TCP Add code for creating an advanced rule. Rule needs to match defined dummy packet, if not return error, which means that this type of rule isn't currently supported. The first step in adding advanced rule is searching for an advanced recipe matching this kind of rule. If it doesn't exist new recipe is created. Dummy packet has to be filled with the correct header field value from the rule definition. It will be used to do look up in HW. Support searching for existing advance rule entry. It is used in case of adding the same rule on different VSI. In this case, instead of creating new rule, the existing one should be updated with refreshed VSI list. Add initialization for prof_res_bm_init flag to zero so that the possible resource for fv in the files can be initialized. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: create advanced switch recipeDan Nowlin
These changes introduce code for creating advanced recipes for the switch in hardware. There are a couple of recipes already defined in the HW. They apply to matching on basic protocol headers, like MAC, VLAN, MACVLAN, ethertype or direction (promiscuous), etc.. If the user wants to match on other protocol headers (eg. ip address, src/dst port etc.) or different variation of already supported protocols, there is a need to create new, more complex recipe. That new recipe is referred as 'advanced recipe', and the filtering rule created on top of that recipe is called 'advanced rule'. One recipe can have up to 5 words, but the first word is always reserved for match on switch id, so the driver can define up to 4 words for one recipe. To support recipes with more words up to 5 recipes can be chained, so 20 words can be programmed for look up. Input for adding recipe function is a list of protocols to support. Based on this list correct profile is being chosen. Correct profile means that it contains all protocol types from a list. Each profile have up to 48 field vector words and each of this word have protocol id and offset. These two fields need to match with input data for adding recipe function. If the correct profile can't be found the function returns an error. The next step after finding the correct profile is grouping words into groups. One group can have up to 4 words. This is done to simplify sending recipes to HW (because recipe also can have up to 4 words). In case of chaining (so when look up consists of more than 4 words) last recipe will always have results from the previous recipes used as words. A recipe to profile map is used to store information about which profile is associate with this recipe. This map is an array of 64 elements (max number of recipes) and each element is a 256 bits bitmap (max number of profiles) Profile to recipe map is used to store information about which recipe is associate with this profile. This map is an array of 256 elements (max number of profiles) and each element is a 64 bits bitmap (max number of recipes) Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: manage profiles and field vectorsDan Nowlin
Implement functions to manage profiles and field vectors in hardware. In hardware, there are up to 256 profiles and each of these profiles can have 48 field vector words. Each field vector word is described by protocol id and offset in the packet. To add a new recipe all used profiles need to be searched. If the profile contains all required protocol ids and offsets from the recipe it can be used. The driver has to add this profile to recipe association to tell hardware that newly added recipe is going to be associated with this profile. The amount of used profiles depend on the package. To avoid searching across not used profile, max profile id value is calculated at init flow. The profile is considered as unused when all field vector words in the profile are invalid (protocol id 0xff and offset 0x1ff). Profiles are read from the package section ICE_SID_FLD_VEC_SW. Empty field vector words can be used for recipe results. Store all unused field vector words in prof_res_bm. It is a 256 elements array (max number of profiles) each element is a 48 bit bitmap (max number of field vector words). For now, support only non-tunnel profiles type. Co-developed-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-11ice: implement low level recipes functionsGrishma Kotecha
Add code to manage recipes and profiles on admin queue layer. Allow the driver to add a new recipe and update an existing one. Get a recipe and get a recipe to profile association is mostly used in update existing recipes code. Only default recipes can be updated. An update is done by reading recipes from HW, changing their params and calling add recipe command. Support following admin queue commands: - ice_aqc_opc_add_recipe (0x0290) - create a recipe with protocol header information and other details that determine how this recipe filter works - ice_aqc_opc_recipe_to_profile (0x0291) - associate a switch recipe to a profile - ice_aqc_opc_get_recipe (0x0292) - get details of an existing recipe - ice_aqc_opc_get_recipe_to_profile (0x0293) - get a recipe associated with profile ID Define ICE_AQC_RES_TYPE_RECIPE resource type to hold a switch recipe. It is needed when a new switch recipe needs to be created. Co-developed-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Dan Nowlin <dan.nowlin@intel.com> Signed-off-by: Grishma Kotecha <grishma.kotecha@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-08Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-07 Michal Swiatkowski says: The following patch series introduces basic switchdev model support in ice driver. Implement the following blocks of switchdev framework: - VF port representors creation - control plane VSI definition - exception path (a. k. a. "slow-path") - to allow a virtual switch or linux bridge to receive any packet that doesn't match any hw filter - link state management of virtual ports - query virtual port statistics Hardware offload support in switchdev mode is out of scope of this patchset. Devlink interface is used to toggle between switchdev and legacy (the default) modes of the driver. --- Note: This series includes the use enum ice_status, however, we have patches in our queue to remove it from the driver [1]. We are working through the patches that precede the removal series. [1] https://patchwork.ozlabs.org/project/intel-wired-lan/list/?series=265957 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-07ice: add port representor ethtool ops and statsWojciech Drewek
Introduce the following ethtool operations for VF's representor: -get_drvinfo -get_strings -get_ethtool_stats -get_sset_count -get_link In all cases, existing operations were used with minor changes which allow us to detect if ethtool op was called for representor. Only VF VSI stats will be available for representor. Implement ndo_get_stats64 for port representor. This will update VF VSI stats and read them. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: switchdev slow pathGrzegorz Nitka
Slow path means allowing packet to go from uplink to representor and from representor to correct VF on Rx site and from VF to representor and to uplink on Tx site. To accomplish this driver, has to set correct Tx descriptor. When packet is sent from representor to VF, destination should be set to VF VSI. When packet is sent from uplink port destination should be uplink to bypass switch infrastructure and send packet outside. On Rx site driver should check source VSI field from Rx descriptor and based on that forward packed to correct netdev. To allow this there is a target netdevs table in control plane VSI struct. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: rebuild switchdev when resetting all VFsGrzegorz Nitka
As resetting all VFs behaves mostly like creating new VFs also eswitch infrastructure has to be recreated. The easiest way to do that is to rebuild eswitch after resetting VFs. Implement helper functions to start and stop all representors queues. This is used to disable traffic on port representors. In rebuild path: - NAPI has to be disabled - eswitch environment has to be set up - new port representors have to be created, because the old one had pointer to not existing VFs - new control plane VSI ring should be remapped - NAPI hast to be enabled - rxdid has to be set to FLEX_NIC_2, because this descriptor id support source_vsi, which is needed on control plane VSI queues - port representors queues have to be started Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: enable/disable switchdev when managing VFsGrzegorz Nitka
Only way to enable switchdev is to create VFs when the eswitch mode is set to switchdev. Check if correct mode is set and enable switchdev in function which creating VFs. Disable switchdev when user change number of VFs to 0. Changing eswitch mode back to legacy when VFs are created in switchdev mode isn't allowed. As switchdev takes care of managing filter rules, adding new rules on VF is blocked. In case of resetting VF driver has to update pointer in ice_repr struct, because after reset VSI related things can change. Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>