summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)Author
2023-02-01ice: Add support for XDP multi-buffer on Rx sideMaciej Fijalkowski
Ice driver needs to be a bit reworked on Rx data path in order to support multi-buffer XDP. For skb path, it currently works in a way that Rx ring carries pointer to skb so if driver didn't manage to combine fragmented frame at current NAPI instance, it can restore the state on next instance and keep looking for last fragment (so descriptor with EOP bit set). What needs to be achieved is that xdp_buff needs to be combined in such way (linear + frags part) in the first place. Then skb will be ready to go in case of XDP_PASS or BPF program being not present on interface. If BPF program is there, it would work on multi-buffer XDP. At this point xdp_buff resides directly on Rx ring, so given the fact that skb will be built straight from xdp_buff, there will be no further need to carry skb on Rx ring. Besides removing skb pointer from Rx ring, lots of members have been moved around within ice_rx_ring. First and foremost reason was to place rx_buf with xdp_buff on the same cacheline. This means that once we touch rx_buf (which is a preceding step before touching xdp_buff), xdp_buff will already be hot in cache. Second thing was that xdp_rxq is used rather rarely and it occupies a separate cacheline, so maybe it is better to have it at the end of ice_rx_ring. Other change that affects ice_rx_ring is the introduction of ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate rx_buf->act to all the parts of current xdp_buff after running XDP program, so that ice_put_rx_buf() that got moved out of the main Rx processing loop will be able to tak an appriopriate action on each buffer. Second is for ice_construct_skb(). ice_construct_skb() has a copybreak mechanism which had an explicit impact on xdp_buff->skb conversion in the new approach when legacy Rx flag is toggled. It works in a way that linear part is 256 bytes long, if frame is bigger than that, remaining bytes are going as a frag to skb_shared_info. This means while memcpying frags from xdp_buff to newly allocated skb, care needs to be taken when picking the destination frag array entry. Upon the time ice_construct_skb() is called, when dealing with fragmented frame, current rx_buf points to the *last* fragment, but copybreak needs to be done against the first one. That's where ice_rx_ring::first_desc helps. When frame building spans across NAPI polls (DD bit is not set on current descriptor and xdp->data is not NULL) with current Rx buffer handling state there might be some problems. Since calls to ice_put_rx_buf() were pulled out of the main Rx processing loop and were scoped from cached_ntc to current ntc, remember that now mentioned function relies on rx_buf->act, which is set within ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so currently we could put Rx buffer with rx_buf->act being *uninitialized*. To address this, change scoping to rely on first_desc on both boundaries instead. This also implies that cleaned_count which is used as an input to ice_alloc_rx_buffers() and tells how many new buffers should be refilled has to be adjusted. If it stayed as is, what could happen is a case where ntc would go over ntu. Therefore, remove cleaned_count altogether and use against allocing routine newly introduced ICE_RX_DESC_UNUSED() macro which is an equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on struct ice_rx_ring::first_desc instead of next_to_clean. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-11-maciej.fijalkowski@intel.com
2023-02-01ice: Use xdp->frame_sz instead of recalculating truesizeMaciej Fijalkowski
SKB path calculates truesize on three different functions, which could be avoided as xdp_buff carries the already calculated truesize under xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the xdp_buff as an input just like functions responsible for creating sk_buff initially, codebase could be simplified by removing these redundant recalculations and rely on xdp_buff::frame_sz instead. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-10-maciej.fijalkowski@intel.com
2023-02-01ice: Do not call ice_finalize_xdp_rx() unnecessarilyMaciej Fijalkowski
Currently ice_finalize_xdp_rx() is called only when xdp_prog is present on VSI, which is a good thing. However, this optimization can be enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in current Rx processing. Non-zero value of @xdp_xmit indicates that xdp_prog is present on VSI. This way XDP_DROP-based workloads will not suffer from unnecessary calls to external function. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-9-maciej.fijalkowski@intel.com
2023-02-01ice: Use ice_max_xdp_frame_size() in ice_xdp_setup_prog()Maciej Fijalkowski
This should have been used in there from day 1, let us address that before introducing XDP multi-buffer support for Rx side. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-8-maciej.fijalkowski@intel.com
2023-02-01ice: Centrallize Rx buffer recyclingMaciej Fijalkowski
Currently calls to ice_put_rx_buf() are sprinkled through ice_clean_rx_irq() - first place is for explicit flow director's descriptor handling, second is after running XDP prog and the last one is after taking care of skb. 1st callsite was actually only for ntc bump purpose, as Rx buffer to be recycled is not even passed to a function. It is possible to walk through Rx buffers processed in particular NAPI cycle by caching ntc from beginning of the ice_clean_rx_irq(). To do so, let us store XDP verdict inside ice_rx_buf, so action we need to take on will be known. For XDP prog absence, just store ICE_XDP_PASS as a verdict. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-7-maciej.fijalkowski@intel.com
2023-02-01ice: Inline eop checkMaciej Fijalkowski
This might be in future used by ZC driver and might potentially yield a minor performance boost. While at it, constify arguments that ice_is_non_eop() takes, since they are pointers and this will help compiler while generating asm. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-6-maciej.fijalkowski@intel.com
2023-02-01ice: Pull out next_to_clean bump out of ice_put_rx_buf()Maciej Fijalkowski
Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so in order to keep the ability of walking through HW Rx descriptors, pull out next_to_clean handling out of ice_put_rx_buf(). Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-5-maciej.fijalkowski@intel.com
2023-02-01ice: Store page count inside ice_rx_bufMaciej Fijalkowski
This will allow us to avoid carrying additional auxiliary array of page counts when dealing with XDP multi buffer support. Previously combining fragmented frame to skb was not affected in the same way as XDP would be as whole frame is needed to be in place before executing XDP prog. Therefore, when going through HW Rx descriptors one-by-one, calls to ice_put_rx_buf() need to be taken *after* running XDP prog on a potentially multi buffered frame, so some additional storage of page count is needed. By adding page count to rx buf, it will make it easier to walk through processed entries at the end of rx cleaning routine and decide whether or not buffers should be recycled. While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was proven many times that calculations on variables smaller than standard register size are harmful. This was also the case during experiments with embedding page count to ice_rx_buf - when this was added as u16 it had a performance impact. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-4-maciej.fijalkowski@intel.com
2023-02-01ice: Add xdp_buff to ice_rx_ring structMaciej Fijalkowski
In preparation for XDP multi-buffer support, let's store xdp_buff on Rx ring struct. This will allow us to combine fragmented frames across separate NAPI cycles in the same way as currently skb fragments are handled. This means that skb pointer on Rx ring will become redundant and will be removed. For now it is kept and layout of Rx ring struct was not inspected, some member movement will be needed later on so that will be the time to take care of it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-3-maciej.fijalkowski@intel.com
2023-02-01ice: Prepare legacy-rx for upcoming XDP multi-buffer supportMaciej Fijalkowski
Rx path is going to be modified in a way that fragmented frame will be gathered within xdp_buff in the first place. This approach implies that underlying buffer has to provide tailroom for skb_shared_info. This is currently the case when ring uses build_skb but not when legacy-rx knob is turned on. This case configures 2k Rx buffers and has no way to provide either headroom or tailroom - FWIW it currently has XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx buffers were used so driver in this setting was able to support 9k MTU as it can chain up to 5 Rx buffers. With offset configuring HW writing 2k of a data was passing the half of the page which broke the assumption of our internal page recycling tricks. Now if above got fixed and legacy-rx path would be left as is, when referring to skb_shared_info via xdp_get_shared_info_from_buff(), packet's content would be corrupted again. Hence size of Rx buffer needs to be lowered and therefore supported MTU. This operation will allow us to keep the unified data path and with 8k MTU users (if any of legacy-rx) would still be good to go. However, tendency is to drop the support for this code path at some point. Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320) as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx buffers, not 2k. Since headroom support is removed, disable data_meta support on legacy-rx. When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when deciding whether to support data_meta or not. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Link: https://lore.kernel.org/bpf/20230131204506.219292-2-maciej.fijalkowski@intel.com
2023-01-30ice: Remove redundant pci_enable_pcie_error_reporting()Bjorn Helgaas
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration. Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path. Note that this doesn't control interrupt generation by the Root Port; that is controlled by the AER Root Error Command register, which is managed by the AER service driver. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Tony Nguyen <anthony.l.nguyen@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-30devlink: remove devlink featuresJiri Pirko
Devlink features were introduced to disallow devlink reload calls of userspace before the devlink was fully initialized. The reason for this workaround was the fact that devlink reload was originally called without devlink instance lock held. However, with recent changes that converted devlink reload to be performed under devlink instance lock, this is redundant so remove devlink features entirely. Note that mlx5 used this to enable devlink reload conditionally only when device didn't act as multi port slave. Move the multi port check into mlx5_devlink_reload_down() callback alongside with the other checks preventing the device from reload in certain states. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/ethernet/intel/ice/ice_main.c 418e53401e47 ("ice: move devlink port creation/deletion") 643ef23bd9dd ("ice: Introduce local var for readability") https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/ https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/ drivers/net/ethernet/engleder/tsnep_main.c 3d53aaef4332 ("tsnep: Fix TX queue stop/wake for multiple queues") 25faa6a4c5ca ("tsnep: Replace TX spin_lock with __netif_tx_lock") https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/ net/netfilter/nf_conntrack_proto_sctp.c 13bd9b31a969 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"") a44b7651489f ("netfilter: conntrack: unify established states for SCTP paths") f71cb8f45d09 ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets") https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/ https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27ice: Prevent set_channel from changing queues while RDMA activeDave Ertman
The PF controls the set of queues that the RDMA auxiliary_driver requests resources from. The set_channel command will alter that pool and trigger a reconfiguration of the VSI, which breaks RDMA functionality. Prevent set_channel from executing when RDMA driver bound to auxiliary device. Adding a locked variable to pass down the call chain to avoid double locking the device_lock. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-27ice: remove pointless calls to devlink_param_driverinit_value_set()Jiri Pirko
devlink_param_driverinit_value_set() call makes sense only for "driverinit" params. However here, both params are "runtime". devlink_param_driverinit_value_set() returns -EOPNOTSUPP in such case and does not do anything. So remove the pointless calls to devlink_param_driverinit_value_set() entirely. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-24ice: move devlink port creation/deletionPaul M Stillwell Jr
Commit a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") moved the code to create and destroy the devlink PF port. This was fine, but created a corner case issue in the case of ice_register_netdev() failing. In that case, the driver would end up calling ice_devlink_destroy_pf_port() twice. Additionally, it makes no sense to tie creation of the devlink PF port to the creation of the netdev so separate out the code to create/destroy the devlink PF port from the netdev code. This makes it a cleaner interface. Fixes: a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20ice: use GNSS subsystem instead of TTYArkadiusz Kubalewski
Previously support for GNSS was implemented as a TTY driver, it allowed to access GNSS receiver on /dev/ttyGNSS_<bus><func>. Use generic GNSS subsystem API instead of implementing own TTY driver. The receiver is accessible on /dev/gnss<id>. In case of multiple receivers in the OS, correct device can be found by enumerating either: - /sys/class/net/<eth port>/device/gnss/ - /sys/class/gnss/gnss<id>/device/ Using GNSS subsystem is superior to implementing own TTY driver, as the GNSS subsystem was designed solely for this purpose. It also implements TTY driver but in a common and defined way. From user perspective, there is no difference in communicating with a device, except new path to the device shall be used. The device will provide same information to the userspace as the old one, and can be used in the same way, i.e.: old # gpsmon /dev/ttyGNSS_2100_0 new # gpsmon /dev/gnss0 There is no other impact on userspace tools. User expecting onboard GNSS receiver support is required to enable CONFIG_GNSS=y/m in kernel config. Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-19ice: Remove excess spaceTony Nguyen
smatch reports inconsistent indenting due to an extra space; remove it to resolve the issue. smatch warnings: drivers/net/ethernet/intel/ice/ice_lib.c:1673 ice_vsi_alloc_ring_stats() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Introduce local var for readabilityTony Nguyen
Based on previous feedback[1], introduce a local var to make things more readable. [1] https://lore.kernel.org/netdev/20220315203218.607f612b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
2023-01-19ice: Match parameter name for ice_cfg_phy_fc()Tony Nguyen
The parameter name in the function declaration and definition do not match; adjust the naming for consistency and to avoid confusion. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Explicitly return 0Tony Nguyen
Previous checks, and goto, will catch all errors meaning these returns will only return 0; explicitly return 0 for these cases. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
2023-01-19ice: Reduce scope of variablesTony Nguyen
There are some places where the scope of a variable can be reduced, so do that. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
2023-01-19ice: Move support DDP code out of ice_flex_pipe.cSergey Temerkhanov
Currently, ice_flex_pipe.c includes the DDP loading functions and has grown large. Although flexible processing support code is related to DDP loading, these parts are distinct. Move the DDP loading functionality from ice_flex_pipe.c to a separate file. Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Remove cppcheck suppressionsTony Nguyen
The use of suppressions for cppcheck in the kernel does not look to be standard as the ice driver is the only one doing it. Remove the comments/suppressions. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: combine cases in ice_ksettings_find_adv_link_speed()Przemek Kitszel
Combine if statements setting the same link speed together. Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Add support for 100G KR2/CR2/SR2 link reportingAnirudh Venkataramanan
Commit 2736d94f351b ("ethtool: Added support for 50Gbps per lane link modes") in v5.1 added (among other things) support for 100G CR2/KR2/SR2 link modes. Advertise these link modes if the firmware reports the corresponding PHY types. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: add missing checks for PF vsi typeJesse Brandeburg
There were a few places we had missed checking the VSI type to make sure it was definitely a PF VSI, before calling setup functions intended only for the PF VSI. This doesn't fix any explicit bugs but cleans up the code in a few places and removes one explicit != vsi->type check that can be superseded by this code (it's a super set) Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: remove redundant non-null check in ice_setup_pf_sw()Anirudh Venkataramanan
Remove a redundant null check, as vsi could not be null at this point. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: restrict PTP HW clock freq adjustments to 100, 000, 000 PPBSiddaraju DH
The PHY provides only 39b timestamp. With current timing implementation, we discard lower 7b, leaving 32b timestamp. The driver reconstructs the full 64b timestamp by correlating the 32b timestamp with cached_time for performance. The reconstruction algorithm does both forward & backward interpolation. The 32b timeval has overflow duration of 2^32 counts ~= 4.23 second. Due to interpolation in both direction, its now ~= 2.125 second IIRC, going with at least half a duration, the cached_time is updated with periodic thread of 1 second (worst-case) periodicity. But the 1 second periodicity is based on System-timer. With PPB adjustments, if the 1588 timers increments at say double the rate, (2s in-place of 1s), the Nyquist rate/half duration sampling/update of cached_time with 1 second periodic thread will lead to incorrect interpolations. Hence we should restrict the PPB adjustments to at least half duration of cached_time update which translates to 500,000,000 PPB. Since the periodicity of the cached-time system thread can vary, it is good to have some buffer time and considering practicality of PPB adjustments, limiting the max_adj to 100,000,000. Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Support drop actionAmritha Nambiar
Currently the drop action is supported only in switchdev mode. Add support for offloading receive filters with action drop in ADQ/non-ADQ modes. This is in addition to other actions such as forwarding to a VSI (ADQ) or a queue (ADQ/non-ADQ). Also renamed 'ch_vsi' to 'dest_vsi' as it is valid for multiple actions such as forward to vsi/queue which may/may not create a channel vsi. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Handle LLDP MIB Pending changeAnatolii Gerasymenko
If the number of Traffic Classes (TC) is decreased, the FW will no longer remove TC nodes, but will send a pending change notification. This will allow RDMA to destroy corresponding Control QP markers. After RDMA finishes outstanding operations, the ice driver will send an execute MIB Pending change admin queue command to FW to finish DCB configuration change. The FW will buffer all incoming Pending changes, so there can be only one active Pending change. RDMA driver guarantees to remove Control QP markers within 5000 ms. Hence, LLDP response timeout txTTL (default 30 sec) will be met. In the case of a Pending change, LLDP MIB Change Event (opcode 0x0A01) will contain the whole new MIB. But Get LLDP MIB (opcode 0x0A00) AQ call would still return an old MIB, as the Pending change hasn't been applied yet. Add ice_get_dcb_cfg_from_mib_change() function to retrieve DCBX config from LLDP MIB Change Event's buffer for Pending changes. Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-19ice: Add 'Execute Pending LLDP MIB' Admin Queue commandTsotne Chakhvadze
In DCB Willing Mode (FW managed LLDP), when the link partner changes configuration which requires fewer TCs, the TCs that are no longer needed are suspended by EMP FW, removed, and never resumed. This occurs before a MIB change event is indicated to SW. The permanent suspension and removal of these TC nodes in the scheduler prevents RDMA from being able to destroy QPs associated with this TC, requiring a CORE reset to recover. A new DCBX configuration change flow is defined to allow SW driver and other SW components (RDMA) to properly adjust to the configuration changes before they are taking effect in HW. This flow includes a two-way handshake between EMP FW<->LAN SW<->RDMA SW. List of changes: - Add 'Execute Pending LLDP MIB' AQC. - Add 'Pending Event Enable' bit. - Add additional logic to ignore Pending Event Enable' request while 'LLDP MIB Chnage' event is disabled. - Add 'Execute Pending LLDP MIB' AQC sending function to FW, which is needed to take place MIB Event change. Signed-off-by: Tsotne Chakhvadze <tsotne.chakhvadze@intel.com> Co-developed-by: Karen Sornek <karen.sornek@intel.com> Signed-off-by: Karen Sornek <karen.sornek@intel.com> Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-09ice: Add check for kzallocJiasheng Jiang
Add the check for the return value of kzalloc in order to avoid NULL pointer dereference. Moreover, use the goto-label to share the clean code. Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-09ice: Fix potential memory leak in ice_gnss_tty_write()Yuan Can
The ice_gnss_tty_write() return directly if the write_buf alloc failed, leaking the cmd_buf. Fix by free cmd_buf if write_buf alloc failed. Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Yuan Can <yuancan@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-21ice: xsk: do not use xdp_return_frame() on tx_buf->raw_bufMaciej Fijalkowski
Previously ice XDP xmit routine was changed in a way that it avoids xdp_buff->xdp_frame conversion as it is simply not needed for handling XDP_TX action and what is more it saves us CPU cycles. This routine is re-used on ZC driver to handle XDP_TX action. Although for XDP_TX on Rx ZC xdp_buff that comes from xsk_buff_pool is converted to xdp_frame, xdp_frame itself is not stored inside ice_tx_buf, we only store raw data pointer. Casting this pointer to xdp_frame and calling against it xdp_return_frame in ice_clean_xdp_tx_buf() results in undefined behavior. To fix this, simply call page_frag_free() on tx_buf->raw_buf. Later intention is to remove the buff->frame conversion in order to simplify the codebase and improve XDP_TX performance on ZC. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Reported-and-tested-by: Robin Cowley <robin.cowley@thehutgroup.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Piotr Raczynski <piotr.raczynski@.intel.com> Link: https://lore.kernel.org/r/20221220175448.693999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-08ice: reschedule ice_ptp_wait_for_offset_valid during resetJacob Keller
If the ice_ptp_wait_for_offest_valid function is scheduled to run while the driver is resetting, it will exit without completing calibration. The work function gets scheduled by ice_ptp_port_phy_restart which will be called as part of the reset recovery process. It is possible for the first execution to occur before the driver has completely cleared its resetting flags. Ensure calibration completes by rescheduling the task until reset is fully completed. Reported-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: make Tx and Rx vernier offset calibration independentSiddaraju DH
The Tx and Rx calibration and timestamp generation blocks are independent. However, the ice driver waits until both blocks are ready before configuring either block. This can result in delay of configuring one block because we have not yet received a packet in the other block. There is no reason to wait to finish programming Tx just because we haven't received a packet. Similarly there is no reason to wait to program Rx just because we haven't transmitted a packet. Instead of checking both offset status before programming either block, refactor the ice_phy_cfg_tx_offset_e822 and ice_phy_cfg_rx_offset_e822 functions so that they perform their own offset status checks. Additionally, make them also check the offset ready bit to determine if the offset values have already been programmed. Call the individual configure functions directly in ice_ptp_wait_for_offset_valid. The functions will now correctly check status, and program the offsets if ready. Once the offset is programmed, the functions will exit quickly after just checking the offset ready register. Remove the ice_phy_calc_vernier_e822 in ice_ptp_hw.c, as well as the offset valid check functions in ice_ptp.c entirely as they are no longer necessary. With this change, the Tx and Rx blocks will each be enabled as soon as possible without waiting for the other block to complete calibration. This can enable timestamps faster in setups which have a low rate of transmitted or received packets. In particular, it can stop a situation where one port never receives traffic, and thus never finishes calibration of the Tx block, resulting in continuous faults reported by the ptp4l daemon application. Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: only check set bits in ice_ptp_flush_tx_trackerJacob Keller
The ice_ptp_flush_tx_tracker function is called to clear all outstanding Tx timestamp requests when the port is being brought down. This function iterates over the entire list, but this is unnecessary. We only need to check the bits which are actually set in the ready bitmap. Replace this logic with for_each_set_bit, and follow a similar flow as in ice_ptp_tx_tstamp_cleanup. Note that it is safe to call dev_kfree_skb_any on a NULL pointer as it will perform a no-op so we do not need to verify that the skb is actually NULL. The new implementation also avoids clearing (and thus reading!) the PHY timestamp unless the index is marked as having a valid timestamp in the timestamp status bitmap. This ensures that we properly clear the status registers as appropriate. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstampJacob Keller
In the event of a PTP clock time change due to .adjtime or .settime, the ice driver needs to update the cached copy of the PHC time and also discard any outstanding Tx timestamps. This is required because otherwise the wrong copy of the PHC time will be used when extending the Tx timestamp. This could result in reporting incorrect timestamps to the stack. The current approach taken to handle this is to call ice_ptp_flush_tx_tracker, which will discard any timestamps which are not yet complete. This is problematic for two reasons: 1) it could lead to a potential race condition where the wrong timestamp is associated with a future packet. This can occur with the following flow: 1. Thread A gets request to transmit a timestamped packet, and picks an index and transmits the packet 2. Thread B calls ice_ptp_flush_tx_tracker and sees the index in use, marking is as disarded. No timestamp read occurs because the status bit is not set, but the index is released for re-use 3. Thread A gets a new request to transmit another timestamped packet, picks the same (now unused) index and transmits that packet. 4. The PHY transmits the first packet and updates the timestamp slot and generates an interrupt. 5. The ice_ptp_tx_tstamp thread executes and sees the interrupt and a valid timestamp but associates it with the new Tx SKB and not the one that actual timestamp for the packet as expected. This could result in the previous timestamp being assigned to a new packet producing incorrect timestamps and leading to incorrect behavior in PTP applications. This is most likely to occur when the packet rate for Tx timestamp requests is very high. 2) on E822 hardware, we must avoid reading a timestamp index more than once each time its status bit is set and an interrupt is generated by hardware. We do have some extensive checks for the unread flag to ensure that only one of either the ice_ptp_flush_tx_tracker or ice_ptp_tx_tstamp threads read the timestamp. However, even with this we can still have cases where we "flush" a timestamp that was actually completed in hardware. This can lead to cases where we don't read the timestamp index as appropriate. To fix both of these issues, we must avoid calling ice_ptp_flush_tx_tracker outside of the teardown path. Rather than using ice_ptp_flush_tx_tracker, introduce a new state bitmap, the stale bitmap. Start this as cleared when we begin a new timestamp request. When we're about to extend a timestamp and send it up to the stack, first check to see if that stale bit was set. If so, drop the timestamp without sending it to the stack. When we need to update the cached PHC timestamp out of band, just mark all currently outstanding timestamps as stale. This will ensure that once hardware completes the timestamp we'll ignore it correctly and avoid reporting bogus timestamps to userspace. With this change, we fix potential issues caused by calling ice_ptp_flush_tx_tracker during normal operation. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: cleanup allocations in ice_ptp_alloc_tx_trackerJacob Keller
The ice_ptp_alloc_tx_tracker function must allocate the timestamp array and the bitmap for tracking the currently in use indexes. A future change is going to add yet another allocation to this function. If these allocations fail we need to ensure that we properly cleanup and ensure that the pointers in the ice_ptp_tx structure are NULL. Simplify this logic by allocating to local variables first. If any allocation fails, then free everything and exit. Only update the ice_ptp_tx structure if all allocations succeed. This ensures that we have no side effects on the Tx structure unless all allocations have succeeded. Thus, no code will see an invalid pointer and we don't need to re-assign NULL on cleanup. This is safe because kernel "free" functions are designed to be NULL safe and perform no action if passed a NULL pointer. Thus its safe to simply always call kfree or bitmap_free even if one of those pointers was NULL. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: protect init and calibrating check in ice_ptp_request_tsJacob Keller
When requesting a new timestamp, the ice_ptp_request_ts function does not hold the Tx tracker lock while checking init and calibrating. This means that we might issue a new timestamp request just after the Tx timestamp tracker starts being deinitialized. This could lead to incorrect access of the timestamp structures. Correct this by moving the init and calibrating checks under the lock, and updating the flows which modify these fields to use the lock. Note that we do not need to hold the lock while checking for tx->init in ice_ptp_tx_tstamp. This is because the teardown function will use synchronize_irq after clearing the flag to ensure that the threaded interrupt completes. Either a) the tx->init flag will be cleared before the ice_ptp_tx_tstamp function starts, thus it will exit immediately, or b) the threaded interrupt will be executing and the synchronize_irq will wait until the threaded interrupt has completed at which point we know the init field has definitely been set and new interrupts will not execute the Tx timestamp thread function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: synchronize the misc IRQ when tearing down Tx trackerJacob Keller
Since commit 1229b33973c7 ("ice: Add low latency Tx timestamp read") the ice driver has used a threaded IRQ for handling Tx timestamps. This change did not add a call to synchronize_irq during ice_ptp_release_tx_tracker. Thus it is possible that an interrupt could occur just as the tracker is being removed. This could lead to a use-after-free of the Tx tracker structure data. Fix this by calling sychronize_irq in ice_ptp_release_tx_tracker after we've cleared the init flag. In addition, make sure that we re-check the init flag at the end of ice_ptp_tx_tstamp before we exit ensuring that we will stop polling for new timestamps once the tracker de-initialization has begun. Refactor the ts_handled variable into "more_timestamps" so that we can simply directly assign this boolean instead of relying on an initialized value of true. This makes the new combined check easier to read. With this change, the ice_ptp_release_tx_tracker function will now wait for the threaded interrupt to complete if it was executing while the init flag was cleared. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: check Tx timestamp memory register for ready timestampsJacob Keller
The PHY for E822 based hardware has a register which indicates which timestamps are valid in the PHY timestamp memory block. Each bit in the register indicates whether the associated index in the timestamp memory is valid. Hardware sets this bit when the timestamp is captured, and clears the bit when the timestamp is read. Use of this register is important as reading timestamp registers can impact the way that hardware generates timestamp interrupts. This occurs because the PHY has an internal value which is incremented when hardware captures a timestamp and decremented when software reads a timestamp. Reading timestamps which are not marked as valid still decrement the internal value and can result in the Tx timestamp interrupt not triggering in the future. To prevent this, use the timestamp memory value to determine which timestamps are ready to be read. The ice_get_phy_tx_tstamp_ready function reads this value. For E810 devices, this just always returns with all bits set. Skip any timestamp which is not set in this bitmap, avoiding reading extra timestamps on E822 devices. The stale check against a cached timestamp value is no longer necessary for PHYs which support the timestamp ready bitmap properly. E810 devices still need this. Introduce a new verify_cached flag to the ice_ptp_tx structure. Use this to determine if we need to perform the verification against the cached timestamp value. Set this to 1 for the E810 Tx tracker init function. Notice that many of the fields in ice_ptp_tx are simple 1 bit flags. Save some structure space by using bitfields of length 1 for these values. Modify the ICE_PTP_TS_VALID check to simply drop the timestamp immediately so that in an event of getting such an invalid timestamp the driver does not attempt to re-read the timestamp again in a future poll of the register. With these changes, the driver now reads each timestamp register exactly once, and does not attempt any re-reads. This ensures the interrupt tracking logic in the PHY will not get stuck. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: handle discarding old Tx requests in ice_ptp_tx_tstampJacob Keller
Currently the driver uses the PTP kthread to process handling and discarding of stale Tx timestamp requests. The function ice_ptp_tx_tstamp_cleanup is used for this. A separate thread creates complications for the driver as we now have both the main Tx timestamp processing IRQ checking timestamps as well as the kthread. Rather than using the kthread to handle this, simply check for stale timestamps within the ice_ptp_tx_tstamp function. This function must already process the timestamps anyways. If a Tx timestamp has been waiting for 2 seconds we simply clear the bit and discard the SKB. This avoids the complication of having separate threads polling, reducing overall CPU work. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: always call ice_ptp_link_change and make it voidJacob Keller
The ice_ptp_link_change function is currently only called for E822 based hardware. Future changes are going to extend this function to perform additional tasks on link change. Always call this function, moving the E810 check from the callers down to just before we call the E822-specific function required to restart the PHY. This function also returns an error value, but none of the callers actually check it. In general, the errors it produces are more likely systemic problems such as invalid or corrupt port numbers. No caller checks these, and so no warning is logged. Re-order the flag checks so that ICE_FLAG_PTP is checked first. Drop the unnecessary check for ICE_FLAG_PTP_SUPPORTED, as ICE_FLAG_PTP will not be set except when ICE_FLAG_PTP_SUPPORTED is set. Convert the port checks to WARN_ON_ONCE, in order to generate a kernel stack trace when they are hit. Convert the function to void since no caller actually checks these return values. Co-developed-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: fix misuse of "link err" with "link status"Jacob Keller
The ice_ptp_link_change function has a comment which mentions "link err" when referring to the current link status. We are storing the status of whether link is up or down, which is not an error. It is appears that this use of err accidentally got included due to an overzealous search and replace when removing the ice_status enum and local status variable. Fix the wording to use the correct term. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: Reset TS memory for all quadsKarol Kolacinski
In E822 products, the owner PF should reset memory for all quads, not only for the one where assigned lport is. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: Remove the E822 vernier "bypass" logicMilena Olech
The E822 devices support an extended "vernier" calibration which enables higher precision timestamps by accounting for delays in the PHY, and compensating for them. These delays are measured by hardware as part of its vernier calibration logic. The driver currently starts the PHY in "bypass" mode which skips the compensation. Then it later attempts to switch from bypass to vernier. This unfortunately does not work as expected. Instead of properly compensating for the delays, the hardware continues operating in bypass without the improved precision expected. Because we cannot dynamically switch between bypass and vernier mode, refactor the driver to always operate in vernier mode. This has a slight downside: Tx timestamp and Rx timestamp requests that occur as the very first packet set after link up will not complete properly and may be reported to applications as missing timestamps. This occurs frequently in test environments where traffic is light or targeted specifically at testing PTP. However, in practice most environments will have transmitted or received some data over the network before such initial requests are made. Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-08ice: Use more generic names for ice_ptp_tx fieldsSergey Temerkhanov
Some supported devices have per-port timestamp memory blocks while others have shared ones within quads. Rename the struct ice_ptp_tx fields to reflect the block entities it works with Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-11-30net: devlink: let the core report the driver name instead of the driversVincent Mailhol
The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. In order to factorize code, make devlink_nl_info_fill() add the driver name attribute. Now that the core sets the driver name attribute, drivers are not supposed to call devlink_info_driver_name_put() anymore. Remove devlink_info_driver_name_put() and clean-up all the drivers using this function in their callback. Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Tested-by: Ido Schimmel <idosch@nvidia.com> # mlxsw Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>