summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)Author
2022-07-15ice: Add EXTTS feature to the feature bitmapAnirudh Venkataramanan
External time stamp sources are supported only on certain devices. Enforce the right support matrix by adding the ICE_F_PTP_EXTTS bit to the feature bitmap set. Co-developed-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-12ice: change devlink code to read NVM in blocksPaul M Stillwell Jr
When creating a snapshot of the NVM the driver needs to read the entire contents from the NVM and store it. The NVM reads are protected by a lock that is shared between the driver and the firmware. If the driver takes too long to read the entire NVM (which can happen on some systems) then the firmware could reclaim the lock and cause subsequent reads from the driver to fail. We could fix this by increasing the timeout that we pass to the firmware, but we could end up in the same situation again if the system is slow. Instead have the driver break the reading of the NVM into blocks that are small enough that we have confidence that the read will complete within the timeout time, but large enough not to cause significant AQ overhead. Fixes: dce730f17825 ("ice: add a devlink region for dumping NVM contents") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-12ice: handle E822 generic device ID in PLDM headerPaul M Stillwell Jr
The driver currently presumes that the record data in the PLDM header of the firmware image will match the device ID of the running device. This is true for E810 devices. It appears that for E822 devices that this is not guaranteed to be true. Fix this by adding a check for the generic E822 device. Fixes: d69ea414c9b4 ("ice: implement device flash update via devlink") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-01Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nexDavid S. Miller
t-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-06-30 This series contains updates to ice driver only. Martyna adds support for VLAN related TC switchdev filters and reworks dummy packet implementation of VLANs to enable dynamic header insertion to allow for more rule types. Lu Wei utilizes eth_broadcast_addr() helper over an open coded version. Ziyang Xuan removes unneeded NULL checks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-30net/ice: fix initializing the bitmap in the switch codeAlexander Lobakin
Kbuild spotted the following bug during the testing of one of the optimizations: In file included from include/linux/cpumask.h:12, [...] from drivers/net/ethernet/intel/ice/ice_switch.c:4: drivers/net/ethernet/intel/ice/ice_switch.c: In function 'ice_find_free_recp_res_idx.constprop': include/linux/bitmap.h:447:22: warning: 'possible_idx[0]' is used uninitialized [-Wuninitialized] 447 | *map |= GENMASK(start + nbits - 1, start); | ^~ In file included from drivers/net/ethernet/intel/ice/ice.h:7, from drivers/net/ethernet/intel/ice/ice_lib.h:7, from drivers/net/ethernet/intel/ice/ice_switch.c:4: drivers/net/ethernet/intel/ice/ice_switch.c:4929:24: note: 'possible_idx[0]' was declared here 4929 | DECLARE_BITMAP(possible_idx, ICE_MAX_FV_WORDS); | ^~~~~~~~~~~~ include/linux/types.h:11:23: note: in definition of macro 'DECLARE_BITMAP' 11 | unsigned long name[BITS_TO_LONGS(bits)] | ^~~~ %ICE_MAX_FV_WORDS is 48, so bitmap_set() here was initializing only 48 bits, leaving a junk in the rest 16. It was previously hidden due to that filling 48 bits makes bitmap_set() call external __bitmap_set(), but after making it use plain bit arithmetics on small bitmaps, compilers started seeing the issue. It was still working because those 16 weren't used anywhere anyhow. bitmap_{clear,set}() are not really intended to initialize bitmaps, rather to modify already initialized ones, as they don't do anything past the passed number of bits. The correct function to do this in that particular case is bitmap_fill(), so use it here. It will do `*possible_idx = ~0UL` instead of `*possible_idx |= GENMASK(47, 0)`, not leaving anything in an undefined state. Fixes: fd2a6b71e300 ("ice: create advanced switch recipe") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-30intel/ice:fix repeated words in commentsJilin Yuan
Delete the redundant word 'a'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-30ice: Remove unnecessary NULL check before dev_putZiyang Xuan
Since commit b37a46683739 ("netdevice: add the case if dev is NULL"), dev_put(NULL) is safe, check NULL before dev_put() is not needed. Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-30ice: use eth_broadcast_addr() to set broadcast addressLu Wei
Use eth_broadcast_addr() to set broadcast address instead of memset(). Signed-off-by: Lu Wei <luwei32@huawei.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-30ice: switch: dynamically add VLAN headers to dummy packetsMartyna Szapar-Mudlaw
Enable the support of creating all kinds of declared dummy packets with the VLAN tags by inserting VLAN headers (single VLAN and QinQ cases) if needed. Decrease the number of declared dummy packets and increase in the possible packet's combinations for adding switch rules. This change enables support of creating filters that match both on VLAN + tunnels properties in switchdev. Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-30ice: Add support for VLAN TPID filters in switchdevMartyna Szapar-Mudlaw
Enable support for adding TC rules that filter on the VLAN tag type in switchdev mode. Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-30ice: Add support for double VLAN in switchdevMartyna Szapar-Mudlaw
Enable support for adding TC rules with both C-tag and S-tag that can filter on the inner and outer VLAN in QinQ for basic packets (not tunneled cases). Signed-off-by: Wiktor Pilarczyk <wiktor.pilarczyk@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-21ice: ethtool: Prohibit improper channel config for DCBAnatolii Gerasymenko
Do not allow setting less channels, than Traffic Classes there are via ethtool. There must be at least one channel per Traffic Class. If you set less channels, than Traffic Classes there are, then during ice_vsi_rebuild there would be allocated only the requested amount of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map there would be requested at least one channel per Traffic Class. This results in setting num_rxq > alloc_rxq and num_txq > alloc_txq. Later, there would be a NULL pointer dereference in ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or tx_rings arrays. Change ice_set_channels() to return error if you try to allocate less channels, than Traffic Classes there are. Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return status code instead of void. Add error handling for ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc(). [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi. [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [53763.992915] PGD 14b45f5067 P4D 0 [53763.996444] Oops: 0002 [#1] SMP NOPTI [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE --------- - - 4.18.0-240.el8.x86_64 #1 [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020 [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice] [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4 c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44 [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206 [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018 [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018 [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004 [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000 [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018 [53764.090976] FS: 000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000 [53764.099362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0 [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [53764.127747] PKRU: 55555554 [53764.130781] Call Trace: [53764.133564] ice_vsi_rebuild+0x611/0x870 [ice] [53764.138341] ice_vsi_recfg_qs+0x94/0x100 [ice] [53764.143116] ice_set_channels+0x1a8/0x3e0 [ice] [53764.147975] ethtool_set_channels+0x14e/0x240 [53764.152667] dev_ethtool+0xd74/0x2a10 [53764.156665] ? __mod_lruvec_state+0x44/0x110 [53764.161280] ? __mod_lruvec_state+0x44/0x110 [53764.165893] ? page_add_file_rmap+0x15/0x170 [53764.170518] ? inet_ioctl+0xd1/0x220 [53764.174445] ? netdev_run_todo+0x5e/0x290 [53764.178808] dev_ioctl+0xb5/0x550 [53764.182485] sock_do_ioctl+0xa0/0x140 [53764.186512] sock_ioctl+0x1a8/0x300 [53764.190367] ? selinux_file_ioctl+0x161/0x200 [53764.195090] do_vfs_ioctl+0xa4/0x640 [53764.199035] ksys_ioctl+0x60/0x90 [53764.202722] __x64_sys_ioctl+0x16/0x20 [53764.206845] do_syscall_64+0x5b/0x1a0 [53764.210887] entry_SYSCALL_64_after_hwframe+0x65/0xca Fixes: 87324e747fde ("ice: Implement ethtool ops for channels") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21ice: ethtool: advertise 1000M speeds properlyAnatolii Gerasymenko
In current implementation ice_update_phy_type enables all link modes for selected speed. This approach doesn't work for 1000M speeds, because both copper (1000baseT) and optical (1000baseX) standards cannot be enabled at once. Fix this, by adding the function `ice_set_phy_type_from_speed()` for 1000M speeds. Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21ice: Fix switchdev rules book keepingWojciech Drewek
Adding two filters with same matching criteria ends up with one rule in hardware with act = ICE_FWD_TO_VSI_LIST. In order to remove them properly we have to keep the information about vsi handle which is used in VSI bitmap (ice_adv_fltr_mgmt_list_entry::vsi_list_info::vsi_map). Fixes: 0d08a441fb1a ("ice: ndo_setup_tc implementation for PF") Reported-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-21ice: ignore protocol field in GTP offloadMarcin Szycik
Commit 34a897758efe ("ice: Add support for inner etype in switchdev") added the ability to match on inner ethertype. A side effect of that change is that it is now impossible to add some filters for protocols which do not contain inner ethtype field. tc requires the protocol field to be specified when providing certain other options, e.g. src_ip. This is a problem in case of GTP - when user wants to specify e.g. src_ip, they also need to specify protocol in tc command (otherwise tc fails with: Illegal "src_ip"). Because GTP is a tunnel, the protocol field is treated as inner protocol. GTP does not contain inner ethtype field and the filter cannot be added. To fix this, ignore the ethertype field in case of GTP filters. Fixes: 9a225f81f540 ("ice: Support GTP-U and GTP-C offload in switchdev") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-14ice: Fix memory corruption in VF driverPrzemyslaw Patynowski
Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled, when it requests a reset. If PF driver assumes that VF is disabled, while VF still has queues configured, VF may unmap DMA resources. In such scenario device still can map packets to memory, which ends up silently corrupting it. Previously, VF driver could experience memory corruption, which lead to crash: [ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237 [ 5119.170166] PGD 0 P4D 0 [ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI [ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G W I --------- - - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1 [ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019 [ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf] [ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30 [ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74 [ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282 [ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002 [ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203 [ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009 [ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000 [ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000 [ 5119.170269] FS: 0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000 [ 5119.170274] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0 [ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 5119.170290] PKRU: 55555554 [ 5119.170292] Call Trace: [ 5119.170298] iavf_clean_rx_ring+0xad/0x110 [iavf] [ 5119.170324] iavf_free_rx_resources+0xe/0x50 [iavf] [ 5119.170342] iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf] [ 5119.170358] iavf_virtchnl_completion+0xd8a/0x15b0 [iavf] [ 5119.170377] ? iavf_clean_arq_element+0x210/0x280 [iavf] [ 5119.170397] iavf_adminq_task+0x126/0x2e0 [iavf] [ 5119.170416] process_one_work+0x18f/0x420 [ 5119.170429] worker_thread+0x30/0x370 [ 5119.170437] ? process_one_work+0x420/0x420 [ 5119.170445] kthread+0x151/0x170 [ 5119.170452] ? set_kthread_struct+0x40/0x40 [ 5119.170460] ret_from_fork+0x35/0x40 [ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas [ 5119.170613] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf] [ 5119.170627] CR2: 00001b9780003237 Fixes: ec4f5a436bdf ("ice: Check if VF is disabled for Opcode and other operations") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Co-developed-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14ice: Fix queue config fail handlingPrzemyslaw Patynowski
Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail. Not disabling them might lead to scenario, where PF driver leaves VF queues enabled, when VF's VSI failed queue config. In this scenario VF should not have RX/TX queues enabled. If PF failed to set up VF's queues, VF will reset due to TX timeouts in VF driver. Initialize iterator 'i' to -1, so if error happens prior to configuring queues then error path code will not disable queue 0. Loop that configures queues will is using same iterator, so error path code will only disable queues that were configured. Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap") Suggested-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14ice: Sync VLAN filtering features for DVMRoman Storozhenko
VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be both enabled or disabled. In case of turning off/on only one of the features, another feature must be turned off/on automatically with issuing an appropriate message to the kernel log. Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev") Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com> Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-14ice: Fix PTP TX timestamp offset calculationMichal Michalik
The offset was being incorrectly calculated for E822 - that led to collisions in choosing TX timestamp register location when more than one port was trying to use timestamping mechanism. In E822 one quad is being logically split between ports, so quad 0 is having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should have separate memory location for tracking timestamps. Due to error for example ports 1 and 2 had been assigned to quad 0 with same offset (0), while port 1 should have offset 0 and 1 offset 16. Fix it by correctly calculating quad offset. Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-10ethernet: Remove vf rate limit check for driversBin Chen
The commit a14857c27a50 ("rtnetlink: verify rate parameters for calls to ndo_set_vf_rate") has been merged to master, so we can to remove the now-duplicate checks in drivers. Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220609084717.155154-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10Merge branch '10GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 10GbE Intel Wired LAN Driver Updates 2022-06-09 Maximilian Heyne adds reporting of VF statistics on ixgbe via iproute2 interface. Kai-Heng Feng removes duplicate defines from igb. Jiaqing Zhao fixes typos in e1000, ixgb, and ixgbe drivers. Julia Lawall fixes typos for fm10k, ixgbe, and ice drivers. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: drivers/net/ethernet/intel: fix typos in comments ixgbe: Fix typos in comments ixgb: Fix typos in comments e1000: Fix typos in comments igb: Remove duplicate defines drivers, ixgbe: export vf statistics ==================== Link: https://lore.kernel.org/r/20220609171257.2727150-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09drivers/net/ethernet/intel: fix typos in commentsJulia Lawall
Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-08ice: Use correct order for the parameters of devm_kcalloc()Christophe JAILLET
We should have 'n', then 'size', not the opposite. This is harmless because the 2 values are just multiplied, but having the correct order silence a (unpublished yet) smatch warning. While at it use '*tun_seg' instead '*seg'. The both variable have the same type, so the result is the same, but it lokks more logical. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-08ice: remove u16 arithmetic in ice_gnssKarol Kolacinski
Change u16 to unsigned int where arithmetic occurs. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-08ice: remove VLAN representor specific opsMichal Swiatkowski
In switchdev mode VF VLAN caps will not be set there is no need to have specific VLAN ops for representor that only returns not supported error. As VLAN configuration commands will be blocked, the VF driver can't disable VLAN stripping at initialization. It leads to the situation when VLAN stripping on VF VSI is on, but in kernel it is off. To prevent this, disable VLAN stripping in VSI initialization. It doesn't break other usecases, because it is set according to kernel settings. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-08ice: don't set VF VLAN caps in switchdevMichal Swiatkowski
In switchdev mode any VLAN manipulation from VF side isn't allowed. In order to prevent parsing VLAN commands don't set VF VLAN caps. This will result in removing VLAN specific opcodes from allowlist. If VF send any VLAN specific opcode PF driver will answer with not supported error. With this approach VF driver know that VLAN caps aren't supported so it shouldn't send VLAN specific opcodes. Thanks to that, some ugly errors will not show up in dmesg (ex. on creating VFs in switchdev mode there are errors about not supported VLAN insertion and stripping) Move setting VLAN caps to separate function, including switchdev mode specific code. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-06-02ice: fix access-beyond-end in the switch codeAlexander Lobakin
Global `-Warray-bounds` enablement revealed some problems, one of which is the way we define and use AQC rules messages. In fact, they have a shared header, followed by the actual message, which can be of one of several different formats. So it is straightforward enough to define that header as a separate struct and then embed it into message structures as needed, but currently all the formats reside in one union coupled with the header. Then, the code allocates only the memory needed for a particular message format, leaving the union potentially incomplete. There are no actual reads or writes beyond the end of an allocated chunk, but at the same time, the whole implementation is fragile and backed by an equilibrium rather than strong type and memory checks. Define the structures the other way around: one for the common header and the rest for the actual formats with the header embedded. There are no places where several union members would be used at the same time anyway. This allows to use proper struct_size() and let the compiler know what is going to be done. Finally, unsilence `-Warray-bounds` back for ice_switch.c. Other little things worth mentioning: * &ice_sw_rule_vsi_list_query is not used anywhere, remove it. It's weird anyway to talk to hardware with purely kernel types (bitmaps); * expand the ICE_SW_RULE_*_SIZE() macros to pass a structure variable name to struct_size() to let it do strict typechecking; * rename ice_sw_rule_lkup_rx_tx::hdr to ::hdr_data to keep ::hdr for the header structure to have the same name for it constistenly everywhere; * drop the duplicate of %ICE_SW_RULE_RX_TX_NO_HDR_SIZE residing in ice_switch.h. Fixes: 9daf8208dd4d ("ice: Add support for switch filter programming") Fixes: 66486d8943ba ("ice: replace single-element array used for C struct hack") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220601105924.2841410-1-alexandr.lobakin@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-22eth: ice: silence the GCC 12 array-bounds warningJakub Kicinski
GCC 12 gets upset because driver allocates partial struct ice_aqc_sw_rules_elem buffers. The writes are within bounds. Silence these warnings for now, our build bot runs GCC 12 so we won't allow any new instances. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/20220519114119.060ce014@canb.auug.org.au/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/20220516111918.366d747f@canb.auug.org.au/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/20220519115146.751c3a37@canb.auug.org.au/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/20220516111435.72f35dca@canb.auug.org.au/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/20220519115837.380bb8d4@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-17ice: Fix interrupt moderation settings getting clearedMichal Wilczynski
Adaptive-rx and Adaptive-tx are interrupt moderation settings that can be enabled/disabled using ethtool: ethtool -C ethX adaptive-rx on/off adaptive-tx on/off Unfortunately those settings are getting cleared after changing number of queues, or in ethtool world 'channels': ethtool -L ethX rx 1 tx 1 Clearing was happening due to introduction of bit fields in ice_ring_container struct. This way only itr_setting bits were rebuilt during ice_vsi_rebuild_set_coalesce(). Introduce an anonymous struct of bitfields and create a union to refer to them as a single variable. This way variable can be easily saved and restored. Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild") Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17ice: fix possible under reporting of ethtool Tx and Rx statisticsPaul Greenwalt
The hardware statistics counters are not cleared during resets so the drivers first access is to initialize the baseline and then subsequent reads are for reporting the counters. The statistics counters are read during the watchdog subtask when the interface is up. If the baseline is not initialized before the interface is up, then there can be a brief window in which some traffic can be transmitted/received before the initial baseline reading takes place. Directly initialize ethtool statistics in driver open so the baseline will be initialized when the interface is up, and any dropped packets incremented before the interface is up won't be reported. Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-17ice: fix crash when writing timestamp on RX ringsArkadiusz Kubalewski
Do not allow to write timestamps on RX rings if PF is being configured. When PF is being configured RX rings can be freed or rebuilt. If at the same time timestamps are updated, the kernel will crash by dereferencing null RX ring pointer. PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51" #0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be #1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d #2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd #3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54 #4 [ff1966a94a713d08] no_context at ffffffff9d06bda4 #5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c #6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4 #7 [ff1966a94a713de0] page_fault at ffffffff9da0107e [exception RIP: ice_ptp_update_cached_phctime+91] RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246 RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648 R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice] #9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b #10 [ff1966a94a713f10] kthread at ffffffff9d101b4d #11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Dave Cain <dcain@redhat.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-13ice: Expose RSS indirection tables for queue groups via ethtoolSridhar Samudrala
When ADQ queue groups (TCs) are created via tc mqprio command, RSS contexts and associated RSS indirection tables are configured automatically per TC based on the queue ranges specified for each traffic class. For ex: tc qdisc add dev enp175s0f0 root mqprio num_tc 3 map 0 1 2 \ queues 2@0 8@2 4@10 hw 1 mode channel will create 3 queue groups (TC 0-2) with queue ranges 2, 8 and 4 in 3 queue groups. Each queue group is associated with its own RSS context and RSS indirection table. Add support to expose RSS indirection tables for all ADQ queue groups using ethtool RSS contexts interface. ethtool -x enp175s0f0 context <tc-num> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220512213249.3747424-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-05-06 Marcin Szycik says: This patchset adds support for systemd defined naming scheme for port representors, as well as re-enables displaying PCI bus-info in ethtool. bus-info information has previously been removed from ethtool for port representors, as a workaround for a bug in lshw tool, where the tool would sometimes display wrong descriptions for port representors/PF. Now the bug has been fixed in lshw tool [1]. Removing the workaround can be considered a regression (user might be running an older, unpatched version of lshw) (see [2] for discussion). However, calling SET_NETDEV_DEV also produces the same effect as removing the workaround, i.e. lshw is able to access PCI bus-info (this time not via ethtool, but in some other way) and the bug can occur. Adding SET_NETDEV_DEV is important, as it greatly improves netdev naming - - port representors are named based on PF name. Currently port representors are named "ethX", which might be confusing, especially when spawning VFs on multiple PFs. Furthermore, it's currently harder to determine to which PF does a particular port representor belong, as bus-info is not shown in ethtool. Consider the following three cases: Case 1: current code - driver workaround in place, no SET_NETDEV_DEV, lshw with or without fix. Port representors are not displayed because they don't have bus-info (the workaround), PFs are labelled correctly: $ sudo ./lshw -c net -businfo Bus info Device Class Description ======================================================== pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP <-- PF pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function <-- VF pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... Case 2: driver workaround in place, SET_NETDEV_DEV, no lshw fix. Port representors have predictable names. lshw is able to get bus-info because of SET_NETDEV_DEV and netdevs CAN be mislabelled: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP <-- mislabeled port representor pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf26 network Ethernet interface pci@0000:02:00.0 ens6f0 network Ethernet interface <-- mislabeled PF pci@0000:02:00.0 ens6f0npf0vf81 network Ethernet interface ... $ sudo ethtool -i ens6f0npf0vf60 driver: ice ... bus-info: ... Output of lshw would be the same with workaround removed; it does not change the fact that lshw labels netdevs incorrectly, while at the same time it prevents ethtool from displaying potentially useful data (bus-info). Case 3: workaround removed, SET_NETDEV_DEV, lshw fix: $ sudo ./lshw -c net -businfo Bus info Device Class Description ============================================================= pci@0000:02:00.0 ens6f0npf0vf73 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.1 ens6f1 network Ethernet Controller E810-XXV for SFP pci@0000:02:01.0 ens6f0v0 network Ethernet Adaptive Virtual Function pci@0000:02:01.1 ens6f0v1 network Ethernet Adaptive Virtual Function ... pci@0000:02:00.0 ens6f0npf0vf5 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0 network Ethernet Controller E810-XXV for SFP pci@0000:02:00.0 ens6f0npf0vf60 network Ethernet Controller E810-XXV for SFP ... $ sudo ethtool -i ens6f0npf0vf73 driver: ice ... bus-info: 0000:02:00.0 ... In this case poort representors have predictable names, netdevs are not mislabelled in lshw, and bus-info is shown in ethtool. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 [2] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20220321144731.3935-1-marcin.szycik@linux.intel.com * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode" ice: link representors to PCI device ==================== Link: https://lore.kernel.org/r/20220506180052.5256-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-09rtnetlink: add extack support in fdb del handlersAlaa Mohamed
Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-06ice: fix PTP stale Tx timestamps cleanupMichal Michalik
Read stale PTP Tx timestamps from PHY on cleanup. After running out of Tx timestamps request handlers, hardware (HW) stops reporting finished requests. Function ice_ptp_tx_tstamp_cleanup() used to only clean up stale handlers in driver and was leaving the hardware registers not read. Not reading stale PTP Tx timestamps prevents next interrupts from arriving and makes timestamping unusable. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Michal Michalik <michal.michalik@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: clear stale Tx queue settings before configuringAnatolii Gerasymenko
The iAVF driver uses 3 virtchnl op codes to communicate with the PF regarding the VF Tx queues: * VIRTCHNL_OP_CONFIG_VSI_QUEUES configures the hardware and firmware logic for the Tx queues * VIRTCHNL_OP_ENABLE_QUEUES configures the queue interrupts * VIRTCHNL_OP_DISABLE_QUEUES disables the queue interrupts and Tx rings. There is a bug in the iAVF driver due to the race condition between VF reset request and shutdown being executed in parallel. This leads to a break in logic and VIRTCHNL_OP_DISABLE_QUEUES is not being sent. If this occurs, the PF driver never cleans up the Tx queues. This results in leaving behind stale Tx queue settings in the hardware and firmware. The most obvious outcome is that upon the next VIRTCHNL_OP_CONFIG_VSI_QUEUES, the PF will fail to program the Tx scheduler node due to a lack of space. We need to protect ICE driver against such situation. To fix this, make sure we clear existing stale settings out when handling VIRTCHNL_OP_CONFIG_VSI_QUEUES. This ensures we remove the previous settings. Calling ice_vf_vsi_dis_single_txq should be safe as it will do nothing if the queue is not configured. The function already handles the case when the Tx queue is not currently configured and exits with a 0 return in that case. Fixes: 7ad15440acf8 ("ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: Fix race during aux device (un)pluggingIvan Vecera
Function ice_plug_aux_dev() assigns pf->adev field too early prior aux device initialization and on other side ice_unplug_aux_dev() starts aux device deinit and at the end assigns NULL to pf->adev. This is wrong because pf->adev should always be non-NULL only when aux device is fully initialized and ready. This wrong order causes a crash when ice_send_event_to_aux() call occurs because that function depends on non-NULL value of pf->adev and does not assume that aux device is half-initialized or half-destroyed. After order correction the race window is tiny but it is still there, as Leon mentioned and manipulation with pf->adev needs to be protected by mutex. Fix (un-)plugging functions so pf->adev field is set after aux device init and prior aux device destroy and protect pf->adev assignment by new mutex. This mutex is also held during ice_send_event_to_aux() call to ensure that aux device is valid during that call. Note that device lock used ice_send_event_to_aux() needs to be kept to avoid race with aux drv unload. Reproducer: cycle=1 while :;do echo "#### Cycle: $cycle" ip link set ens7f0 mtu 9000 ip link add bond0 type bond mode 1 miimon 100 ip link set bond0 up ifenslave bond0 ens7f0 ip link set bond0 mtu 9000 ethtool -L ens7f0 combined 1 ip link del bond0 ip link set ens7f0 mtu 1500 sleep 1 let cycle++ done In short when the device is added/removed to/from bond the aux device is unplugged/plugged. When MTU of the device is changed an event is sent to aux device asynchronously. This can race with (un)plugging operation and because pf->adev is set too early (plug) or too late (unplug) the function ice_send_event_to_aux() can touch uninitialized or destroyed fields. In the case of crash below pf->adev->dev.mutex. Crash: [ 53.372066] bond0: (slave ens7f0): making interface the new active one [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval idating tc mappings. Priority traffic classification disabled! [ 54.248204] bond0: (slave ens7f0): Releasing backup interface [ 54.253955] bond0: (slave ens7f1): making interface the new active one [ 54.274875] bond0: (slave ens7f1): Releasing backup interface [ 54.289153] bond0 (unregistering): Released all slaves [ 55.383179] MII link monitoring set to 100 ms [ 55.398696] bond0: (slave ens7f0): making interface the new active one [ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080 [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u p link [ 55.412198] #PF: supervisor write access in kernel mode [ 55.412200] #PF: error_code(0x0002) - not-present page [ 55.412201] PGD 25d2ad067 P4D 0 [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S 5.17.0-13579-g57f2d6540f03 #1 [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up link [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/ 2021 [ 55.430226] Workqueue: ice ice_service_task [ice] [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001 [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080 [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041 [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0 [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000 [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000 [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0 [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 55.567305] PKRU: 55555554 [ 55.570018] Call Trace: [ 55.572474] <TASK> [ 55.574579] ice_service_task+0xaab/0xef0 [ice] [ 55.579130] process_one_work+0x1c5/0x390 [ 55.583141] ? process_one_work+0x390/0x390 [ 55.587326] worker_thread+0x30/0x360 [ 55.590994] ? process_one_work+0x390/0x390 [ 55.595180] kthread+0xe6/0x110 [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 [ 55.603116] ret_from_fork+0x1f/0x30 [ 55.606698] </TASK> Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06Revert "ice: Hide bus-info in ethtool for PRs in switchdev mode"Marcin Szycik
This reverts commit bfaaba99e680bf82bf2cbf69866c3f37434ff766. Commit bfaaba99e680 ("ice: Hide bus-info in ethtool for PRs in switchdev mode") was a workaround for lshw tool displaying incorrect descriptions for port representors and PF in switchdev mode. Now the issue has been fixed in the lshw tool itself [1]. Removing the workaround can be considered a regression, as the user might be running older, unpatched lshw version. However, another important change (ice: link representors to PCI device, which improves port representor netdev naming with SET_NETDEV_DEV) also causes the same "regression" as removing the workaround, i.e. unpatched lshw is able to access bus-info information (this time not via ethtool) and the bug can occur. Therefore, the workaround no longer prevents the bug and can be removed. [1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1 Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-06ice: link representors to PCI deviceMichal Swiatkowski
Link port representors to parent PCI device to benefit from systemd defined naming scheme. Example from ip tool: - without linking: eth0 ... - with linking: eth0 ... altname enp24s0f0npf0vf0 The port representor name is being shown in altname, because the name is longer than IFNAMSIZ (16) limit. Altname can be used in ip tool. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: remove period on argument description in ice_for_each_vfJacob Keller
The ice_for_each_vf macros have comments describing the implementation. One of the arguments has a period on the end, which is not our typical style. Remove the unnecessary period. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: add a function comment for ice_cfg_mac_antispoofJacob Keller
This function definition was missing a comment describing its implementation. Add one. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: fix wording in comment for ice_reset_vfJacob Keller
The comment explaining ice_reset_vf has an extraneous "the" with the "if the resets are disabled". Remove it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: remove return value comment for ice_reset_all_vfsJacob Keller
Since commit fe99d1c06c16 ("ice: make ice_reset_all_vfs void"), the ice_reset_all_vfs function has not returned anything. The function comment still indicated it did. Fix this. While here, also add a line to clarify the function resets all VFs at once in response to hardware resets such as a PF reset. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: always check VF VSI pointer valuesJacob Keller
The ice_get_vf_vsi function can return NULL in some cases, such as if handling messages during a reset where the VSI is being removed and recreated. Several places throughout the driver do not bother to check whether this VSI pointer is valid. Static analysis tools maybe report issues because they detect paths where a potentially NULL pointer could be dereferenced. Fix this by checking the return value of ice_get_vf_vsi everywhere. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-05ice: add newline to dev_dbg in ice_vf_fdir_dump_infoJacob Keller
The debug print in ice_vf_fdir_dump_info does not end in newlines. This can look confusing when reading the kernel log, as the next print will immediately continue on the same line. Fix this by adding the forgotten newline. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>