summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-25ice: check for a leaf node presenceVictor Raj
Check for a leaf node presence for a given VSI. This check is required before removing a VSI since VSIs can't be removed with enabled queues (with leaf nodes) from the FW scheduler tree unless its a reset. Signed-off-by: Victor Raj <victor.raj@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: flush Tx pipe on disable queue timeoutVictor Raj
Set the flush Tx pipe flag instead of getting an EAGAIN error when FW times out in processing the disable Tx queue command. Signed-off-by: Victor Raj <victor.raj@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: clear VF ARQLEN register on resetMitch Williams
On older devices like X710 and X722, the VF's ARQLEN register is cleared on reset, so the VF driver uses that register to detect an unannounced reset. Unfortunately, on devices controlled by ice, this register is NOT cleared on reset. This causes the VF to miss resets, and even on properly-announced resets, the VF driver complains that it didn't see the reset. To fix this, we'll do it in software. When we handle a VF reset (whether triggered by software or VFLR), clear this register after the HW reset is complete. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: don't spam VFs with link messagesMitch Williams
Don't send a link message to the VFs unless link actually changes state. This avoids a small timing hole in some VF drivers that can cause an apparent TX hang if they receive a link status message at the wrong time. Although we have fixed the timing hole in the current VF driver, there are still lots of drivers in the field that have this timing hole. Let's not fall into it if we can avoid it. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: only use the VF for ICE_VSI_VF in ice_vsi_releaseBrett Creeley
In ice_vsi_release we are always assigning a value to the local VF variable. Change this to only be assigned if the VSI is a VF VSI. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: fix numeric overflow warningBruce Allan
When compiling and analyzing the driver on newer kernels, a static analyzer warns about the following "numeric overflow" issues: "The result of expression: 'budget-1' generates 4-byte type while casting to a bigger size of 8-byte". "The result of expression: '*words-words_read' generates 4-byte type while casting to a bigger size of 8-byte". Fix them both. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: fix issue where host reboots on unload when iommu=onBrett Creeley
Currently if the kernel has the intel_iommu=on parameter set, on some platforms removing the driver causes a system reboot. In initialization we associate the control queue interrupts with the pf->hw_oicr_idx and enable the interrupts by setting the CAUSE_ENA bit. The problem comes on teardown because we are not clearing the CAUSE_ENA bit for the control queues, but the vector at pf->hw_oicr_idx (miscellaneous interrupt vector) gets disabled. Fix this by clearing the CAUSE_ENA bit in the appropriate control queue registers on when freeing the miscellaneous interrupt vector. Also, move the call to ice_free_irq_msix_misc() to after ice_deinit_sw() in ice_remove() because ice_deinit_sw() makes an AQ call, but ice_free_irq_msix_misc() disables the miscellaneous vector and it's associated interrupts. Also, create two small helper functions to enable and disable the control queue interrupts respectively. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: fix ice_remove_rule_internal vsi_list handlingJacob Keller
When adding multiple VLANs to the same VSI, the ice_add_vlan code will share the VSI list, so as not to create multiple unnecessary VSI lists. Consider the following flow ice_add_vlan(hw, <VSI 0 VID 7, VSI 0 VID 8, VSI 0 VID 9>) Where we add three VLAN filters for VIDs 7, 8, and 9, all for VSI 0. The ice_add_vlan will create a single vsi_list and share it among all the filters. Later, if we try to remove a VLAN, ice_remove_vlan(hw, <VSI 0 VID 7>) Then the removal code will update the vsi_list and remove VSI 0 from it. But, since the vsi_list is shared, this breaks the list for the other users who reference it. We actually even free the VSI list memory, and may result in segmentation faults. This is due to the way that VLAN rule share VSI lists with reference counts, and is caused because we call ice_rem_update_vsi_list even when the ref_cnt is greater than one. To fix this, handle the case where ref_cnt is greater than one separately. In this case, we need to remove the associated rule without modifying the vsi_list, since it is currently being referenced by another rule. Instead, we just need to decrement the VSI list ref_cnt. The case for handling sharing of VSI lists with multiple VSIs is not currently supported by this code. No such rules will be created today, and this code will require changes if/when such code is added. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: fix stack hogs from struct ice_vsi_ctx structuresBruce Allan
struct ice_vsi_ctx has gotten large enough that function local declarations of it on the stack are causing stack hogs. Fix that by allocating the structs on heap. Cleanup some formatting issues in the code around these changes and fix incorrect data type uses of returned functions in a couple places. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: sizeof(<type>) should be avoidedBruce Allan
With sizeof(), it is preferable to use the variable of type <type> instead of sizeof(<type>). There are multiple places where a temporary variable is used to hold a 'size' value which is then used for a subsequent alloc/memset. Get rid of the temporary variable by calculating size as part of the alloc/memset statement. Also remove unnecessary type-cast. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: Fix added in VSI supported nodes calcVictor Raj
VSI supported nodes are calculated in order to add the VSI parent or intermediate nodes to the scheduler tree. If one of the node in below layers (from VSI layer) has space to add the new VSI or intermediate node above that layer then it's not required to continue the calculation further for below layers. Signed-off-by: Victor Raj <victor.raj@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: Fix the calculation of ICE_MAX_MTUMaciej Fijalkowski
Currently ICE_MAX_MTU subtracts only ETH_HLEN from max frame size and adds ETH_FCS_LEN and VLAN_HLEN, which is not what was intended. The ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN expression should be surrounded with parentheses. Wrap mentioned expression and take into account VLAN double tagging. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-25ice: Mark extack argument as __always_unusedBruce Allan
Commit 87b0984ebfab ("net: Add extack argument to ndo_fdb_add()") in net-next added an extended parameter to the .ndo_fdb_add op and changed ice_fdb_add() accordingly. Update the function header and add the __always_unused attribute to the new parameter to avoid -Wunused-parameter warnings. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-02-24net: fix double-free in bpf_lwt_xmit_reroutePeter Oskolkov
dst_output() frees skb when it fails (see, for example, ip_finish_output2), so it must not be freed in this case. Fixes: 3bd0b15281af ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c") Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ip_tunnel: Add ip tunnel tun_info type dst_cache in ip_tunnel_xmitwenxu
ip l add dev tun type gretap key 1000 Non-tunnel-dst ip tunnel device can send packet through lwtunnel This patch provide the tun_inf dst cache support for this mode. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge branch 'dsa-mv88e6xxx-lockdep'David S. Miller
Andrew Lunn says: ==================== mv88e6xxx: Avoid false positive Lockdep splats When acquiring the GPIO interrupt line for the switch, it is possible to trigger lockdep splats. These are false positives, the mutex is in a different IRQ descriptor. But fix it anyway, since it could mask real locking issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: dsa: mv88e6xxx: Release lock while requesting IRQAndrew Lunn
There is no need to hold the register lock while requesting the GPIO interrupt. By not holding it we can also avoid a false positive lockdep splat. Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: dsa: mv88e6xxx: Add lockdep classes to fix false positive splatAndrew Lunn
The following false positive lockdep splat has been observed. ====================================================== WARNING: possible circular locking dependency detected 4.20.0+ #302 Not tainted ------------------------------------------------------ systemd-udevd/160 is trying to acquire lock: edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704 but task is already holding lock: edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&desc->request_mutex){+.+.}: mutex_lock_nested+0x1c/0x24 __setup_irq+0xa0/0x704 request_threaded_irq+0xd0/0x150 mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx] mdio_probe+0x2c/0x54 really_probe+0x200/0x2c4 driver_probe_device+0x5c/0x174 __driver_attach+0xd8/0xdc bus_for_each_dev+0x58/0x7c bus_add_driver+0xe4/0x1f0 driver_register+0x7c/0x110 mdio_driver_register+0x24/0x58 do_one_initcall+0x74/0x2e8 do_init_module+0x60/0x1d0 load_module+0x1968/0x1ff4 sys_finit_module+0x8c/0x98 ret_fast_syscall+0x0/0x28 0xbedf2ae8 -> #0 (&chip->reg_lock){+.+.}: __mutex_lock+0x50/0x8b8 mutex_lock_nested+0x1c/0x24 __setup_irq+0x640/0x704 request_threaded_irq+0xd0/0x150 mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx] mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx] mdio_probe+0x2c/0x54 really_probe+0x200/0x2c4 driver_probe_device+0x5c/0x174 __driver_attach+0xd8/0xdc bus_for_each_dev+0x58/0x7c bus_add_driver+0xe4/0x1f0 driver_register+0x7c/0x110 mdio_driver_register+0x24/0x58 do_one_initcall+0x74/0x2e8 do_init_module+0x60/0x1d0 load_module+0x1968/0x1ff4 sys_finit_module+0x8c/0x98 ret_fast_syscall+0x0/0x28 0xbedf2ae8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&desc->request_mutex); lock(&chip->reg_lock); lock(&desc->request_mutex); lock(&chip->reg_lock); &desc->request_mutex refer to two different mutex. #1 is the GPIO for the chip interrupt. #2 is the chained interrupt between global 1 and global 2. Add lockdep classes to the GPIO interrupt to avoid this. Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ip_tunnel: Add dst_cache support in lwtunnel_state of ip tunnelwenxu
The lwtunnel_state is not init the dst_cache Which make the ip_md_tunnel_xmit can't use the dst_cache. It will lookup route table every packets. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24tls: Return type of non-data records retrieved using MSG_PEEK in recvmsgVakul Garg
The patch enables returning 'type' in msghdr for records that are retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked from socket from getting clubbed with any other record of different type when records are subsequently dequeued from strparser. For each record, we now retain its type in sk_buff's control buffer cb[]. Inside control buffer, record's full length and offset are already stored by strparser in 'struct strp_msg'. We store record type after 'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is stored just after record dequeue. For tls1.3, the type is stored after record has been decrypted. Inside process_rx_list(), before processing a non-data record, we check that we must be able to return back the record type to the user application. If not, the decrypted records in tls context's rx_list is left there without consuming any data. Fixes: 692d7b5d1f912 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge branch 'ipv4-v6-icmp-small-cleanup-and-update'David S. Miller
Kefeng Wang says: ==================== ipv4/v6: icmp: small cleanup and update v2: - Add cover letter and user proper patch subject-prefix suggested-by Eric Dumazet This patch series contains some small cleanup and update, 1) use icmp/v6_sk_exit when icmp_sk_init fails instead of open-code 2) use new percpu allocation interface for the ipv6.icmp_sk ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ipv6: icmp: use percpu allocationKefeng Wang
Use percpu allocation for the ipv6.icmp_sk. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ipv6: icmp: use icmpv6_sk_exit()Kefeng Wang
Simply use icmpv6_sk_exit() when inet_ctl_sock_create() fail in icmpv6_sk_init(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ipv4: icmp: use icmp_sk_exit()Kefeng Wang
Simply use icmp_sk_exit() when inet_ctl_sock_create() fail in icmp_sk_init(). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24ila: Fix uninitialised return value in ila_xlat_nl_cmd_flushHerbert Xu
This patch fixes an uninitialised return value error in ila_xlat_nl_cmd_flush. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6c4128f65857 ("rhashtable: Remove obsolete...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/sched: act_tunnel_key: Add dst_cache supportwenxu
The metadata_dst is not init the dst_cache which make the ip_md_tunnel_xmit can't use the dst_cache. It will lookup route table every packets. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge branch 'code-optimizations-and-bugfixes-for-HNS3-driver'David S. Miller
Huazhong Tan says: ==================== code optimizations & bugfixes for HNS3 driver This patchset includes bugfixes and code optimizations for the HNS3 ethernet controller driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: fix improper error handling for hns3_client_startHuazhong Tan
If hns3_client_start() failed in the hns3_client_init(), register_dev() should be undo in its error handling. Fixes: a6d818e31d08 ("net: hns3: Add vport alive state checking support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: fix setting of the hns reset_type for rdma hw errorsShiju Jose
Presently the hns reset_type for the roce errors is set in the hclge_log_and_clear_rocee_ras_error function. This function is also called to detect and clear roce errors while enabling the rdma error interrupts. However there is no hns reset requested for this case. This can cause issue of wrong reset_type used with subsequent hns reset as the reset_type set in the above case was not cleared. This patch moves setting of hns reset_type for the roce errors from hclge_log_and_clear_rocee_ras_error function to hclge_handle_rocee_ras_error. Fixes: 630ba007f475 ("net: hns3: add handling of RDMA RAS errors") Reported-by: Huazhong Tan <tanhuazhong@huawei.com> Reported-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: fix get VF RSS issueJian Shen
For revision 0x20, VF shares the same RSS config with PF. In original codes, it always return 0 when query RSS hash key for VF. This patch fixes it by return the hash key got from PF. Fixes: 374ad291762a ("net: hns3: net: hns3: Add RSS general configuration support for VF") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: enable VF VLAN filter for each VF when initializingJian Shen
For revision 0x21, the switch of VF VLAN filter is per function. It's necessary to enable VF VLAN filter for each VF when initializing. Otherwise, VF will be able to receive broadcast packets with unknown VLAN when PF enters promisc mode. Fixes: 64d114f0a750 ("net: hns3: Add egress/ingress vlan filter for revision 0x21") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: add support to config depth for tx|rx ring separatelyPeng Li
This patch adds support to config depth for tx|rx ring separately by ethtool command "-G". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: remove hnae3_get_bit in data pathYunsheng Lin
The hnae3_get_bit uses hnae3_get_field, and hnae3_get_field masks the data, which is unnecessary in data path. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: replace hnae3_set_bit and hnae3_set_field in data pathYunsheng Lin
hnae3_set_bit and hnae3_set_field masks the data before setting the field or bit, which is unnecessary because the data is already zero initialized. Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: add unlikely for error handling in data pathYunsheng Lin
This patch adds unlikely hint for error handling in critical data path. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: remove some ops in struct hns3_nic_opsYunsheng Lin
The fill_desc ops has only one implementation, and get_rxd_bnum has not been used, so this patch removes them. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: limit some variable scope in critical data pathYunsheng Lin
This patch limits some variables' scope as much as possible in hns3_fill_desc. Also, only set l3_type and l4_type when necessary. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: avoid mult + div op in critical data pathYunsheng Lin
This patch uses shift offset to avoid doing mult and div operation. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net: hns3: add xps setting support for hns3 driverYunsheng Lin
This patch adds xps setting support for hns3 driver based on the interrupt affinity info. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge branch 'mlxsw-spectrum_acl-Don-t-take-rtnl-mutex-for-region-rehash'David S. Miller
Ido Schimmel says: ==================== mlxsw: spectrum_acl: Don't take rtnl mutex for region rehash Jiri says: During region rehash, a new region is created with a more optimized set of masks (ERPs). When transitioning to the new region, all the rules from the old region are copied one-by-one to the new region. This transition can be time consuming and currently done under RTNL lock. In order to remove RTNL lock dependency during region rehash, introduce multiple smaller locks guarding dedicated structures or parts of them. That is the vast majority of this patchset. Only patch #1 is simple cleanup and patches 12-15 are improving or introducing new selftests. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24selftests: mlxsw: spectrum-2: Add massive delta rehash testJiri Pirko
Do insertions and removal of filters during rehash in higher volumes. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24selftests: mlxsw: spectrum-2: Check migrate end traceJiri Pirko
Add checking of newly added trace. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Add vregion migration end tracepointJiri Pirko
Hit the new tracepoint once the vregion migration ends. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24selftests: mlxsw: spectrum-2: Add IPv6 variant of simple delta rehash testJiri Pirko
Track the basic codepaths of delta rehash handling, using mlxsw tracepoints. Use IPv6 addresses. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Don't take mutex in mlxsw_sp_acl_tcam_vregion_rehash_work()Jiri Pirko
Other mutexes are taking care of proper locking for this, no longer needed to take RTNL mutex here. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Remove RTNL lock assertions from ERP codeJiri Pirko
No longer require RTNL lock in this code. Newly introduced mutexes take care of guarding objagg and bloom filter. There is no need to guard gen_pool_alloc()/gen_pool_free() as they are fine to be called lockless. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Don't take rtnl lock during vregion_rehash_intrvl_set()Jiri Pirko
Relax dependency on rtnl mutex during vregion_rehash_intrvl_set(). The vregion list is protected with newly introduced mutex. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Introduce a mutex to guard objagg instance manipulationJiri Pirko
Protect objagg structures by adding a mutex to ERP code and take it during the structure manipulation. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Enable vregion rehash per-profileJiri Pirko
For MR ACL profile is does not make sense to do periodical rehashes, as there is only one mask in use during the whole vregion lifetime. Therefore periodical work is scheduled but the rehash never happens. So allow to enable/disable rehash for the whole group, which is added per-profile. Disable rehashing for MR profile. Addition to the vregion list is done only in case the rehash is enable on the particular vregion. Also, the addition is moved after delayed work init to avoid schedule of uninitialized work from vregion_rehash_intrvl_set(). Symmetrically, deletion from the list is done before canceling the delayed work so it is not scheduled by vregion_rehash_intrvl_set() again. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24mlxsw: spectrum_acl: Introduce mutex to guard Bloom Filter updatesJiri Pirko
Bloom filter is shared within multiple regions. For updates, it needs to be guarded by a separate mutex. Do that in order to not rely on RTNL mutex. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>