summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/mellanox
AgeCommit message (Collapse)Author
2017-02-03mlxsw: reg: Add Policy-Engine TCAM Allocation RegisterJiri Pirko
The PTAR register is used for allocation of regions in the TCAM. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03mlxsw: reg: Add Policy-Engine ACL Group Table registerJiri Pirko
The PAGT register is used for configuration of the ACL Group Table. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03mlxsw: reg: Add Policy-Engine ACL RegisterJiri Pirko
The PACL register is used for configuration of the ACL. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03mlxsw: item: Add helpers for getting pointer into payload for char buffer itemJiri Pirko
Sometimes it is handy to get a pointer to a char buffer item and use it direcly to write/read data. So add these helpers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03mlxsw: item: Add 8bit item helpersJiri Pirko
Item heplers for 8bit values are needed, let's add them. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G'Martin KaFai Lau
After calling mlx4_en_try_alloc_resources (e.g. by changing the number of rx-queues with ethtool -L), the existing xdp_prog becomes inactive. The bug is that the xdp_prog ptr has not been carried over from the old rx-queues to the new rx-queues Fixes: 47a38e155037 ("net/mlx4_en: add support for fast rx drop bpf program") Cc: Brenden Blanco <bblanco@plumgrid.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02mlx4: Fix memory leak after mlx4_en_update_priv()Martin KaFai Lau
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t] are over-written by src->tx_ring[t] and src->tx_cq[t] without first calling kfree. One of the reproducible code paths is by doing 'ethtool -L'. The fix is to do the kfree in mlx4_en_free_resources(). Here is the kmemleak report: unreferenced object 0xffff880841211800 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff880841213000 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff (gdb) list *mlx4_en_try_alloc_resources+0x118 0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145). 2140 if (!dst->tx_ring_num[t]) 2141 continue; 2142 2143 dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) * 2144 MAX_TX_RINGS, GFP_KERNEL); 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) * 2149 MAX_TX_RINGS, GFP_KERNEL); (gdb) list *mlx4_en_try_alloc_resources+0x13b 0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150). 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) * 2149 MAX_TX_RINGS, GFP_KERNEL); 2150 if (!dst->tx_cq[t]) { 2151 kfree(dst->tx_ring[t]); 2152 goto err_free_tx; 2153 } 2154 } Fixes: ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems") Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
All merge conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_core: Avoid command timeouts during VF driver device shutdownJack Morgenstein
Some Hypervisors detach VFs from VMs by instantly causing an FLR event to be generated for a VF. In the mlx4 case, this will cause that VF's comm channel to be disabled before the VM has an opportunity to invoke the VF device's "shutdown" method. The result is that the VF driver on the VM will experience a command timeout during the shutdown process when the Hypervisor does not deliver a command-completion event to the VM. To avoid FW command timeouts on the VM when the driver's shutdown method is invoked, we detect the absence of the VF's comm channel at the very start of the shutdown process. If the comm-channel has already been disabled, we cause all FW commands during the device shutdown process to immediately return success (and thus avoid all command timeouts). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flowShaker Daibes
Make sure pptx/pprx mask flag is set using new fields upon set port request. In addition, move this code into a helper function for better code readability. Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_en: Check the enabling mtu flag in SET_PORT wrapper flowShaker Daibes
Make sure MTU mask flag is set using new field upon set port request. In addition, move this code into a helper function for better code readability. Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_en: Pass user MTU value to Firmware at set port commandShaker Daibes
When starting the port, driver will inform Firmware about the actual MTU which does not include implicit headers, such as FCS or VLAN tags. Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_en: Adding support of turning off link autonegotiation via ethtoolAriel Levkovich
This feature will allow the user to disable auto negotiation on the port for mlx4 devices while setting the speed is limited to 1GbE speeds. Other speeds will not be accepted in autoneg off mode. This functionality is permitted providing that the firmware is compatible with this feature. The above is determined by querying a new dedicated capability bit in the device. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_core: Get num_tc using netdev_get_num_tcAlaa Hleihel
Avoid reading num_tc directly from struct net_device, but use the helper function netdev_get_num_tc. Fixes: bc6a4744b827 ("net/mlx4_en: num cores tx rings for every UP") Fixes: f5b6345ba8da ("net/mlx4_en: User prio mapping gets corrupted when changing number of channels") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_core: Add resource alloc/dealloc debuggingMatan Barak
In order to aid debugging of functions that take a resource but don't put it, add the last function name that successfully grabbed this resource. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4_core: Device revision supportYishai Hadas
The device revision field returned by the NodeInfo MAD is incorrect on ConnectX3 devices. This patch is driver side handling to complete a FW fix added at 2.11.1172. INIT_HCA - bit at offset 0x0C.12 is set to 1 so that FW will report correct device revision. Older FW versions won't be affected from turning on that bit, no capability bit is needed. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net/mlx4: Replace ENOSYS with better fitting error codesTariq Toukan
Conform the following warning: WARNING: ENOSYS means 'invalid syscall nr' and nothing else. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net/mlx5e: Check ets capability before ets query FW commandMoshe Shemesh
On dcbnl callback getpgtccfgtx, the driver should check the ets capability before ets query command is sent to firmware. It is valid to return from this void function without changing in/out parameters, as these parameters are initialized to DCB_ATTR_VALUE_UNDEFINED. Fixes: 3a6a931dfb8e ("net/mlx5e: Support DCBX CEE API") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5e: Fix update of hash function/key via ethtoolGal Pressman
Modifying TIR hash should change selected fields bitmask in addition to the function and key. Formerly, Only on ethool mlx5e_set_rxfh "ethtoo -X" we would not set this field resulting in zeroing of its value, which means no packet fields are used for RX RSS hash calculation thus causing all traffic to arrive in RQ[0]. On driver load out of the box we don't have this issue, since the TIR hash is fully created from scratch. Tested: ethtool -X ethX hkey <new key> ethtool -X ethX hfunc <new func> ethtool -X ethX equal <new indirection table> All cases are verified with TCP Multi-Stream traffic over IPv4 & IPv6. Fixes: bdfc028de1b3 ("net/mlx5e: Fix ethtool RX hash func configuration change") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5e: Modify TIRs hash only when it's neededGal Pressman
We don't need to modify our TIRs unless the user requested a change in the hash function/key, for example when changing indirection only. Tested: # Modify TIRs hash is needed ethtool -X ethX hkey <new key> ethtool -X ethX hfunc <new func> # Modify TIRs hash is not needed ethtool -X ethX equal <new indirection table> All cases are verified with TCP Multi-Stream traffic over IPv4 & IPv6. Fixes: bdfc028de1b3 ("net/mlx5e: Fix ethtool RX hash func configuration change") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5e: Support TC encapsulation offloads with upper devicesHadar Hen Zion
When tunneling is used, some virtualizations systems set the (mlx5e) uplink device to be stacked under upper devices such as bridge or ovs internal port, where the VTEP IP address used for the encapsulation is set on that upper device. In order to support such use-cases, we also deal with a setup where the egress mirred device isn't representing a port on the HW e-switch to where the ingress device belongs. We use eswitch service function which returns the uplink and set it as the egress device of the tc encap rule. Fixes: a54e20b4fcae ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5: E-Switch, Re-enable RoCE on mode change only after FDB destroyOr Gerlitz
We must re-enable RoCE on the e-switch management port (PF) only after destroying the FDB in its switchdev/offloaded mode. Otherwise, when encapsulation is supported, this re-enablement will fail. Also, it's more natural and symmetric to disable RoCE on the PF before we create the FDB under switchdev mode, so do that as well and revert if getting into error during the mode change later. Fixes: 9da34cd34e85 ('net/mlx5: Disable RoCE on the e-switch management [..]') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5: E-Switch, Err when retrieving steering name-space failsOr Gerlitz
Make sure to return error when we failed retrieving the FDB steering name space. Also, while around, correctly print the error when mode change revert fails in the warning message. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5: Return EOPNOTSUPP when failing to get steering name-spaceOr Gerlitz
When we fail to retrieve a hardware steering name-space, the returned error code should say that this operation is not supported. Align the various places in the driver where this call is made to this convention. Also, make sure to warn when we fail to retrieve a SW (ANCHOR) name-space. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-29net/mlx5: Change ENOTSUPP to EOPNOTSUPPOr Gerlitz
As ENOTSUPP is specific to NFS, change the return error value to EOPNOTSUPP in various places in the mlx5 driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Suggested-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25bpf: add initial bpf tracepointsDaniel Borkmann
This work adds a number of tracepoints to paths that are either considered slow-path or exception-like states, where monitoring or inspecting them would be desirable. For bpf(2) syscall, tracepoints have been placed for main commands when they succeed. In XDP case, tracepoint is for exceptions, that is, f.e. on abnormal BPF program exit such as unknown or XDP_ABORTED return code, or when error occurs during XDP_TX action and the packet could not be forwarded. Both have been split into separate event headers, and can be further extended. Worst case, if they unexpectedly should get into our way in future, they can also removed [1]. Of course, these tracepoints (like any other) can be analyzed by eBPF itself, etc. Example output: # ./perf record -a -e bpf:* sleep 10 # ./perf script sock_example 6197 [005] 283.980322: bpf:bpf_map_create: map type=ARRAY ufd=4 key=4 val=8 max=256 flags=0 sock_example 6197 [005] 283.980721: bpf:bpf_prog_load: prog=a5ea8fa30ea6849c type=SOCKET_FILTER ufd=5 sock_example 6197 [005] 283.988423: bpf:bpf_prog_get_type: prog=a5ea8fa30ea6849c type=SOCKET_FILTER sock_example 6197 [005] 283.988443: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[06 00 00 00] val=[00 00 00 00 00 00 00 00] [...] sock_example 6197 [005] 288.990868: bpf:bpf_map_lookup_elem: map type=ARRAY ufd=4 key=[01 00 00 00] val=[14 00 00 00 00 00 00 00] swapper 0 [005] 289.338243: bpf:bpf_prog_put_rcu: prog=a5ea8fa30ea6849c type=SOCKET_FILTER [1] https://lwn.net/Articles/705270/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge tag 'mlx5-updates-2017-01-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-24-01 The first seven patches from Or Gerlitz in this series further enhances the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the TC tunnel key set (encap) and unset (decap) actions. Or Gerlitz says: ======================== As part of doing this change, few cleanups are done in the IPv4 code, later we move to use the full tunnel key info provided to the driver as the key for our internal hashing which is used to identify cases where the same tunnel is used for encapsulating multiple flows. As done in the IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh lookups and construction of the IPv6 tunnel headers on the encap path and matching on the outer hears in the decap path. The last patch of the series enlarges the HW FDB size for the switchdev mode, so it has now room to contain offloaded flows as many as min(max number of HW flow counters supported, max HW table size supported). ======================== Next to Or's series you can find several patches handling several topics. From Mohamad, add support for SRIOV VF min rate guarantee by using the TSAR BW share weights mechanism. From Or, Two patches to enable Eth VFs to query their min-inline value for user-space. for that we move a mlx5 low level min inline helper function from mlx5 ethernet driver into the core driver and then use it in mlx5_ib to expose the inline mode to rdma applications through libmlx5. From Kamal Heib, Reduce memory consumption on kdump kernel. From Shaker Daibes, code reuse in CQE compression control logic ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24net/mlx5e: CQE compression control code reuseShaker Daibes
This patch is intended for code reuse of mlx5e_modify_rx_cqe_compression function. Signed-off-by: Shaker Daibes <shakerd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5e: Reduce memory consumption on kdump kernelKamal Heib
Reduce memory consumption on kdump kernel by decreasing the number of channels to 1 and the size of RQs and SQs to the minimal values. Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5: Push min-inline mode resolution helper into the coreOr Gerlitz
So we can use that from the IB driver too in downstream patches. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5: Add support for setting VF min rateMohamad Haj Yahia
Add support for SRIOV VF min rate guarantee by using the TSAR BW share weights mechanism. The TSAR BW share vport attribute represents the weight of that vport among the other vports weights which means that the actual vport BW percentage is the same vport weight percentage among the total vports weights sum. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5: E-Switch, Enlarge the FDB size for the switchdev modeOr Gerlitz
The E-Switch FDB size was hard coded to 8k. Change it to be min(max eswitch table size, max flow counters * num flow groups) where the max values are read from the firmware and the number of flow groups is hard-coded as before this change. We don't know upfront the division of flows to group. This setup allows each group to be of size up to the where we want to support (we mandate pairing of flows with counters for offloading). Thus, we don't expect multiple occurences for a group which in turn adds steering hops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Tested-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnelsOr Gerlitz
Add the missing parts for offloading IPv6 tunnels. This includes route and neigh lookups and construnction of the IPv6 tunnel headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5e: Maximize ip tunnel key usage on the TC offloading pathOr Gerlitz
Use more fields out of the tunnel key (e.g the tunnel source IP address) provided by upper layers for the route lookup done on the encap offload path. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5e: Use the full tunnel key info for encapsulation offload house-keepingOr Gerlitz
Currently we use subset of the input tunnel key fields (id, ip daddr, dst port) which are provided by upper layers to indentify flows that should go through the same encapsulation and maintain the HW encapsulation table. This is redundant and can get us wrong. Instead, keep a copy of the ip tunnel info provided by the user through TC and have the tunnel key part as the key to our internal hash. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5e: TC ipv4 tunnel encap offload cosmetic changesOr Gerlitz
Move around some settings of variables as pre-step to make things more robust and clear for the ipv6 case in down-stream patch. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5e: Add TC offloads matching on IPv6 encapsulation headersOr Gerlitz
Enhance the parsing of offloaded TC rules to set HW matching on outer IPv6 encapsulation headers. This effectively adds support for TC tunnel key release action (decapsulation) of SRIOV offloads over IPv6 tunnels. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24net/mlx5: Use exact encap header size for the FW input bufferOr Gerlitz
The current code is allocating the max encap size supported by the firmware and not the size requested by the caller, fix that. Also, spare a warning when the size of the encapsulation headers is bigger from what is supported by the firmware. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-24mlxsw: spectrum: Add packet sample offloading supportYotam Gigi
Using the MPSC register, add the functions that configure port-based packet sampling in hardware and the necessary datatypes in the mlxsw_sp_port struct. In addition, add the necessary trap for sampled packets and integrate with matchall offloading to allow offloading of the sample tc action. The current offload support is for the tc command: tc filter add dev <DEV> parent ffff: \ matchall skip_sw \ action sample rate <RATE> group <GROUP> [trunc <SIZE>] Where only ingress qdiscs are supported, and only a combination of matchall classifier and sample action will lead to activating hardware packet sampling. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24mlxsw: reg: add the Monitoring Packet Sampling Configuration RegisterYotam Gigi
The MPSC register allows to configure ingress packet sampling on specific port of the mlxsw device. The sampled packets are then trapped via PKT_SAMPLE trap. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-24mlxsw: spectrum_router: Correctly reallocate adjacency entriesIdo Schimmel
mlxsw_sp_nexthop_group_mac_update() is called in one of two cases: 1) When the MAC of a nexthop needs to be updated 2) When the size of a nexthop group has changed In the second case the adjacency entries for the nexthop group need to be reallocated from the adjacency table. In this case we must write to the entries the MAC addresses of all the nexthops that should be offloaded and not only those whose MAC changed. Otherwise, these entries would be filled with garbage data, resulting in packet loss. Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-22net/mlx4: use rb_entry()Geliang Tang
To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20Merge tag 'mlx5-updates-2017-01-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 and mlx5e updates 2017-01-19 This series includes some updates for mlx5 core and mlx5e netdevice driver. From Leon, a small fix that remove an unnecessary print. From Eli Cohen, a fix to the FW version printout in case of internal error. From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per second) mlx5 infrastructure support and the 2nd adds the necessary bits for mlx5e ptp logic and structures. From Mohamad, add support for s-tagged packet receive when in promiscuous mode. Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM (Ports capabilities mask register) registers infrastructure, those registers are needed in order to query the different statistics registers support in FW, in order for the driver to enable/disable query and reporting them back to user. On top of this infrastructure we've exposed new set of statistics groups: - MPCNT: Physical layer statistical counters (For symbol errors) - PPCNT: PCIe performance counters In addition to the statistics capabilities series we've moved the mlx5 HCA capabilities fields to a dedicated struct under the driver private data. At the end a small patch to update & query statistics in the most desired order. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20net/mlx5e: Do not recycle pages from emergency reserveEric Dumazet
A driver using dev_alloc_page() must not reuse a page allocated from emergency memory reserve. Otherwise all packets using this page will be immediately dropped, unless for very specific sockets having SOCK_MEMALLOC bit set. This issue might be hard to debug, because only a fraction of received packets would be dropped. Fixes: 4415a0319f92 ("net/mlx5e: Implement RX mapped page cache for page recycle") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19mlx4: support __GFP_MEMALLOC for rxEric Dumazet
Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC") added code that appears to be not needed at that time, since mlx4 never used __GFP_MEMALLOC allocations anyway. As using memory reserves is a must in some situations (swap over NFS or iSCSI), this patch adds this flag. Note that this driver does not reuse pages (yet) so we do not have to add anything else. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-19net/mlx5e: Reorder update statsSaeed Mahameed
Reorder update stats flow to update most important counters last, to get more accurate results. New update order: - PCIe counters - Port counters - Vport counters - Queue counters - Software counters Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2017-01-19net/mlx5: Move cached hca caps to designated caps structGal Pressman
The caps structure consists of hca caps and port/management caps, all under one roof. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19net/mlx5e: Expose PCIe statistics to ethtoolGal Pressman
This patch exposes PCIe performance counters, queried with ethtool -S <devname>. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-01-19net/mlx5e: Expose physical layer statistical counters to ethtoolGal Pressman
Use ethtool -S to query physical layer statistical counters including: - rx_symbol_errors_phy: Number of symbol errors that were not corrected by FEC correction algorithm or that FEC was not active on this interface. - rx_corrected_bits_phy: Number of corrected bits according to active FEC (RS/FC). Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>