summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-24bpfilter: fix up a sparse annotationChristoph Hellwig
The __user doesn't make sense when casting to an integer type, just switch to a uintptr_t cast which also removes the need for the __force. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24Merge branch 'TC-datapath-hash-api'David S. Miller
Ariel Levkovich says: ==================== TC datapath hash api Hash based packet classification allows user to set up rules that provide load balancing of traffic across multiple vports and for ECMP path selection while keeping the number of rule at minimum. Instead of matching on exact flow spec, which requires a rule per flow, user can define rules based on a their hash value and distribute the flows to different buckets. The number of rules in this case will be constant and equal to the number of buckets. The series introduces an extention to the cls flower classifier and allows user to add rules that match on the hash value that is stored in skb->hash while assuming the value was set prior to the classification. Setting the skb->hash can be done in various ways and is not defined in this series - for example: 1. By the device driver upon processing an rx packet. 2. Using tc action bpf with a program which computes and sets the skb->hash value. $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 2 proto ip \ flower hash 0x0/0xf \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_0 ingress \ prio 1 chain 2 proto ip \ flower hash 0x1/0xf \ action mirred egress redirect dev ens1f0_2 v3 -> v4: *Drop hash setting code leaving only the classidication parts. Setting the hash will be possible via existing tc action bpf. v2 -> v3: *Split hash algorithm option into 2 different actions. Asym_l4 available via act_skbedit and bpf via new act_hash. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net/sched: cls_flower: Add hash info to flow classificationAriel Levkovich
Adding new cls flower keys for hash value and hash mask and dissect the hash info from the skb into the flow key towards flow classication. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net/flow_dissector: add packet hash dissectionAriel Levkovich
Retreive a hash value from the SKB and store it in the dissector key for future matching. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: hyperv: dump TX indirection table to ethtool regsChi Song
An imbalanced TX indirection table causes netvsc to have low performance. This table is created and managed during runtime. To help better diagnose performance issues caused by imbalanced tables, it needs make TX indirection tables visible. Because TX indirection table is driver specified information, so display it via ethtool register dump. Signed-off-by: Chi Song <chisong@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23vrf: Handle CONFIG_SYSCTL not setDavid Ahern
Randy reported compile failure when CONFIG_SYSCTL is not set/enabled: ERROR: modpost: "sysctl_vals" [drivers/net/vrf.ko] undefined! Fix by splitting out the sysctl init and cleanup into helpers that can be set to do nothing when CONFIG_SYSCTL is disabled. In addition, move vrf_strict_mode and vrf_strict_mode_change to above vrf_shared_table_handler (code move only) and wrap all of it in the ifdef CONFIG_SYSCTL. Update the strict mode tests to check for the existence of the /proc/sys entry. Fixes: 33306f1aaf82 ("vrf: add sysctl parameter for strict mode") Cc: Andrea Mayer <andrea.mayer@uniroma2.it> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David Ahern <dsahern@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23ice: add 1G SGMII PHY typePaul M Stillwell Jr
There isn't a case for 1G SGMII in ice_get_media_type() so add the handling for it. Also handle the special case where some direct attach cables may report that they support 1G SGMII, but that is erroneous since SGMII is supposed to be a backplane media type (between a MAC and a PHY). If the driver doesn't handle this special case then a user could see the 'Port' in ethtool change from 'Direct attach Copper' to 'Backplane' when they have forced the speed to 1G, but the cable hasn't changed. Lastly, change ice_aq_get_phy_caps() to save the module_type info if the function was called with ICE_AQC_REPORT_TOPO_CAP. This call uses the media information to populate the module_type. If no media is present then the values in module_type will be 0. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: Report AOC PHY Types as FiberDoug Dziggel
Report AOC types as fiber instead of unknown. Signed-off-by: Doug Dziggel <douglas.a.dziggel@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: add AQC get link topology handle supportPaul Greenwalt
Add AQC get link topology handle support. This is needed to determine Direct Attach (DA) or backplane media type for PHY types that support either. Get link topology handle cage node type request can be used to determine if a cage is present or not. If a cage is present for PHY types that supports both DA and backplane media type, then the media type is DA, else the media type is backplane. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: Rename low_power_ctrlLev Faerman
Rename the low_power_ctrl field to low_power_ctrl_an to be properly descriptive of it being an autoneg field. Signed-off-by: Lev Faerman <lev.faerman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: update reporting of autoneg capabilitiesPaul Greenwalt
Firmware now reports AN28, AN32, and AN73. Add a helper and check these new values and report PHY autoneg capability. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: add ice_aq_get_phy_caps() debug logsPaul Greenwalt
Add debug logs for ice_aq_get_phy_caps(), and format ice_aq_set_phy_cfg() and ice_aq_get_link_info() debug logs to make them more readable. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: support Total Port Shutdown on devices that support itBruce Allan
When the Port Disable bit is set in the Link Default Override Mask TLV PFA module in the NVM, Total Port Shutdown mode is supported and enabled. In this mode, the driver should act as if the link-down-on-close ethtool private flag is always enabled and dis-allow any change to that flag. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: add link lenient and default override supportPaul Greenwalt
Adds functions to check for link override firmware support and get the override settings for a port. The previously supported/default link mode was strict mode. In strict mode link is configured based on get PHY capabilities PHY types with media. Lenient mode is now the default link mode. In lenient mode the link is configured based on get PHY capabilities PHY types without media. This allows the user to configure link that the media does not report. Limit the minimum supported link mode to 25G for devices that support 100G, and 1G for devices that support less than 100G. Default override is only supported in lenient mode. If default override is supported and enabled, then default override values are used for configuring speed and FEC. Default override provide persistent link settings in the NVM. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Evan Swanson <evan.swanson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: restore PHY settings on media insertionPaul Greenwalt
After the transition from no media to media FW will clear the set-phy-cfg data set by the user. Save initial PHY settings and any settings later requested by the user and use that data to restore PHY settings on media insertion. Since PHY configuration is now being stored, replace calls that were calling FW to get the configuration with the saved copy. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23net: dsa: stop overriding master's ndo_get_phys_port_nameVladimir Oltean
The purpose of this override is to give the user an indication of what the number of the CPU port is (in DSA, the CPU port is a hardware implementation detail and not a network interface capable of traffic). However, it has always failed (by design) at providing this information to the user in a reliable fashion. Prior to commit 3369afba1e46 ("net: Call into DSA netdevice_ops wrappers"), the behavior was to only override this callback if it was not provided by the DSA master. That was its first failure: if the DSA master itself was a DSA port or a switchdev, then the user would not see the number of the CPU port in /sys/class/net/eth0/phys_port_name, but the number of the DSA master port within its respective physical switch. But that was actually ok in a way. The commit mentioned above changed that behavior, and now overrides the master's ndo_get_phys_port_name unconditionally. That comes with problems of its own, which are worse in a way. The idea is that it's typical for switchdev users to have udev rules for consistent interface naming. These are based, among other things, on the phys_port_name attribute. If we let the DSA switch at the bottom to start randomly overriding ndo_get_phys_port_name with its own CPU port, we basically lose any predictability in interface naming, or even uniqueness, for that matter. So, there are reasons to let DSA override the master's callback (to provide a consistent interface, a number which has a clear meaning and must not be interpreted according to context), and there are reasons to not let DSA override it (it breaks udev matching for the DSA master). But, there is an alternative method for users to retrieve the number of the CPU port of each DSA switch in the system: $ devlink port pci/0000:00:00.5/0: type eth netdev swp0 flavour physical port 0 pci/0000:00:00.5/2: type eth netdev swp2 flavour physical port 2 pci/0000:00:00.5/4: type notset flavour cpu port 4 spi/spi2.0/0: type eth netdev sw0p0 flavour physical port 0 spi/spi2.0/1: type eth netdev sw0p1 flavour physical port 1 spi/spi2.0/2: type eth netdev sw0p2 flavour physical port 2 spi/spi2.0/4: type notset flavour cpu port 4 spi/spi2.1/0: type eth netdev sw1p0 flavour physical port 0 spi/spi2.1/1: type eth netdev sw1p1 flavour physical port 1 spi/spi2.1/2: type eth netdev sw1p2 flavour physical port 2 spi/spi2.1/3: type eth netdev sw1p3 flavour physical port 3 spi/spi2.1/4: type notset flavour cpu port 4 So remove this duplicated, unreliable and troublesome method. From this patch on, the phys_port_name attribute of the DSA master will only contain information about itself (if at all). If the users need reliable information about the CPU port they're probably using devlink anyway. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23ice: move auto FEC checks into ice_cfg_phy_fec()Paul Greenwalt
The call to ice_cfg_phy_fec() requires the caller to perform certain actions before calling it. Instead of imposing these preconditions move the operations into the function and perform them ourselves. Also, fix some style issues in nearby touched code. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: refactor FC functionsPaul Greenwalt
Create a helper function for configuring requested flow control so that it can be utilized by other functions looking to configure flow control settings. Utilize the existing helper ice_copy_phy_caps_to_cfg() to copy a PHY capability to configuration instead duplicating the code for it. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: Add advanced power mgmt for WoLAkeem G Abodunrin
Add callbacks needed to support advanced power management for Wake on LAN. Also make ice_pf_state_is_nominal function available for all configurations not just CONFIG_PCI_IOV. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: split ice_discover_caps into two functionsJacob Keller
Using the new ice_aq_list_caps and ice_parse_(dev|func)_caps functions, replace ice_discover_caps with two functions that each take a pointer to the dev_caps and func_caps structures respectively. This makes the side effect of updating the hw->dev_caps and hw->func_caps obvious from reading the implementation of the function. Additionally, it opens the way for enabling reading of device capabilities outside of the initialization flow. By passing in a pointer, another caller will be able to read the capabilities without modifying the HW capabilities structures. As there are no other callers, it is safe to now remove ice_aq_discover_caps and ice_parse_caps. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: split ice_parse_caps into separate functionsJacob Keller
The ice_parse_caps function is used to convert the capability block data coming from firmware into a structured format used by other parts of the code. The current implementation directly updates the hw->func_caps and hw->dev_caps structures. It is directly called from within ice_aq_discover_caps. This causes the discover_caps function to have the side effect of modifying the HW capability structures, which is not intuitive. Split this function into ice_parse_dev_caps and ice_parse_func_caps. These functions will take a pointer to the dev_caps and func_caps respectively. Also create an ice_parse_common_caps for sharing the capability logic that is common to device and function. Doing so enables a future refactor to allow reading and parsing capabilities into a local caps structure instead of modifying the members of the HW structure directly. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23ice: refactor ice_discover_caps to avoid need to retryJacob Keller
The ice_discover_caps function is used to read the device and function capabilities, updating the hardware capabilities structures with relevant data. The exact number of capabilities returned by the hardware is unknown ahead of time. The AdminQ command will report the total number of capabilities in the return buffer. The current implementation involves requesting capabilities once, reading this returned size, and then re-requested with that size. This isn't really necessary. The firmware interface has a maximum size of ICE_AQ_MAX_BUF_LEN. Firmware can never return more than ICE_AQ_MAX_BUF_LEN / sizeof(struct ice_aqc_list_caps_elem) capabilities. Avoid the retry loop by simply allocating a buffer of size ICE_AQ_MAX_BUF_LEN. This is significantly simpler than retrying. The extra allocation isn't a big deal, as it will be released after we finish parsing the capabilities. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-23cxgb4: add loopback ethtool self-testVishal Kulkarni
In this test, loopback pkt is created and sent on default queue. The packet goes until the Multi Port Switch (MPS) just before the MAC and based on the specified channel number, it either goes outside the wire on one of the physical ports or looped back to Rx path by MPS. In this case, we're specifying loopback channel, instead of physical ports, so the packet gets looped back to Rx path, instead of getting transmitted on the wire. v3: - Modify commit message to include test details. v2: - Add only loopback self-test. Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23Merge branch 'l2tp-further-checkpatch-pl-cleanups'David S. Miller
Tom Parkin says: ==================== l2tp: further checkpatch.pl cleanups l2tp hasn't been kept up to date with the static analysis checks offered by checkpatch.pl. This patchset builds on the series "l2tp: cleanup checkpatch.pl warnings". It includes small refactoring changes which improve code quality and resolve a subset of the checkpatch warnings for the l2tp codebase. ==================== Reviewed-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup kzalloc callsTom Parkin
Passing "sizeof(struct blah)" in kzalloc calls is less readable, potentially prone to future bugs if the type of the pointer is changed, and triggers checkpatch warnings. Tweak the kzalloc calls in l2tp which use this form to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup netlink tunnel create address handlingTom Parkin
When creating an L2TP tunnel using the netlink API, userspace must either pass a socket FD for the tunnel to use (for managed tunnels), or specify the tunnel source/destination address (for unmanaged tunnels). Since source/destination addresses may be AF_INET or AF_INET6, the l2tp netlink code has conditionally compiled blocks to support IPv6. Rather than embedding these directly into l2tp_nl_cmd_tunnel_create (where it makes the code difficult to read and confuses checkpatch to boot) split the handling of address-related attributes into a separate function. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup netlink send of tunnel address informationTom Parkin
l2tp_nl_tunnel_send has conditionally compiled code to support AF_INET6, which makes the code difficult to follow and triggers checkpatch warnings. Split the code out into functions to handle the AF_INET v.s. AF_INET6 cases, which both improves readability and resolves the checkpatch warnings. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: check socket address type in l2tp_dfs_seq_tunnel_showTom Parkin
checkpatch warns about indentation and brace balancing around the conditionally compiled code for AF_INET6 support in l2tp_dfs_seq_tunnel_show. By adding another check on the socket address type we can make the code more readable while removing the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup unnecessary braces in if statementsTom Parkin
These checks are all simple and don't benefit from extra braces to clarify intent. Remove them for easier-reading code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup comparisons to NULLTom Parkin
checkpatch warns about comparisons to NULL, e.g. CHECK: Comparison to NULL could be written "!rt" #474: FILE: net/l2tp/l2tp_ip.c:474: + if (rt == NULL) { These sort of comparisons are generally clearer and more readable the way checkpatch suggests, so update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23net/ncsi: use eth_zero_addr() to clear mac addressMiaohe Lin
Use eth_zero_addr() to clear mac address insetad of memset(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23cxgb4: use eth_zero_addr() to clear mac addressMiaohe Lin
Use eth_zero_addr() to clear mac address insetad of memset(). Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23Merge branch 'mptcp-non-backup-subflows-pre-reqs'David S. Miller
Paolo Abeni says: ==================== mptcp: non backup subflows pre-reqs This series contains a bunch of MPTCP improvements loosely related to concurrent subflows xmit usage, currently under development. The first 3 patches are actually bugfixes for issues that will become apparent as soon as we will enable the above feature. The later patches improve the handling of incoming additional subflows, improving significantly the performances in stress tests based on a high new connection rate. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23subflow: introduce and use mptcp_can_accept_new_subflow()Paolo Abeni
So that we can easily perform some basic PM-related adimission checks before creating the child socket. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23subflow: use rsk_ops->send_reset()Paolo Abeni
tcp_send_active_reset() is more prone to transient errors (memory allocation or xmit queue full): in stress conditions the kernel may drop the egress packet, and the client will be stuck. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23subflow: explicitly check for plain tcp rskPaolo Abeni
When syncookie are in use, the TCP stack may feed into subflow_syn_recv_sock() plain TCP request sockets. We can't access mptcp_subflow_request_sock-specific fields on such sockets. Explicitly check the rsk ops to do safe accesses. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23mptcp: cleanup subflow_finish_connect()Paolo Abeni
The mentioned function has several unneeded branches, handle each case - MP_CAPABLE, MP_JOIN, fallback - under a single conditional and drop quite a bit of duplicate code. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23mptcp: explicitly track the fully established statusPaolo Abeni
Currently accepted msk sockets become established only after accept() returns the new sk to user-space. As MP_JOIN request are refused as per RFC spec on non fully established socket, the above causes mp_join self-tests instabilities. This change lets the msk entering the established status as soon as it receives the 3rd ack and propagates the first subflow fully established status on the msk socket. Finally we can change the subflow acceptance condition to take in account both the sock state and the msk fully established flag. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23mptcp: mark as fallback even early onesPaolo Abeni
In the unlikely event of a failure at connect time, we currently clear the request_mptcp flag - so that the MPC handshake is not started at all, but the msk is not explicitly marked as fallback. This would lead to later insertion of wrong DSS options in the xmitted packets, in violation of RFC specs and possibly fooling the peer. Fixes: e1ff9e82e2ea ("net: mptcp: improve fallback to TCP") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23mptcp: avoid data corruption on reinsertPaolo Abeni
When updating a partially acked data fragment, we actually corrupt it. This is irrelevant till we send data on a single subflow, as retransmitted data, if any are discarded by the peer as duplicate, but it will cause data corruption as soon as we will start creating non backup subflows. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23subflow: always init 'rel_write_seq'Paolo Abeni
Currently we do not init the subflow write sequence for MP_JOIN subflows. This will cause bad mapping being generated as soon as we will use non backup subflow. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23sfc: convert to new udp_tunnel infrastructureJakub Kicinski
Check MC_CMD_DRV_ATTACH_EXT_OUT_FLAG_TRUSTED, before setting the info, which will hopefully protect us from -EPERM errors the previous code was gracefully ignoring. Ed reports this is not the 100% correct bit, but it's the best approximation we have. Shared code reports the port information back to user space, so we really want to know what was added and what failed. Ignoring -EPERM is not an option. The driver does not call udp_tunnel_get_rx_info(), so its own management of table state is not really all that problematic, we can leave it be. This allows the driver to continue with its copious table syncing, and matching the ports to TX frames, which it will reportedly do one day. Leave the feature checking in the callbacks, as the device may remove the capabilities on reset. Inline the loop from __efx_ef10_udp_tnl_lookup_port() into efx_ef10_udp_tnl_has_port(), since it's the only caller now. With new infra this driver gains port replace - when space frees up in a full table a new port will be selected for offload. Plus efx will no longer sleep in an atomic context. v2: - amend the commit message about TRUSTED not being 100% - add TUNNEL_ENCAP_UDP_PORT_ENTRY_INVALID to mark unsed entries Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-By: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22Merge branch 'qed-qede-improve-chain-API-and-add-XDP_REDIRECT-support'David S. Miller
Alexander Lobakin says: ==================== qed, qede: improve chain API and add XDP_REDIRECT support This series adds missing XDP_REDIRECT case handling in QLogic Everest Ethernet driver with all necessary prerequisites and ops. QEDE Tx relies heavily on chain API, so make sure it is in its best at first. v2 (from [1]): - add missing includes to #003 to pass the build on Alpha; - no functional changes. [1] https://lore.kernel.org/netdev/20200722155349.747-1-alobakin@marvell.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qede: add .ndo_xdp_xmit() and XDP_REDIRECT supportAlexander Lobakin
Add XDP_REDIRECT case handling and the corresponding NDO to support redirecting XDP frames. This also includes registering driver memory model (currently order-0 page mode) in BPF subsystem. The total number of XDP queues is usually 1:1 with Rx ones. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qede: refactor XDP Tx processingAlexander Lobakin
Current XDP Tx logic is suboptimal and can't be reused for XDP_REDIRECT path. Make qede_xdp_{tx_int,xmit}() more universal and effective in general to allow future expanding. Misc: use unlikely() hints where appropriate and replace "fallthrough" comments with pseudo-keywords. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qede: reformat net_device_ops declarationsAlexander Lobakin
Correct the indentation of net_device_ops declarations for fancier look. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qede: reformat several structures in "qede.h"Alexander Lobakin
Make the file more readable and easier for adding new fields. Misc: use IFNAMSIZ and netdev_name() instead of sizeof_field() and direct net_device::name dereferencing. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qed: introduce qed_chain_get_elem_used{,u32}()Alexander Lobakin
Add reverse-variants of qed_chain_get_elem_left{,u32}() to be able to know current chain occupation. They will be used in the upcoming qede XDP_REDIRECT code. They share most of the logics with the mentioned ones, so were reused to collapse the latters. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qed: optimize common chain accessorsAlexander Lobakin
Constify chain pointers and refactor qed_chain_get_elem_left{,u32}() a bit. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22qed: add support for different page sizes for chainsAlexander Lobakin
Extend current infrastructure to store chain page size in a struct and use it in all functions instead of fixed QED_CHAIN_PAGE_SIZE. Its value remains the default one, but can be overridden in qed_chain_init_params before chain allocation. Signed-off-by: Alexander Lobakin <alobakin@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>