summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-03-28net: dsa: mv88e6xxx: reorder 88E6141 definitionsVivien Didelot
The related mv88e6xxx_ops and mv88e6xxx_info structure were misplaced. Reorder them correctly to fix this. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28Merge branch 'qed-load-unload-mfw'David S. Miller
Yuval Mintz says: ==================== qed: load/unload mfw series This series correct the unload flow and greatly enhances its initialization flow in regard to interactions between driver and management firmware. Patch #1 makes sure unloading is done under management-firmware's 'criticial section' protection. Patches #2 - #4 move driver into using a newer scheme for loading in regard to the MFW; This newer scheme would help cleaning the device in case a previous instance has dirtied it [preboot, PDA, etc.]. Patches #5 - #6 let driver inform management-firmware on number of resources which are dependent on the non-management firmware used. Patch #7 then uses a new resource [BDQ] instead of some set value. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: Use BDQ resource for storage protocolsMintz, Yuval
Until now, qed used some port-defined value as BDQ index for both iSCSI and FCoE. As management firmware now treats BDQ as a resource and tells each PF its BDQ-range, start using a valure from that range instead. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: Utilize resource-lock based schemeTomer Tayar
Management firmware is used as an arbiter between the various PFs in matters of resources, but some of the resources that need to be divided are dependent on the non-management firmware used, so management firmware first needs to be told how many resources there are before trying to divide them. As part of the initialization sequence, driver would first inform the management firmware of the available resources under a dedicated resource lock, and afterwards request for various resources which might be based on the previous set values. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: Support management-based resource lockingTomer Tayar
Global locking can't properly be used to synchronize between different PFs in all scenarios, as those instances might reside in different logical partitions [e.g., when a PF is assigned via PDA to some VM]. The management firmware provides a generic infrastructure for device locks. For each 'resource', it's guaranteed it could be acquired by at most a single PF at any given time [or by management firmware]. This patch adds the necessary logic in qed for utilizing said infrastructure, implementing lock/unlock internal APIs. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: Send pf-flr as part of initializationMintz, Yuval
During HW initialization, driver would set various registers to their needed values - but it assumes all registers start at their reset-value, so there's no need to re-configure a register's default value. This assumption might be incorrect, e.g., in case of preboot driver running and initializing the driver prior to our driver. To overcome this, we now ask management firmware to initiate a PF-flr early during the initialization sequence. That would return everything in the PF's scope back to default and prevent previous configurations from still being applied. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: Move to new load request schemeTomer Tayar
Management firmware is used as an arbiter between the various PFs in regard to loading - it causes the various PFs to load/unload sequentially and informs each of its appropriate rule in the init. But the existing flow is too weak to handle some scenarios where PFs aren't properly cleaned prior to loading. The significant scenarios falling under this criteria: a. Preboot drivers in some environment can't properly unload. b. Unexpected driver replacement [kdump, PDA]. Modern management firmware supports a more intricate loading flow, where the driver has the ability to overcome previous limitations. This moves qed into using this newer scheme. Notice new scheme is backward compatible, so new drivers would still be able to load properly on top of older management firmwares and vice versa. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: hw_init() to receive parameter-structMintz, Yuval
We'll soon need additional information, so start by changing the infrastructure to receive the initializing variables via a parameter struct. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28qed: Correct HW stop flowTomer Tayar
Management firmware is used as arbiter between different PFs which are loading/unloading, but in order to use the synchronization it offers the contending configurations need to be applied either between their LOAD_REQ <-> LOAD_DONE or UNLOAD_REQ <-> UNLOAD_DONE management firmware commands. Existing HW stop flow utilizes 2 different functions: qed_hw_stop() and qed_hw_reset() which don't abide this requirement; Most of the closure is doing outside the scope of the unload request. This patch removes qed_hw_reset() and places the relevant stop functionality underneath the management firmware protection. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28Merge branch 'tipc-subscription-refcount-simplifications'David S. Miller
Parthasarathy Bhuvaragan says: ==================== tipc: subscription refcount simplifications The first patch makes the subscription refcount cleanup lockless and the second updates the subscription refcount policy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28tipc: adjust the policy of holding subscription krefYing Xue
When a new subscription object is inserted into name_seq->subscriptions list, it's under name_seq->lock protection; when a subscription is deleted from the list, it's also under the same lock protection; similarly, when accessing a subscription by going through subscriptions list, the entire process is also protected by the name_seq->lock. Therefore, if subscription refcount is increased before it's inserted into subscriptions list, and its refcount is decreased after it's deleted from the list, it will be unnecessary to hold refcount at all before accessing subscription object which is obtained by going through subscriptions list under name_seq->lock protection. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28tipc: advance the time of deleting subscription from subscriber->subscrp_listYing Xue
After a subscription object is created, it's inserted into its subscriber subscrp_list list under subscriber lock protection, similarly, before it's destroyed, it should be first removed from its subscriber->subscrp_list. Since the subscription list is accessed with subscriber lock, all the subscriptions are valid during the lock duration. Hence in tipc_subscrb_subscrp_delete(), we remove subscription get/put and the extra subscriber unlock/lock. After this change, the subscriptions refcount cleanup is very simple and does not access any lock. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28stmmac: use netif_set_real_num_{rx,tx}_queuesArnd Bergmann
A driver must not access the two fields directly but should instead use the helper functions to set the values and keep a consistent internal state: ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe': ethernet/stmicro/stmmac/stmmac_main.c:4083:8: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'? Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28soc: qcom: smd-rpm: Add msm8996 compatibilityBjorn Andersson
With the RPM driver transitioned to RPMSG we can reuse the SMD-RPM driver ontop of GLINK for 8996, without any modifications. Acked-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28soc: qcom: smd: Remove standalone driverBjorn Andersson
Remove the standalone SMD implementation as we have transitioned the client drivers to use the RPMSG based one. Also remove all dependencies on QCOM_SMD from Kconfig files, in order to keep them selectable in the absence of the removed symbol. Acked-by: Andy Gross <andy.gross@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28soc: qcom: smd: Transition client drivers from smd to rpmsgBjorn Andersson
By moving these client drivers to use RPMSG instead of the direct SMD API we can reuse them ontop of the newly added GLINK wire-protocol support found in the 820 and 835 Qualcomm platforms. As the new (RPMSG-based) and old SMD implementations are mutually exclusive we have to change all client drivers in one commit, to make sure we have a working system before and after this transition. Acked-by: Andy Gross <andy.gross@linaro.org> Acked-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28vxlan: don't age NTF_EXT_LEARNED fdb entriesRoopa Prabhu
vxlan driver already implicitly supports installing of external fdb entries with NTF_EXT_LEARNED. This patch just makes sure these entries are not aged by the vxlan driver. An external entity managing these entries will age them out. This is consistent with the use of NTF_EXT_LEARNED in the bridge driver. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28Merge branch 'net-dpipe'David S. Miller
Jiri Pirko says: ==================== Add support for pipeline debug (dpipe) Arkadi says: While doing the hardware offloading process much of the hardware specifics cannot be presented. An example for such is the routing LPM algorithm which differ in hardware implementation from the kernel software implementation. The only information the user receives is whether specific route is offloaded or not, but he cannot really understand the underlying implementation nor get the specific statistics related to that process. Another example is ACL offload using TC which is commonly implemented using TCAM memory. Currently there is no capability to gain visibility into the TCAM structure and to debug suboptimal resource allocation. This patchset introduces capability for exporting the ASICs pipeline abstraction via devlink infrastructure, which should serve as an complementary tool. This infrastructure allows the user to get visibility into the ASIC by modeling it as a set of match/action tables. The main objects defined: Table - abstraction for a single pipeline stage. Contains the available match/actions and counter availability. Entry - entry in a specific table with specific matches/actions values and dedicated counter. Header/field - tuples which describes the tables behavior. As an example one of the ASIC's L3 blocks will be modeled. The egress rif (router interface) table is the final step in the L3 pipeline processing which does match on the internal rif index which was determined before by the routing logic. The erif table determines whether to forward or drop the packet and updates the corresponding rif L3 statistics. To expose this internal resources a special metadata header will be introduced that describes the internal information gathered by the ASIC's pipeline and contains the following fields: rif_port_index, forward and drop. Some internal hardware resources have direct mapping to kernel objects. For example the rif_port_index is mapped to the net-devices ifindex. By providing this mapping the users gains visibility into the offloading process. Follow-up work will include exporting more L3 tables which will give visibility into the routing process. First stage is adding support for dpipe in devlink. Next add support in spectrum driver. Finally implement egress router interface (erif) table for spectrum ASIC as an example. --- v1->v2: Please see individual patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: spectrum: Add Support for erif table entries accessArkadi Sharshevsky
Implement dpipe's table ops for erif table which provide: 1. Getting the entries in the table with the associate values. - match on "mlxsw_meta:erif_index" - action on "mlxsw_meta:forwared_out" 2. Synchronize the hardware in case of enabling/disabling counters which mean removing erif counters from all interfaces. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: spectrum_router: Add rif helper functionsArkadi Sharshevsky
Add rif helper function to access the rif index and rif devices ifindex. This functions will be used by dpipe in order to dump the rif table. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: spectrum: Support for counters on router interfacesArkadi Sharshevsky
Add support for counter allocation on router interfaces. The allocation depends on the counter state of relevant table. In case the counting is disabled or no counters left the counter index will be set as invalid. Also a counter pool for router allocation is added. Signed-off-by: Arakdi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: reg: Add Router Interface Counter RegisterArkadi Sharshevsky
The RICNT register retrieves per port performance counter. It will be used to query the router interfaces statistics. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: spectrum: Add definition for egress rif tableArkadi Sharshevsky
Add definition for egress router interface table. This table describes the final part in the routing pipeline. This table matches the egress interface index (rif index, which is set by the previous stages and determine the out port) and makes the decision of forwarding the packet towards the L2 logic or dropping it. The metadata header is added to represent this internal information. The rif index field is mapped logically to netdevice ifindex. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: spectrum: Add placeholder for dpipeArkadi Sharshevsky
Add placeholder for dpipe. Support for specific tables and headers will be introduced in following patches. The headers are shared between all mlxsw_sp instances. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28mlxsw: reg: Add counter fields to RITR registerArkadi Sharshevsky
Update RITR for counter support. This allows adding counters for ASIC's router ports. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28devlink: Support for pipeline debug (dpipe)Arkadi Sharshevsky
The pipeline debug is used to export the pipeline abstractions for the main objects - tables, headers and entries. The only support for set is for changing the counter parameter on specific table. The basic structures: Header - can represent a real protocol header information or internal metadata. Generic protocol headers like IPv4 can be shared between drivers. Each driver can add local headers. Field - part of a header. Can represent protocol field or specific ASIC metadata field. Hardware special metadata fields can be mapped to different resources, for example switch ASIC ports can have internal number which from the systems point of view is mapped to netdeivce ifindex. Match - represent specific match rule. Can describe match on specific field or header. The header index should be specified as well in order to support several header instances of the same type (tunneling). Action - represents specific action rule. Actions can describe operations on specific field values for example like set, increment, etc. And header operation like add and delete. Value - represents value which can be associated with specific match or action. Table - represents a hardware block which can be described with match/ action behavior. The match/action can be done on the packets data or on the internal metadata that it gathered along the packets traversal throw the pipeline which is vendor specific and should be exported in order to provide understanding of ASICs behavior. Entry - represents single record in a specific table. The entry is identified by specific combination of values for match/action. Prior to accessing the tables/entries the drivers provide the header/ field data base which is used by driver to user-space. The data base is split between the shared headers and unique headers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actionsOr Gerlitz
This includes calling the parsing code that translates from pedit speak to the HW API, allocation (deallocation) of a modify header context and setting the modify header id associated with this context to the FTE of that flow. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5e: Add offloading of NIC TC pedit (header re-write) actionsOr Gerlitz
This includes calling the parsing code that translates from pedit speak to the HW API, allocation (deallocation) of a modify header context and setting the modify header id associated with this context to the FTE of that flow. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5e: Add parsing of TC pedit actions to HW formatOr Gerlitz
Parse/translate a set of TC pedit actions to be formed in the HW API format. User-space provides set of keys where each one of them is made of: command (add or set), header-type, byte offset within that header along with a 32 bit mask and value. The mask dictates what bits in the 32 bit word that starts on the offset we should be dealing with, but under negative polarity (unset bits are to be modified). We do a 1st pass over the set of keys while using the header-type and offset to fill the masks and the values into a data-structure containting all the supported network headers. We then do a 2nd pass over the set of fields to re-write supported by the HW, where for each such candidate field, we use the masks filled on the 1st pass to realize if we should offloading re-write it. In case offloading is required, we fill a HW descriptor with the following: (1) the header field to modify (2) the bit offset within the field from where to modify (set command only) (3) the value to set/add (4) the length in bits 1...32 to modify (set command only) Note that it's possible for a given pedit mask to dictate modifying the same header field multiple times or to modify multiple header fields. Currently such combinations are not supported for offloading, hence, for set commands, the offset within the field is always zero, and the length to modify is the field size. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Amir Vadai <amir@vadai.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/sched: Add accessor functions to pedit keys for offloading driversOr Gerlitz
HW drivers will use the header-type and command fields from the extended keys, and some fields (e.g mask, val, offset) from the legacy keys. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5: Introduce alloc/dealloc modify header context commandsOr Gerlitz
Implement the low-level commands to support packet header re-write. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5: Introduce modify header structures, commands and steering action ↵Or Gerlitz
definitions Add the definitions related to creation/deletion of a modify header context and the modify header steering action which are used for HW packet header modify (re-write) as part of steering. Add as well the modify header id into two intermediate structs and set it to the FTE. Note that as the push/pop vlan steering actions are emulated by the ewitch management code, we're not breaking any compatibility while changing their values to make room for the modify header action which is not emulated and whose value is part of the FW API. The new bit values for the emulated actions are at the end of the possible range. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5: Reorder few command cases to reflect their natural orderOr Gerlitz
Move the commands related to scheduling elements and vport qos to a suitable location (according to the MLX5_CMD_OP enum values) in the command string and internal error helpers. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5: Add helper to initialize a flow steering actions struct instanceOr Gerlitz
There are bunch of places in the code where the intermediate struct that keeps the elements related to flow actions is initialized with the same default values. Put that into a small DECLARE type helper. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5e: Properly deal with resource cleanup when adding TC flow failsOr Gerlitz
The code for adding tc fdb flows leaves things half set when it fails in the middle. Currently we are not leaking things (e.g eswitch vlan reference, encap reference and HW resources) since the main code to add flower rules does a cleanup by calling mlx5e_tc_del_flow(). This cleanup further works just b/c we're checking there if the HW rule for the flow we are attempting to delete is valid before touching it, and since under the current possible combinations of supported actions it's okay to go and blidnly deref or delete all the action related resources (encap, vlan). Instead, do things properly, namely make sure that if add flow fails we clean all what was allocated or referenced. Now, the flow delete code can blindly deref/deallocate both the rule and the actions related resources and when more action combinations are introduced (such as the upcoming header re-write) we are fine with clear and robust code. While here, align all of nic/fdb parse actions/add flow functions to get mlx5e_tc_flow struct param and pick the attributes or whatever else needed from there. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5e: Add intermediate struct for TC flow parsing attributesOr Gerlitz
Add intermediate structure to store attributes parsed from TC filter matching/actions parts which are soon to be configured into the HW. Currently put there the flow matching spec after being parsed. More content to be added in down-stream patch. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5e: Add NIC attributes for offloaded TC flowsOr Gerlitz
Add structure that contains the attributes related to offloaded NIC flows. Currently it has the actions and flow tag. While here, do xmas tree cleanup of the TC configure function. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28net/mlx5e: Add prefix for e-switch offloaded TC flow attributesOr Gerlitz
Add esw_ prefix to the flow attributes attached to offloaded e-switch TC flows. This is a pre-step to add attributes to offloaded NIC TC flows. Also, save one pointer space by using gcc's zero size array, this would be beneficial for environments where 100Ks (or Ms) of flows are offloaded. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-27Merge tag 'mlx5e-failsafe' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-failsafe 27-03-2017 This series provides a fail-safe mechanism to allow safely re-configuring mlx5e netdevice and provides a resiliency against sporadic configuration failures. To enable this we do some refactoring and code reorganizing to allow breaking the drivers open/close flows to stages: open -> activate -> deactivate -> close. In addition we need to allow creating fresh HW ring resources (mlx5e_channels) with their own "new" set of parameters, while keeping the current ones running and active until the new channels are successfully created with the new configuration, and only then we can safly replace (switch) old channels with new ones. For that we introduce mlx5e_channels object and an API to manage it: - channels = open_channels(new_params): open fresh TX/RX channels - activate_channels(channels): redirect traffic to them and attach them to the netdev - deactivate_channes(channels) stop traffic and detach from netdev - close(channels) Free the TX/RX HW resources of those channels With the above strategy it is straightforward to achieve the desired behavior of fail-safe configuration. In pseudo code: make_new_config(new_params) { old_channels = current_active_channels; new_channels = create_channels(new_params); if (!new_channels) return "Failed, but current channels are still active :)" deactivate_channels(old_channels); /* Can't fail */ set_hw_new_state(); /* If needed */ activate_channels(new_channels); /* Can't fail */ close_channels(old_channels); current_active_channels = new_channels; return "SUCCESS"; } At the top of this series, we change the following flows to be fail-safe: ethtool: - ring parameters - coalesce parameters - tx copy break parameters - cqe compressing/moderation mode setting (priv flags) ndos: - tc setup - set features: LRO - change mtu ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27Merge branch 'bond-link-status-fixes'David S. Miller
Mahesh Bandewar says: ==================== link-status fixes for mii-monitoring The mii monitoring is divided into two phases - inspect and commit. The inspect phase technically should not make any changes to the state and defer it to the commit phase. However detected link state inconsistencies on several machines and discovered that it's the result of some inconsistent update to link states and assumption that you *always* get rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the commit phase is postponed until next mii-mon run. At the next round because of the state change performed in the previous inspect-run, this round does not detect any changes and would skip calling commit phase. This would result in an inconsistent state until next link event happens (if it ever happens). During the the commit phase, it's always assumed that speed and duplex fetch is always successful, but that's always not the case. However the slave state is marked UP irrespective of speed / duplex fetch operation. If the speed / duplex fetch operation results in insane values for either of these two fields, then keeping internal link state UP is not going to provide fruitful results either. Please see into individual patches for more details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: avoid printing while holding a spinlockMahesh Bandewar
Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: correctly update link status during mii-commit phaseMahesh Bandewar
bond_miimon_commit() marks the link UP after attempting to get the speed and duplex settings for the link. There is a possibility that bond_update_speed_duplex() could fail. This is another place where it could result into an inconsistent bonding link state. With this patch the link will be marked UP only if the speed and duplex values retrieved have sane values and processed further. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: make speed, duplex setting consistent with link stateMahesh Bandewar
bond_update_speed_duplex() retrieves speed and duplex settings. There is a possibility of failure in retrieving these values but caller has to assume it's always successful. This leads to having inconsistent slave link settings. If these (speed, duplex) values cannot be retrieved, then keeping the link UP causes problems. The updated bond_update_speed_duplex() returns 0 on success if it retrieves sane values for speed and duplex. On failure it returns 1 and marks the link down. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: improve link-status update in mii-monitoringMahesh Bandewar
The primary issue is that mii-inspect phase updates link-state and expects changes to be committed during the mii-commit phase. After the inspect phase if it fails to acquire rtnl-mutex, the commit phase (bond_mii_commit) doesn't get to run. This partially updated state stays and makes the internal-state inconsistent. e.g. setup bond0 => slaves: eth1, eth2 eth1 goes DOWN -> UP mii_monitor() mii-inspect() bond_set_slave_link_state(eth1, UP, DontNotify) rtnl_trylock() <- fails! Next mii-monitor round eth1: No change mii_monitor() mii-inspect() eth1->link == current-status (ethtool_ops->get_link) no-change-detected End result: eth1: Link = BOND_LINK_UP Speed = 0xfffff [SpeedUnknown] Duplex = 0xff [DuplexUnknown] This doesn't always happen but for some unlucky machines in a large set of machines it creates problems. The fix for this is to avoid making changes during inspect phase and postpone them until acquiring the rtnl-mutex / invoking commit phase. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27bonding: split bond_set_slave_link_state into two partsMahesh Bandewar
Split the function into two (a) propose (b) commit phase without changing the semantics for the original API. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27Merge branch '40GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-03-27 This series contains updates to i40e and i40evf only. Alex updates the driver code so that we can do bulk updates of the page reference count instead of just incrementing it by one reference at a time. Fixed an issue where we were not resetting skb back to NULL when we have freed it. Cleaned up the i40e_process_skb_fields() to align with other Intel drivers. Removed FCoE code, since it is not supported in any of the Fortville/Fortpark hardware, so there is not much point of carrying the code around, especially if it is broken and untested. Harshitha fixes a bug in the driver where the calculation of the RSS size was not taking into account the number of traffic classes enabled. Robert fixes a potential race condition during VF reset by eliminating IOMMU DMAR Faults caused by VF hardware and when the OS initiates a VF reset and before the reset is finished we modify the VF's settings. Bimmy removes a delay that is no longer needed, since it was only needed for preproduction hardware. Colin King fixes null pointer dereference, where VSI was being dereferenced before the VSI NULL check. Jake fixes an issue with the recent addition of the "client code" to the driver, where we attempt to use an uninitialized variable, so correctly initialize the params variable by calling i40e_client_get_params(). v2: dropped patch 5 of the original series from Carolyn since we need more documentation and reason why the added delay, so Carolyn is taking the time to update the patch before we re-submit it for kernel inclusion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-27i40e: initialize params before notifying of l2_param_changesJacob Keller
Probably due to some mis-merging fix a bug associated with commits d7ce6422d6e6 ("i40e: don't check params until after checking for client instance", 2017-02-09) and 3140aa9a78c9 ("i40e: KISS the client interface", 2017-03-14) The first commit tried to move the initialization of the params structure so that we didn't bother doing this if we didn't have a client interface. You can already see that it looks fishy because of the indentation. The second commit refactors a bunch of the interface, and incorrectly drops the params initialization. I believe what occurred is that internally the two patches were re-ordered, and the merge conflicts as a result were performed incorrectly. Fix the use of an uninitialized variable by correctly initializing the params variable via i40e_client_get_params(). Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27i40evf: dereference VSI after VSI has been null checkedColin Ian King
VSI is being dereferenced before the VSI null check; if VSI is null we end up with a null pointer dereference. Fix this by performing VSI deference after the VSI null check. Also remove the need for using adapter by using vsi->back->cinst. Detected by CoverityScan, CID#1419696, CID#1419697 ("Dereference before null check") Fixes: ed0e894de7c133 ("i40evf: add client interface") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27i40e: Drop FCoE code that always evaluates to false or 0Alexander Duyck
Since FCoE isn't supported by the i40e products there isn't much point in carrying around code that will always evaluate to false. This patch goes through and strips out the code in several spots so that we don't go around carrying variables and/or code that is always going to evaluate to false or 0. Change-ID: I39d1d779c66c638b75525839db2b6208fdc809d7 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27i40e: Drop FCoE code from core driver filesAlexander Duyck
Looking over the code for FCoE it looks like the Rx path has been broken at least since the last major Rx refactor almost a year ago. It seems like FCoE isn't supported for any of the Fortville/Fortpark hardware so there isn't much point in carrying the code around, especially if it is broken and untested. Change-ID: I892de8fa551cb129ce2361e738ff82ce55fa229e Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>