summaryrefslogtreecommitdiff
path: root/drivers/net/dsa/ocelot
AgeCommit message (Collapse)Author
2021-11-18net: dsa: felix: use vcap policer to set flow meter for psfpXiaoliang Yang
This patch add police action to set flow meter table which is defined in IEEE802.1Qci. Flow metering is two rates two buckets and three color marker to policing the frames, we only enable one rate one bucket in this patch. Flow metering shares a same policer pool with VCAP policers, so the PSFP policer calls ocelot_vcap_policer_add() and ocelot_vcap_policer_del() to set flow meter police. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: mscc: ocelot: use index to set vcap policerXiaoliang Yang
Policer was previously automatically assigned from the highest index to the lowest index from policer pool. But police action of tc flower now uses index to set an police entry. This patch uses the police index to set vcap policers, so that one policer can be shared by multiple rules. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: dsa: felix: add stream gate settings for psfpXiaoliang Yang
This patch adds stream gate settings for PSFP. Use SGI table to store stream gate entries. Disable the gate entry when it is not used by any stream. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-18net: dsa: felix: support psfp filter on vsc9959Xiaoliang Yang
VSC9959 supports Per-Stream Filtering and Policing(PSFP) that complies with the IEEE 802.1Qci standard. The stream is identified by Null stream identification(DMAC and VLAN ID) defined in IEEE802.1CB. For PSFP, four tables need to be set up: stream table, stream filter table, stream gate table, and flow meter table. Identify the stream by parsing the tc flower keys and add it to the stream table. The stream filter table is automatically maintained, and its index is determined by SGID(flow gate index) and FMID(flow meter index). Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-03net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridgeVladimir Oltean
Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/20210814025003.2449143-1-colin.foster@in-advantage.com/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: ea440cd2d9b2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <po.liu@nxp.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24net: convert users of bitmap_foo() to linkmode_foo()Sean Anderson
This converts instances of bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS) to linkmode_foo(args...) I manually fixed up some lines to prevent them from being excessively long. Otherwise, this change was generated with the following semantic patch: // Generated with // echo linux/linkmode.h > includes // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \ // | sort | uniq | tee new_includes | wc -l && mv new_includes includes // and repeating until the number stopped going up @i@ @@ ( #include <linux/acpi_mdio.h> | #include <linux/brcmphy.h> | #include <linux/dsa/loop.h> | #include <linux/dsa/sja1105.h> | #include <linux/ethtool.h> | #include <linux/ethtool_netlink.h> | #include <linux/fec.h> | #include <linux/fs_enet_pd.h> | #include <linux/fsl/enetc_mdio.h> | #include <linux/fwnode_mdio.h> | #include <linux/linkmode.h> | #include <linux/lsm_audit.h> | #include <linux/mdio-bitbang.h> | #include <linux/mdio.h> | #include <linux/mdio-mux.h> | #include <linux/mii.h> | #include <linux/mii_timestamper.h> | #include <linux/mlx5/accel.h> | #include <linux/mlx5/cq.h> | #include <linux/mlx5/device.h> | #include <linux/mlx5/driver.h> | #include <linux/mlx5/eswitch.h> | #include <linux/mlx5/fs.h> | #include <linux/mlx5/port.h> | #include <linux/mlx5/qp.h> | #include <linux/mlx5/rsc_dump.h> | #include <linux/mlx5/transobj.h> | #include <linux/mlx5/vport.h> | #include <linux/of_mdio.h> | #include <linux/of_net.h> | #include <linux/pcs-lynx.h> | #include <linux/pcs/pcs-xpcs.h> | #include <linux/phy.h> | #include <linux/phy_led_triggers.h> | #include <linux/phylink.h> | #include <linux/platform_data/bcmgenet.h> | #include <linux/platform_data/xilinx-ll-temac.h> | #include <linux/pxa168_eth.h> | #include <linux/qed/qed_eth_if.h> | #include <linux/qed/qed_fcoe_if.h> | #include <linux/qed/qed_if.h> | #include <linux/qed/qed_iov_if.h> | #include <linux/qed/qed_iscsi_if.h> | #include <linux/qed/qed_ll2_if.h> | #include <linux/qed/qed_nvmetcp_if.h> | #include <linux/qed/qed_rdma_if.h> | #include <linux/sfp.h> | #include <linux/sh_eth.h> | #include <linux/smsc911x.h> | #include <linux/soc/nxp/lpc32xx-misc.h> | #include <linux/stmmac.h> | #include <linux/sunrpc/svc_rdma.h> | #include <linux/sxgbe_platform.h> | #include <net/cfg80211.h> | #include <net/dsa.h> | #include <net/mac80211.h> | #include <net/selftests.h> | #include <rdma/ib_addr.h> | #include <rdma/ib_cache.h> | #include <rdma/ib_cm.h> | #include <rdma/ib_hdrs.h> | #include <rdma/ib_mad.h> | #include <rdma/ib_marshall.h> | #include <rdma/ib_pack.h> | #include <rdma/ib_pma.h> | #include <rdma/ib_sa.h> | #include <rdma/ib_smi.h> | #include <rdma/ib_umem.h> | #include <rdma/ib_umem_odp.h> | #include <rdma/ib_verbs.h> | #include <rdma/iw_cm.h> | #include <rdma/mr_pool.h> | #include <rdma/opa_addr.h> | #include <rdma/opa_port_info.h> | #include <rdma/opa_smi.h> | #include <rdma/opa_vnic.h> | #include <rdma/rdma_cm.h> | #include <rdma/rdma_cm_ib.h> | #include <rdma/rdmavt_cq.h> | #include <rdma/rdma_vt.h> | #include <rdma/rdmavt_qp.h> | #include <rdma/rw.h> | #include <rdma/tid_rdma_defs.h> | #include <rdma/uverbs_ioctl.h> | #include <rdma/uverbs_named_ioctl.h> | #include <rdma/uverbs_std_types.h> | #include <rdma/uverbs_types.h> | #include <soc/mscc/ocelot.h> | #include <soc/mscc/ocelot_ptp.h> | #include <soc/mscc/ocelot_vcap.h> | #include <trace/events/ib_mad.h> | #include <trace/events/rdma_core.h> | #include <trace/events/rdma.h> | #include <trace/events/rpcrdma.h> | #include <uapi/linux/ethtool.h> | #include <uapi/linux/ethtool_netlink.h> | #include <uapi/linux/mdio.h> | #include <uapi/linux/mii.h> ) @depends on i@ expression list args; @@ ( - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_zero(args) | - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_copy(args) | - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_and(args) | - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_or(args) | - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_empty(args) | - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_andnot(args) | - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_equal(args) | - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_intersects(args) | - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_subset(args) ) Add missing linux/mii.h include to mellanox. -DaveM Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/ioam6.sh 7b1700e009cc ("selftests: net: modify IOAM tests for undef bits") bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: felix: break at first CPU port during init and teardownVladimir Oltean
The NXP LS1028A switch has two Ethernet ports towards the CPU, but only one of them is capable of acting as an NPI port at a time (inject and extract packets using DSA tags). However, using the alternative ocelot-8021q tagging protocol, it should be possible to use both CPU ports symmetrically, but for that we need to mark both ports in the device tree as DSA masters. In the process of doing that, it can be seen that traffic to/from the network stack gets broken, and this is because the Felix driver iterates through all DSA CPU ports and configures them as NPI ports. But since there can only be a single NPI port, we effectively end up in a situation where DSA thinks the default CPU port is the first one, but the hardware port configured to be an NPI is the last one. I would like to treat this as a bug, because if the updated device trees are going to start circulating, it would be really good for existing kernels to support them, too. Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: felix: purge skb from TX timestamping queue if it cannot be sentVladimir Oltean
At present, when a PTP packet which requires TX timestamping gets dropped under congestion by the switch, things go downhill very fast. The driver keeps a clone of that skb in a queue of packets awaiting TX timestamp interrupts, but interrupts will never be raised for the dropped packets. Moreover, matching timestamped packets to timestamps is done by a 2-bit timestamp ID, and this can wrap around and we can match on the wrong skb. Since with the default NPI-based tagging protocol, we get no notification about packet drops, the best we can do is eventually recover from the drop of a PTP frame: its skb will be dead memory until another skb which was assigned the same timestamp ID happens to find it. However, with the ocelot-8021q tagger which injects packets using the manual register interface, it appears that we can check for more information, such as: - whether the input queue has reached the high watermark or not - whether the injection group's FIFO can accept additional data or not so we know that a PTP frame is likely to get dropped before actually sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet). But when we do that, we can also remove the skb from the timestamping queue, because there surely won't be any timestamp that matches it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch libVladimir Oltean
Michael reported that when using the "ocelot-8021q" tagging protocol, the switch driver module must be manually loaded before the tagging protocol can be loaded/is available. This appears to be the same problem described here: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/ where due to the fact that DSA tagging protocols make use of symbols exported by the switch drivers, circular dependencies appear and this breaks module autoloading. The ocelot_8021q driver needs the ocelot_can_inject() and ocelot_port_inject_frame() functions from the switch library. Previously the wrong approach was taken to solve that dependency: shims were provided for the case where the ocelot switch library was compiled out, but that turns out to be insufficient, because the dependency when the switch lib _is_ compiled is problematic too. We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as static inline functions, because these access I/O functions like __ocelot_write_ix() which is called by ocelot_write_rix(). Making those static inline basically means exposing the whole guts of the ocelot switch library, not ideal... We already have one tagging protocol driver which calls into the switch driver during xmit but not using any exported symbol: sja1105_defer_xmit. We can do the same thing here: create a kthread worker and one work item per skb, and let the switch driver itself do the register accesses to send the skb, and then consume it. Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12net: mscc: ocelot: avoid overflowing the PTP timestamp FIFOVladimir Oltean
PTP packets with 2-step TX timestamp requests are matched to packets based on the egress port number and a 6-bit timestamp identifier. All PTP timestamps are held in a common FIFO that is 128 entry deep. This patch ensures that back-to-back timestamping requests cannot exceed the hardware FIFO capacity. If that happens, simply send the packets without requesting a TX timestamp to be taken (in the case of felix, since the DSA API has a void return code in ds->ops->port_txtstamp) or drop them (in the case of ocelot). I've moved the ts_id_lock from a per-port basis to a per-switch basis, because we need separate accounting for both numbers of PTP frames in flight. And since we need locking to inc/dec the per-switch counter, that also offers protection for the per-port counter and hence there is no reason to have a per-port counter anymore. Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-24net: dsa: felix: accept "ethernet-ports" OF node nameVladimir Oltean
Since both forms are accepted, let's search for both when we pre-validate the PHY modes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-19net: dsa: be compatible with masters which unregister on shutdownVladimir Oltean
Lino reports that on his system with bcmgenet as DSA master and KSZ9897 as a switch, rebooting or shutting down never works properly. What does the bcmgenet driver have special to trigger this, that other DSA masters do not? It has an implementation of ->shutdown which simply calls its ->remove implementation. Otherwise said, it unregisters its network interface on shutdown. This message can be seen in a loop, and it hangs the reboot process there: unregister_netdevice: waiting for eth0 to become free. Usage count = 3 So why 3? A usage count of 1 is normal for a registered network interface, and any virtual interface which links itself as an upper of that will increment it via dev_hold. In the case of DSA, this is the call path: dsa_slave_create -> netdev_upper_dev_link -> __netdev_upper_dev_link -> __netdev_adjacent_dev_insert -> dev_hold So a DSA switch with 3 interfaces will result in a usage count elevated by two, and netdev_wait_allrefs will wait until they have gone away. Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and delete themselves, but DSA cannot just vanish and go poof, at most it can unbind itself from the switch devices, but that must happen strictly earlier compared to when the DSA master unregisters its net_device, so reacting on the NETDEV_UNREGISTER event is way too late. It seems that it is a pretty established pattern to have a driver's ->shutdown hook redirect to its ->remove hook, so the same code is executed regardless of whether the driver is unbound from the device, or the system is just shutting down. As Florian puts it, it is quite a big hammer for bcmgenet to unregister its net_device during shutdown, but having a common code path with the driver unbind helps ensure it is well tested. So DSA, for better or for worse, has to live with that and engage in an arms race of implementing the ->shutdown hook too, from all individual drivers, and do something sane when paired with masters that unregister their net_device there. The only sane thing to do, of course, is to unlink from the master. However, complications arise really quickly. The pattern of redirecting ->shutdown to ->remove is not unique to bcmgenet or even to net_device drivers. In fact, SPI controllers do it too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers and MDIO controllers do it too (this is something I have not researched too deeply, but even if this is not the case today, it is certainly plausible to happen in the future, and must be taken into consideration). Since DSA switches might be SPI devices, I2C devices, MDIO devices, the insane implication is that for the exact same DSA switch device, we might have both ->shutdown and ->remove getting called. So we need to do something with that insane environment. The pattern I've come up with is "if this, then not that", so if either ->shutdown or ->remove gets called, we set the device's drvdata to NULL, and in the other hook, we check whether the drvdata is NULL and just do nothing. This is probably not necessary for platform devices, just for devices on buses, but I would really insist for consistency among drivers, because when code is copy-pasted, it is not always copy-pasted from the best sources. So depending on whether the DSA switch's ->remove or ->shutdown will get called first, we cannot really guarantee even for the same driver if rebooting will result in the same code path on all platforms. But nonetheless, we need to do something minimally reasonable on ->shutdown too to fix the bug. Of course, the ->remove will do more (a full teardown of the tree, with all data structures freed, and this is why the bug was not caught for so long). The new ->shutdown method is kept separate from dsa_unregister_switch not because we couldn't have unregistered the switch, but simply in the interest of doing something quick and to the point. The big question is: does the DSA switch's ->shutdown get called earlier than the DSA master's ->shutdown? If not, there is still a risk that we might still trigger the WARN_ON in unregister_netdevice that says we are attempting to unregister a net_device which has uppers. That's no good. Although the reference to the master net_device won't physically go away even if DSA's ->shutdown comes afterwards, remember we have a dev_hold on it. The answer to that question lies in this comment above device_link_add: * A side effect of the link creation is re-ordering of dpm_list and the * devices_kset list by moving the consumer device and all devices depending * on it to the ends of these lists (that does not happen to devices that have * not been registered when this function is called). so the fact that DSA uses device_link_add towards its master is not exactly for nothing. device_shutdown() walks devices_kset from the back, so this is our guarantee that DSA's shutdown happens before the master's shutdown. Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings") Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/ Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-17net: update NXP copyright textVladimir Oltean
NXP Legal insists that the following are not fine: - Saying "NXP Semiconductors" instead of "NXP", since the company's registered name is "NXP" - Putting a "(c)" sign in the copyright string - Putting a comma in the copyright string The only accepted copyright string format is "Copyright <year-range> NXP". This patch changes the copyright headers in the networking files that were sent by me, or derived from code sent by me. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: mscc: ocelot: transmit the VLAN filtering restrictions via extackVladimir Oltean
We need to transmit more restrictions in future patches, convert this one to netlink extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-20net: mscc: ocelot: transmit the "native VLAN" error via extackVladimir Oltean
We need to reject some more configurations in future patches, convert the existing one to netlink extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16net: mscc: ocelot: convert to phylinkVladimir Oltean
The felix DSA driver, which is a wrapper over the same hardware class as ocelot, is integrated with phylink, but ocelot is using the plain PHY library. It makes sense to bring together the two implementations, which is what this patch achieves. This is a large patch and hard to break up, but it does the following: The existing ocelot_adjust_link writes some registers, and felix_phylink_mac_link_up writes some registers, some of them are common, but both functions write to some registers to which the other doesn't. The main reasons for this are: - Felix switches so far have used an NXP PCS so they had no need to write the PCS1G registers that ocelot_adjust_link writes - Felix switches have the MAC fixed at 1G, so some of the MAC speed changes actually break the link and must be avoided. The naming conventions for the functions introduced in this patch are: - vsc7514_phylink_{mac_config,validate} are specific to the Ocelot instantiations and placed in ocelot_net.c which is built only for the ocelot switchdev driver. - ocelot_phylink_mac_link_{up,down} are shared between the ocelot switchdev driver and the felix DSA driver (they are put in the common lib). One by one, the registers written by ocelot_adjust_link are: DEV_MAC_MODE_CFG - felix_phylink_mac_link_up had no need to write this register since its out-of-reset value was fine and did not need changing. The write is moved to the common ocelot_phylink_mac_link_up and on felix it is guarded by a quirk bit that makes the written value identical with the out-of-reset one DEV_PORT_MISC - runtime invariant, was moved to vsc7514_phylink_mac_config PCS1G_MODE_CFG - same as above PCS1G_SD_CFG - same as above PCS1G_CFG - same as above PCS1G_ANEG_CFG - same as above PCS1G_LB_CFG - same as above DEV_MAC_ENA_CFG - both ocelot_adjust_link and ocelot_port_disable touched this. felix_phylink_mac_link_{up,down} also do. We go with what felix does and put it in ocelot_phylink_mac_link_up. DEV_CLOCK_CFG - ocelot_adjust_link and felix_phylink_mac_link_up both write this, but to different values. Move to the common ocelot_phylink_mac_link_up and make sure via the quirk that the old values are preserved for both. ANA_PFC_PFC_CFG - ocelot_adjust_link wrote this, felix_phylink_mac_link_up did not. Runtime invariant, speed does not matter since PFC is disabled via the RX_PFC_ENA bits which are cleared. Move to vsc7514_phylink_mac_config. QSYS_SWITCH_PORT_MODE_PORT_ENA - both ocelot_adjust_link and felix_phylink_mac_link_{up,down} wrote this. Ocelot also wrote this register from ocelot_port_disable. Keep what felix did, move in ocelot_phylink_mac_link_{up,down} and delete ocelot_port_disable. ANA_POL_FLOWC - same as above SYS_MAC_FC_CFG - same as above, except slight behavior change. Whereas ocelot always enabled RX and TX flow control, felix listened to phylink (for the most part, at least - see the 2500base-X comment). The registers which only felix_phylink_mac_link_up wrote are: SYS_PAUSE_CFG_PAUSE_ENA - this is why I am not sure that flow control worked on ocelot. Not it should, since the code is shared with felix where it does. ANA_PORT_PORT_CFG - this is a Frame Analyzer block register, phylink should be the one touching them, deleted. Other changes: - The old phylib registration code was in mscc_ocelot_init_ports. It is hard to work with 2 levels of indentation already in, and with hard to follow teardown logic. The new phylink registration code was moved inside ocelot_probe_port(), right between alloc_etherdev() and register_netdev(). It could not be done before (=> outside of) ocelot_probe_port() because ocelot_probe_port() allocates the struct ocelot_port which we then use to assign ocelot_port->phy_mode to. It is more preferable to me to have all PHY handling logic inside the same function. - On the same topic: struct ocelot_port_private :: serdes is only used in ocelot_port_open to set the SERDES protocol to Ethernet. This is logically a runtime invariant and can be done just once, when the port registers with phylink. We therefore don't even need to keep the serdes reference inside struct ocelot_port_private, or to use the devm variant of of_phy_get(). - Phylink needs a valid phy-mode for phylink_create() to succeed, and the existing device tree bindings in arch/mips/boot/dts/mscc/ocelot_pcb120.dts don't define one for the internal PHY ports. So we patch PHY_INTERFACE_MODE_NA into PHY_INTERFACE_MODE_INTERNAL. - There was a strategically placed: switch (priv->phy_mode) { case PHY_INTERFACE_MODE_NA: continue; which made the code skip the serdes initialization for the internal PHY ports. Frankly that is not all that obvious, so now we explicitly initialize the serdes under an "if" condition and not rely on code jumps, so everything is clearer. - There was a write of OCELOT_SPEED_1000 to DEV_CLOCK_CFG for QSGMII ports. Since that is in fact the default value for the register field DEV_CLOCK_CFG_LINK_SPEED, I can only guess the intention was to clear the adjacent fields, MAC_TX_RST and MAC_RX_RST, aka take the port out of reset, which does match the comment. I don't even want to know why this code is placed there, but if there is indeed an issue that all ports that share a QSGMII lane must all be up, then this logic is already buggy, since mscc_ocelot_init_ports iterates using for_each_available_child_of_node, so nobody prevents the user from putting a 'status = "disabled";' for some QSGMII ports which would break the driver's assumption. In any case, in the eventuality that I'm right, we would have yet another issue if ocelot_phylink_mac_link_down would reset those ports and that would be forbidden, so since the ocelot_adjust_link logic did not do that (maybe for a reason), add another quirk to preserve the old logic. The ocelot driver teardown goes through all ports in one fell swoop. When initialization of one port fails, the ocelot->ports[port] pointer for that is reset to NULL, and teardown is done only for non-NULL ports, so there is no reason to do partial teardowns, let the central mscc_ocelot_release_ports() do its job. Tested bind, unbind, rebind, link up, link down, speed change on mock-up hardware (modified the driver to probe on Felix VSC9959). Also regression tested the felix DSA driver. Could not test the Ocelot specific bits (PCS1G, SERDES, device tree bindings). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-16net: dsa: felix: stop calling ocelot_port_{enable,disable}Vladimir Oltean
ocelot_port_enable touches ANA_PORT_PORT_CFG, which has the following fields: - LOCKED_PORTMOVE_CPU, LEARNDROP, LEARNCPU, LEARNAUTO, RECV_ENA, all of which are written with their hardware default values, also runtime invariants. So it makes no sense to write these during every .ndo_open. - PORTID_VAL: this field has an out-of-reset value of zero for all ports and must be initialized by software. Additionally, the ocelot_setup_logical_port_ids() code path sets up different logical port IDs for the ports in a hardware LAG, and we absolutely don't want .ndo_open to interfere there and reset those values. So in fact the write from ocelot_port_enable can better be moved to ocelot_init_port, and the .ndo_open hook deleted. ocelot_port_disable touches DEV_MAC_ENA_CFG and QSYS_SWITCH_PORT_MODE_PORT_ENA, in an attempt to undo what ocelot_adjust_link did. But since .ndo_stop does not get called each time the link falls (i.e. this isn't a substitute for .phylink_mac_link_down), felix already does better at this by writing those registers already in felix_phylink_mac_link_down. So keep ocelot_port_disable (for now, until ocelot is converted to phylink too), and just delete the felix call to it, which is not necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-13ethernet: fix PTP_1588_CLOCK dependenciesArnd Bergmann
The 'imply' keyword does not do what most people think it does, it only politely asks Kconfig to turn on another symbol, but does not prevent it from being disabled manually or built as a loadable module when the user is built-in. In the ICE driver, the latter now causes a link failure: aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl': ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config' ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config' aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config' ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset': ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release' ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release' aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild': This is a recurring problem in many drivers, and we have discussed it several times befores, without reaching a consensus. I'm providing a link to the previous email thread for reference, which discusses some related problems. To solve the dependency issue better than the 'imply' keyword, introduce a separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver can depend on if it is able to use PTP support when available, but works fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are then prevented from being built-in, the same way as with a 'depends on PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick, but that can be rather confusing when you first see it. Since this should cover the dependencies correctly, the IS_REACHABLE() hack in the header is no longer needed now, and can be turned back into a normal IS_ENABLED() check. Any driver that gets the dependency wrong will now cause a link time failure rather than being unable to use PTP support when that is in a loadable module. However, the two recently added ptp_get_vclocks_index() and ptp_convert_timestamp() interfaces are only called from builtin code with ethtool and socket timestamps, so keep the current behavior by stubbing those out completely when PTP is in a loadable module. This should be addressed properly in a follow-up. As Richard suggested, we may want to actually turn PTP support into a 'bool' option later on, preventing it from being a loadable module altogether, which would be one way to solve the problem with the ethtool interface. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Link: https://lore.kernel.org/netdev/20210804121318.337276-1-arnd@kernel.org/ Link: https://lore.kernel.org/netdev/CAK8P3a06enZOf=XyZ+zcAwBczv41UuCTz+=0FMf2gBz1_cOnZQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/CAK8P3a3=eOxE-K25754+fB_-i_0BZzf9a9RfPTX3ppSwu9WZXw@mail.gmail.com/ Link: https://lore.kernel.org/netdev/20210726084540.3282344-1-arnd@kernel.org/ Acked-by: Shannon Nelson <snelson@pensando.io> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210812183509.1362782-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-07-20net: dsa: tag_8021q: absorb dsa_8021q_setup into dsa_tag_8021q_{,un}registerVladimir Oltean
Right now, setting up tag_8021q is a 2-step operation for a driver, first the context structure needs to be created, then the VLANs need to be installed on the ports. A similar thing is true for teardown. Merge the 2 steps into the register/unregister methods, to be as transparent as possible for the driver as to what tag_8021q does behind the scenes. This also gets rid of the funny "bool setup == true means setup, == false means teardown" API that tag_8021q used to expose. Note that dsa_tag_8021q_register() must be called at least in the .setup() driver method and never earlier (like in the driver probe function). This is because the DSA switch tree is not initialized at probe time, and the cross-chip notifiers will not work. For symmetry with .setup(), the unregister method should be put in .teardown(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: make tag_8021q operations part of the coreVladimir Oltean
Make tag_8021q a more central element of DSA and move the 2 driver specific operations outside of struct dsa_8021q_context (which is supposed to hold dynamic data and not really constant function pointers). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: let the core manage the tag_8021q contextVladimir Oltean
The basic problem description is as follows: Be there 3 switches in a daisy chain topology: | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] | +---------+ | sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] | +---------+ | sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] The CPU will not be able to ping through the user ports of the bottom-most switch (like for example sw2p0), simply because tag_8021q was not coded up for this scenario - it has always assumed DSA switch trees with a single switch. To add support for the topology above, we must admit that the RX VLAN of sw2p0 must be added on some ports of switches 0 and 1 as well. This is in fact a textbook example of thing that can use the cross-chip notifier framework that DSA has set up in switch.c. There is only one problem: core DSA (switch.c) is not able right now to make the connection between a struct dsa_switch *ds and a struct dsa_8021q_context *ctx. Right now, it is drivers who call into tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer, and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del} methods. But with cross-chip notifiers, it is possible for tag_8021q to call drivers without drivers having ever asked for anything. A good example is right above: when sw2p0 wants to set itself up for tag_8021q, the .tag_8021q_vlan_add method needs to be called for switches 1 and 0, so that they transport sw2p0's VLANs towards the CPU without dropping them. So instead of letting drivers manage the tag_8021q context, add a tag_8021q_ctx pointer inside of struct dsa_switch, which will be populated when dsa_tag_8021q_register() returns success. The patch is fairly long-winded because we are partly reverting commit 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure") which made the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we can access "ctx" directly from "ds", this is no longer needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: tag_8021q: create dsa_tag_8021q_{register,unregister} helpersVladimir Oltean
In preparation of moving tag_8021q to core DSA, move all initialization and teardown related to tag_8021q which is currently done by drivers in 2 functions called "register" and "unregister". These will gather more functionality in future patches, which will better justify the chosen naming scheme. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08net: dsa: felix: set TX flow control according to the phylink_mac_link_up ↵Vladimir Oltean
resolution Instead of relying on the static initialization done by ocelot_init_port() which enables flow control unconditionally, set SYS_PAUSE_CFG_PAUSE_ENA according to the parameters negotiated by the PHY. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07net: mscc: ocelot: check return value after calling platform_get_resource()Yang Yingliang
It will cause null-ptr-deref if platform_get_resource() returns NULL, we need check the return value. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-10net: dsa: felix: re-enable TAS guard band modeMichael Walle
Commit 316bcffe4479 ("net: dsa: felix: disable always guard band bit for TAS config") disabled the guard band and broke 802.3Qbv compliance. There are two issues here: (1) Without the guard band the end of the scheduling window could be overrun by a frame in transit. (2) Frames that don't fit into a configured window will still be sent. The reason for both issues is that the switch will schedule the _start_ of a frame transmission inside the predefined window without taking the length of the frame into account. Thus, we'll need the guard band which will close the gate early, so that a complete frame can still be sent. Revert the commit and add a note. For a lengthy discussion see [1]. [1] https://lore.kernel.org/netdev/c7618025da6723418c56a54fe4683bd7@walle.cc/ Fixes: 316bcffe4479 ("net: dsa: felix: disable always guard band bit for TAS config") Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: mscc: ocelot: convert to ocelot_port_txtstamp_request()Yangbo Lu
Convert to a common ocelot_port_txtstamp_request() for TX timestamp request handling. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: free skb->cb usage in core driverYangbo Lu
Free skb->cb usage in core driver and let device drivers decide to use or not. The reason having a DSA_SKB_CB(skb)->clone was because dsa_skb_tx_timestamp() which may set the clone pointer was called before p->xmit() which would use the clone if any, and the device driver has no way to initialize the clone pointer. This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning of dsa_slave_xmit(). Some new features in the future, like one-step timestamp may need more bytes of skb->cb to use in dsa_skb_tx_timestamp(), and p->xmit(). Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: no longer clone skb in core driverYangbo Lu
It was a waste to clone skb directly in dsa_skb_tx_timestamp(). For one-step timestamping, a clone was not needed. For any failure of port_txtstamp (this may usually happen), the skb clone had to be freed. So this patch moves skb cloning for tx timestamp out of dsa core, and let drivers clone skb in port_txtstamp if they really need. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: no longer identify PTP packet in core driverYangbo Lu
Move ptp_classify_raw out of dsa core driver for handling tx timestamp request. Let device drivers do this if they want. Not all drivers want to limit tx timestamping for only PTP packet. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: check tx timestamp request in core driverYangbo Lu
Check tx timestamp request in core driver at very beginning of dsa_skb_tx_timestamp(), so that most skbs not requiring tx timestamp just return. And drop such checking in device drivers. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-20net: dsa: felix: disable always guard band bit for TAS configXiaoliang Yang
ALWAYS_GUARD_BAND_SCH_Q bit in TAS config register is descripted as this: 0: Guard band is implemented for nonschedule queues to schedule queues transition. 1: Guard band is implemented for any queue to schedule queue transition. The driver set guard band be implemented for any queue to schedule queue transition before, which will make each GCL time slot reserve a guard band time that can pass the max SDU frame. Because guard band time could not be set in tc-taprio now, it will use about 12000ns to pass 1500B max SDU. This limits each GCL time interval to be more than 12000ns. This patch change the guard band to be only implemented for nonschedule queues to schedule queues transition, so that there is no need to reserve guard band on each GCL. Users can manually add guard band time for each schedule queues in their configuration if they want. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-29net: mscc: ocelot: remove redundant dev_err call in vsc9959_mdio_bus_alloc()Guobin Huang
There is a error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundant error message. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Guobin Huang <huangguobin4@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23net: ocelot: replay switchdev events when joining bridgeVladimir Oltean
The premise of this change is that the switchdev port attributes and objects offloaded by ocelot might have been missed when we are joining an already existing bridge port, such as a bonding interface. The patch pulls these switchdev attributes and objects from the bridge, on behalf of the 'bridge port' net device which might be either the ocelot switch interface, or the bonding upper interface. The ocelot_net.c belongs strictly to the switchdev ocelot driver, while ocelot.c is part of a library shared with the DSA felix driver. The ocelot_port_bridge_leave function (part of the common library) used to call ocelot_port_vlan_filtering(false), something which is not necessary for DSA, since the framework deals with that already there. So we move this function to ocelot_switchdev_unsync, which is specific to the switchdev driver. The code movement described above makes ocelot_port_bridge_leave no longer return an error code, so we change its type from int to void. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: felix: Add support for MRPHoratiu Vultur
Implement functions 'port_mrp_add', 'port_mrp_del', 'port_mrp_add_ring_role' and 'port_mrp_del_ring_role' to call the mrp functions from ocelot. Also all MRP frames that arrive to CPU on queue number OCELOT_MRP_CPUQ will be forward by the SW. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: felix: perform teardown on error in felix_setupVladimir Oltean
If the driver fails to probe, it would be nice to not leak memory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: felix: don't deinitialize unused portsVladimir Oltean
ocelot_init_port is called only if dsa_is_unused_port == false, however ocelot_deinit_port is called unconditionally. This causes a warning in the skb_queue_purge inside ocelot_deinit_port saying that the spin lock protecting ocelot_port->tx_skbs was not initialized. Fixes: e5fb512d81d0 ("net: mscc: ocelot: deinitialize only initialized ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: propagate extack to .port_vlan_filteringVladimir Oltean
Some drivers can't dynamically change the VLAN filtering option, or impose some restrictions, it would be nice to propagate this info through netlink instead of printing it to a kernel log that might never be read. Also netlink extack includes the module that emitted the message, which means that it's easier to figure out which ones are driver-generated errors as opposed to command misuse. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: propagate extack to .port_vlan_addVladimir Oltean
Allow drivers to communicate their restrictions to user space directly, instead of printing to the kernel log. Where the conversion would have been lossy and things like VLAN ID could no longer be conveyed (due to the lack of support for printf format specifier in netlink extack), I chose to keep the messages in full form to the kernel log only, and leave it up to individual driver maintainers to move more messages to extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: tag_ocelot_8021q: add support for PTP timestampingVladimir Oltean
For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021qVladimir Oltean
Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: tag_ocelot: create separate tagger for SevilleVladimir Oltean
The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: mscc: ocelot: use common tag parsing code with DSAVladimir Oltean
The Injection Frame Header and Extraction Frame Header that the switch prepends to frames over the NPI port is also prepended to frames delivered over the CPU port module's queues. Let's unify the handling of the frame headers by making the ocelot driver call some helpers exported by the DSA tagger. Among other things, this allows us to get rid of the strange cpu_to_be32 when transmitting the Injection Frame Header on ocelot, since the packing API uses network byte order natively (when "quirks" is 0). The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and the cpu extraction queue mask to something, but the code doesn't do it, so we don't do it either. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: mscc: ocelot: offload bridge port flags to deviceVladimir Oltean
We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. We need to set up the initial port flags for no learning and flooding everything, and also when the port joins and leaves the bridge. The flood configuration was already configured ok for standalone mode in ocelot_init, we just need to disable learning in ocelot_init_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: mscc: ocelot: use separate flooding PGID for broadcastVladimir Oltean
In preparation of offloading the bridge port flags which have independent settings for unknown multicast and for broadcast, we should also start reserving one destination Port Group ID for the flooding of broadcast packets, to allow configuring it individually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: dsa: felix: restore multicast flood to CPU when NPI tagger reinitializesVladimir Oltean
ocelot_init sets up PGID_MC to include the CPU port module, and that is fine, but the ocelot-8021q tagger removes the CPU port module from the unknown multicast replicator. So after a transition from the default ocelot tagger towards ocelot-8021q and then again towards ocelot, multicast flooding towards the CPU port module will be disabled. Fixes: e21268efbe26 ("net: dsa: felix: perform switch setup for tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2021-02-09net: dsa: felix: implement port flushing on .phylink_mac_link_downVladimir Oltean
There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-06net: dsa: felix: propagate the LAG offload ops towards the ocelot libVladimir Oltean
The ocelot switch has been supporting LAG offload since its initial commit, however felix could not make use of that, due to lack of a LAG abstraction in DSA. Now that we have that, let's forward DSA's calls towards the ocelot library, who will deal with setting up the bonding. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: felix: perform switch setup for tag_8021qVladimir Oltean
Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>