summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-20xenbus: limit when state is forced to closedPaul Durrant
If a driver probe() fails then leave the xenstore state alone. There is no reason to modify it as the failure may be due to transient resource allocation issues and hence a subsequent probe() may succeed. If the driver supports re-binding then only force state to closed during remove() only in the case when the toolstack may need to clean up. This can be detected by checking whether the state in xenstore has been set to closing prior to device removal. NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver, which defaults to false. Subsequent patches will add support to some backend drivers. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-20xenbus: move xenbus_dev_shutdown() into frontend code...Paul Durrant
...and make it static xenbus_dev_shutdown() is seemingly intended to cause clean shutdown of PV frontends when a guest is rebooted. Indeed the function waits for a conpletion which is only set by a call to xenbus_frontend_closed(). This patch removes the shutdown() method from backends and moves xenbus_dev_shutdown() from xenbus_probe.c into xenbus_probe_frontend.c, renaming it appropriately and making it static. NOTE: In the case where the backend is running in a driver domain, the toolstack should have already terminated any frontends that may be using it (since Xen does not support re-startable PV driver domains) so xenbus_dev_shutdown() should never be called. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-20xen/blkfront: Adjust indentation in xlvbd_alloc_gendiskNathan Chancellor
Clang warns: ../drivers/block/xen-blkfront.c:1117:4: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] nr_parts = PARTS_PER_DISK; ^ ../drivers/block/xen-blkfront.c:1115:3: note: previous statement is here if (err) ^ This is because there is a space at the beginning of this line; remove it so that the indentation is consistent according to the Linux kernel coding style and clang no longer warns. While we are here, the previous line has some trailing whitespace; clean that up as well. Fixes: c80a420995e7 ("xen-blkfront: handle Xen major numbers other than XENVBD") Link: https://github.com/ClangBuiltLinux/linux/issues/791 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-20riscv: move sifive_l2_cache.c to drivers/socChristoph Hellwig
The sifive_l2_cache.c is in no way related to RISC-V architecture memory management. It is a little stub driver working around the fact that the EDAC maintainers prefer their drivers to be structured in a certain way that doesn't fit the SiFive SOCs. Move the file to drivers/soc and add a Kconfig option for it, as well as the whole drivers/soc boilerplate for CONFIG_SOC_SIFIVE. Fixes: a967a289f169 ("RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Borislav Petkov <bp@suse.de> [paul.walmsley@sifive.com: keep the MAINTAINERS change specific to the L2$ controller code] Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-20riscv: define vmemmap before pfn_to_page callsDavid Abdurachmanov
pfn_to_page & page_to_pfn depend on vmemmap being available before the calls if kernel is configured with CONFIG_SPARSEMEM_VMEMMAP=y. This was caused by NOMMU changes which moved vmemmap definition bellow functions definitions calling pfn_to_page & page_to_pfn. Noticed while compiled 5.5-rc2 kernel for Fedora/RISCV. v2: - Add a comment for vmemmap in source Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com> Fixes: 6bd33e1ece52 ("riscv: add nommu support") Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-20riscv: fix scratch register clearing in M-mode.Greentime Hu
This patch fixes that the sscratch register clearing in M-mode. It cleared sscratch register in M-mode, but it should clear mscratch register. That will cause kernel trap if the CPU core doesn't support S-mode when trying to access sscratch. Fixes: 9e80635619b5 ("riscv: clear the instruction cache and all registers when booting") Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-20riscv: Fix use of undefined config option CONFIG_CONFIG_MMUAndreas Schwab
In Kconfig files, config options are written without the CONFIG_ prefix. Fixes: 6bd33e1ece52 ("riscv: add nommu support") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-19NFC: pn544: Adjust indentation in pn544_hci_check_presenceNathan Chancellor
Clang warns ../drivers/nfc/pn544/pn544.c:696:4: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] return nfc_hci_send_cmd(hdev, NFC_HCI_RF_READER_A_GATE, ^ ../drivers/nfc/pn544/pn544.c:692:3: note: previous statement is here if (target->nfcid1_len != 4 && target->nfcid1_len != 7 && ^ 1 warning generated. This warning occurs because there is a space after the tab on this line. Remove it so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. Fixes: da052850b911 ("NFC: Add pn544 presence check for different targets") Link: https://github.com/ClangBuiltLinux/linux/issues/814 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge branch 'bcmgenet-Turn-on-offloads-by-default'David S. Miller
Doug Berger says: ==================== net: bcmgenet: Turn on offloads by default This commit stack is based on Florian's commit 4e8aedfe78c7 ("net: systemport: Turn on offloads by default") and enables the offloads for the bcmgenet driver by default. The first commit adds support for the HIGHDMA feature to the driver. The second converts the Tx checksum implementation to use the generic hardware logic rather than the deprecated IP centric methods. The third modifies the Rx checksum implementation to use the hardware offload to compute the complete checksum rather than filtering out bad packets detected by the hardware's IP centric implementation. This may increase processing load by passing bad packets to the network stack, but it provides for more flexible handling of packets by the network stack without requiring software computation of the checksum. The remaining commits mirror the extensions Florian made to the sysport driver to retain symmetry with that driver and to make the benefits of the hardware offloads more ubiquitous. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: Add software counters to track reallocationsDoug Berger
When inserting the TSB, keep track of how many times we had to do it and if there was a failure in doing so, this helps profile the driver for possibly incorrect headroom settings. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: Be drop monitor friendly while re-allocating headroomDoug Berger
During bcmgenet_put_tx_csum() make sure we differentiate a SKB headroom re-allocation failure from the normal swap and replace path. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: Turn on offloads by defaultDoug Berger
We can turn on the RX/TX checksum offloads and the scatter/gather features by default and make sure that those are properly reflected back to e.g: stacked devices such as VLAN. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: Utilize bcmgenet_set_features() during resume/openDoug Berger
During driver resume and open, the HW may have lost its context/state, utilize bcmgenet_set_features() to make sure we do restore the correct set of features that were previously configured. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: Refactor bcmgenet_set_features()Doug Berger
In preparation for unconditionally enabling TX and RX checksum offloads, refactor bcmgenet_set_features() a bit such that __netdev_update_features() during register_netdev() can make sure that features are correctly programmed during network device registration. Since we can now be called during register_netdev() with clocks gated, we need to temporarily turn them on/off in order to have a successful register programming. We also move the CRC forward setting read into bcmgenet_set_features() since priv->crc_fwd_en matters while turning on RX checksum offload, that way we are guaranteed they are in sync in case we ever add support for NETIF_F_RXFCS at some point in the future. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: use CHECKSUM_COMPLETE for NETIF_F_RXCSUMDoug Berger
This commit updates the Rx checksum offload behavior of the driver to use the more generic CHECKSUM_COMPLETE method that supports all protocols over the CHECKSUM_UNNECESSARY method that only applies to some protocols known by the hardware. This behavior is perceived to be superior. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: enable NETIF_F_HW_CSUM featureDoug Berger
The GENET hardware should be capable of generating IP checksums using the NETIF_F_HW_CSUM feature, so switch to using that feature instead of the depricated NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: bcmgenet: enable NETIF_F_HIGHDMA flagDoug Berger
This commit configures the DMA masks for the GENET driver and sets the NETIF_F_HIGHDMA flag to report support of the feature. Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: systemport: Set correct DMA maskFlorian Fainelli
SYSTEMPORT is capabable of doing up to 40-bit of physical addresses, set an appropriate DMA mask to permit that. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge branch 'cls_u32-fix-refcount-leak'David S. Miller
Davide Caratti says: ==================== net/sched: cls_u32: fix refcount leak a refcount leak in the error path of u32_change() has been recently introduced. It can be observed with the following commands: [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 97 \ > u32 match ip src 127.0.0.1/32 indev notexist20 flowid 1:1 action drop RTNETLINK answers: Invalid argument We have an error talking to the kernel [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 98 \ > handle 42:42 u32 divisor 256 Error: cls_u32: Divisor can only be used on a hash table. We have an error talking to the kernel [root@f31 ~]# tc filter replace dev eth0 ingress protocol ip prio 99 \ > u32 ht 47:47 Error: cls_u32: Specified hash table not found. We have an error talking to the kernel they all legitimately return -EINVAL; however, they leave semi-configured filters at eth0 tc ingress: [root@f31 ~]# tc filter show dev eth0 ingress filter protocol ip pref 97 u32 chain 0 filter protocol ip pref 97 u32 chain 0 fh 800: ht divisor 1 filter protocol ip pref 98 u32 chain 0 filter protocol ip pref 98 u32 chain 0 fh 801: ht divisor 1 filter protocol ip pref 99 u32 chain 0 filter protocol ip pref 99 u32 chain 0 fh 802: ht divisor 1 With older kernels, filters were unconditionally considered empty (and thus de-refcounted) on the error path of ->change(). After commit 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution"), filters were considered empty when the walk() function didn't set 'walker.stop' to 1. Finally, with commit 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty"), tc filters are considered empty unless the walker function is called with a non-NULL handle. This last change doesn't fit cls_u32 design, because at least the "root hnode" is (almost) always non-NULL, as it's allocated in u32_init(). - patch 1/2 is a proposal to restore the original kernel behavior, where no filter was installed in the error path of u32_change(). - patch 2/2 adds tdc selftests that can be ued to verify the correct behavior of u32 in the error path of ->change(). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19tc-testing: initial tdc selftests for cls_u32Davide Caratti
- move test "e9a3 - Add u32 with source match" to u32.json, and change the match pattern to catch all hnodes - add testcases for relevant error paths of cls_u32 module Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net/sched: cls_u32: fix refcount leak in the error path of u32_change()Davide Caratti
when users replace cls_u32 filters with new ones having wrong parameters, so that u32_change() fails to validate them, the kernel doesn't roll-back correctly, and leaves semi-configured rules. Fix this in u32_walk(), avoiding a call to the walker function on filters that don't have a match rule connected. The side effect is, these "empty" filters are not even dumped when present; but that shouldn't be a problem as long as we are restoring the original behaviour, where semi-configured filters were not even added in the error path of u32_change(). Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge branch 'nfp-tls-implement-the-stream-sync-RX-resync'David S. Miller
Jakub Kicinski says: ==================== nfp: tls: implement the stream sync RX resync This small series adds support for using the device in stream scan RX resync mode which improves the RX resync success rate. Without stream scan it's pretty much impossible to successfully resync a continuous stream. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: tls: implement the stream sync RX resyncJakub Kicinski
The simple RX resync strategy controlled by the kernel does not guarantee as good results as if the device helps by detecting the potential record boundaries and keeping track of them. We've called this strategy stream scan in the tls-offload doc. Implement this strategy for the NFP. The device sends a request for record boundary confirmation, which is then recorded in per-TLS socket state and responded to once record is reached. Because the device keeps track of records passing after the request was sent the response is not as latency sensitive as when kernel just tries to tell the device the information about the next record. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net/tls: add helper for testing if socket is RX offloadedJakub Kicinski
There is currently no way for driver to reliably check that the socket it has looked up is in fact RX offloaded. Add a helper. This allows drivers to catch misbehaving firmware. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: pass packet pointer to nfp_net_parse_meta()Jakub Kicinski
Make nfp_net_parse_meta() take a packet pointer and return a drop/no drop decision. Right now it returns the end of metadata and caller compares it to the packet pointer. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge branch 'nfp-ipv6-tunnel'David S. Miller
John Hurley says: ==================== Add ipv6 tunnel support to NFP The following patches add support for IPv6 tunnel offload to the NFP driver. Patches 1-2 do some code tidy up and prepare existing code for reuse in IPv6 tunnels. Patches 3-4 handle IPv6 tunnel decap (match) rules. Patches 5-8 handle encap (action) rules. Patch 9 adds IPv6 support to the merge and pre-tunnel rule functions. v1->v2: - fix compiler warning when building without CONFIG_IPV6 set - Jakub Kicinski (patch 7) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: update flow merge code to support IPv6 tunnelsJohn Hurley
Both pre-tunnel match rules and flow merge functions parse compiled match/action fields for validation. Update these validation functions to include IPv6 match and action fields. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: support ipv6 tunnel keep-alive messages from fwJohn Hurley
FW sends an update of IPv6 tunnels that are active in a given period. Use this information to update the kernel table so that neighbour entries do not time out when active on the NIC. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: handle notifiers for ipv6 route changesJohn Hurley
A notifier is used to track route changes in the kernel. If a change is made to a route that is offloaded to fw then an update is sent to the NIC. The driver tracks all routes that are offloaded to determine if a kernel change is of interest. Extend the notifier to track IPv6 route changes and create a new list that stores offloaded IPv6 routes. Modify the IPv4 route helper functions to accept varying address lengths. This way, the same core functions can be used to handle IPv4 and IPv6. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: handle ipv6 tunnel no neigh requestJohn Hurley
When fw does not know the next hop for an IPv6 tunnel, it sends a request to the driver. Handle this request by doing a route lookup on the IPv6 address and offloading the next hop to the fw neighbour table. Similar functions already exist to handle IPv4 no neighbour requests. To avoid confusion, append these functions with the _ipv4 tag. There is no change in functionality with this. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: modify pre-tunnel and set tunnel action for ipv6John Hurley
The IPv4 set tunnel action allows the setting of tunnel metadata such as the TTL and ToS values. The pre-tunnel action includes the destination IP address and is used to calculate the next hop from from the neighbour table. Much of the IPv4 tunnel actions can be reused for IPv6 tunnels. Change the names of associated functions and structs to remove the IPv4 identifier and make minor modifcations to support IPv6 tunnel actions. Ensure the pre-tunnel action contains the IPv6 address along with an identifying flag when an IPv6 tunnel action is required. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: offload list of IPv6 tunnel endpoint addressesJohn Hurley
Fw requires a list of IPv6 addresses that are used as tunnel endpoints to enable correct decap of tunneled packets. Store a list of IPv6 endpoints used in rules with a ref counter to track how many times it is in use. Offload the entire list any time a new IPv6 address is added or when an address is removed (ref count is 0). Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: compile match for IPv6 tunnelsJohn Hurley
IPv6 tunnel matches are now supported by firmware. Modify the NFP driver to compile these match rules. IPv6 matches are handled similar to IPv4 tunnels with the difference the address length. The type of tunnel is indicated by the same bitmap that is used in IPv4 with an extra bit signifying that the IPv6 variation should be used. Only compile IPv6 tunnel matches when the fw features symbol indicated that they are compatible with the currently loaded fw. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: move udp tunnel key match compilation to helper functionJohn Hurley
IPv4 UDP and GRE tunnel match rule compile helpers share functions for compiling fields such as IP addresses. However, they handle fields such tunnel IDs differently. Create new helper functions for compiling GRE and UDP tunnel key data. This is in preparation for supporting IPv6 tunnels where these new functions can be reused. This patch does not change functionality. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfp: flower: pass flow rule pointer directly to match functionsJohn Hurley
In kernel 5.1, the flow offload API was introduced along with a helper function to extract the flow_rule from the TC offload struct. Each of the match helper functions are passed the offload struct and extract the flow rule to a local variable. Simplify the code while also removing the extra compat and local variable calls by extracting the rule once in the main match handler, and passing a reference to the rule direct to each helper. This patch does not change driver functionality. Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19hdlcdrv: replace unnecessary assertion in hdlcdrv_registerAditya Pakki
In hdlcdrv_register, failure to register the driver causes a crash. The three callers of hdlcdrv_register all pass valid pointers and do not fail. The patch eliminates the unnecessary BUG_ON assertion. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19nfc: s3fwrn5: replace the assertion with a WARN_ONAditya Pakki
In s3fwrn5_fw_recv_frame, if fw_info->rsp is not empty, the current code causes a crash via BUG_ON. However, s3fwrn5_fw_send_msg does not crash in such a scenario. The patch replaces the BUG_ON by returning the error to the callers and frees up skb. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19Merge branch 'macb-fix-probing-of-PHY-not-described-in-the-dt'David S. Miller
Antoine Tenart says: ==================== net: macb: fix probing of PHY not described in the dt The macb Ethernet driver supports various ways of referencing its network PHY. When a device tree is used the PHY can be referenced with a phy-handle or, if connected to its internal MDIO bus, described in a child node. Some platforms omitted the PHY description while connecting the PHY to the internal MDIO bus and in such cases the MDIO bus has to be scanned "manually" by the macb driver. Prior to the phylink conversion the driver registered the MDIO bus with of_mdiobus_register and then in case the PHY couldn't be retrieved using dt or using phy_find_first (because registering an MDIO bus with of_mdiobus_register masks all PHYs) the macb driver was "manually" scanning the MDIO bus (like mdiobus_register does). The phylink conversion did break this particular case but reimplementing the manual scan of the bus in the macb driver wouldn't be very clean. The solution seems to be registering the MDIO bus based on if the PHYs are described in the device tree or not. There are multiple ways to do this, none is perfect. I chose to check if any of the child nodes of the macb node was a network PHY and based on this to register the MDIO bus with the of_ helper or not. The drawback is boards referencing the PHY through phy-handle, would scan the entire MDIO bus of the macb at boot time (as the MDIO bus would be registered with mdiobus_register). For this solution to work properly of_mdiobus_child_is_phy has to be exported, which means the patch doing so has to be backported to -stable as well. Another possible solution could have been to simply check if the macb node has a child node by counting its sub-nodes. This isn't techically perfect, as there could be other sub-nodes (in practice this should be fine, fixed-link being taken care of in the driver). We could also simply s/of_mdiobus_register/mdiobus_register/ but that could break boards using the PHY description in child node as a selector (which really would be not a proper way to do this...). The real issue here being having PHYs not described in the dt but we have dt backward compatibility, so we have to live with that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: macb: fix probing of PHY not described in the dtAntoine Tenart
This patch fixes the case where the PHY isn't described in the device tree. This is due to the way the MDIO bus is registered in the driver: whether the PHY is described in the device tree or not, the bus is registered through of_mdiobus_register. The function masks all the PHYs and only allow probing the ones described in the device tree. Prior to the Phylink conversion this was also done but later on in the driver the MDIO bus was manually scanned to circumvent the fact that the PHY wasn't described. This patch fixes it in a proper way, by registering the MDIO bus based on if the PHY attached to a given interface is described in the device tree or not. Fixes: 7897b071ac3b ("net: macb: convert to phylink") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19of: mdio: export of_mdiobus_child_is_phyAntoine Tenart
This patch exports of_mdiobus_child_is_phy, allowing to check if a child node is a network PHY. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: mvpp2: cycle comphy to power it downRussell King
Presently, at boot time, the comphys are enabled. For firmware compatibility reasons, the comphy driver does not power down the comphys at boot. Consequently, the ethernet comphys are left active until the network interfaces are brought through an up/down cycle. If the port is never used, the port wastes power needlessly. Arrange for the ethernet comphys to be cycled by the mvpp2 driver as if the interface went through an up/down cycle during driver probe, thereby powering them down. This saves: 270mW per 10G SFP+ port on the Macchiatobin Single Shot (eth0/eth1) 370mW per 10G PHY port on the Macchiatobin Double Shot (eth0/eth1) 160mW on the SFP port on either Macchiatobin flavour (eth3) Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: sfp: report error on failure to read sfp soft statusRussell King
Report a rate-limited error if we fail to read the SFP soft status, and preserve the current status in that case. This avoids I2C bus errors from triggering a link flap. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19tracing: Have the histogram compare functions convert to u64 firstSteven Rostedt (VMware)
The compare functions of the histogram code would be specific for the size of the value being compared (byte, short, int, long long). It would reference the value from the array via the type of the compare, but the value was stored in a 64 bit number. This is fine for little endian machines, but for big endian machines, it would end up comparing zeros or all ones (depending on the sign) for anything but 64 bit numbers. To fix this, first derference the value as a u64 then convert it to the type being compared. Link: http://lkml.kernel.org/r/20191211103557.7bed6928@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 08d43a5fa063e ("tracing: Add lock-free tracing_map") Acked-by: Tom Zanussi <zanussi@kernel.org> Reported-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-19tracing: Avoid memory leak in process_system_preds()Keita Suzuki
When failing in the allocation of filter_item, process_system_preds() goes to fail_mem, where the allocated filter is freed. However, this leads to memory leak of filter->filter_string and filter->prog, which is allocated before and in process_preds(). This bug has been detected by kmemleak as well. Fix this by changing kfree to __free_fiter. unreferenced object 0xffff8880658007c0 (size 32): comm "bash", pid 579, jiffies 4295096372 (age 17.752s) hex dump (first 32 bytes): 63 6f 6d 6d 6f 6e 5f 70 69 64 20 20 3e 20 31 30 common_pid > 10 00 00 00 00 00 00 00 00 65 73 00 00 00 00 00 00 ........es...... backtrace: [<0000000067441602>] kstrdup+0x2d/0x60 [<00000000141cf7b7>] apply_subsystem_event_filter+0x378/0x932 [<000000009ca32334>] subsystem_filter_write+0x5a/0x90 [<0000000072da2bee>] vfs_write+0xe1/0x240 [<000000004f14f473>] ksys_write+0xb4/0x150 [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0 [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff888060c22d00 (size 64): comm "bash", pid 579, jiffies 4295096372 (age 17.752s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 e8 d7 41 80 88 ff ff ...........A.... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000b8c1b109>] process_preds+0x243/0x1820 [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932 [<000000009ca32334>] subsystem_filter_write+0x5a/0x90 [<0000000072da2bee>] vfs_write+0xe1/0x240 [<000000004f14f473>] ksys_write+0xb4/0x150 [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0 [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff888041d7e800 (size 512): comm "bash", pid 579, jiffies 4295096372 (age 17.752s) hex dump (first 32 bytes): 70 bc 85 97 ff ff ff ff 0a 00 00 00 00 00 00 00 p............... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000001e04af34>] process_preds+0x71a/0x1820 [<000000003972c7f0>] apply_subsystem_event_filter+0x3be/0x932 [<000000009ca32334>] subsystem_filter_write+0x5a/0x90 [<0000000072da2bee>] vfs_write+0xe1/0x240 [<000000004f14f473>] ksys_write+0xb4/0x150 [<00000000a968b4a0>] do_syscall_64+0x6d/0x1e0 [<000000001a189f40>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20191211091258.11310-1-keitasuzuki.park@sslab.ics.keio.ac.jp Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 404a3add43c9c ("tracing: Only add filter list when needed") Signed-off-by: Keita Suzuki <keitasuzuki.park@sslab.ics.keio.ac.jp> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2019-12-19 The following pull-request contains BPF updates for your *net* tree. We've added 10 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 269 insertions(+), 108 deletions(-). The main changes are: 1) Fix lack of synchronization between xsk wakeup and destroying resources used by xsk wakeup, from Maxim Mikityanskiy. 2) Fix pruning with tail call patching, untrack programs in case of verifier error and fix a cgroup local storage tracking bug, from Daniel Borkmann. 3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer. 4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when only cBPF is present, from Alexander Lobakin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19bpf: Add further test_verifier cases for record_func_keyDaniel Borkmann
Expand dummy prog generation such that we can easily check on return codes and add few more test cases to make sure we keep on tracking pruning behavior. # ./test_verifier [...] #1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK #1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED Also verified that JIT dump of added test cases looks good. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/df7200b6021444fd369376d227de917357285b65.1576789878.git.daniel@iogearbox.net
2019-12-19bpf: Fix record_func_key to perform backtracking on r3Daniel Borkmann
While testing Cilium with /unreleased/ Linus' tree under BPF-based NodePort implementation, I noticed a strange BPF SNAT engine behavior from time to time. In some cases it would do the correct SNAT/DNAT service translation, but at a random point in time it would just stop and perform an unexpected translation after SYN, SYN/ACK and stack would send a RST back. While initially assuming that there is some sort of a race condition in BPF code, adding trace_printk()s for debugging purposes at some point seemed to have resolved the issue auto-magically. Digging deeper on this Heisenbug and reducing the trace_printk() calls to an absolute minimum, it turns out that a single call would suffice to trigger / not trigger the seen RST issue, even though the logic of the program itself remains unchanged. Turns out the single call changed verifier pruning behavior to get everything to work. Reconstructing a minimal test case, the incorrect JIT dump looked as follows: # bpftool p d j i 11346 0xffffffffc0cba96c: [...] 21: movzbq 0x30(%rdi),%rax 26: cmp $0xd,%rax 2a: je 0x000000000000003a 2c: xor %edx,%edx 2e: movabs $0xffff89cc74e85800,%rsi 38: jmp 0x0000000000000049 3a: mov $0x2,%edx 3f: movabs $0xffff89cc74e85800,%rsi 49: mov -0x224(%rbp),%eax 4f: cmp $0x20,%eax 52: ja 0x0000000000000062 54: add $0x1,%eax 57: mov %eax,-0x224(%rbp) 5d: jmpq 0xffffffffffff6911 62: mov $0x1,%eax [...] Hence, unexpectedly, JIT emitted a direct jump even though retpoline based one would have been needed since in line 2c and 3a we have different slot keys in BPF reg r3. Verifier log of the test case reveals what happened: 0: (b7) r0 = 14 1: (73) *(u8 *)(r1 +48) = r0 2: (71) r0 = *(u8 *)(r1 +48) 3: (15) if r0 == 0xd goto pc+4 R0_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0 4: (b7) r3 = 0 5: (18) r2 = 0xffff89cc74d54a00 7: (05) goto pc+3 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit from 3 to 8: R0_w=inv13 R1=ctx(id=0,off=0,imm=0) R10=fp0 8: (b7) r3 = 2 9: (18) r2 = 0xffff89cc74d54a00 11: safe processed 13 insns (limit 1000000) [...] Second branch is pruned by verifier since considered safe, but issue is that record_func_key() couldn't have seen the index in line 3a and therefore decided that emitting a direct jump at this location was okay. Fix this by reusing our backtracking logic for precise scalar verification in order to prevent pruning on the slot key. This means verifier will track content of r3 all the way backwards and only prune if both scalars were unknown in state equivalence check and therefore poisoned in the first place in record_func_key(). The range is [x,x] in record_func_key() case since the slot always would have to be constant immediate. Correct verification after fix: 0: (b7) r0 = 14 1: (73) *(u8 *)(r1 +48) = r0 2: (71) r0 = *(u8 *)(r1 +48) 3: (15) if r0 == 0xd goto pc+4 R0_w=invP(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=ctx(id=0,off=0,imm=0) R10=fp0 4: (b7) r3 = 0 5: (18) r2 = 0x0 7: (05) goto pc+3 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit from 3 to 8: R0_w=invP13 R1=ctx(id=0,off=0,imm=0) R10=fp0 8: (b7) r3 = 2 9: (18) r2 = 0x0 11: (85) call bpf_tail_call#12 12: (b7) r0 = 1 13: (95) exit processed 15 insns (limit 1000000) [...] And correct corresponding JIT dump: # bpftool p d j i 11 0xffffffffc0dc34c4: [...] 21: movzbq 0x30(%rdi),%rax 26: cmp $0xd,%rax 2a: je 0x000000000000003a 2c: xor %edx,%edx 2e: movabs $0xffff9928b4c02200,%rsi 38: jmp 0x0000000000000049 3a: mov $0x2,%edx 3f: movabs $0xffff9928b4c02200,%rsi 49: cmp $0x4,%rdx 4d: jae 0x0000000000000093 4f: and $0x3,%edx 52: mov %edx,%edx 54: cmp %edx,0x24(%rsi) 57: jbe 0x0000000000000093 59: mov -0x224(%rbp),%eax 5f: cmp $0x20,%eax 62: ja 0x0000000000000093 64: add $0x1,%eax 67: mov %eax,-0x224(%rbp) 6d: mov 0x110(%rsi,%rdx,8),%rax 75: test %rax,%rax 78: je 0x0000000000000093 7a: mov 0x30(%rax),%rax 7e: add $0x19,%rax 82: callq 0x000000000000008e 87: pause 89: lfence 8c: jmp 0x0000000000000087 8e: mov %rax,(%rsp) 92: retq 93: mov $0x1,%eax [...] Also explicitly adding explicit env->allow_ptr_leaks to fixup_bpf_calls() since backtracking is enabled under former (direct jumps as well, but use different test). In case of only tracking different map pointers as in c93552c443eb ("bpf: properly enforce index mask to prevent out-of-bounds speculation"), pruning cannot make such short-cuts, neither if there are paths with scalar and non-scalar types as r3. mark_chain_precision() is only needed after we know that register_is_const(). If it was not the case, we already poison the key on first path and non-const key in later paths are not matching the scalar range in regsafe() either. Cilium NodePort testing passes fine as well now. Note, released kernels not affected. Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/ac43ffdeb7386c5bd688761ed266f3722bb39823.1576789878.git.daniel@iogearbox.net
2019-12-19Merge branch 'phylib-consolidation'David S. Miller
Russell King says: ==================== phylib consolidation Over the last few releases, there has been a push to clean up and consolidate the phylib code. Some cases have been missed, and this series catches those cases. 1. Remove redundant .aneg_done initialisers; calling genphy_aneg_done() for clause 22 PHYs is the default when .aneg_done is not set. 2. Some PHY drivers manually set phydev->pause and phydev->asym_pause, but we have a helper for this - phy_resolve_aneg_pause(), introduced in 2d880b8709c0 ("net: phy: extract pause mode"). Use this in the lxt, marvell and uPD60620 drivers. Incidentally, this brings up the question whether marvell fiber mode is correctly interpreting and advertising the pause parameters. 3. Add a genphy_check_and_restart_aneg() helper, which complements the clause 45 version of this. This will be useful for PHY drivers that open code this logic (e.g. marvell.c) 4. Add a genphy_read_status_fixed() helper to read the fixed-mode status from a clause 22 PHY. lxt and marvell both contain copies of this code, so convert them over. 5. Arrange marvell driver to use genphy_read_lpa() for copper mode. This needs some rearrangement of the code in marvell_read_status_page_an(), but preserves using the PHY specific status register to derive the current negotiation results. 6. Simplify the marvell driver so we can use the genphy_read_status_fixed() helper directly rather than marvell_read_status_page_fixed(). 7. Use positive logic in the marvell driver to determine the link state, and get rid of the REGISTER_LINK_STATUS definition; we already have a definition for this. 8. The marvell driver reads the PHY specific status register multiple times when determining the status: once in marvell_update_link() and again in marvell_read_status_page_an(). This is a waste; rearrange to read the status register once, and pass its value into marvell_read_status_page_an(). We preserve using genphy_update_link() for the copper side. 9. The marvell driver was using private clause 37 definitions, but we have clause 37 definitions in uapi/linux/mii.h. Use the generic definitions. 10. Switch the marvell driver to use phy_modify_changed() to modify the fiber advertisement. 11. Switch the marvell driver to use genphy_check_and_restart_aneg() introduced above rather than open-coding this functionality. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: use genphy_check_and_restart_aneg()Russell King
Use the helper to check and restart autonegotiation for the marvell fiber page negotiation setting. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19net: phy: marvell: use phy_modify_changed()Russell King
Use phy_modify_changed() to change the fiber advertisement register rather than open coding this functionality. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>