summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-26bonding: rename AD_STATE_* to LACP_STATE_*Andy Roulin
As the LACP actor/partner state is now part of the uapi, rename the 3ad state defines with LACP prefix. The LACP prefix is preferred over BOND_3AD as the LACP standard moved to 802.1AX. Fixes: 826f66b30c2e3 ("bonding: move 802.3ad port state flags to uapi") Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-26sctp: move trace_sctp_probe_path into sctp_outq_sackKevin Kou
The original patch bringed in the "SCTP ACK tracking trace event" feature was committed at Dec.20, 2017, it replaced jprobe usage with trace events, and bringed in two trace events, one is TRACE_EVENT(sctp_probe), another one is TRACE_EVENT(sctp_probe_path). The original patch intended to trigger the trace_sctp_probe_path in TRACE_EVENT(sctp_probe) as below code, +TRACE_EVENT(sctp_probe, + + TP_PROTO(const struct sctp_endpoint *ep, + const struct sctp_association *asoc, + struct sctp_chunk *chunk), + + TP_ARGS(ep, asoc, chunk), + + TP_STRUCT__entry( + __field(__u64, asoc) + __field(__u32, mark) + __field(__u16, bind_port) + __field(__u16, peer_port) + __field(__u32, pathmtu) + __field(__u32, rwnd) + __field(__u16, unack_data) + ), + + TP_fast_assign( + struct sk_buff *skb = chunk->skb; + + __entry->asoc = (unsigned long)asoc; + __entry->mark = skb->mark; + __entry->bind_port = ep->base.bind_addr.port; + __entry->peer_port = asoc->peer.port; + __entry->pathmtu = asoc->pathmtu; + __entry->rwnd = asoc->peer.rwnd; + __entry->unack_data = asoc->unack_data; + + if (trace_sctp_probe_path_enabled()) { + struct sctp_transport *sp; + + list_for_each_entry(sp, &asoc->peer.transport_addr_list, + transports) { + trace_sctp_probe_path(sp, asoc); + } + } + ), But I found it did not work when I did testing, and trace_sctp_probe_path had no output, I finally found that there is trace buffer lock operation(trace_event_buffer_reserve) in include/trace/trace_events.h: static notrace void \ trace_event_raw_event_##call(void *__data, proto) \ { \ struct trace_event_file *trace_file = __data; \ struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ struct trace_event_buffer fbuffer; \ struct trace_event_raw_##call *entry; \ int __data_size; \ \ if (trace_trigger_soft_disabled(trace_file)) \ return; \ \ __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \ \ entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ \ trace_event_buffer_commit(&fbuffer); \ } The reason caused no output of trace_sctp_probe_path is that trace_sctp_probe_path written in TP_fast_assign part of TRACE_EVENT(sctp_probe), and it will be placed( { assign; } ) after the trace_event_buffer_reserve() when compiler expands Macro, entry = trace_event_buffer_reserve(&fbuffer, trace_file, \ sizeof(*entry) + __data_size); \ \ if (!entry) \ return; \ \ tstruct \ \ { assign; } \ so trace_sctp_probe_path finally can not acquire trace_event_buffer and return no output, that is to say the nest of tracepoint entry function is not allowed. The function call flow is: trace_sctp_probe() -> trace_event_raw_event_sctp_probe() -> lock buffer -> trace_sctp_probe_path() -> trace_event_raw_event_sctp_probe_path() --nested -> buffer has been locked and return no output. This patch is to remove trace_sctp_probe_path from the TP_fast_assign part of TRACE_EVENT(sctp_probe) to avoid the nest of entry function, and trigger sctp_probe_path_trace in sctp_outq_sack. After this patch, you can enable both events individually, # cd /sys/kernel/debug/tracing # echo 1 > events/sctp/sctp_probe/enable # echo 1 > events/sctp/sctp_probe_path/enable Or, you can enable all the events under sctp. # echo 1 > events/sctp/enable Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-26net/mlxfw: Fix out-of-memory error in mfa2 flash burningVladyslav Tarasiuk
The burning process requires to perform internal allocations of large chunks of memory. This memory doesn't need to be contiguous and can be safely allocated by vzalloc() instead of kzalloc(). This patch changes such allocation to avoid possible out-of-memory failure. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-26netfilter: nft_meta: add support for slave device ifindex matchingFlorian Westphal
Allow to match on vrf slave ifindex or name. In case there was no slave interface involved, store 0 in the destination register just like existing iif/oif matching. sdif(name) is restricted to the ipv4/ipv6 input and forward hooks, as it depends on ip(6) stack parsing/storing info in skb->cb[]. Cc: Martin Willi <martin@strongswan.org> Cc: David Ahern <dsahern@kernel.org> Cc: Shrijeet Mukherjee <shrijeet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: place rtclassid handling in a helperFlorian Westphal
skb_dst is an inline helper with a WARN_ON(), so this is a bit more code than it looks like. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: place prandom handling in a helperFlorian Westphal
Move this out of the main eval loop, the numgen expression provides a better alternative to meta random. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: move all interface related keys to helperFlorian Westphal
Reduces repetiveness and reduces size of meta eval function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: move interface kind handling to helperFlorian Westphal
checkpatch complains about == NULL checks in original code, so use !in instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: move cgroup handling to helperFlorian Westphal
Reduce size of main eval function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: move sk uid/git handling to helperFlorian Westphal
Not a hot path. Also, both have copy&paste case statements, so use a common helper for both. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: move pkttype handling to helperFlorian Westphal
When pkttype is loopback, nft_meta performs guesswork to detect broad/multicast packets. Place this in a helper, this is hardly a hot path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26netfilter: nft_meta: move time handling to helperFlorian Westphal
reduce size of the (large) meta evaluation function. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-12-26bpf: Print error message for bpftool cgroup showHechao Li
Currently, when bpftool cgroup show <path> has an error, no error message is printed. This is confusing because the user may think the result is empty. Before the change: $ bpftool cgroup show /sys/fs/cgroup ID AttachType AttachFlags Name $ echo $? 255 After the change: $ ./bpftool cgroup show /sys/fs/cgroup Error: can't query bpf programs attached to /sys/fs/cgroup: Operation not permitted v2: Rename check_query_cgroup_progs to cgroup_has_attached_progs Signed-off-by: Hechao Li <hechaol@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191224011742.3714301-1-hechaol@fb.com
2019-12-26libbpf: Support CO-RE relocations for LDX/ST/STX instructionsAndrii Nakryiko
Clang patch [0] enables emitting relocatable generic ALU/ALU64 instructions (i.e, shifts and arithmetic operations), as well as generic load/store instructions. The former ones are already supported by libbpf as is. This patch adds further support for load/store instructions. Relocatable field offset is encoded in BPF instruction's 16-bit offset section and are adjusted by libbpf based on target kernel BTF. These Clang changes and corresponding libbpf changes allow for more succinct generated BPF code by encoding relocatable field reads as a single ST/LDX/STX instruction. It also enables relocatable access to BPF context. Previously, if context struct (e.g., __sk_buff) was accessed with CO-RE relocations (e.g., due to preserve_access_index attribute), it would be rejected by BPF verifier due to modified context pointer dereference. With Clang patch, such context accesses are both relocatable and have a fixed offset from the point of view of BPF verifier. [0] https://reviews.llvm.org/D71790 Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191223180305.86417-1-andriin@fb.com
2019-12-25Merge branch 'Peer-to-Peer-One-Step-time-stamping'David S. Miller
Richard Cochran says: ==================== Peer to Peer One-Step time stamping This series adds support for PTP (IEEE 1588) P2P one-step time stamping along with a driver for a hardware device that supports this. If the hardware supports p2p one-step, it subtracts the ingress time stamp value from the Pdelay_Request correction field. The user space software stack then simply copies the correction field into the Pdelay_Response, and on transmission the hardware adds the egress time stamp into the correction field. This new functionality extends CONFIG_NETWORK_PHY_TIMESTAMPING to cover MII snooping devices, but it still depends on phylib, just as that option does. Expanding beyond phylib is not within the scope of the this series. User space support is available in the current linuxptp master branch. - Patch 1 adds phy_device methods for existing time stamping fields. - Patches 2-5 convert the stack and drivers to the new methods. - Patch 6 moves code around the dp83640 driver. - Patches 7-10 add support for MII time stamping in non-PHY devices. - Patch 11 adds the new P2P 1-step option. - Patch 12 adds a driver implementing the new option. Thanks, Richard Changed in v9: ~~~~~~~~~~~~~~ - Fix two more drivers' switch/case blocks WRT the new HWTSTAMP ioctl. - Picked up two more review tags from Andrew. Changed in v8: ~~~~~~~~~~~~~~ - Avoided adding forward functional declarations in the dp83640 driver. - Picked up Florian's new review tags and another one from Andrew. Changed in v7: ~~~~~~~~~~~~~~ - Converted pr_debug|err to dev_ variants in new driver. - Fixed device tree documentation per Rob's v6 review. - Picked up Andrew's and Rob's review tags. - Silenced sparse warnings in new driver. Changed in v6: ~~~~~~~~~~~~~~ - Added methods for accessing the phy_device time stamping fields. - Adjust the device tree documentation per Rob's v5 review. - Fixed the build failures due to missing exports. Changed in v5: ~~~~~~~~~~~~~~ - Fixed build failure in macvlan. - Fixed latent bug with its gcc warning in the driver. Changed in v4: ~~~~~~~~~~~~~~ - Correct error paths and PTR_ERR return values in the framework. - Expanded KernelDoc comments WRT PHY locking. - Pick up Andrew's review tag. Changed in v3: ~~~~~~~~~~~~~~ - Simplify the device tree binding and document the time stamping phandle by itself. Changed in v2: ~~~~~~~~~~~~~~ - Per the v1 review, changed the modeling of MII time stamping devices. They are no longer a kind of mdio device. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ptp: Add a driver for InES time stamping IP core.Richard Cochran
The InES at the ZHAW offers a PTP time stamping IP core. The FPGA logic recognizes and time stamps PTP frames on the MII bus. This patch adds a driver for the core along with a device tree binding to allow hooking the driver to MII buses. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: Introduce peer to peer one step PTP time stamping.Richard Cochran
The 1588 standard defines one step operation for both Sync and PDelay_Resp messages. Up until now, hardware with P2P one step has been rare, and kernel support was lacking. This patch adds support of the mode in anticipation of new hardware developments. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: mdio: of: Register discovered MII time stampers.Richard Cochran
When parsing a PHY node, register its time stamper, if any, and attach the instance to the PHY device. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25dt-bindings: ptp: Introduce MII time stamping devices.Richard Cochran
This patch add a new binding that allows non-PHY MII time stamping devices to find their buses. The new documentation covers both the generic binding and one upcoming user. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: Add a layer for non-PHY MII time stamping drivers.Richard Cochran
While PHY time stamping drivers can simply attach their interface directly to the PHY instance, stand alone drivers require support in order to manage their services. Non-PHY MII time stamping drivers have a control interface over another bus like I2C, SPI, UART, or via a memory mapped peripheral. The controller device will be associated with one or more time stamping channels, each of which sits snoops in on a MII bus. This patch provides a glue layer that will enable time stamping channels to find their controlling device. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: Introduce a new MII time stamping interface.Richard Cochran
Currently the stack supports time stamping in PHY devices. However, there are newer, non-PHY devices that can snoop an MII bus and provide time stamps. In order to support such devices, this patch introduces a new interface to be used by both PHY and non-PHY devices. In addition, the one and only user of the old PHY time stamping API is converted to the new interface. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: phy: dp83640: Move the probe and remove methods around.Richard Cochran
An upcoming patch will change how the PHY time stamping functions are registered with the networking stack, and adapting this driver would entail adding forward declarations for four time stamping methods. However, forward declarations are considered to be stylistic defects. This patch avoids the issue by moving the probe and remove methods immediately above the phy_driver interface structure. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: netcp_ethss: Use the PHY time stamping interface.Richard Cochran
The netcp_ethss driver tests fields of the phy_device in order to determine whether to defer to the PHY's time stamping functionality. This patch replaces the open coded logic with an invocation of the proper methods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: ethtool: Use the PHY time stamping interface.Richard Cochran
The ethtool layer tests fields of the phy_device in order to determine whether to invoke the PHY's tsinfo ethtool callback. This patch replaces the open coded logic with an invocation of the proper methods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: vlan: Use the PHY time stamping interface.Richard Cochran
The vlan layer tests fields of the phy_device in order to determine whether to invoke the PHY's tsinfo ethtool callback. This patch replaces the open coded logic with an invocation of the proper methods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: macvlan: Use the PHY time stamping interface.Richard Cochran
The macvlan layer tests fields of the phy_device in order to determine whether to invoke the PHY's tsinfo ethtool callback. This patch replaces the open coded logic with an invocation of the proper methods. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25net: phy: Introduce helper functions for time stamping support.Richard Cochran
Some parts of the networking stack and at least one driver test fields within the 'struct phy_device' in order to query time stamping capabilities and to invoke time stamping methods. This patch adds a functional interface around the time stamping fields. This will allow insulating the callers from future changes to the details of the time stamping implemenation. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25ata: ahci_brcm: Add missing clock management during recoveryFlorian Fainelli
The downstream implementation of ahci_brcm.c did contain clock management recovery, but until recently, did that outside of the libahci_platform helpers and this was unintentionally stripped out while forward porting the patch upstream. Add the missing clock management during recovery and sleep for 10 milliseconds per the design team recommendations to ensure the SATA PHY controller and AFE have been fully quiesced. Fixes: eb73390ae241 ("ata: ahci_brcm: Recover from failures to identify devices") Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINEFlorian Fainelli
Set AHCI_HFLAG_DELAY_ENGINE for the BCM7425 AHCI controller thus making it conforming to the 'strict' AHCI implementation which this controller is based on. This solves long link establishment with specific hard drives (e.g.: Seagate ST1000VM002-9ZL1 SC12) that would otherwise have to complete the error recovery handling before finally establishing a succesful SATA link at the desired speed. We re-order the hpriv->flags assignment to also remove the NONCQ quirk since we can set the flag directly. Fixes: 9586114cf1e9 ("ata: ahci_brcmstb: add support MIPS-based platforms") Fixes: 423be77daabe ("ata: ahci_brcmstb: add quirk for broken ncq") Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25ata: ahci_brcm: Fix AHCI resources managementFlorian Fainelli
The AHCI resources management within ahci_brcm.c is a little convoluted, largely because it historically had a dedicated clock that was managed within this file in the downstream tree. Once brough upstream though, the clock was left to be managed by libahci_platform.c which is entirely appropriate. This patch series ensures that the AHCI resources are fetched and enabled before any register access is done, thus avoiding bus errors on platforms which clock gate the controller by default. As a result we need to re-arrange the suspend() and resume() functions in order to avoid accessing registers after the clocks have been turned off respectively before the clocks have been turned on. Finally, we can refactor brcm_ahci_get_portmask() in order to fetch the number of ports from hpriv->mmio which is now accessible without jumping through hoops like we used to do. The commit pointed in the Fixes tag is both old and new enough not to require major headaches for backporting of this patch. Fixes: eba68f829794 ("ata: ahci_brcmstb: rename to support across Broadcom SoC's") Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys()Florian Fainelli
This reverts commit 6bb86fefa086faba7b60bb452300b76a47cde1a5 ("libahci_platform: Staticize ahci_platform_<en/dis>able_phys()") we are going to need ahci_platform_{enable,disable}_phys() in a subsequent commit for ahci_brcm.c in order to properly control the PHY initialization order. Also make sure the function prototypes are declared in include/linux/ahci_platform.h as a result. Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25Merge branch 'hsr-fix-several-bugs-in-hsr-module'David S. Miller
Taehee Yoo says: ==================== hsr: fix several bugs in hsr module 1. The first patch fixes debugfs warning when it's opened when hsr module is being removed. debugfs file is opened, it tries to hold .owner module, but it would print warning messages if it couldn't hold .owner module. In order to avoid the warning message, this patch makes hsr module does not set .owner. Unsetting .owner is safe because these are protected by inode_lock(). 2. The second patch fixes wrong error handling of hsr_dev_finalize() a) hsr_dev_finalize() calls debugfs_create_{dir/file} to create debugfs. it checks NULL pointer but debugfs don't return NULL so it's wrong code. b) hsr_dev_finalize() calls register_netdevice(). so if it fails after register_netdevice(), it should call unregister_netdevice(). But it doesn't. c) debugfs doesn't affect any actual logic of hsr module. So, the failure of creating of debugfs could be ignored. 3. The third patch adds hsr root debugfs directory. When hsr interface is created, it creates debugfs directory in /sys/kernel/debug/<interface name>. It's a little bit faulty path because if an interface is the same with another directory name in the same path, it will fail. If hsr root directory is existing, the possibility of failure of creating debugfs file will be reduced. 4. The fourth patch adds debugfs rename routine. debugfs directory name is the same with hsr interface name. So hsr interface name is changed, debugfs directory name should be changed too. 5. The fifth patch fixes a race condition in node list add and del. hsr nodes are protected by RCU and there is no write side lock. But node insertions and deletions could be being operated concurrently. So write side locking is needed. 6. The Sixth patch resets network header Tap routine is enabled, below message will be printed. [ 175.852292][ C3] protocol 88fb is buggy, dev veth0 hsr module doesn't set network header for supervision frame. But tap routine validates network header. If network header wasn't set, it resets and warns about it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25hsr: reset network header when supervision frame is createdTaehee Yoo
The supervision frame is L2 frame. When supervision frame is created, hsr module doesn't set network header. If tap routine is enabled, dev_queue_xmit_nit() is called and it checks network_header. If network_header pointer wasn't set(or invalid), it resets network_header and warns. In order to avoid unnecessary warning message, resetting network_header is needed. Test commands: ip netns add nst ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link set veth1 netns nst ip link set veth3 netns nst ip link set veth0 up ip link set veth2 up ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip netns exec nst ip link set veth1 up ip netns exec nst ip link set veth3 up ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3 ip netns exec nst ip a a 192.168.100.2/24 dev hsr1 ip netns exec nst ip link set hsr1 up tcpdump -nei veth0 Splat looks like: [ 175.852292][ C3] protocol 88fb is buggy, dev veth0 Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25hsr: fix a race condition in node list insertion and deletionTaehee Yoo
hsr nodes are protected by RCU and there is no write side lock. But node insertions and deletions could be being operated concurrently. So write side locking is needed. Test commands: ip netns add nst ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link set veth1 netns nst ip link set veth3 netns nst ip link set veth0 up ip link set veth2 up ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip netns exec nst ip link set veth1 up ip netns exec nst ip link set veth3 up ip netns exec nst ip link add hsr1 type hsr slave1 veth1 slave2 veth3 ip netns exec nst ip a a 192.168.100.2/24 dev hsr1 ip netns exec nst ip link set hsr1 up for i in {0..9} do for j in {0..9} do for k in {0..9} do for l in {0..9} do arping 192.168.100.2 -I hsr0 -s 00:01:3$i:4$j:5$k:6$l -c1 & done done done done Splat looks like: [ 236.066091][ T3286] list_add corruption. next->prev should be prev (ffff8880a5940300), but was ffff8880a5940d0. [ 236.069617][ T3286] ------------[ cut here ]------------ [ 236.070545][ T3286] kernel BUG at lib/list_debug.c:25! [ 236.071391][ T3286] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 236.072343][ T3286] CPU: 0 PID: 3286 Comm: arping Tainted: G W 5.5.0-rc1+ #209 [ 236.073463][ T3286] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 236.074695][ T3286] RIP: 0010:__list_add_valid+0x74/0xd0 [ 236.075499][ T3286] Code: 48 39 da 75 27 48 39 f5 74 36 48 39 dd 74 31 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 b [ 236.078277][ T3286] RSP: 0018:ffff8880aaa97648 EFLAGS: 00010286 [ 236.086991][ T3286] RAX: 0000000000000075 RBX: ffff8880d4624c20 RCX: 0000000000000000 [ 236.088000][ T3286] RDX: 0000000000000075 RSI: 0000000000000008 RDI: ffffed1015552ebf [ 236.098897][ T3286] RBP: ffff88809b53d200 R08: ffffed101b3c04f9 R09: ffffed101b3c04f9 [ 236.099960][ T3286] R10: 00000000308769a1 R11: ffffed101b3c04f8 R12: ffff8880d4624c28 [ 236.100974][ T3286] R13: ffff8880d4624c20 R14: 0000000040310100 R15: ffff8880ce17ee02 [ 236.138967][ T3286] FS: 00007f23479fa680(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000 [ 236.144852][ T3286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 236.145720][ T3286] CR2: 00007f4a14bab210 CR3: 00000000a61c6001 CR4: 00000000000606f0 [ 236.146776][ T3286] Call Trace: [ 236.147222][ T3286] hsr_add_node+0x314/0x490 [hsr] [ 236.153633][ T3286] hsr_forward_skb+0x2b6/0x1bc0 [hsr] [ 236.154362][ T3286] ? rcu_read_lock_sched_held+0x90/0xc0 [ 236.155091][ T3286] ? rcu_read_lock_bh_held+0xa0/0xa0 [ 236.156607][ T3286] hsr_dev_xmit+0x70/0xd0 [hsr] [ 236.157254][ T3286] dev_hard_start_xmit+0x160/0x740 [ 236.157941][ T3286] __dev_queue_xmit+0x1961/0x2e10 [ 236.158565][ T3286] ? netdev_core_pick_tx+0x2e0/0x2e0 [ ... ] Reported-by: syzbot+3924327f9ad5f4d2b343@syzkaller.appspotmail.com Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25hsr: rename debugfs file when interface name is changedTaehee Yoo
hsr interface has own debugfs file, which name is same with interface name. So, interface name is changed, debugfs file name should be changed too. Fixes: fc4ecaeebd26 ("net: hsr: add debugfs support for display node list") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25hsr: add hsr root debugfs directoryTaehee Yoo
In current hsr code, when hsr interface is created, it creates debugfs directory /sys/kernel/debug/<interface name>. If there is same directory or file name in there, it fails. In order to reduce possibility of failure of creation of debugfs, this patch adds root directory. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 Before this patch: /sys/kernel/debug/hsr0/node_table After this patch: /sys/kernel/debug/hsr/hsr0/node_table Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25hsr: fix error handling routine in hsr_dev_finalize()Taehee Yoo
hsr_dev_finalize() is called to create new hsr interface. There are some wrong error handling codes. 1. wrong checking return value of debugfs_create_{dir/file}. These function doesn't return NULL. If error occurs in there, it returns error pointer. So, it should check error pointer instead of NULL. 2. It doesn't unregister interface if it fails to setup hsr interface. If it fails to initialize hsr interface after register_netdevice(), it should call unregister_netdevice(). 3. Ignore failure of creation of debugfs If creating of debugfs dir and file is failed, creating hsr interface will be failed. But debugfs doesn't affect actual logic of hsr module. So, ignoring this is more correct and this behavior is more general. Fixes: c5a759117210 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25hsr: avoid debugfs warning message when module is removeTaehee Yoo
When hsr module is being removed, debugfs_remove() is called to remove both debugfs directory and file. When module is being removed, module state is changed to MODULE_STATE_GOING then exit() is called. At this moment, module couldn't be held so try_module_get() will be failed. debugfs's open() callback tries to hold the module if .owner is existing. If it fails, warning message is printed. CPU0 CPU1 delete_module() try_stop_module() hsr_exit() open() <-- WARNING debugfs_remove() In order to avoid the warning message, this patch makes hsr module does not set .owner. Unsetting .owner is safe because these are protected by inode_lock(). Test commands: #SHELL1 ip link add dummy0 type dummy ip link add dummy1 type dummy while : do ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 modprobe -rv hsr done #SHELL2 while : do cat /sys/kernel/debug/hsr0/node_table done Splat looks like: [ 101.223783][ T1271] ------------[ cut here ]------------ [ 101.230309][ T1271] debugfs file owner did not clean up at exit: node_table [ 101.230380][ T1271] WARNING: CPU: 3 PID: 1271 at fs/debugfs/file.c:309 full_proxy_open+0x10f/0x650 [ 101.233153][ T1271] Modules linked in: hsr(-) dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_d] [ 101.237112][ T1271] CPU: 3 PID: 1271 Comm: cat Tainted: G W 5.5.0-rc1+ #204 [ 101.238270][ T1271] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 101.240379][ T1271] RIP: 0010:full_proxy_open+0x10f/0x650 [ 101.241166][ T1271] Code: 48 c1 ea 03 80 3c 02 00 0f 85 c1 04 00 00 49 8b 3c 24 e8 04 86 7e ff 84 c0 75 2d 4c 8 [ 101.251985][ T1271] RSP: 0018:ffff8880ca22fa38 EFLAGS: 00010286 [ 101.273355][ T1271] RAX: dffffc0000000008 RBX: ffff8880cc6e6200 RCX: 0000000000000000 [ 101.274466][ T1271] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff8880c4dd5c14 [ 101.275581][ T1271] RBP: 0000000000000000 R08: fffffbfff2922f5d R09: 0000000000000000 [ 101.276733][ T1271] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffffc0551bc0 [ 101.277853][ T1271] R13: ffff8880c4059a48 R14: ffff8880be50a5e0 R15: ffffffff941adaa0 [ 101.278956][ T1271] FS: 00007f8871cda540(0000) GS:ffff8880da800000(0000) knlGS:0000000000000000 [ 101.280216][ T1271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 101.282832][ T1271] CR2: 00007f88717cfd10 CR3: 00000000b9440005 CR4: 00000000000606e0 [ 101.283974][ T1271] Call Trace: [ 101.285328][ T1271] do_dentry_open+0x63c/0xf50 [ 101.286077][ T1271] ? open_proxy_open+0x270/0x270 [ 101.288271][ T1271] ? __x64_sys_fchdir+0x180/0x180 [ 101.288987][ T1271] ? inode_permission+0x65/0x390 [ 101.289682][ T1271] path_openat+0x701/0x2810 [ 101.290294][ T1271] ? path_lookupat+0x880/0x880 [ 101.290957][ T1271] ? check_chain_key+0x236/0x5d0 [ 101.291676][ T1271] ? __lock_acquire+0xdfe/0x3de0 [ 101.292358][ T1271] ? sched_clock+0x5/0x10 [ 101.292962][ T1271] ? sched_clock_cpu+0x18/0x170 [ 101.293644][ T1271] ? find_held_lock+0x39/0x1d0 [ 101.305616][ T1271] do_filp_open+0x17a/0x270 [ 101.306061][ T1271] ? may_open_dev+0xc0/0xc0 [ ... ] Fixes: fc4ecaeebd26 ("net: hsr: add debugfs support for display node list") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25MAINTAINERS: Add additional maintainers to ENA Ethernet driverNetanel Belgazal
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-25libata: Fix retrieving of active qcsSascha Hauer
ata_qc_complete_multiple() is called with a mask of the still active tags. mv_sata doesn't have this information directly and instead calculates the still active tags from the started tags (ap->qc_active) and the finished tags as (ap->qc_active ^ done_mask) Since 28361c40368 the hw_tag and tag are no longer the same and the equation is no longer valid. In ata_exec_internal_sg() ap->qc_active is initialized as 1ULL << ATA_TAG_INTERNAL, but in hardware tag 0 is started and this will be in done_mask on completion. ap->qc_active ^ done_mask becomes 0x100000000 ^ 0x1 = 0x100000001 and thus tag 0 used as the internal tag will never be reported as completed. This is fixed by introducing ata_qc_get_active() which returns the active hardware tags and calling it where appropriate. This is tested on mv_sata, but sata_fsl and sata_nv suffer from the same problem. There is another case in sata_nv that most likely needs fixing as well, but this looks a little different, so I wasn't confident enough to change that. Fixes: 28361c403683 ("libata: add extra internal command") Cc: stable@vger.kernel.org Tested-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Add missing export of ata_qc_get_active(), as per Pali. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-25Merge tag 'devfreq-fixes-for-5.5-rc4' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq fixes for 5.5-rc4 from Chanwoo Choi: "1. Fix the build error of tegra*-devfreq.c when COMPILE_TEST is enabled. 2. Drop unneeded PM_OPP dependency from each driver in Kconfig." * tag 'devfreq-fixes-for-5.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: tegra: Add COMMON_CLK dependency PM / devfreq: Drop explicit selection of PM_OPP
2019-12-24Merge branch 's390-qeth-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2019-12-23 please apply the following patch series for qeth to your net tree. This brings two fixes for errors during device initialization, deals with several issues in the vnicc control code, and adds a missing lock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24s390/qeth: fix initialization on old HWJulian Wiedmann
I stumbled over an old OSA model that claims to support DIAG_ASSIST, but then rejects the cmd to query its DIAG capabilities. In the old code this was ok, as the returned raw error code was > 0. Now that we translate the raw codes to errnos, the "rc < 0" causes us to fail the initialization of the device. The fix is trivial: don't bail out when the DIAG query fails. Such an error is not critical, we can still use the device (with a slightly reduced set of features). Fixes: 742d4d40831d ("s390/qeth: convert remaining legacy cmd callbacks") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24s390/qeth: vnicc Fix init to defaultAlexandra Winter
During vnicc_init wanted_char should be compared to cur_char and not to QETH_VNICC_DEFAULT. Without this patch there is no way to enforce the default values as desired values. Note, that it is expected, that a card comes online with default values. This patch was tested with private card firmware. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24s390/qeth: Fix vnicc_is_in_use if rx_bcast not setAlexandra Winter
Symptom: After vnicc/rx_bcast has been manually set to 0, bridge_* sysfs parameters can still be set or written. Only occurs on HiperSockets, as OSA doesn't support changing rx_bcast. Vnic characteristics and bridgeport settings are mutually exclusive. rx_bcast defaults to 1, so manually setting it to 0 should disable bridge_* parameters. Instead it makes sense here to check the supported mask. If the card does not support vnicc at all, bridge commands are always allowed. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24s390/qeth: fix false reporting of VNIC CHAR config failureAlexandra Winter
Symptom: Error message "Configuring the VNIC characteristics failed" in dmesg whenever an OSA interface on z15 is set online. The VNIC characteristics get re-programmed when setting a L2 device online. This follows the selected 'wanted' characteristics - with the exception that the INVISIBLE characteristic unconditionally gets switched off. For devices that don't support INVISIBLE (ie. OSA), the resulting IO failure raises a noisy error message ("Configuring the VNIC characteristics failed"). For IQD, INVISIBLE is off by default anyways. So don't unnecessarily special-case the INVISIBLE characteristic, and thereby suppress the misleading error message on OSA devices. Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24s390/qeth: lock the card while changing its hsuidJulian Wiedmann
qeth_l3_dev_hsuid_store() initially checks the card state, but doesn't take the conf_mutex to ensure that the card stays in this state while being reconfigured. Rework the code to take this lock, and drop a redundant state check in a helper function. Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24s390/qeth: fix qdio teardown after early init errorJulian Wiedmann
qeth_l?_set_online() goes through a number of initialization steps, and on any error uses qeth_l?_stop_card() to tear down the residual state. The first initialization step is qeth_core_hardsetup_card(). When this fails after having established a QDIO context on the device (ie. somewhere after qeth_mpc_initialize()), qeth_l?_stop_card() doesn't shut down this QDIO context again (since the card state hasn't progressed from DOWN at this stage). Even worse, we then call qdio_free() as final teardown step to free the QDIO data structures - while some of them are still hooked into wider QDIO infrastructure such as the IRQ list. This is inevitably followed by use-after-frees and other nastyness. Fix this by unconditionally calling qeth_qdio_clear_card() to shut down the QDIO context, and also to halt/clear any pending activity on the various IO channels. Remove the naive attempt at handling the teardown in qeth_mpc_initialize(), it clearly doesn't suffice and we're handling it properly now in the wider teardown code. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24Merge branch 'Simplify-IPv6-route-offload-API'David S. Miller
Ido Schimmel says: ==================== Simplify IPv6 route offload API Motivation ========== This is the IPv6 counterpart of "Simplify IPv4 route offload API" [1]. The aim of this patch set is to simplify the IPv6 route offload API by making the stack a bit smarter about the notifications it is generating. This allows driver authors to focus on programming the underlying device instead of having to duplicate the IPv6 route insertion logic in their driver, which is error-prone. Details ======= Today, whenever an IPv6 route is added or deleted a notification is sent in the FIB notification chain and it is up to offload drivers to decide if the route should be programmed to the hardware or not. This is not an easy task as in hardware routes are keyed by {prefix, prefix length, table id}, whereas the kernel can store multiple such routes that only differ in metric / nexthop info. This series makes sure that only routes that are actually used in the data path are notified to offload drivers. This greatly simplifies the work these drivers need to do, as they are now only concerned with programming the hardware and do not need to replicate the IPv6 route insertion logic and store multiple identical routes. The route that is notified is the first route in the IPv6 FIB node, which represents a single prefix and length in a given table. In case the route is deleted and there is another route with the same key, a replace notification is emitted. Otherwise, a delete notification is emitted. Unlike IPv4, in IPv6 it is possible to append individual nexthops to an existing multipath route. Therefore, in addition to the replace and delete notifications present in IPv4, an append notification is also used. Testing ======= To ensure there is no degradation in route insertion rates, I averaged the insertion rate of 512k routes (/64 and /128) over 50 runs. Did not observe any degradation. Functional tests are available here [2]. They rely on route trap indication, which is added in a subsequent patch set. In addition, I have been running syzkaller for the past couple of weeks with debug options enabled. Did not observe any problems. Patch set overview ================== Patches #1-#7 gradually introduce the new FIB notifications Patch #8 converts mlxsw to use the new notifications Patch #9 remove the old notifications [1] https://patchwork.ozlabs.org/cover/1209738/ [2] https://github.com/idosch/linux/tree/fib-notifier ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24ipv6: Remove old route notifications and convert listenersIdo Schimmel
Now that mlxsw is converted to use the new FIB notifications it is possible to delete the old ones and use the new replace / append / delete notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>