summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-04-16gianfar: Drop GFAR_MQ_POLLING supportClaudiu Manoil
Gianfar used to enable all 8 Rx queues (DMA rings) per ethernet device, even though the controller can only support 2 interrupt lines at most. This meant that multiple Rx queues would have to be grouped per NAPI poll routine, and the CPU would have to split the budget and service them in a round robin manner. The overhead of this scheme proved to outweight the potential benefits. The alternative was to introduce the "Single Queue" polling mode, supporting one Rx queue per NAPI, which became the default packet processing option and helped improve the performance of the driver. MQ_POLLING also relies on undocumeted device tree properties to specify how to map the 8 Rx and Tx queues to a given interrupt line (aka "interrupt group"). Using module parameters to enable this mode wasn't an option either. Long story short, MQ_POLLING became obsolete, now it is just dead code, and no one asked for it so far. For the Tx queues, multi-queue support (more than 1 Tx queue per CPU) could be revisited by adding tc MQPRIO support, but again, one has to consider that there are only 2 interrupt lines. So the NAPI poll routine would have to service multiple Tx rings. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16veth: check for NAPI instead of xdp_prog before xmit of XDP frameToke Høiland-Jørgensen
The recent patch that tied enabling of veth NAPI to the GRO flag also has the nice side effect that a veth device can be the target of an XDP_REDIRECT without an XDP program needing to be loaded on the peer device. However, the patch adding this extra NAPI mode didn't actually change the check in veth_xdp_xmit() to also look at the new NAPI pointer, so let's fix that. Fixes: 6788fa154546 ("veth: allow enabling NAPI even without XDP") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mld: fix suspicious RCU usage in __ipv6_dev_mc_dec()Taehee Yoo
__ipv6_dev_mc_dec() internally uses sleepable functions so that caller must not acquire atomic locks. But caller, which is addrconf_verify_rtnl() acquires rcu_read_lock_bh(). So this warning occurs in the __ipv6_dev_mc_dec(). Test commands: ip netns add A ip link add veth0 type veth peer name veth1 ip link set veth1 netns A ip link set veth0 up ip netns exec A ip link set veth1 up ip a a 2001:db8::1/64 dev veth0 valid_lft 2 preferred_lft 1 Splat looks like: ============================ WARNING: suspicious RCU usage 5.12.0-rc6+ #515 Not tainted ----------------------------- kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by kworker/4:0/1997: #0: ffff88810bd72d48 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x761/0x1440 #1: ffff888105c8fe00 ((addr_chk_work).work){+.+.}-{0:0}, at: process_one_work+0x795/0x1440 #2: ffffffffb9279fb0 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0xa/0x20 #3: ffffffffb8e30860 (rcu_read_lock_bh){....}-{1:2}, at: addrconf_verify_rtnl+0x23/0xc60 stack backtrace: CPU: 4 PID: 1997 Comm: kworker/4:0 Not tainted 5.12.0-rc6+ #515 Workqueue: ipv6_addrconf addrconf_verify_work Call Trace: dump_stack+0xa4/0xe5 ___might_sleep+0x27d/0x2b0 __mutex_lock+0xc8/0x13f0 ? lock_downgrade+0x690/0x690 ? __ipv6_dev_mc_dec+0x49/0x2a0 ? mark_held_locks+0xb7/0x120 ? mutex_lock_io_nested+0x1270/0x1270 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 ? _raw_spin_unlock_irqrestore+0x47/0x50 ? trace_hardirqs_on+0x41/0x120 ? __wake_up_common_lock+0xc9/0x100 ? __wake_up_common+0x620/0x620 ? memset+0x1f/0x40 ? netlink_broadcast_filtered+0x2c4/0xa70 ? __ipv6_dev_mc_dec+0x49/0x2a0 __ipv6_dev_mc_dec+0x49/0x2a0 ? netlink_broadcast_filtered+0x2f6/0xa70 addrconf_leave_solict.part.64+0xad/0xf0 ? addrconf_join_solict.part.63+0xf0/0xf0 ? nlmsg_notify+0x63/0x1b0 __ipv6_ifa_notify+0x22c/0x9c0 ? inet6_fill_ifaddr+0xbe0/0xbe0 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 ? __local_bh_enable_ip+0xa5/0xf0 ? ipv6_del_addr+0x347/0x870 ipv6_del_addr+0x3b1/0x870 ? addrconf_ifdown+0xfe0/0xfe0 ? rcu_read_lock_any_held.part.27+0x20/0x20 addrconf_verify_rtnl+0x8a9/0xc60 addrconf_verify_work+0xf/0x20 process_one_work+0x84c/0x1440 In order to avoid this problem, it uses rcu_read_unlock_bh() for a short time. RCU is used for avoiding freeing ifp(struct *inet6_ifaddr) while ifp is being used. But this will not be released even if rcu_read_unlock_bh() is used. Because before rcu_read_unlock_bh(), it uses in6_ifa_hold(ifp). So this is safe. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16Merge branch 'ipa-fw-names'David S. Miller
Alex Elder says: ==================== net: ipa: allow different firmware names Add the ability to define a "firmware-name" property in the IPA DT node, specifying an alternate name to use for the firmware file. Used only if the AP (Trust Zone) does early IPA initialization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: ipa: optionally define firmware name via DTAlex Elder
IPA initialization includes loading some firmware. This step is done either by the modem or by the AP under Trust Zone. If the AP loads firmware, the name of the firmware file is currently hard-coded ("ipa_fws.mdt"). Add the ability to specify the relative path of the firmware file to use in a property in the Device Tree IPA node. If the property is not found (or if any other error occurs attempting to get it), fall back to using a default relative path. Use the "old" fixed name as the default. Rename the symbol that represents this default to emphasize its purpose. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16dt-bindings: net: qcom,ipa: add firmware-name propertyAlex Elder
Add a new optional firmware-name property to the IPA DT node. It is used only if the modem is not doing early initialization (i.e., if the modem-init property is not present). Its value is the name of the firmware file to use; if it's not specified, a default name ("ipa_fws.mdt") is used. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16virtio-net: page_to_skb() use build_skb when there's sufficient tailroomXuan Zhuo
In page_to_skb(), if we have enough tailroom to save skb_shared_info, we can use build_skb to create skb directly. No need to alloc for additional space. And it can save a 'frags slot', which is very friendly to GRO. Here, if the payload of the received package is too small (less than GOOD_COPY_LEN), we still choose to copy it directly to the space got by napi_alloc_skb. So we can reuse these pages. Testing Machine: The four queues of the network card are bound to the cpu1. Test command: for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done The size of the udp package is 1000, so in the case of this patch, there will always be enough tailroom to use build_skb. The sent udp packet will be discarded because there is no port to receive it. The irqsoftd of the machine is 100%, we observe the received quantity displayed by sar -n DEV 1: no build_skb: 956864.00 rxpck/s build_skb: 1158465.00 rxpck/s Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: Add Qcom WWAN control driverLoic Poulain
The MHI WWWAN control driver allows MHI QCOM-based modems to expose different modem control protocols/ports via the WWAN framework, so that userspace modem tools or daemon (e.g. ModemManager) can control WWAN config and state (APN config, SMS, provider selection...). A QCOM-based modem can expose one or several of the following protocols: - AT: Well known AT commands interactive protocol (microcom, minicom...) - MBIM: Mobile Broadband Interface Model (libmbim, mbimcli) - QMI: QCOM MSM/Modem Interface (libqmi, qmicli) - QCDM: QCOM Modem diagnostic interface (libqcdm) - FIREHOSE: XML-based protocol for Modem firmware management (qmi-firmware-update) Note that this patch is mostly a rework of the earlier MHI UCI tentative that was a generic interface for accessing MHI bus from userspace. As suggested, this new version is WWAN specific and is dedicated to only expose channels used for controlling a modem, and for which related opensource userpace support exist. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: Add a WWAN subsystemLoic Poulain
This change introduces initial support for a WWAN framework. Given the complexity and heterogeneity of existing WWAN hardwares and interfaces, there is no strict definition of what a WWAN device is and how it should be represented. It's often a collection of multiple devices that perform the global WWAN feature (netdev, tty, chardev, etc). One usual way to expose modem controls and configuration is via high level protocols such as the well known AT command protocol, MBIM or QMI. The USB modems started to expose them as character devices, and user daemons such as ModemManager learnt to use them. This initial version adds the concept of WWAN port, which is a logical pipe to a modem control protocol. The protocols are rawly exposed to user via character device, allowing straigthforward support in existing tools (ModemManager, ofono...). The WWAN core takes care of the generic part, including character device management, and relies on port driver operations to receive/submit protocol data. Since the different devices exposing protocols for a same WWAN hardware do not necessarily know about each others (e.g. two different USB interfaces, PCI/MHI channel devices...) and can be created/removed in different orders, the WWAN core ensures that all WAN ports contributing to the 'whole' WWAN feature are grouped under the same virtual WWAN device, relying on the provided parent device (e.g. mhi controller, USB device). It's a 'trick' I copied from Johannes's earlier WWAN subsystem proposal. This initial version is purposely minimalist, it's essentially moving the generic part of the previously proposed mhi_wwan_ctrl driver inside a common WWAN framework, but the implementation is open and flexible enough to allow extension for further drivers. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: mvpp2: Add parsing support for different IPv4 IHL valuesStefan Chulski
Add parser entries for different IPv4 IHL values. Each entry will set the L4 header offset according to the IPv4 IHL field. L3 header offset will set during the parsing of the IPv4 protocol. Because of missed parser support for IP header length > 20, RX IPv4 checksum HW offload fails and skb->ip_summed set to CHECKSUM_NONE(checksum done by Network stack). This patch adds RX IPv4 checksum HW offload capability for frames with IP header length > 20. v1 --> v2 - Improve commit message. Suggested-by: Dana Vardi <danat@marvell.com> Signed-off-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16Merge branch 'r8152--new-chips'David S. Miller
Hayes Wang says: ==================== r8152: support new chips Support new RTL8153 and RTL8156 series. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16r8152: search the configuration of vendor modeHayes Wang
The vendor mode is not always at config #1, so it is necessary to set the correct configuration number. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16r8152: support PHY firmware for RTL8156 seriesHayes Wang
Support new firmware type and method for RTL8156 series. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16r8152: support new chipsHayes Wang
Support RTL8153C, RTL8153D, RTL8156A, and RTL8156B. The RTL8156A and RTL8156B are the 2.5G ethernet. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16r8152: add help function to change mtuHayes Wang
The different chips may have different requests when changing mtu. Therefore, add a new help function of rtl_ops to change mtu. Besides, reset the tx/rx after changing mtu. Additionally, add mtu_to_size() and size_to_mtu() macros to simplify the code. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16r8152: adjust rtl8152_check_firmware functionHayes Wang
Use bits operations to record and check the firmware. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16r8152: set inter fram gap time depending on speedHayes Wang
Set the maximum inter frame gap time (144ns) for speed 10M/half and 100M/half. It improves the performance for those speeds. And, there is no effect for the other speeds. For 10M/half and 100M/half, the fast inter frame gap time let the device couldn't use the feature of the aggregation effectively, because the transfer would be completed fastly. Therefore, use the maximum value to improve the effect of the aggregation. However, you may not feel the improvement for fast CPUs, because they compensate for the effect of the aggregation. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16MAINTAINERS: update my emailLijun Pan
Update my email and change myself to Reviewer. Signed-off-by: Lijun Pan <lijunp213@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: ethernet: mediatek: ppe: fix busy wait loopIlya Lipnitskiy
The intention is for the loop to timeout if the body does not succeed. The current logic calls time_is_before_jiffies(timeout) which is false until after the timeout, so the loop body never executes. Fix by using readl_poll_timeout as a more standard and less error-prone solution. Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Ilya Lipnitskiy <ilya.lipnitskiy@gmail.com> Cc: Felix Fietkau <nbd@nbd.name> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16Merge branch 'mptcp-socket-options'David S. Miller
Mat Martineau says: ==================== mptcp: Improve socket option handling MPTCP sockets have previously had limited socket option support. The architecture of MPTCP sockets (one userspace-facing MPTCP socket that manages one or more in-kernel TCP subflow sockets) adds complexity for passing options through to lower levels. This patch set adds MPTCP support for socket options commonly used with TCP. Patch 1 reverts an interim socket option fix (a socket option blocklist) that was merged in the net tree for v5.12. Patch 2 moves the socket option code to a separate file, with no functional changes. Patch 3 adds an allowlist for socket options that are known to function with MPTCP. Later patches in this set add more allowed options. Patches 4 and 5 add infrastructure for syncing MPTCP-level options with the TCP subflows. Patches 6-12 add support for specific socket options. Patch 13 adds a socket option self test. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16selftests: mptcp: add packet mark test caseFlorian Westphal
Extend mptcp_connect tool with SO_MARK support (-M <value>) and add a test case that checks that the packet mark gets copied to all subflows. This is done by only allowing packets with either skb->mark 1 or 2 via iptables. DROP rule packet counter is checked; if its not zero, print an error message and fail the test case. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: sockopt: add TCP_CONGESTION and TCP_INFOFlorian Westphal
TCP_CONGESTION is set for all subflows. The mptcp socket gains icsk_ca_ops too so it can be used to keep the authoritative state that should be set on new/future subflows. TCP_INFO will return first subflow only. The out-of-tree kernel has a MPTCP_INFO getsockopt, this could be added later on. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: setsockopt: SO_DEBUG and no-op optionsFlorian Westphal
Handle SO_DEBUG and set it on all subflows. Ignore those values not implemented on TCP sockets. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: setsockopt: add SO_INCOMING_CPUFlorian Westphal
Replicate to all subflows. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: setsockopt: add SO_MARK supportFlorian Westphal
Value is synced to all subflows. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: setsockopt: support SO_LINGERFlorian Westphal
Similar to PRIORITY/KEEPALIVE: needs to be mirrored to all subflows. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: setsockopt: handle receive/send buffer and device bindFlorian Westphal
Similar to previous patch: needs to be mirrored to all subflows. Device bind is simpler: it is only done on the initial (listener) sk. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITYFlorian Westphal
start with something simple: both take an integer value, both need to be mirrored to all subflows. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: tag sequence_seq with socket stateFlorian Westphal
Paolo Abeni suggested to avoid re-syncing new subflows because they inherit options from listener. In case options were set on listener but are not set on mptcp-socket there is no need to do any synchronisation for new subflows. This change sets sockopt_seq of new mptcp sockets to the seq of the mptcp listener sock. Subflow sequence is set to the embedded tcp listener sk. Add a comment explaing why sk_state is involved in sockopt_seq generation. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: add skeleton to sync msk socket options to subflowsFlorian Westphal
Handle following cases: 1. setsockopt is called with multiple subflows. Change might have to be mirrored to all of them. This is done directly in process context/setsockopt call. 2. Outgoing subflow is created after one or several setsockopt() calls have been made. Old setsockopt changes should be synced to the new socket. 3. Incoming subflow, after setsockopt call(s). Cases 2 and 3 are handled right after the join list is spliced to the conn list. Not all sockopt values can be just be copied by value, some require helper calls. Those can acquire socket lock (which can sleep). If the join->conn list splicing is done from preemptible context, synchronization can be done right away, otherwise its deferred to work queue. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: only admit explicitly supported sockoptPaolo Abeni
Unrolling mcast state at msk dismantel time is bug prone, as syzkaller reported: ====================================================== WARNING: possible circular locking dependency detected 5.11.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor905/8822 is trying to acquire lock: ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323 but task is already holding lock: ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline] ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507 which lock already depends on the new lock. Instead we can simply forbid any mcast-related setsockopt. Let's do the same with all other non supported sockopts. Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations") Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: move sockopt function into a new filePaolo Abeni
The MPTCP sockopt implementation is going to be much more big and complex soon. Let's move it to a different source file. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16mptcp: revert "mptcp: forbit mcast-related sockopt on MPTCP sockets"Matthieu Baerts
This change reverts commit 86581852d771 ("mptcp: forbit mcast-related sockopt on MPTCP sockets"). As announced in the cover letter of the mentioned patch above, the following commits introduce a larger MPTCP sockopt implementation refactor. This time, we switch from a blocklist to an allowlist. This is safer for the future where new sockoptions could be added while not being fully supported with MPTCP sockets and thus causing unstabilities. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16atl1c: move tx cleanup processing out of interruptGatis Peisenieks
Tx queue cleanup happens in interrupt handler on same core as rx queue processing. Both can take considerable amount of processing in high packet-per-second scenarios. Sending big amounts of packets can stall the rx processing which is unfair and also can lead to out-of-memory condition since __dev_kfree_skb_irq queues the skbs for later kfree in softirq which is not allowed to happen with heavy load in interrupt handler. This puts tx cleanup in its own napi and enables threaded napi to allow the rx/tx queue processing to happen on different cores. The ability to sustain equal amounts of tx/rx traffic increased: from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming Mikrotik 10/25G NIC, from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16Merge branch 'BR_FDB_LOCAL'David S. Miller
Vladimir Oltean says: ==================== Pass the BR_FDB_LOCAL information to switchdev drivers Bridge FDB entries with the is_local flag are entries which are terminated locally and not forwarded. Switchdev drivers might want to be notified of these addresses so they can trap them. If they don't program these entries to hardware, there is no guarantee that they will do the right thing with these entries, and they won't be, let's say, flooded. Ideally none of the switchdev drivers should ignore these entries, but having access to the is_local bit is the bare minimum change that should be done in the bridge layer, before this is even possible. These 2 changes are extracted from the larger "RX filtering in DSA" series: https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-8-olteanv@gmail.com/ https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-9-olteanv@gmail.com/ and submitted separately, because they touch all switchdev drivers, while the rest is mostly specific to DSA. This change is not a functional one, in the sense that everybody still ignores the local FDB entries, but this will be changed by further patches at least for DSA. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: bridge: switchdev: include local flag in FDB notificationsVladimir Oltean
As explained in bugfix commit 6ab4c3117aec ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses. Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16net: bridge: switchdev: refactor br_switchdev_fdb_notifyTobias Waldekranz
Instead of having to add more and more arguments to br_switchdev_fdb_call_notifiers, get rid of it and build the info struct directly in br_switchdev_fdb_notify. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16bpf: Update selftests to reflect new error statesDaniel Borkmann
Update various selftest error messages: * The 'Rx tried to sub from different maps, paths, or prohibited types' is reworked into more specific/differentiated error messages for better guidance. * The change into 'value -4294967168 makes map_value pointer be out of bounds' is due to moving the mixed bounds check into the speculation handling and thus occuring slightly later than above mentioned sanity check. * The change into 'math between map_value pointer and register with unbounded min value' is similarly due to register sanity check coming before the mixed bounds check. * The case of 'map access: known scalar += value_ptr from different maps' now loads fine given masks are the same from the different paths (despite max map value size being different). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Tighten speculative pointer arithmetic maskDaniel Borkmann
This work tightens the offset mask we use for unprivileged pointer arithmetic in order to mitigate a corner case reported by Piotr and Benedict where in the speculative domain it is possible to advance, for example, the map value pointer by up to value_size-1 out-of-bounds in order to leak kernel memory via side-channel to user space. Before this change, the computed ptr_limit for retrieve_ptr_limit() helper represents largest valid distance when moving pointer to the right or left which is then fed as aux->alu_limit to generate masking instructions against the offset register. After the change, the derived aux->alu_limit represents the largest potential value of the offset register which we mask against which is just a narrower subset of the former limit. For minimal complexity, we call sanitize_ptr_alu() from 2 observation points in adjust_ptr_min_max_vals(), that is, before and after the simulated alu operation. In the first step, we retieve the alu_state and alu_limit before the operation as well as we branch-off a verifier path and push it to the verification stack as we did before which checks the dst_reg under truncation, in other words, when the speculative domain would attempt to move the pointer out-of-bounds. In the second step, we retrieve the new alu_limit and calculate the absolute distance between both. Moreover, we commit the alu_state and final alu_limit via update_alu_sanitation_state() to the env's instruction aux data, and bail out from there if there is a mismatch due to coming from different verification paths with different states. Reported-by: Piotr Krysiuk <piotras@gmail.com> Reported-by: Benedict Schlueter <benedict.schlueter@rub.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Benedict Schlueter <benedict.schlueter@rub.de>
2021-04-16bpf: Move sanitize_val_alu out of op switchDaniel Borkmann
Add a small sanitize_needed() helper function and move sanitize_val_alu() out of the main opcode switch. In upcoming work, we'll move sanitize_ptr_alu() as well out of its opcode switch so this helps to streamline both. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Refactor and streamline bounds check into helperDaniel Borkmann
Move the bounds check in adjust_ptr_min_max_vals() into a small helper named sanitize_check_bounds() in order to simplify the former a bit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Improve verifier error messages for usersDaniel Borkmann
Consolidate all error handling and provide more user-friendly error messages from sanitize_ptr_alu() and sanitize_val_alu(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Rework ptr_limit into alu_limit and add common error pathDaniel Borkmann
Small refactor with no semantic changes in order to consolidate the max ptr_limit boundary check. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Ensure off_reg has no mixed signed bounds for all typesDaniel Borkmann
The mixed signed bounds check really belongs into retrieve_ptr_limit() instead of outside of it in adjust_ptr_min_max_vals(). The reason is that this check is not tied to PTR_TO_MAP_VALUE only, but to all pointer types that we handle in retrieve_ptr_limit() and given errors from the latter propagate back to adjust_ptr_min_max_vals() and lead to rejection of the program, it's a better place to reside to avoid anything slipping through for future types. The reason why we must reject such off_reg is that we otherwise would not be able to derive a mask, see details in 9d7eceede769 ("bpf: restrict unknown scalars of mixed signed bounds for unprivileged"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Move off_reg into sanitize_ptr_aluDaniel Borkmann
Small refactor to drag off_reg into sanitize_ptr_alu(), so we later on can use off_reg for generalizing some of the checks for all pointer types. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16bpf: Use correct permission flag for mixed signed bounds arithmeticDaniel Borkmann
We forbid adding unknown scalars with mixed signed bounds due to the spectre v1 masking mitigation. Hence this also needs bypass_spec_v1 flag instead of allow_ptr_leaks. Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-04-16igc: Expose LPI countersSasha Neftin
Expose EEE Tx and Rx low power idle counters via ethtool A EEE TX or RX LPI event occurs when the transmitter or the receiver enters EEE (IEEE802.3az) LPI state. ethtool --statistics <iface> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igc: Fix overwrites return valueSasha Neftin
drivers/net/ethernet/intel/igc/igc_i225.c:235 igc_write_nvm_srwr() warn: loop overwrites return value 'ret_val' Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igc: enable auxiliary PHC functions for the i225Ederson de Souza
The i225 device offers a number of special PTP Hardware Clock features on the Software Defined Pins (SDPs) - much like i210, which is used as inspiration for this patch. It enables two possible functions, namely time stamping external events and periodic output signals. The assignment of PHC functions to the four SDP can be freely chosen by the user. For the external events time stamping, when the SDP (configured as input by user) level changes, an interrupt is generated and the kernel Precision Time Protocol (PTP) is informed. For the periodic output signals, the i225 is configured to generate them (so the SDP level will change periodically) and the driver also has to keep updating the time of the next level change. However, this work is not necessary for some frequencies as the i225 takes care of them (namely, anything with a half-cycle of 500ms, 250ms, 125ms or < 70ms). While i225 allows up to four timers to be used to source the time used on the external events or output signals, this patch uses only one of those timers. Main reason is to keep it simple, as it's not clear how these extra timers would be exposed to users. Note that currently a NIC can expose a single PTP device. Signed-off-by: Ederson de Souza <ederson.desouza@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-16igc: Enable internal i225 PPSEderson de Souza
The i225 device can produce one interrupt on the full second, much like i210 - from where this patch is inspired. This patch sets up the full second interruption on the i225 and when receiving it, it sends a PPS event to PTP (Precision Time Protocol) kernel subsystem. The PTP subsystem exposes the PPS events via ioctl and sysfs, and one can use the `testptp` tool (tools/testing/selftests/ptp) to check that the events are being generated. Signed-off-by: Ederson de Souza <ederson.desouza@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>