summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-20Merge branch 'VSOCK-add-vsock_test-test-suite'David S. Miller
Stefano Garzarella says: ==================== VSOCK: add vsock_test test suite The vsock_diag.ko module already has a test suite but the core AF_VSOCK functionality has no tests. This patch series adds several test cases that exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept, half-closed connections, simultaneous connections). The v1 of this series was originally sent by Stefan. v3: - Patch 6: * check the byte received in the recv_byte() * use send(2)/recv(2) instead of write(2)/read(2) to test also flags (e.g. MSG_PEEK) - Patch 8: * removed unnecessary control_expectln("CLOSED") [Stefan]. - removed patches 9,10,11 added in the v2 - new Patch 9 add parameters to list and skip tests (e.g. useful for vmci that doesn't support half-closed socket in the host) - new Patch 10 prints a list of options in the help - new Patch 11 tests MSG_PEEK flags of recv(2) v2: https://patchwork.ozlabs.org/cover/1140538/ v1: https://patchwork.ozlabs.org/cover/847998/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20vsock_test: add SOCK_STREAM MSG_PEEK testStefano Garzarella
Test if the MSG_PEEK flags of recv(2) works as expected. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20testing/vsock: print list of options and descriptionStefano Garzarella
Since we now have several options, in the help we print the list of all supported options and a brief description of them. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20testing/vsock: add parameters to list and skip testsStefano Garzarella
Some tests can fail with transports that have a slightly different behavior, so let's add the possibility to specify which tests to skip. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20vsock_test: wait for the remote to close the connectionStefano Garzarella
Before check if a send returns -EPIPE, we need to make sure the connection is closed. To do that, we use epoll API to wait EPOLLRDHUP or EPOLLHUP events on the socket. Reported-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: add AF_VSOCK test casesStefan Hajnoczi
The vsock_test.c program runs a test suite of AF_VSOCK test cases. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: add send_byte()/recv_byte() test utilitiesStefan Hajnoczi
Test cases will want to transfer data. This patch adds utility functions to do this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: add full barrier between test casesStefan Hajnoczi
See code comment for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: extract connect/accept functions from vsock_diag_test.cStefan Hajnoczi
Many test cases will need to connect to the server or accept incoming connections. This patch extracts these operations into utility functions that can be reused. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: extract utility functions from vsock_diag_test.cStefan Hajnoczi
Move useful functions into a separate file in preparation for more vsock test programs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: add SPDX identifiers to vsock testsStefan Hajnoczi
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20VSOCK: fix header include in vsock_diag_testStefan Hajnoczi
The vsock_diag_test program directly included ../../../include/uapi/ headers from the source tree. Tests are supposed to use the usr/include/linux/ headers that have been prepared with make headers_install instead. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: dsa: ksz: use common define for tag lenMichael Grzeschik
Remove special taglen define KSZ8795_INGRESS_TAG_LEN and use generic KSZ_INGRESS_TAG_LEN instead. Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20Merge branch 's390-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2019-12-18 please apply the following patch series to your net tree. This brings two fixes for initialization / teardown issues, and one ENOTSUPP cleanup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20s390/qeth: don't return -ENOTSUPP to userspaceJulian Wiedmann
ENOTSUPP is not uapi, use EOPNOTSUPP instead. Fixes: d66cb37e9664 ("qeth: Add new priority queueing options") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20s390/qeth: fix promiscuous mode after resetJulian Wiedmann
When managing the promiscuous mode during an RX modeset, qeth caches the current HW state to avoid repeated programming of the same state on each modeset. But while tearing down a device, we forget to clear the cached state. So when the device is later set online again, the initial RX modeset doesn't program the promiscuous mode since we believe it is already enabled. Fix this by clearing the cached state in the tear-down path. Note that for the SBP variant of promiscuous mode, this accidentally works right now because we unconditionally restore the SBP role while re-initializing. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20s390/qeth: handle error due to unsupported transport modeJulian Wiedmann
Along with z/VM NICs, there's additional device types that only support a specific transport mode (eg. external-bridged IQD). Identify the corresponding error code, and raise a fitting error message so that the user knows to adjust their device configuration. On top of that also fix the subsequent error path, so that the rejected cmd doesn't need to wait for a timeout but gets cancelled straight away. Fixes: 4a71df50047f ("qeth: new qeth device driver") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20Merge branch 'add-dsa-switch-support-for-ar9331'David S. Miller
Oleksij Rempel says: ==================== add dsa switch support for ar9331 changes v6: - remove ag71xx changes from this patch set. It needs more work. - ar9331: fix register definition and add ASCII art switch documentation. changes v6: - rebase against net-next changes v5: - remote support for port5. The effort of using this port is questionable. Currently, it is better to not use it at all, then adding buggy support. - remove port enable call back. There is nothing what we actually need to enable. - rebase it against v5.5-rc1 changes v4: - ag71xx: ag71xx_mac_validate fix always false comparison (&& -> ||) - tag_ar9331: use skb_pull_rcsum() instead of skb_pull(). - tag_ar9331: drop skb_set_mac_header() changes v3: - ag71xx: ag71xx_mac_config: ignore MLO_AN_INBAND mode. It is not supported by HW and SW. - ag71xx: ag71xx_mac_validate: return all supported bits on PHY_INTERFACE_MODE_NA changes v2: - move Atheros AR9331 TAG format to separate patch - use netdev_warn_once in the tag driver to reduce potential message spam - typo fixes - reorder tag driver alphabetically - configure switch to maximal frame size - use mdiobus_read/write - fail if mdio sub node is not found - add comment for post reset state - remove deprecated comment about device id - remove phy-handle option for node with fixed-link - ag71xx: set 1G support only for GMII mode This patch series provides dsa switch support for Atheros ar9331 WiSoC. As side effect ag71xx needed to be ported to phylink to make the switch driver (as well phylink based) work properly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: dsa: add support for Atheros AR9331 built-in switchOleksij Rempel
Provide basic support for Atheros AR9331 built-in switch. So far it works as port multiplexer without any hardware offloading support. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20net: dsa: add support for Atheros AR9331 TAG formatOleksij Rempel
Add support for tag format used in Atheros AR9331 built-in switch. Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20MIPS: ath79: ar9331: add ar9331-switch nodeOleksij Rempel
Add switch node supported by dsa ar9331 driver. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20dt-bindings: net: dsa: qca, ar9331 switch documentationOleksij Rempel
Atheros AR9331 has built-in 5 port switch. The switch can be configured to use all 5 or 4 ports. One of built-in PHYs can be used by first built-in ethernet controller or to be used directly by the switch over second ethernet controller. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20samples/bpf: Xdp_redirect_cpu fix missing tracepoint attachJesper Dangaard Brouer
When sample xdp_redirect_cpu was converted to use libbpf, the tracepoints used by this sample were not getting attached automatically like with bpf_load.c. The BPF-maps was still getting loaded, thus nobody notice that the tracepoints were not updating these maps. This fix doesn't use the new skeleton code, as this bug was introduced in v5.1 and stable might want to backport this. E.g. Red Hat QA uses this sample as part of their testing. Fixes: bbaf6029c49c ("samples/bpf: Convert XDP samples to libbpf usage") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157685877642.26195.2798780195186786841.stgit@firesoul
2019-12-20Merge branch 'xdpsock'Alexei Starovoitov
Jay Jayatheerthan says: ==================== This series of patches enhances xdpsock application with command line parameters to set transmit packet size and fill pattern among other options. The application has also been enhanced to use Linux Ethernet/IP/UDP header structs and calculate IP and UDP checksums. I have measured the performance of the xdpsock application before and after this patch set and have not been able to detect any difference. Packet Size: ------------ There is a new option '-s' or '--tx-pkt-size' to specify the transmit packet size. It ranges from 47 to 4096 bytes. Default packet size is 64 bytes which is same as before. Fill Pattern: ------------- The transmit UDP payload fill pattern is specified using '-P' or '--tx-pkt-pattern'option. It is an unsigned 32 bit field and defaulted to 0x12345678. Packet Count: ------------- The number of packets to send is specified using '-C' or '--tx-pkt-count' option. If it is not specified, the application sends packets forever. Batch Size: ----------- The batch size for transmit, receive and l2fwd features of the application is specified using '-b' or '--batch-size' options. Default value when this option is not provided is 64 (same as before). Duration: --------- The application supports '-d' or '--duration' option to specify number of seconds to run. This is used in tx, rx and l2fwd features. If this option is not provided, the application runs for ever. ==================== Tested-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-20samples/bpf: xdpsock: Add option to specify transmit fill patternJay Jayatheerthan
The UDP payload fill pattern can be specified using '-P' or '--tx-pkt-pattern' option. It is an unsigned 32 bit field and defaulted to 0x12345678. The IP and UDP checksum is calculated by the code as per the content of the packet before transmission. Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191220085530.4980-7-jay.jayatheerthan@intel.com
2019-12-20samples/bpf: xdpsock: Add option to specify tx packet sizeJay Jayatheerthan
New option '-s' or '--tx-pkt-size' has been added to specify the transmit packet size. The packet size ranges from 47 to 4096 bytes. When this option is not provided, it defaults to 64 byte packet. The code uses struct ethhdr, struct iphdr and struct udphdr to form the transmit packet. The MAC address, IP address and UDP ports are set to default values. The code calculates IP and UDP checksums before sending the packet. Checksum calculation code in Linux kernel is used for this purpose. The Ethernet FCS is not filled by the code. Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191220085530.4980-6-jay.jayatheerthan@intel.com
2019-12-20samples/bpf: xdpsock: Add option to specify number of packets to sendJay Jayatheerthan
Use '-C' or '--tx-pkt-count' to specify number of packets to send. If it is not specified, the application sends packets forever. If packet count is not a multiple of batch size, last batch sent is less than the batch size. Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191220085530.4980-5-jay.jayatheerthan@intel.com
2019-12-20samples/bpf: xdpsock: Add option to specify batch sizeJay Jayatheerthan
New option to specify batch size for tx, rx and l2fwd has been added. This allows fine tuning to maximize performance. It is specified using '-b' or '--batch_size' options. When not specified default is 64. Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191220085530.4980-4-jay.jayatheerthan@intel.com
2019-12-20samples/bpf: xdpsock: Use common code to handle signal and main exitJay Jayatheerthan
Add code to do cleanup for signals and application completion in a unified fashion. The signal handler sets benckmark_done flag terminating the threads. The cleanup is called before returning from main() function. Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191220085530.4980-3-jay.jayatheerthan@intel.com
2019-12-20samples/bpf: xdpsock: Add duration option to specify how long to runJay Jayatheerthan
The application now supports '-d' or '--duration' option to specify number of seconds to run. This is used in tx, rx and l2fwd features. If this option is not provided, the application runs forever. Signed-off-by: Jay Jayatheerthan <jay.jayatheerthan@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191220085530.4980-2-jay.jayatheerthan@intel.com
2019-12-20selftests/bpf: Preserve errno in test_progs CHECK macrosAndrey Ignatov
It's follow-up for discussion [1] CHECK and CHECK_FAIL macros in test_progs.h can affect errno in some circumstances, e.g. if some code accidentally closes stdout. It makes checking errno in patterns like this unreliable: if (CHECK(!bpf_prog_attach_xattr(...), "tag", "msg")) goto err; CHECK_FAIL(errno != ENOENT); , since by CHECK_FAIL time errno could be affected not only by bpf_prog_attach_xattr but by CHECK as well. Fix it by saving and restoring errno in the macros. There is no "Fixes" tag since no problems were discovered yet and it's rather precaution. test_progs was run with this change and no difference was identified. [1] https://lore.kernel.org/bpf/20191219210907.GD16266@rdna-mbp.dhcp.thefacebook.com/ Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191220000511.1684853-1-rdna@fb.com
2019-12-20Merge branch 'xsk-cleanup'Alexei Starovoitov
Magnus Karlsson says: ==================== This patch set cleans up the ring access functions of AF_XDP in hope that it will now be easier to understand and maintain. I used to get a headache every time I looked at this code in order to really understand it, but now I do think it is a lot less painful. The code has been simplified a lot and as a bonus we get better performance in nearly all cases. On my new 2.1 GHz Cascade Lake machine with a standard default config plus AF_XDP support and CONFIG_PREEMPT on I get the following results in percent performance increases with this patch set compared to without it: Zero-copy (-N): rxdrop txpush l2fwd 1 core: -2% 0% 3% 2 cores: 4% 0% 3% Zero-copy with poll() (-N -p): rxdrop txpush l2fwd 1 core: 3% 0% 1% 2 cores: 21% 0% 9% Skb mode (-S): Shows a 0% to 5% performance improvement over the same benchmarks as above. Here 1 core means that we are running the driver processing and the application on the same core, while 2 cores means that they execute on separate cores. The applications are from the xdpsock sample app. On my older 2.0 Ghz Broadwell machine that I used for the v1, I get the following results: Zero-copy (-N): rxdrop txpush l2fwd 1 core: 4% 5% 4% 2 cores: 1% 0% 2% Zero-copy with poll() (-N -p): rxdrop txpush l2fwd 1 core: 1% 3% 3% 2 cores: 22% 0% 5% Skb mode (-S): Shows a 0% to 1% performance improvement over the same benchmarks as above. When a results says 21 or 22% better, as in the case of poll mode with 2 cores and rxdrop, my first reaction is that it must be a bug. Everything else shows between 0% and 5% performance improvement. What is giving rise to 22%? A quick bisect indicates that it is patches 2, 3, 4, 5, and 6 that are giving rise to most of this improvement. So not one patch in particular, but something around 4% improvement from each one of them. Note that exactly this benchmark has previously had an extraordinary slow down compared to when running without poll syscalls. For all the other poll tests above, the slowdown has always been around 4% for using poll syscalls. But with the bad performing test in question, it was above 25%. Interestingly, after this clean up, the slow down is 4%, just like all the other poll tests. Please take an extra peek at this so I have not messed up something. The 0% for several txpush results are due to the test bottlenecking on a non-CPU HW resource. If I eliminated that bottleneck on my system, I would expect to see an increase there too. Changes v1 -> v2: * Corrected textual errors in the commit logs (Sergei and Martin) * Fixed the functions that detect empty and full rings so that they now operate on the global ring state (Maxim) This patch has been applied against commit a352a82496d1 ("Merge branch 'libbpf-extern-followups'") Structure of the patch set: Patch 1: Eliminate the lazy update threshold used when preallocating entries in the completion ring Patch 2: Simplify the detection of empty and full rings Patch 3: Consolidate the two local producer pointers into one Patch 4: Standardize the naming of the producer ring access functions Patch 5: Eliminate the Rx batch size used for the fill ring Patch 6: Simplify the functions xskq_nb_avail and xskq_nb_free Patch 7: Simplify and standardize the naming of the consumer ring access functions Patch 8: Change the names of the validation functions to improve readability and also the return value of these functions Patch 9: Change the name of xsk_umem_discard_addr() to xsk_umem_release_addr() to better reflect the new names. Requires a name change in the drivers that support AF_XDP zero-copy. Patch 10: Remove unnecessary READ_ONCE of data in the ring Patch 11: Add overall function naming comment and reorder the functions for easier reference Patch 12: Use the struct_size helper function when allocating rings ==================== Reviewed-by: Björn Töpel <bjorn.topel@intel.com> Tested-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-20xsk: Use struct_size() helperMagnus Karlsson
Improve readability and maintainability by using the struct_size() helper when allocating the AF_XDP rings. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-13-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Add function naming comments and reorder functionsMagnus Karlsson
Add comments on how the ring access functions are named and how they are supposed to be used for producers and consumers. The functions are also reordered so that the consumer functions are in the beginning and the producer functions in the end, for easier reference. Put this in a separate patch as the diff might look a little odd, but no functionality has changed in this patch. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-12-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Remove unnecessary READ_ONCE of dataMagnus Karlsson
There are two unnecessary READ_ONCE of descriptor data. These are not needed since the data is written by the producer before it signals that the data is available by incrementing the producer pointer. As the access to this producer pointer is serialized and the consumer always reads the descriptor after it has read and synchronized with the producer counter, the write of the descriptor will have fully completed and it does not matter if the consumer has any read tearing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-11-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: ixgbe: i40e: ice: mlx5: Xsk_umem_discard_addr to xsk_umem_release_addrMagnus Karlsson
Change the name of xsk_umem_discard_addr to xsk_umem_release_addr to better reflect the new naming of the AF_XDP queue manipulation functions. As this functions is used by drivers implementing support for AF_XDP zero-copy, it requires a name change to these drivers. The function xsk_umem_release_addr_rq has also changed name in the same fashion. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-10-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Change names of validation functionsMagnus Karlsson
Change the names of the validation functions to better reflect what they are doing. The uppermost ones are reading entries from the rings and only the bottom ones validate entries. So xskq_cons_read_ is a better prefix name. Also change the xskq_cons_read_ functions to return a bool as the the descriptor or address is already returned by reference in the parameters. Everyone is using the return value as a bool anyway. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-9-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Simplify the consumer ring access functionsMagnus Karlsson
Simplify and refactor consumer ring functions. The consumer first "peeks" to find descriptors or addresses that are available to read from the ring, then reads them and finally "releases" these descriptors once it is done. The two local variables cons_tail and cons_head are turned into one single variable called cached_cons. cached_tail referred to the cached value of the global consumer pointer and will be stored in cached_cons. For cached_head, we just use cached_prod instead as it was not used for a consumer queue before. It also better reflects what it really is now: a cached copy of the producer pointer. The names of the functions are also renamed in the same manner as the producer functions. The new functions are called xskq_cons_ followed by what it does. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-8-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Simplify xskq_nb_avail and xskq_nb_freeMagnus Karlsson
At this point, there are no users of the functions xskq_nb_avail and xskq_nb_free that take any other number of entries argument than 1, so let us get rid of the second argument that takes the number of entries. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-7-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Eliminate the RX batch sizeMagnus Karlsson
In the xsk consumer ring code there is a variable called RX_BATCH_SIZE that dictates the minimum number of entries that we try to grab from the fill and Tx rings. In fact, the code always try to grab the maximum amount of entries from these rings. The only thing this variable does is to throw an error if there is less than 16 (as it is defined) entries on the ring. There is no reason to do this and it will just lead to weird behavior from user space's point of view. So eliminate this variable. With this change, we will be able to simplify the xskq_nb_free and xskq_nb_avail code in the next commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-6-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Standardize naming of producer ring access functionsMagnus Karlsson
Adopt the naming of the producer ring access functions to have a similar naming convention as the functions in libbpf, but adapted to the kernel. You first reserve a number of entries that you later submit to the global state of the ring. This is much clearer, IMO, than the one that was in the kernel part. Once renamed, we also discover that two functions are actually the same, so remove one of them. Some of the primitive ring submission operations are also the same so break these out into __xskq_prod_submit that the upper level ring access functions can use. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-5-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Consolidate to one single cached producer pointerMagnus Karlsson
Currently, the xsk ring code has two cached producer pointers: prod_head and prod_tail. This patch consolidates these two into a single one called cached_prod to make the code simpler and easier to maintain. This will be in line with the user space part of the the code found in libbpf, that only uses a single cached pointer. The Rx path only uses the two top level functions xskq_produce_batch_desc and xskq_produce_flush_desc and they both use prod_head and never prod_tail. So just move them over to cached_prod. The Tx XDP_DRV path uses xskq_produce_addr_lazy and xskq_produce_flush_addr_n and unnecessarily operates on both prod_tail and prod_head, so move them over to just use cached_prod by skipping the intermediate step of updating prod_tail. The Tx path in XDP_SKB mode uses xskq_reserve_addr and xskq_produce_addr. They currently use both cached pointers, but we can operate on the global producer pointer in xskq_produce_addr since it has to be updated anyway, thus eliminating the use of both cached pointers. We can also remove the xskq_nb_free in xskq_produce_addr since it is already called in xskq_reserve_addr. No need to do it twice. When there is only one cached producer pointer, we can also simplify xskq_nb_free by removing one argument. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-4-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Simplify detection of empty and full ringsMagnus Karlsson
In order to set the correct return flags for poll, the xsk code has to check if the Rx queue is empty and if the Tx queue is full. This code was unnecessarily large and complex as it used the functions that are used to update the local state from the global state (xskq_nb_free and xskq_nb_avail). Since we are not doing this nor updating any data dependent on this state, we can simplify the functions. Another benefit from this is that we can also simplify the xskq_nb_free and xskq_nb_avail functions in a later commit. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-3-git-send-email-magnus.karlsson@intel.com
2019-12-20xsk: Eliminate the lazy update thresholdMagnus Karlsson
The lazy update threshold was introduced to keep the producer and consumer some distance apart in the completion ring. This was important in the beginning of the development of AF_XDP as the ring format as that point in time was very sensitive to the producer and consumer being on the same cache line. This is not the case anymore as the current ring format does not degrade in any noticeable way when this happens. Moreover, this threshold makes it impossible to run the system with rings that have less than 128 entries. So let us remove this threshold and just get one entry from the ring as in all other functions. This will enable us to remove this function in a later commit. Note that xskq_produce_addr_lazy followed by xskq_produce_flush_addr_n are still not the same function as xskq_produce_addr() as it operates on another cached pointer. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1576759171-28550-2-git-send-email-magnus.karlsson@intel.com
2019-12-20sbitmap: only queue kyber's wait callback if not already activeDavid Jeffery
Under heavy loads where the kyber I/O scheduler hits the token limits for its scheduling domains, kyber can become stuck. When active requests complete, kyber may not be woken up leaving the I/O requests in kyber stuck. This stuck state is due to a race condition with kyber and the sbitmap functions it uses to run a callback when enough requests have completed. The running of a sbt_wait callback can race with the attempt to insert the sbt_wait. Since sbitmap_del_wait_queue removes the sbt_wait from the list first then sets the sbq field to NULL, kyber can see the item as not on a list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This results in the sbt_wait being inserted onto the wait list but ws_active doesn't get incremented. So the sbitmap queue does not know there is a waiter on a wait list. Since sbitmap doesn't think there is a waiter, kyber may never be informed that there are domain tokens available and the I/O never advances. With the sbt_wait on a wait list, kyber believes it has an active waiter so cannot insert a new waiter when reaching the domain's full state. This race can be fixed by only adding the sbt_wait to the queue if the sbq field is NULL. If sbq is not NULL, there is already an action active which will trigger the re-running of kyber. Let it run and add the sbt_wait to the wait list if still needing to wait. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Reported-by: John Pittman <jpittman@redhat.com> Tested-by: John Pittman <jpittman@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20cxgb4: fix refcount init for TC-MQPRIO offloadRahul Lakkireddy
Properly initialize refcount to 1 when hardware queue arrays for TC-MQPRIO offload have been freshly allocated. Otherwise, following warning is observed. Also fix up error path to only free hardware queue arrays when refcount reaches 0. [ 130.075342] ------------[ cut here ]------------ [ 130.075343] refcount_t: addition on 0; use-after-free. [ 130.075355] WARNING: CPU: 0 PID: 10870 at lib/refcount.c:25 refcount_warn_saturate+0xe1/0x100 [ 130.075356] Modules linked in: sch_mqprio iptable_nat ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_umad iw_cxgb4 libcxgb ib_uverbs x86_pkg_temp_thermal cxgb4 igb [ 130.075361] CPU: 0 PID: 10870 Comm: tc Kdump: loaded Not tainted 5.5.0-rc1+ #11 [ 130.075362] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.2 01/16/2015 [ 130.075363] RIP: 0010:refcount_warn_saturate+0xe1/0x100 [ 130.075364] Code: e8 14 41 c1 ff 0f 0b c3 80 3d 44 f4 10 01 00 0f 85 63 ff ff ff 48 c7 c7 38 9f 83 8c 31 c0 c6 05 2e f4 10 01 01 e8 ef 40 c1 ff <0f> 0b c3 48 c7 c7 10 9f 83 8c 31 c0 c6 05 17 f4 10 01 01 e8 d7 40 [ 130.075365] RSP: 0018:ffffa48d00c0b768 EFLAGS: 00010286 [ 130.075366] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001 [ 130.075366] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffff8a2e9fa187d0 [ 130.075367] RBP: ffff8a2e93890000 R08: 0000000000000398 R09: 000000000000003c [ 130.075367] R10: 00000000000142a0 R11: 0000000000000397 R12: ffffa48d00c0b848 [ 130.075368] R13: ffff8a2e94746498 R14: ffff8a2e966f7000 R15: 0000000000000031 [ 130.075368] FS: 00007f689015f840(0000) GS:ffff8a2e9fa00000(0000) knlGS:0000000000000000 [ 130.075369] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 130.075369] CR2: 00000000006762a0 CR3: 00000007cf164005 CR4: 00000000001606f0 [ 130.075370] Call Trace: [ 130.075377] cxgb4_setup_tc_mqprio+0xbee/0xc30 [cxgb4] [ 130.075382] ? cxgb4_ethofld_restart+0x50/0x50 [cxgb4] [ 130.075384] ? pfifo_fast_init+0x7e/0xf0 [ 130.075386] mqprio_init+0x5f4/0x630 [sch_mqprio] [ 130.075389] qdisc_create+0x1bf/0x4a0 [ 130.075390] tc_modify_qdisc+0x1ff/0x770 [ 130.075392] rtnetlink_rcv_msg+0x28b/0x350 [ 130.075394] ? rtnl_calcit.isra.32+0x110/0x110 [ 130.075395] netlink_rcv_skb+0xc6/0x100 [ 130.075396] netlink_unicast+0x1db/0x330 [ 130.075397] netlink_sendmsg+0x2f5/0x460 [ 130.075399] ? _copy_from_user+0x2e/0x60 [ 130.075400] sock_sendmsg+0x59/0x70 [ 130.075401] ____sys_sendmsg+0x1f0/0x230 [ 130.075402] ? copy_msghdr_from_user+0xd7/0x140 [ 130.075403] ___sys_sendmsg+0x77/0xb0 [ 130.075404] ? ___sys_recvmsg+0x84/0xb0 [ 130.075406] ? __handle_mm_fault+0x377/0xaf0 [ 130.075407] __sys_sendmsg+0x53/0xa0 [ 130.075409] do_syscall_64+0x44/0x130 [ 130.075412] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 130.075413] RIP: 0033:0x7f688f13af10 [ 130.075414] Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24 [ 130.075414] RSP: 002b:00007ffe6c7d9988 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 130.075415] RAX: ffffffffffffffda RBX: 00000000006703a0 RCX: 00007f688f13af10 [ 130.075415] RDX: 0000000000000000 RSI: 00007ffe6c7d99f0 RDI: 0000000000000003 [ 130.075416] RBP: 000000005df38312 R08: 0000000000000002 R09: 0000000000008000 [ 130.075416] R10: 00007ffe6c7d93e0 R11: 0000000000000246 R12: 0000000000000000 [ 130.075417] R13: 00007ffe6c7e9c50 R14: 0000000000000001 R15: 000000000067c600 [ 130.075418] ---[ end trace 8fbb3bf36a8671db ]--- v2: - Move the refcount_set() closer to where the hardware queue arrays are being allocated. - Fix up error path to only free hardware queue arrays when refcount reaches 0. Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20Merge tag 'devicetree-fixes-for-5.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree fix from Rob Herring: "Add missing 'properties' keyword enclosing 'snps,tso' in snps,dwmac binding" * tag 'devicetree-fixes-for-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: Add missing 'properties' keyword enclosing 'snps,tso'
2019-12-20Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Leftover put_cpu() in the perf/smmuv3 error path. - Add Hisilicon TSV110 to spectre-v2 safe list * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: cpu_errata: Add Hisilicon TSV110 to spectre-v2 safe list perf/smmuv3: Remove the leftover put_cpu() in error path
2019-12-20Merge tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Probably the last one before Christmas, I'll see if there is much demand over next few weeks for more fixes, I expect it'll be quiet enough. This has one exynos fix, and a bunch of i915 core and i915 GVT fixes. Summary: exynos: - component delete fix i915: - Fix to drop an unused and harmful display W/A - Fix to define EHL power wells independent of ICL - Fix for priority inversion on bonded requests - Fix in mmio offset calculation of DSB instance - Fix memory leak from get_task_pid when banning clients - Fixes to avoid dereference of uninitialized ops in dma_fence tracing and keep reference to execbuf object until submitted. - vGPU state setting locking fix (Zhenyu) - Fix vGPU display dmabuf as read-only (Zhenyu) - Properly handle vGPU display dmabuf page pin when rendering (Tina) - Fix one guest boot warning to handle guc reset state (Fred)" * tag 'drm-fixes-2019-12-21' of git://anongit.freedesktop.org/drm/drm: drm/exynos: gsc: add missed component_del drm/i915: Fix pid leak with banned clients drm/i915/gem: Keep request alive while attaching fences drm/i915: Fix WARN_ON condition for cursor plane ddb allocation drm/i915/gvt: Fix guest boot warning drm/i915/tgl: Drop Wa#1178 drm/i915/ehl: Define EHL powerwells independently of ICL drm/i915: Set fence_work.ops before dma_fence_init drm/i915: Copy across scheduler behaviour flags across submit fences drm/i915/dsb: Fix in mmio offset calculation of DSB instance drm/i915/gvt: Pin vgpu dma address before using drm/i915/gvt: set guest display buffer as readonly drm/i915/gvt: use vgpu lock for active state setting
2019-12-20Merge tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Here's a set of fixes that should go into 5.5-rc3 for io_uring. This is bigger than I'd like it to be, mainly because we're fixing the case where an application reuses sqe data right after issue. This really must work, or it's confusing. With 5.5 we're flagging us as submit stable for the actual data, this must also be the case for SQEs. Honestly, I'd really like to add another series on top of this, since it cleans it up considerable and prevents any SQE reuse by design. I posted that here: https://lore.kernel.org/io-uring/20191220174742.7449-1-axboe@kernel.dk/T/#u and may still send it your way early next week once it's been looked at and had some more soak time (does pass all regression tests). With that series, we've unified the prep+issue handling, and only the prep phase even has access to the SQE. Anyway, outside of that, fixes in here for a few other issues that have been hit in testing or production" * tag 'io_uring-5.5-20191220' of git://git.kernel.dk/linux-block: io_uring: io_wq_submit_work() should not touch req->rw io_uring: don't wait when under-submitting io_uring: warn about unhandled opcode io_uring: read opcode and user_data from SQE exactly once io_uring: make IORING_OP_TIMEOUT_REMOVE deferrable io_uring: make IORING_OP_CANCEL_ASYNC deferrable io_uring: make IORING_POLL_ADD and IORING_POLL_REMOVE deferrable io_uring: make HARDLINK imply LINK io_uring: any deferred command must have stable sqe data io_uring: remove 'sqe' parameter to the OP helpers that take it io_uring: fix pre-prepped issue with force_nonblock == true io-wq: re-add io_wq_current_is_worker() io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG io_uring: fix stale comment and a few typos