summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-29mlxsw: spectrum: Use dedicated policer for VRRP packetsIdo Schimmel
Currently, VRRP packets and packets that hit exceptions during routing (e.g., MTU error) are policed using the same policer towards the CPU. This means, for example, that misconfiguration of the MTU on a routed interface can prevent VRRP packets from reaching the CPU, which in turn can cause the VRRP daemon to assume it is the Master router. Fix this by using a dedicated policer for VRRP packets. Fixes: 11566d34f895 ("mlxsw: spectrum: Add VRRP traps") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29mlxsw: spectrum_router: Skip loopback RIFs during MAC validationAmit Cohen
When a router interface (RIF) is created the MAC address of the backing netdev is verified to have the same MSBs as existing RIFs. This is required in order to avoid changing existing RIF MAC addresses that all share the same MSBs. Loopback RIFs are special in this regard as they do not have a MAC address, given they are only used to loop packets from the overlay to the underlay. Without this change, an error is returned when trying to create a RIF after the creation of a GRE tunnel that is represented by a loopback RIF. 'rif->dev->dev_addr' points to the GRE device's local IP, which does not share the same MSBs as physical interfaces. Adding an IP address to any physical interface results in: Error: mlxsw_spectrum: All router interface MAC addresses must have the same prefix. Fix this by skipping loopback RIFs during MAC validation. Fixes: 74bc99397438 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses") Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-29Merge tag 'riscv/for-v5.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "One important fix for RISC-V: - Redirect any incoming syscall with an ID less than -1 to sys_ni_syscall, rather than allowing them to fall through into the syscall handler. and two minor build fixes: - Export __asm_copy_{from,to}_user() from where they are defined. This fixes a build error triggered by some randconfigs. - Export flush_icache_all(). I'd resisted this before, since historically we didn't want modules to be able to flush the I$ directly; but apparently everyone else is doing it now" * tag 'riscv/for-v5.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: export flush_icache_all to modules riscv: reject invalid syscalls below -1 riscv: fix compile failure with EXPORT_SYMBOL() & !MMU
2019-12-29Merge tag 'locks-v5.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull /proc/locks formatting fix from Jeff Layton: "This is a trivial fix for a _very_ long standing bug in /proc/locks formatting. Ordinarily, I'd wait for the merge window for something like this, but it is making it difficult to validate some overlayfs fixes. I've also gone ahead and marked this for stable" * tag 'locks-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: print unsigned ino in /proc/locks
2019-12-29Merge tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "One performance fix for large directory searches, and one minor style cleanup noticed by Clang" * tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Optimize readdir on reparse points cifs: Adjust indentation in smb2_open_file
2019-12-29locks: print unsigned ino in /proc/locksAmir Goldstein
An ino is unsigned, so display it as such in /proc/locks. Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-12-28Merge branch 'DSA-TX-tstamp'David S. Miller
Vladimir Oltean says: ==================== The DSA TX timestamping situation This series is the moral v2 of "[PATCH net] net: dsa: sja1105: Fix double delivery of TX timestamps to socket error queue" [0] which did not manage to convince public opinion (actually it didn't convince me neither). This fixes PTP timestamping on one particular board, where the DSA switch is sja1105 and the master is gianfar. Unfortunately there is no way to make the fix more general without committing logical inaccuracies: the SKBTX_IN_PROGRESS flag does serve a purpose, even if the sja1105 driver is not using it now: it prevents delivering a SW timestamp to the app socket when the HW timestamp will be provided. So not setting this flag (the approach from v1) might create avoidable complications in the future (not to mention that there isn't any satisfactory explanation on why that would be the correct solution). So the goal of this change set is to create a more strict framework for DSA master devices when attached to PTP switches, and to fix the first master driver that is overstepping its duties and is delivering unsolicited TX timestamps. [0]: https://www.spinics.net/lists/netdev/msg619699.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28net: dsa: Deny PTP on master if switch supports itVladimir Oltean
It is possible to kill PTP on a DSA switch completely and absolutely, until a reboot, with a simple command: tcpdump -i eth2 -j adapter_unsynced where eth2 is the switch's DSA master. Why? Well, in short, the PTP API in place today is a bit rudimentary and relies on applications to retrieve the TX timestamps by polling the error queue and looking at the cmsg structure. But there is no timestamp identification of any sorts (except whether it's HW or SW), you don't know how many more timestamps are there to come, which one is this one, from whom it is, etc. In other words, the SO_TIMESTAMPING API is fundamentally limited in that you can get a single HW timestamp from the stack. And the "-j adapter_unsynced" flag of tcpdump enables hardware timestamping. So let's imagine what happens when the DSA master decides it wants to deliver TX timestamps to the skb's socket too: - The timestamp that the user space sees is taken by the DSA master. Whereas the RX timestamp will eventually be overwritten by the DSA switch. So the RX and TX timestamps will be in different time bases (aka garbage). - The user space applications have no way to deal with the second (real) TX timestamp finally delivered by the DSA switch, or even to know to wait for it. Take ptp4l from the linuxptp project, for example. This is its behavior after running tcpdump, before the patch: ptp4l[172]: [6469.594] Unexpected data on socket err queue: ptp4l[172]: [6469.693] rms 8 max 16 freq -21257 +/- 11 delay 748 +/- 0 ptp4l[172]: [6469.711] Unexpected data on socket err queue: ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.721] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.838] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.848] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02 ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05 ptp4l[172]: 0040 de 06 00 01 ptp4l[172]: [6469.855] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.974] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 The ptp4l program itself is heavily patched to show this (more details here [0]). Otherwise, by default it just hangs. On the other hand, with the DSA patch to disallow HW timestamping applied: tcpdump -i eth2 -j adapter_unsynced tcpdump: SIOCSHWTSTAMP failed: Device or resource busy So it is a fact of life that PTP timestamping on the DSA master is incompatible with timestamping on the switch MAC, at least with the current API. And if the switch supports PTP, taking the timestamps from the switch MAC is highly preferable anyway, due to the fact that those don't contain the queuing latencies of the switch. So just disallow PTP on the DSA master if there is any PTP-capable switch attached. [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/ Fixes: 0336369d3a4d ("net: dsa: forward hardware timestamping ioctls to switch driver") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28gianfar: Fix TX timestamping with a stacked DSA driverVladimir Oltean
The driver wrongly assumes that it is the only entity that can set the SKBTX_IN_PROGRESS bit of the current skb. Therefore, in the gfar_clean_tx_ring function, where the TX timestamp is collected if necessary, the aforementioned bit is used to discriminate whether or not the TX timestamp should be delivered to the socket's error queue. But a stacked driver such as a DSA switch can also set the SKBTX_IN_PROGRESS bit, which is actually exactly what it should do in order to denote that the hardware timestamping process is undergoing. Therefore, gianfar would misinterpret the "in progress" bit as being its own, and deliver a second skb clone in the socket's error queue, completely throwing off a PTP process which is not expecting to receive it, _even though_ TX timestamping is not enabled for gianfar. There have been discussions [0] as to whether non-MAC drivers need or not to set SKBTX_IN_PROGRESS at all (whose purpose is to avoid sending 2 timestamps, a sw and a hw one, to applications which only expect one). But as of this patch, there are at least 2 PTP drivers that would break in conjunction with gianfar: the sja1105 DSA switch and the felix switch, by way of its ocelot core driver. So regardless of that conclusion, fix the gianfar driver to not do stuff based on flags set by others and not intended for it. [0]: https://www.spinics.net/lists/netdev/msg619699.html Fixes: f0ee7acfcdd4 ("gianfar: Add hardware TX timestamping support") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-28net/wan/fsl_ucc_hdlc: remove set but not used variables 'ut_info' and 'ret'Chen Zhou
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/wan/fsl_ucc_hdlc.c: In function ucc_hdlc_irq_handler: drivers/net/wan/fsl_ucc_hdlc.c:643:23: warning: variable ut_info set but not used [-Wunused-but-set-variable] drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_suspend: drivers/net/wan/fsl_ucc_hdlc.c:880:23: warning: variable ut_info set but not used [-Wunused-but-set-variable] drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_resume: drivers/net/wan/fsl_ucc_hdlc.c:925:6: warning: variable ret set but not used [-Wunused-but-set-variable] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27riscv: export flush_icache_all to modulesOlof Johansson
This is needed by LKDTM (crash dump test module), it calls flush_icache_range(), which on RISC-V turns into flush_icache_all(). On other architectures, the actual implementation is exported, so follow that precedence and export it here too. Fixes build of CONFIG_LKDTM that fails with: ERROR: "flush_icache_all" [drivers/misc/lkdtm/lkdtm.ko] undefined! Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27riscv: reject invalid syscalls below -1David Abdurachmanov
Running "stress-ng --enosys 4 -t 20 -v" showed a large number of kernel oops with "Unable to handle kernel paging request at virtual address" message. This happens when enosys stressor starts testing random non-valid syscalls. I forgot to redirect any syscall below -1 to sys_ni_syscall. With the patch kernel oops messages are gone while running stress-ng enosys stressor. Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com> Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER") Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27riscv: fix compile failure with EXPORT_SYMBOL() & !MMULuc Van Oostenryck
When support for !MMU was added, the declaration of __asm_copy_to_user() & __asm_copy_from_user() were #ifdefed out hence their EXPORT_SYMBOL() give an error message like: .../riscv_ksyms.c:13:15: error: '__asm_copy_to_user' undeclared here .../riscv_ksyms.c:14:15: error: '__asm_copy_from_user' undeclared here Since these symbols are not defined with !MMU it's wrong to export them. Same for __clear_user() (even though this one is also declared in include/asm-generic/uaccess.h and thus doesn't give an error message). Fix this by doing the EXPORT_SYMBOL() directly where these symbols are defined: inside lib/uaccess.S itself. Fixes: 6bd33e1ece52 ("riscv: fix compile failure with EXPORT_SYMBOL() & !MMU") Reported-by: kbuild test robot <lkp@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-12-27Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four fixes and one spelling update, all in drivers: two in lpfc and the rest in mp3sas, cxgbi and target" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target/iblock: Fix protection error with blocks greater than 512B scsi: libcxgbi: fix NULL pointer dereference in cxgbi_device_destroy() scsi: lpfc: fix spelling mistakes of asynchronous scsi: lpfc: fix build failure with DEBUGFS disabled scsi: mpt3sas: Fix double free in attach error handling
2019-12-27Merge branch 'ethtool-netlink-part-one'David S. Miller
Michal Kubecek says: ==================== ethtool netlink interface, part 1 This is first part of netlink based alternative userspace interface for ethtool. It aims to address some long known issues with the ioctl interface, mainly lack of extensibility, raciness, limited error reporting and absence of notifications. The goal is to allow userspace ethtool utility to provide all features it currently does but without using the ioctl interface. However, some features provided by ethtool ioctl API will be available through other netlink interfaces (rtnetlink, devlink) if it's more appropriate. The interface uses generic netlink family "ethtool" and provides multicast group "monitor" which is used for notifications. Documentation for the interface is in Documentation/networking/ethtool-netlink.rst file. The netlink interface is optional, it is built when CONFIG_ETHTOOL_NETLINK (bool) option is enabled. There are three types of request messages distinguished by suffix "_GET" (query for information), "_SET" (modify parameters) and "_ACT" (perform an action). Kernel reply messages have name with additional suffix "_REPLY" (e.g. ETHTOOL_MSG_SETTINGS_GET_REPLY). Most "_SET" and "_ACT" message types do not have matching reply type as only some of them need additional reply data beyond numeric error code and extack. Kernel also broadcasts notification messages ("_NTF" suffix) on changes. Basic concepts: - make extensions easier not only by allowing new attributes but also by imposing as few artificial limits as possible, e.g. by using arbitrary size bit sets for most bitmap attributes or by not using fixed size strings - use extack for error reporting and warnings - send netlink notifications on changes (even if they were done using the ioctl interface) and actions - avoid the racy read/modify/write cycle between kernel and userspace by sending only attributes which userspace wants to change; there is still a read/modify/write cycle between generic kernel code and ethtool_ops handler in NIC driver but it is only in kernel and under RTNL lock - reduce the number of name lists that need to be kept in sync between kernel and userspace (e.g. recognized link modes) - where feasible, allow dump requests to query specific information for all network devices - as parsing and generating netlink messages is more complicated than simply copying data structures between userspace API and ethtool_ops handlers (which most ioctl commands do), split the code into multiple files in net/ethtool directory; move net/core/ethtool.c also to this directory and rename it to ioctl.c Changes between v8 and v9: - fix ethnl_update_u8() - fix description of ETHTOOL_A_LINKSTATE_LINK in rst file - add explanation of verbose vs. compact bitset usage to documentation - link ethtool-netlink.rst into toctree Main changes between v7 and v8: - preliminary patches sent as a separate series (already in net-next) - split notification related changes out of _SET patches - drop request specific flags from common header - use FLAG/flag rather than GFLAG/gflag for global flags (as there are only global flags now) - allow device names up to ALTIFNAMSIZ characters - rename ETHTOOL_A_BITSET_LIST to ETHTOOL_A_BITSET_NOMASK - rename ETHTOOL_A_BIT{,S}_* to ETHTOOL_A_BITSET_BIT{,S}_* - use standard bitset helpers for link modes (rather than in-place conversion) - use "default" rather than "standard" for unified _GET handlers - fixed 64-bit big endian bitset code Main changes between v6 and v7: - split complex messages into small single purpose ones (drop info and request masks and one level of nesting) - separate request information and reply data into two structures - refactor bitset handling (no simultaneous u32/ulong handling but avoid kmalloc() except for long bitmaps on 64-bit big endian architectures) - use only fixed size strings internally (will be replaced by char * eventually but that will require rewriting also existing ioctl code) - rework ethnl_update_* helpers to return error code - rename request flag constants (to ETHTOOL_[GR]FLAG_ prefix) - convert documentation to rst Main changes between v5 and v6: - use ETHTOOL_MSG_ prefix for message types - replace ETHA_ prefix for netlink attributes by ETHTOOL_A_ - replace ETH_x_IM_y for infomask bits by ETHTOOL_IM_x_y - split GET reply types from SET requests and notifications - split kernel and userspace message types into different enums - remove INFO_GET requests from submitted part - drop EVENT notifications (use rtnetlink and on-demand string set load) - reorganize patches to reduce the number of intermitent warnings - unify request/reply header and its processing - another nest around strings in a string set for consistency - more consistent identifier naming - coding style cleanup - get rid of some of the helpers - set bad attribute in extack where applicable - various bug fixes - improve documentation and code comments, more kerneldoc comments - more verbose commit messages Changes between v4 and v5: - do not panic on failed initialization, only WARN() Main changes between RFC v3 and v4: - use more kerneldoc style comments - strict attribute policy checking - use macros for tables of link mode names and parameters - provide permanent hardware address in rtnetlink - coding style cleanup - split too long patches, reorder - wrap more ETHA_SETTINGS_* attributes in nests - add also some SET_* implementation into submitted part Main changes between RFC v2 and RFC v3: - do not allow building as a module (no netdev notifiers needed) - drop some obsolete fields - add permanent hw address, timestamping and private flags support - rework bitset handling to get rid of variable length arrays - notify monitor on device renames - restructure GET_SETTINGS/SET_SETTINGS messages - split too long patches and submit only first part of the series Main changes between RFC v1 and RFC v2: - support dumps for all "get" requests - provide notifications for changes related to supported request types - support getting string sets (both global and per device) - support getting/setting device features - get rid of family specific header, everything passed as attributes - split netlink code into multiple files in net/ethtool/ directory ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide link state with LINKSTATE_GET requestMichal Kubecek
Implement LINKSTATE_GET netlink request to get link state information. At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command is returned. LINKSTATE_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: add LINKMODES_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKMODES_GET request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKMODES_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: set link modes related data with LINKMODES_SET requestMichal Kubecek
Implement LINKMODES_SET netlink request to set advertised linkmodes and related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do. The request allows setting autonegotiation flag, speed, duplex and advertised link modes. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide link mode information with LINKMODES_GET requestMichal Kubecek
Implement LINKMODES_GET netlink request to get link modes related information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides supported, advertised and peer advertised link modes, autonegotiation flag, speed and duplex. LINKMODES_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: add LINKINFO_NTF notificationMichal Kubecek
Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands. The notification message has the same format as reply to LINKINFO_GET request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the notification if there is a change but the ioctl command handlers do not check if there is an actual change and trigger the notification whenever the commands are executed. As all work is done by ethnl_default_notify() handler and callback functions introduced to handle LINKINFO_GET requests, all that remains is adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where needed. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: add default notification handlerMichal Kubecek
The ethtool netlink notifications have the same format as related GET replies so that if generic GET handling framework is used to process GET requests, its callbacks and instance of struct get_request_ops can be also used to compose corresponding notification message. Provide function ethnl_std_notify() to be used as notification handler in ethnl_notify_handlers table. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: set link settings with LINKINFO_SET requestMichal Kubecek
Implement LINKINFO_SET netlink request to set link settings queried by LINKINFO_GET message. Only physical port, phy MDIO address and MDI(-X) control can be set, attempt to modify MDI(-X) status and transceiver is rejected. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide link settings with LINKINFO_GET requestMichal Kubecek
Implement LINKINFO_GET netlink request to get basic link settings provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands. This request provides settings not directly related to autonegotiation and link mode selection: physical port, phy MDIO address, MDI(-X) status, MDI(-X) control and transceiver. LINKINFO_GET request can be used with NLM_F_DUMP (without device identification) to request the information for all devices in current network namespace providing the data. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: provide string sets with STRSET_GET requestMichal Kubecek
Requests a contents of one or more string sets, i.e. indexed arrays of strings; this information is provided by ETHTOOL_GSSET_INFO and ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all information can be retrieved with one request and mulitple string sets can be requested at once. There are three types of requests: - no NLM_F_DUMP, no device: get "global" stringsets - no NLM_F_DUMP, with device: get string sets related to the device - NLM_F_DUMP, no device: get device related string sets for all devices Client can request either all string sets of given type (global or device related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only set sizes (numbers of strings) are returned. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: default handlers for GET requestsMichal Kubecek
Significant part of GET request processing is common for most request types but unfortunately it cannot be easily separated from type specific code as we need to alternate between common actions (parsing common request header, allocating message and filling netlink/genetlink headers etc.) and specific actions (querying the device, composing the reply). The processing also happens in three different situations: "do" request, "dump" request and notification, each doing things in slightly different way. The request specific code is implemented in four or five callbacks defined in an instance of struct get_request_ops: parse_request() - parse incoming message prepare_data() - retrieve data from driver or NIC reply_size() - estimate reply message size fill_reply() - compose reply message cleanup_data() - (optional) clean up additional data Other members of struct get_request_ops describe the data structure holding information from client request and data used to compose the message. The default handlers ethnl_default_doit(), ethnl_default_dumpit(), ethnl_default_start() and ethnl_default_done() can be then used in genl_ops handler. Notification handler will be introduced in a later patch. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: support for netlink notificationsMichal Kubecek
Add infrastructure for ethtool netlink notifications. There is only one multicast group "monitor" which is used to notify userspace about changes and actions performed. Notification messages (types using suffix _NTF) share the format with replies to GET requests. Notifications are supposed to be broadcasted on every configuration change, whether it is done using the netlink interface or ioctl one. Netlink SET requests only trigger a notification if some data is actually changed. To trigger an ethtool notification, both ethtool netlink and external code use ethtool_notify() helper. This helper requires RTNL to be held and may sleep. Handlers sending messages for specific notification message types are registered in ethnl_notify_handlers array. As notifications can be triggered from other code, ethnl_ok flag is used to prevent an attempt to send notification before genetlink family is registered. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: netlink bitset handlingMichal Kubecek
The ethtool netlink code uses common framework for passing arbitrary length bit sets to allow future extensions. A bitset can be a list (only one bitmap) or can consist of value and mask pair (used e.g. when client want to modify only some bits). A bitset can use one of two formats: verbose (bit by bit) or compact. Verbose format consists of bitset size (number of bits), list flag and an array of bit nests, telling which bits are part of the list or which bits are in the mask and which of them are to be set. In requests, bits can be identified by index (position) or by name. In replies, kernel provides both index and name. Verbose format is suitable for "one shot" applications like standard ethtool command as it avoids the need to either keep bit names (e.g. link modes) in sync with kernel or having to add an extra roundtrip for string set request (e.g. for private flags). Compact format uses one (list) or two (value/mask) arrays of 32-bit words to store the bitmap(s). It is more suitable for long running applications (ethtool in monitor mode or network management daemons) which can retrieve the names once and then pass only compact bitmaps to save space. Userspace requests can use either format; ETHTOOL_FLAG_COMPACT_BITSETS flag in request header tells kernel which format to use in reply. Notifications always use compact format. As some code uses arrays of unsigned long for internal representation and some arrays of u32 (or even a single u32), two sets of parse/compose helpers are introduced. To avoid code duplication, helpers for unsigned long arrays are implemented as wrappers around helpers for u32 arrays. There are two reasons for this choice: (1) u32 arrays are more frequent in ethtool code and (2) unsigned long array can be always interpreted as an u32 array on little endian 64-bit and all 32-bit architectures while we would need special handling for odd number of u32 words in the opposite direction. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: helper functions for netlink interfaceMichal Kubecek
Add common request/reply header definition and helpers to parse request header and fill reply header. Provide ethnl_update_* helpers to update structure members from request attributes (to be used for *_SET requests). Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ethtool: introduce ethtool netlink interfaceMichal Kubecek
Basic genetlink and init infrastructure for the netlink interface, register genetlink family "ethtool". Add CONFIG_ETHTOOL_NETLINK Kconfig option to make the build optional. Add initial overall interface description into Documentation/networking/ethtool-netlink.rst, further patches will add more detailed information. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCsMartin Blumenstingl
GXBB and newer SoCs use the fixed FCLK_DIV2 (1GHz) clock as input for the m250_sel clock. Meson8b and Meson8m2 use MPLL2 instead, whose rate can be adjusted at runtime. So far we have been running MPLL2 with ~250MHz (and the internal m250_div with value 1), which worked enough that we could transfer data with an TX delay of 4ns. Unfortunately there is high packet loss with an RGMII PHY when transferring data (receiving data works fine though). Odroid-C1's u-boot is running with a TX delay of only 2ns as well as the internal m250_div set to 2 - no lost (TX) packets can be observed with that setting in u-boot. Manual testing has shown that the TX packet loss goes away when using the following settings in Linux (the vendor kernel uses the same settings): - MPLL2 clock set to ~500MHz - m250_div set to 2 - TX delay set to 2ns on the MAC side Update the m250_div divider settings to only accept dividers greater or equal 2 to fix the TX delay generated by the MAC. iperf3 results before the change: [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 182 MBytes 153 Mbits/sec 514 sender [ 5] 0.00-10.00 sec 182 MBytes 152 Mbits/sec receiver iperf3 results after the change (including an updated TX delay of 2ns): [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-10.00 sec 927 MBytes 778 Mbits/sec 0 sender [ 5] 0.00-10.01 sec 927 MBytes 777 Mbits/sec receiver Fixes: 4f6a71b84e1afd ("net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27net/sched: act_mirred: Pull mac prior redir to non mac_header_xmit deviceShmulik Ladkani
There's no skb_pull performed when a mirred action is set at egress of a mac device, with a target device/action that expects skb->data to point at the network header. As a result, either the target device is errornously given an skb with data pointing to the mac (egress case), or the net stack receives the skb with data pointing to the mac (ingress case). E.g: # tc qdisc add dev eth9 root handle 1: prio # tc filter add dev eth9 parent 1: prio 9 protocol ip handle 9 basic \ action mirred egress redirect dev tun0 (tun0 is a tun device. result: tun0 errornously gets the eth header instead of the iph) Revise the push/pull logic of tcf_mirred_act() to not rely on the skb_at_tc_ingress() vs tcf_mirred_act_wants_ingress() comparison, as it does not cover all "pull" cases. Instead, calculate whether the required action on the target device requires the data to point at the network header, and compare this to whether skb->data points to network header - and make the push/pull adjustments as necessary. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Shmulik Ladkani <sladkani@proofpoint.com> Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27sctp: do trace_sctp_probe after SACK validation and checkKevin Kou
The function sctp_sf_eat_sack_6_2 now performs the Verification Tag validation, Chunk length validation, Bogu check, and also the detection of out-of-order SACK based on the RFC2960 Section 6.2 at the beginning, and finally performs the further processing of SACK. The trace_sctp_probe now triggered before the above necessary validation and check. this patch is to do the trace_sctp_probe after the chunk sanity tests, but keep doing trace if the SACK received is out of order, for the out-of-order SACK is valuable to congestion control debugging. v1->v2: - keep doing SCTP trace if the SACK is out of order as Marcelo's suggestion. v2->v3: - regenerate the patch as v2 generated on top of v1, and add 'net-next' tag to the new one as Marcelo's comments. Signed-off-by: Kevin Kou <qdkevin.kou@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27mv88e6xxx: Add serdes Rx statisticsNikita Yushchenko
If packet checker is enabled in the serdes, then Rx counter registers start working, and no side effects have been detected. This patch enables packet checker automatically when powering serdes on, and exposes Rx counter registers via ethtool statistics interface. Code partially basded by older attempt by Andrew Lunn. Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27net: ena: remove set but not used variable 'rx_ring'YueHaibing
drivers/net/ethernet/amazon/ena/ena_netdev.c: In function ena_xdp_xmit_buff: drivers/net/ethernet/amazon/ena/ena_netdev.c:316:19: warning: variable rx_ring set but not used [-Wunused-but-set-variable] commit 548c4940b9f1 ("net: ena: Implement XDP_TX action") left behind this unused variable. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27net: dsa: qca: ar9331: drop pointless static qualifier in ar9331_sw_mbus_initMao Wenan
There is no need to set variable 'mbus' static since new value always be assigned before use it. Signed-off-by: Mao Wenan <maowenan@huawei.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27ppp: Remove redundant BUG_ON() check in ppp_pernetXu Wang
Passing NULL to ppp_pernet causes a crash via BUG_ON. Dereferencing net in net_generic() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27Merge branch 'tcp_cubic-various-fixes'David S. Miller
Eric Dumazet says: ==================== tcp_cubic: various fixes This patch series converts tcp_cubic to usec clock resolution for Hystart logic. This makes Hystart more relevant for data-center flows. Prior to this series, Hystart was not kicking, or was kicking without good reason, since the 1ms clock was too coarse. Last patch also fixes an issue with Hystart vs TCP pacing. v2: removed a last-minute debug chunk from last patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27tcp_cubic: make Hystart aware of pacingEric Dumazet
For years we disabled Hystart ACK train detection at Google because it was fooled by TCP pacing. ACK train detection uses a simple heuristic, detecting if we receive ACK past half the RTT, to exit slow start before hitting the bottleneck and experience massive drops. But pacing by design might delay packets up to RTT/2, so we need to tweak the Hystart logic to be aware of this extra delay. Tested: Added a 100 usec delay at receiver. Before: nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 9117 7057 9553 8300 7030 6849 9533 10126 6876 8473 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 1230 0.0 After : nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 9845 10103 10866 11096 11936 11487 11773 12188 11066 11894 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 6462 0.0 Disabling Hystart ACK Train detection gives similar numbers echo 2 >/sys/module/tcp_cubic/parameters/hystart_detect nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 11173 10954 12455 10627 11578 11583 11222 10880 10665 11366 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27tcp_cubic: tweak Hystart detection for short RTT flowsEric Dumazet
After switching ca->delay_min to usec resolution, we exit slow start prematurely for very low RTT flows, setting snd_ssthresh to 20. The reason is that delay_min is fed with RTT of small packet trains. Then as cwnd is increased, TCP sends bigger TSO packets. LRO/GRO aggregation and/or interrupt mitigation strategies on receiver tend to inflate RTT samples. Fix this by adding to delay_min the expected delay of two TSO packets, given current pacing rate. Tested: Sender uses pfifo_fast qdisc Before : $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 11348 11707 11562 11428 11773 11534 9878 11693 10597 10968 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 200 0.0 After : $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 14877 14517 15797 18466 17376 14833 17558 17933 16039 18059 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 1670 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27tcp_cubic: switch bictcp_clock() to usec resolutionEric Dumazet
Current 1ms clock feeds ca->round_start, ca->delay_min, ca->last_ack. This is quite problematic for data-center flows, where delay_min is way below 1 ms. This means Hystart Train detection triggers every time jiffies value is updated, since "((s32)(now - ca->round_start) > ca->delay_min >> 4)" expression becomes true. This kind of random behavior can be solved by reusing the existing usec timestamp that TCP keeps in tp->tcp_mstamp Note that a followup patch will tweak things a bit, because during slow start, GRO aggregation on receivers naturally increases the RTT as TSO packets gradually come to ~64KB size. To recap, right after this patch CUBIC Hystart train detection is more aggressive, since short RTT flows might exit slow start at cwnd = 20, instead of being possibly unbounded. Following patch will address this problem. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27tcp_cubic: remove one conditional from hystart_update()Eric Dumazet
If we initialize ca->curr_rtt to ~0U, we do not need to test for zero value in hystart_update() We only read ca->curr_rtt if at least HYSTART_MIN_SAMPLES have been processed, and thus ca->curr_rtt will have a sane value. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27tcp_cubic: optimize hystart_update()Eric Dumazet
We do not care which bit in ca->found is set. We avoid accessing hystart and hystart_detect unless really needed, possibly avoiding one cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-12-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 127 non-merge commits during the last 17 day(s) which contain a total of 110 files changed, 6901 insertions(+), 2721 deletions(-). There are three merge conflicts. Conflicts and resolution looks as follows: 1) Merge conflict in net/bpf/test_run.c: There was a tree-wide cleanup c593642c8be0 ("treewide: Use sizeof_field() macro") which gets in the way with b590cb5f802d ("bpf: Switch to offsetofend in BPF_PROG_TEST_RUN"): <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) + sizeof_field(struct __sk_buff, priority), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c There are a few occasions that look similar to this. Always take the chunk with offsetofend(). Note that there is one where the fields differ in here: <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) + sizeof_field(struct __sk_buff, tstamp), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN"). 2) Merge conflict in arch/riscv/net/bpf_jit_comp.c: (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.) <<<<<<< HEAD if (is_13b_check(off, insn)) return -1; emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx); ======= emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx); >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Result should look like: emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx); 3) Merge conflict in arch/riscv/include/asm/pgtable.h: <<<<<<< HEAD ======= #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) #define vmemmap ((struct page *)VMEMMAP_START) >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Only take the BPF_* defines from there and move them higher up in the same file. Remove the rest from the chunk. The VMALLOC_* etc defines got moved via 01f52e16b868 ("riscv: define vmemmap before pfn_to_page calls"). Result: [...] #define __S101 PAGE_READ_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) [...] Let me know if there are any other issues. Anyway, the main changes are: 1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific to a provided BPF object file. This provides an alternative, simplified API compared to standard libbpf interaction. Also, add libbpf extern variable resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko. 2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also, add various BPF riscv JIT improvements, from Björn Töpel. 3) Extend bpftool to allow matching BPF programs and maps by name, from Paul Chaignon. 4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI flag for allowing updates without service interruption, from Andrey Ignatov. 5) Cleanup and simplification of ring access functions for AF_XDP with a bonus of 0-5% performance improvement, from Magnus Karlsson. 6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa. 7) Move and extend test_select_reuseport into BPF program tests under BPF selftests, from Jakub Sitnicki. 8) Various BPF sample improvements for xdpsock for customizing parameters to set up and benchmark AF_XDP, from Jay Jayatheerthan. 9) Improve libbpf to provide a ulimit hint on permission denied errors. Also change XDP sample programs to attach in driver mode by default, from Toke Høiland-Jørgensen. 10) Extend BPF test infrastructure to allow changing skb mark from tc BPF programs, from Nikita V. Shirokov. 11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King. 12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after libbpf conversion, from Jesper Dangaard Brouer. 13) Minor misc improvements from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27Merge tag 'drm-fixes-2019-12-28' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Post-xmas food coma recovery fixes. Only three fixes for i915 since I expect most people are holidaying. i915: - power management rc6 fix - framebuffer tracking fix - display power management ratelimit fix" * tag 'drm-fixes-2019-12-28' of git://anongit.freedesktop.org/drm/drm: drm/i915: Hold reference to intel_frontbuffer as we track activity drm/i915/gt: Ratelimit display power w/a drm/i915/pmu: Ensure monotonic rc6
2019-12-27Merge tag 'linux-kselftest-5.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - rseq build failures fixes related to glibc 2.30 compatibility from Mathieu Desnoyers - Kunit fixes and cleanups from SeongJae Park - Fixes to filesystems/epoll, firmware, and livepatch build failures and skip handling. * tag 'linux-kselftest-5.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: rseq/selftests: Clarify rseq_prepare_unload() helper requirements rseq/selftests: Fix: Namespace gettid() for compatibility with glibc 2.30 rseq/selftests: Turn off timeout setting kunit/kunit_tool_test: Test '--build_dir' option run kunit: Rename 'kunitconfig' to '.kunitconfig' kunit: Place 'test.log' under the 'build_dir' kunit: Create default config in '--build_dir' kunit: Remove duplicated defconfig creation docs/kunit/start: Use in-tree 'kunit_defconfig' selftests: livepatch: Fix it to do root uid check and skip selftests: firmware: Fix it to do root uid check and skip selftests: filesystems/epoll: fix build error
2019-12-27Merge tag 'pm-5.5-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Fix compile test of the Tegra devfreq driver (Arnd Bergmann) and remove redundant Kconfig dependencies from multiple devfreq drivers (Leonard Crestez)" * tag 'pm-5.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / devfreq: tegra: Add COMMON_CLK dependency PM / devfreq: Drop explicit selection of PM_OPP
2019-12-27Merge tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Removal of now unused busy wqe list (Hillf) - Add cond_resched() to io-wq work processing (Hillf) - And then the series that I hinted at from last week, which removes the sqe from the io_kiocb and keeps all sqe handling on the prep side. This guarantees that an opcode can't do the wrong thing and read the sqe more than once. This is unchanged from last week, no issues have been observed with this in testing. Hence I really think we should fold this into 5.5. * tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-block: io-wq: add cond_resched() to worker thread io-wq: remove unused busy list from io_sqe io_uring: pass in 'sqe' to the prep handlers io_uring: standardize the prep methods io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler io_uring: move all prep state for IORING_OP_CONNECT to prep handler io_uring: add and use struct io_rw for read/writes io_uring: use u64_to_user_ptr() consistently
2019-12-27Merge tag 'libata-5.5-20191226' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull libata fixes from Jens Axboe: "Two things in here: - First half of a series that fixes ahci_brcm, also marked for stable. The other part of the series is going into 5.6 (Florian) - sata_nv regression fix that is also marked for stable (Sascha)" * tag 'libata-5.5-20191226' of git://git.kernel.dk/linux-block: ata: ahci_brcm: Add missing clock management during recovery ata: ahci_brcm: BCM7425 AHCI requires AHCI_HFLAG_DELAY_ENGINE ata: ahci_brcm: Fix AHCI resources management ata: libahci_platform: Export again ahci_platform_<en/dis>able_phys() libata: Fix retrieving of active qcs
2019-12-27Merge tag 'block-5.5-20191226' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Only thing here are the changes from Arnd from last week, which now have the appropriate header include to ensure they actually compile if COMPAT is enabled" * tag 'block-5.5-20191226' of git://git.kernel.dk/linux-block: compat_ioctl: block: handle Persistent Reservations compat_ioctl: block: handle add zone open, close and finish ioctl compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE pktcdvd: fix regression on 64-bit architectures
2019-12-27Merge tag 'gpio-v5.5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "A set of fixes for the v5.5 series: - Fix the build for the Xtensa driver. - Make sure to set up the parent device for mpc8xxx. - Clarify the look-up error message. - Fix the usage of the line direction in the mockup device. - Fix a type warning on the Aspeed driver. - Remove the pointless __exit annotation on the xgs-iproc which is causing a compilation problem. - Fix up emultation of open drain outputs .get_direction() - Fix the IRQ callbacks on the PCA953xx to use bitops and work properly. - Fix the Kconfig on the Tegra driver" * tag 'gpio-v5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: tegra186: Allow building on Tegra194-only configurations gpio: pca953x: Switch to bitops in IRQ callbacks gpiolib: fix up emulated open drain outputs MAINTAINERS: Append missed file to the database gpio: xgs-iproc: remove __exit annotation for iproc_gpio_remove gpio: aspeed: avoid return type warning gpio: mockup: Fix usage of new GPIO_LINE_DIRECTION gpio: Fix error message on out-of-range GPIO in lookup table gpio: mpc8xxx: Add platform device to gpiochip->parent gpio: xtensa: fix driver build