summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-06tools: hv: Enable network manager for bonding scripts on RHELHaiyang Zhang
We found network manager is necessary on RHEL to make the synthetic NIC, VF NIC bonding operations handled automatically. So, enabling network manager here. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05netlink: Do not schedule work from sk_destructHerbert Xu
It is wrong to schedule a work from sk_destruct using the socket as the memory reserve because the socket will be freed immediately after the return from sk_destruct. Instead we should do the deferral prior to sk_free. This patch does just that. Fixes: 707693c8a498 ("netlink: Call cb->done from a worker thread") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05uapi: export nf_log.hstephen hemminger
File is in uapi directory but not being copied on make install_headers Fixes commit 4ec9c8fbbc22 ("netfilter: nft_log: complete NFTA_LOG_FLAGS attr support"). Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05uapi: export tc_skbmod.hstephen hemminger
Fixes commit 735cffe5d800 ("net_sched: Introduce skbmod action") Not used by iproute2 but maybe in future. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: ep93xx_eth: Do not crash unloading moduleFlorian Fainelli
When we unload the ep93xx_eth, whether we have opened the network interface or not, we will either hit a kernel paging request error, or a simple NULL pointer de-reference because: - if ep93xx_open has been called, we have created a valid DMA mapping for ep->descs, when we call ep93xx_stop, we also call ep93xx_free_buffers, ep->descs now has a stale value - if ep93xx_open has not been called, we have a NULL pointer for ep->descs, so performing any operation against that address just won't work Fix this by adding a NULL pointer check for ep->descs which means that ep93xx_free_buffers() was able to successfully tear down the descriptors and free the DMA cookie as well. Fixes: 1d22e05df818 ("[PATCH] Cirrus Logic ep93xx ethernet driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: calxeda: xgmac: use new api ethtool_{get|set}_link_ksettingsPhilippe Reynes
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'bpf-prog-digest'David S. Miller
Daniel Borkmann says: ==================== Minor BPF cleanups and digest First two patches are minor cleanups, and the third one adds a prog digest. For details, please see individual patches. After this one, I have a set with tracepoint support that makes use of this facility as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bpf: add prog_digest and expose it via fdinfo/netlinkDaniel Borkmann
When loading a BPF program via bpf(2), calculate the digest over the program's instruction stream and store it in struct bpf_prog's digest member. This is done at a point in time before any instructions are rewritten by the verifier. Any unstable map file descriptor number part of the imm field will be zeroed for the hash. fdinfo example output for progs: # cat /proc/1590/fdinfo/5 pos: 0 flags: 02000002 mnt_id: 11 prog_type: 1 prog_jited: 1 prog_digest: b27e8b06da22707513aa97363dfb11c7c3675d28 memlock: 4096 When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. Furthermore, this can also be exposed through other means such as netlink in case of a tc cls/act dump (or xdp in future), but also through tracepoints or other facilities to identify the program. Other than that, the digest can also serve as a base name for the work in progress kallsyms support of programs. The digest doesn't depend/select the crypto layer, since we need to keep dependencies to a minimum. iproute2 will get support for this facility. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bpf, cls: consolidate prog deletion pathDaniel Borkmann
Commit 18cdb37ebf4c ("net: sched: do not use tcf_proto 'tp' argument from call_rcu") removed the last usage of tp from cls_bpf_delete_prog(), so also remove it from the function as argument to not give a wrong impression. tp is illegal to access from this callback, since it could already have been freed. Refactor the deletion code a bit, so that cls_bpf_destroy() can call into the same code for prog deletion as cls_bpf_delete() op, instead of having it unnecessarily duplicated. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bpf: remove type arg from __is_valid_{,xdp_}accessDaniel Borkmann
Commit d691f9e8d440 ("bpf: allow programs to write to certain skb fields") pushed access type check outside of __is_valid_access() to have different restrictions for socket filters and tc programs. type is thus not used anymore within __is_valid_access() and should be removed as a function argument. Same for __is_valid_xdp_access() introduced by 6a773a15a1e8 ("bpf: add XDP prog type for early driver filter"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'ethoc-next'David S. Miller
Florian Fainelli says: ==================== net: ethoc: Misc improvements This patch series fixes/improves a few things: - implement a proper PHYLIB adjust_link callback to set the duplex mode accordingly - do not open code the fetching of a MAC address in OF/DT environments - demote an error message that occurs more frequently than expected in low CPU/memory/bandwidth environments Tested on a Cirrus Logic EP93xx / TS7300 board. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: ethoc: Demote packet dropped error message to debugFlorian Fainelli
Spamming the console with: net eth1: packet dropped can happen fairly frequently if the adapter is busy transmitting, demote the message to a debug print. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: ethoc: Utilize of_get_mac_address()Florian Fainelli
Do not open code getting the MAC address exclusively from the "local-mac-address" property, but instead use of_get_mac_address() which looks up the MAC address using the 3 typical property names. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: ethoc: Account for duplex changesFlorian Fainelli
ethoc_mdio_poll() which is our PHYLIB adjust_link callback does nothing, we should at least react to duplex changes and change MODER accordingly. Speed changes is not a problem, since the OpenCores Ethernet core seems to be reacting okay without us telling it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net_sched: gen_estimator: complete rewrite of rate estimatorsEric Dumazet
1) Old code was hard to maintain, due to complex lock chains. (We probably will be able to remove some kfree_rcu() in callers) 2) Using a single timer to update all estimators does not scale. 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity is not supposed to work well) In this rewrite : - I removed the RB tree that had to be scanned in gen_estimator_active(). qdisc dumps should be much faster. - Each estimator has its own timer. - Estimations are maintained in net_rate_estimator structure, instead of dirtying the qdisc. Minor, but part of the simplification. - Reading the estimator uses RCU and a seqcount to provide proper support for 32bit kernels. - We reduce memory need when estimators are not used, since we store a pointer, instead of the bytes/packets counters. - xt_rateest_mt() no longer has to grab a spinlock. (In the future, xt_rateest_tg() could be switched to per cpu counters) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'bnx2x-fixes'David S. Miller
Yuval Mintz says: ==================== bnx2x: fixes series Two unrelated fixes for bnx2x - the first one is nice-to-have, while the other fixes fatal behaviour in older adapters. Please consider applying them to `net'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnx2x: Prevent tunnel config for 577xxMintz, Yuval
Only the 578xx adapters are capable of configuring UDP ports for the purpose of tunnelling - doing the same on 577xx might lead to a firmware assertion. We're already not claiming support for any related feature for such devices, but we also need to prevent the configuration of the UDP ports to the device in this case. Fixes: f34fa14cc033 ("bnx2x: Add vxlan RSS support") Reported-by: Anikina Anna <anikina@gmail.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnx2x: Correct ringparam estimate when DOWNMintz, Yuval
Until interface is up [and assuming ringparams weren't explicitly configured] when queried for the size of its rings bnx2x would claim they're the maximal size by default. That is incorrect as by default the maximal number of buffers would be equally divided between the various rx rings. This prevents the user from actually setting the number of elements on each rx ring to be of maximal size prior to transitioning the interface into up state. To fix this, make a rough estimation about the number of buffers. It wouldn't always be accurate, but it would be much better than current estimation and would allow users to increase number of buffers during early initialization of the interface. Reported-by: Seymour, Shane <shane.seymour@hpe.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net/sched: cls_flower: Set the filter Hardware device for all use-casesHadar Hen Zion
Check if the returned device from tcf_exts_get_dev function supports tc offload and in case the rule can't be offloaded, set the filter hw_dev parameter to the original device given by the user. The filter hw_device parameter should always be set by fl_hw_replace_filter function, since this pointer is used by dump stats and destroy filter for each flower rule (offloaded or not). Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress Hardware device') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reported-by: Simon Horman <horms@verge.net.au> Tested-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05isdn: hisax: set error code on failurePan Bian
In function hfc4s8s_probe(), the value of return variable err should be negative on failures. However, when the call to request_region() returns NULL, the value of err is 0. This patch fixes the bug, assigning "-EBUSY" to err on the path that request_region() fails. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188931 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: bnx2x: fix improper return valuePan Bian
Macro BNX2X_ALLOC_AND_SET(arr, lbl, func) calls kmalloc() to allocate memory, and jumps to label "lbl" if the allocation fails. Label "lbl" first cleans memory and then returns variable rc. Before calling the macro, the value of variable rc is 0. Because 0 means no error, the callers of bnx2x_init_firmware() may be misled. This patch fixes the bug, assigning "-ENOMEM" to rc before calling macro NX2X_ALLOC_AND_SET(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189141 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: ethernet: qlogic: set error code on failurePan Bian
When calling dma_mapping_error(), the value of return variable rc is 0. And when the call returns an unexpected value, rc is not set to a negative errno. Thus, it will return 0 on the error path, and its callers cannot detect the bug. This patch fixes the bug, assigning "-ENOMEM" to err. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189041 Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05atm: fix improper return valuePan Bian
It returns variable "error" when ioremap_nocache() returns a NULL pointer. The value of "error" is 0 then, which will mislead the callers to believe that there is no error. This patch fixes the bug, returning "-ENOMEM". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=189021 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: irda: set error code on failuresPan Bian
When the calls to kzalloc() fail, the value of return variable ret may be 0. 0 means success in this context. This patch fixes the bug, assigning "-ENOMEM" to ret before calling kzalloc(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188971 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05ipv6: Allow IPv4-mapped address as next-hopErik Nordmark
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop. It is possible to configure IP interfaces with IPv4-mapped addresses, and one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior to this fix the kernel returned an EINVAL when attempting to add an IPv6 route with an IPv4-mapped address as a nexthop/gateway. RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops, thus in order to support that type of address configuration the kernel needs to allow IPv4-mapped addresses as nexthops. Signed-off-by: Erik Nordmark <nordmark@arista.com> Signed-off-by: Bob Gilligan <gilligan@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: caif: remove ineffective checkPan Bian
The check of the return value of sock_register() is ineffective. "if(!err)" seems to be a typo. It is better to propagate the error code to the callers of caif_sktinit_module(). This patch removes the check statment and directly returns the result of sock_register(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188751 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bpf: Preserve const register type on const OR alu opsGianluca Borello
Occasionally, clang (e.g. version 3.8.1) translates a sum between two constant operands using a BPF_OR instead of a BPF_ADD. The verifier is currently not handling this scenario, and the destination register type becomes UNKNOWN_VALUE even if it's still storing a constant. As a result, the destination register cannot be used as argument to a helper function expecting a ARG_CONST_STACK_*, limiting some use cases. Modify the verifier to handle this case, and add a few tests to make sure all combinations are supported, and stack boundaries are still verified even with BPF_OR. Signed-off-by: Gianluca Borello <g.borello@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05r8169: Add support for restarting auto-negotiationFlorian Fainelli
Implement ethtooll::nway_restart by utilizing mii_nway_restart. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2016-12-03 Here's a set of Bluetooth & 802.15.4 patches for net-next (i.e. 4.10 kernel): - Fix for a potential NULL deref in the ieee802154 netlink code - Fix for the ED values of the at86rf2xx driver - Documentation updates to ieee802154 - Cleanups to u8 vs __u8 usage - Timer API usage cleanups in HCI drivers Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: ping: check minimum size on ICMP header lengthKees Cook
Prior to commit c0371da6047a ("put iov_iter into msghdr") in v3.19, there was no check that the iovec contained enough bytes for an ICMP header, and the read loop would walk across neighboring stack contents. Since the iov_iter conversion, bad arguments are noticed, but the returned error is EFAULT. Returning EINVAL is a clearer error and also solves the problem prior to v3.19. This was found using trinity with KASAN on v3.18: BUG: KASAN: stack-out-of-bounds in memcpy_fromiovec+0x60/0x114 at addr ffffffc071077da0 Read of size 8 by task trinity-c2/9623 page:ffffffbe034b9a08 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 0 PID: 9623 Comm: trinity-c2 Tainted: G BU 3.18.0-dirty #15 Hardware name: Google Tegra210 Smaug Rev 1,3+ (DT) Call trace: [<ffffffc000209c98>] dump_backtrace+0x0/0x1ac arch/arm64/kernel/traps.c:90 [<ffffffc000209e54>] show_stack+0x10/0x1c arch/arm64/kernel/traps.c:171 [< inline >] __dump_stack lib/dump_stack.c:15 [<ffffffc000f18dc4>] dump_stack+0x7c/0xd0 lib/dump_stack.c:50 [< inline >] print_address_description mm/kasan/report.c:147 [< inline >] kasan_report_error mm/kasan/report.c:236 [<ffffffc000373dcc>] kasan_report+0x380/0x4b8 mm/kasan/report.c:259 [< inline >] check_memory_region mm/kasan/kasan.c:264 [<ffffffc00037352c>] __asan_load8+0x20/0x70 mm/kasan/kasan.c:507 [<ffffffc0005b9624>] memcpy_fromiovec+0x5c/0x114 lib/iovec.c:15 [< inline >] memcpy_from_msg include/linux/skbuff.h:2667 [<ffffffc000ddeba0>] ping_common_sendmsg+0x50/0x108 net/ipv4/ping.c:674 [<ffffffc000dded30>] ping_v4_sendmsg+0xd8/0x698 net/ipv4/ping.c:714 [<ffffffc000dc91dc>] inet_sendmsg+0xe0/0x12c net/ipv4/af_inet.c:749 [< inline >] __sock_sendmsg_nosec net/socket.c:624 [< inline >] __sock_sendmsg net/socket.c:632 [<ffffffc000cab61c>] sock_sendmsg+0x124/0x164 net/socket.c:643 [< inline >] SYSC_sendto net/socket.c:1797 [<ffffffc000cad270>] SyS_sendto+0x178/0x1d8 net/socket.c:1761 CVE-2016-8399 Reported-by: Qidan He <i@flanker017.me> Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'tcp-tsq-perf'David S. Miller
Eric Dumazet says: ==================== tcp: tsq: performance series Under very high TX stress, CPU handling NIC TX completions can spend considerable amount of cycles handling TSQ (TCP Small Queues) logic. This patch series avoids some atomic operations, but most notable patch is the 3rd one, allowing other cpus processing ACK packets and calling tcp_write_xmit() to grab TCP_TSQ_DEFERRED so that tcp_tasklet_func() can skip already processed sockets. This avoid lots of lock acquisitions and cache lines accesses, particularly under load. In v2, I added : - tcp_small_queue_check() change to allow 1st and 2nd packets in write queue to be sent, even in the case TX completion of already acknowledged packets did not happen yet. This helps when TX completion coalescing parameters are set even to insane values, and/or busy polling is used. - A reorganization of struct sock fields to lower false sharing and increase data locality. - Then I moved tsq_flags from tcp_sock to struct sock also to reduce cache line misses during TX completions. I measured an overall throughput gain of 22 % for heavy TCP use over a single TX queue. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: move tsq_flags close to sk_wmem_allocEric Dumazet
tsq_flags being in the same cache line than sk_wmem_alloc makes a lot of sense. Both fields are changed from tcp_wfree() and more generally by various TSQ related functions. Prior patch made room in struct sock and added sk_tsq_flags, this patch deletes tsq_flags from struct tcp_sock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: reorganize struct sock for better data localityEric Dumazet
Group fields used in TX path, and keep some cache lines mostly read to permit sharing among cpus. Gained two 4 bytes holes on 64bit arches. Added a place holder for tcp tsq_flags, next to sk_wmem_alloc to speed up tcp_wfree() in the following patch. I have not added ____cacheline_aligned_in_smp, this might be done later. I prefer doing this once inet and tcp/udp sockets reorg is also done. Tested with both TCP and UDP. UDP receiver performance under flood increased by ~20 % : Accessing sk_filter/sk_wq/sk_napi_id no longer stalls because sk_drops was moved away from a critical cache line, now mostly read and shared. /* --- cacheline 4 boundary (256 bytes) --- */ unsigned int sk_napi_id; /* 0x100 0x4 */ int sk_rcvbuf; /* 0x104 0x4 */ struct sk_filter * sk_filter; /* 0x108 0x8 */ union { struct socket_wq * sk_wq; /* 0x8 */ struct socket_wq * sk_wq_raw; /* 0x8 */ }; /* 0x110 0x8 */ struct xfrm_policy * sk_policy[2]; /* 0x118 0x10 */ struct dst_entry * sk_rx_dst; /* 0x128 0x8 */ struct dst_entry * sk_dst_cache; /* 0x130 0x8 */ atomic_t sk_omem_alloc; /* 0x138 0x4 */ int sk_sndbuf; /* 0x13c 0x4 */ /* --- cacheline 5 boundary (320 bytes) --- */ int sk_wmem_queued; /* 0x140 0x4 */ atomic_t sk_wmem_alloc; /* 0x144 0x4 */ long unsigned int sk_tsq_flags; /* 0x148 0x8 */ struct sk_buff * sk_send_head; /* 0x150 0x8 */ struct sk_buff_head sk_write_queue; /* 0x158 0x18 */ __s32 sk_peek_off; /* 0x170 0x4 */ int sk_write_pending; /* 0x174 0x4 */ long int sk_sndtimeo; /* 0x178 0x8 */ Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tcp_mtu_probe() is likely to exit earlyEric Dumazet
Adding a likely() in tcp_mtu_probe() moves its code which used to be inlined in front of tcp_write_xmit() We still have a cache line miss to access icsk->icsk_mtup.enabled, we will probably have to reorganize fields to help data locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: add a shortcut in tcp_small_queue_check()Eric Dumazet
Always allow the two first skbs in write queue to be sent, regardless of sk_wmem_alloc/sk_pacing_rate values. This helps a lot in situations where TX completions are delayed either because of driver latencies or softirq latencies. Test is done with no cache line misses. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: avoid one atomic in tcp_wfree()Eric Dumazet
Under high load, tcp_wfree() has an atomic operation trying to schedule a tasklet over and over. We can schedule it only if our per cpu list was empty. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: add shortcut in tcp_tasklet_func()Eric Dumazet
Under high stress, I've seen tcp_tasklet_func() consuming ~700 usec, handling ~150 tcp sockets. By setting TCP_TSQ_DEFERRED in tcp_wfree(), we give a chance for other cpus/threads entering tcp_write_xmit() to grab it, allowing tcp_tasklet_func() to skip sockets that already did an xmit cycle. In the future, we might give to ACK processing an increased budget to reduce even more tcp_tasklet_func() amount of work. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: remove one locked operation in tcp_wfree()Eric Dumazet
Instead of atomically clear TSQ_THROTTLED and atomically set TSQ_QUEUED bits, use one cmpxchg() to perform a single locked operation. Since the following patch will also set TCP_TSQ_DEFERRED here, this cmpxchg() will make this addition free. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05tcp: tsq: add tsq_flags / tsq_enumEric Dumazet
This is a cleanup, to ease code review of following patches. Old 'enum tsq_flags' is renamed, and a new enumeration is added with the flags used in cmpxchg() operations as opposed to single bit operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge tag 'powerpc-4.9-7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Four fixes, the first for code we merged this cycle and three that are also going to stable: - On 64-bit Book3E we were not placing the .text section where we said we would in the asm. - We broke building the boot wrapper on some 32-bit toolchains. - Lazy icache flushing was broken on pre-POWER5 machines. - One of the error paths in our EEH code would lead to a deadlock. Thanks to: Andrew Donnellan, Ben Hutchings, Benjamin Herrenschmidt, Nicholas Piggin" * tag 'powerpc-4.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix placement of .text to be immediately following .head.text powerpc/eeh: Fix deadlock when PE frozen state can't be cleared powerpc/mm: Fix lazy icache flush on pre-POWER5 powerpc/boot: Fix build failure in 32-bit boot wrapper
2016-12-05atm: lanai: set error code when ioremap failsPan Bian
In function lanai_dev_open(), when the call to ioremap() fails, the value of return variable result is 0. 0 means no error in this context. This patch fixes the bug, assigning "-ENOMEM" to result when ioremap() returns a NULL pointer. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188791 Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: usb: set error code when usb_alloc_urb failsPan Bian
In function lan78xx_probe(), variable ret takes the errno code on failures. However, when the call to usb_alloc_urb() fails, its value will keeps 0. 0 indicates success in the context, which is inconsistent with the execution result. This patch fixes the bug, assigning "-ENOMEM" to ret when usb_alloc_urb() returns a NULL pointer. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188771 Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: bridge: set error code on failurePan Bian
Function br_sysfs_addbr() does not set error code when the call kobject_create_and_add() returns a NULL pointer. It may be better to return "-ENOMEM" when kobject_create_and_add() fails. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188781 Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05net: af_mpls.c add space before open parenthesisSuraj Deshmukh
Adding space after switch keyword before open parenthesis for readability purpose. This patch fixes the checkpatch.pl warning: space required before the open parenthesis '(' Signed-off-by: Suraj Deshmukh <surajssd009005@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05netdev: broadcom: propagate error codePan Bian
Function bnxt_hwrm_stat_ctx_alloc() always returns 0, even if the call to _hwrm_send_message() fails. It may be better to propagate the errors to the caller of bnxt_hwrm_stat_ctx_alloc(). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188661 Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05Merge branch 'bnxt_en-dcbnl'David S. Miller
Michael Chan says: ==================== bnxt_en: Add DCBNL support. This series adds DCBNL operations to support host-based IEEE DCBX. v2: Updated to the latest firmware interface spec. David, please consider this series for net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Add PFC statistics.Michael Chan
Report PFC statistics to ethtool -S and DCBNL. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Implement DCBNL to support host-based DCBX.Michael Chan
Support only IEEE DCBX initially. Add IEEE DCBNL ops and functions to get and set the hardware DCBX parameters. The DCB code is conditional on Kconfig CONFIG_BNXT_DCB. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Update firmware header file to latest 1.6.0.Michael Chan
Latest interface has the latest DCB command structs. Get and store the max number of lossless TCs the hardware can support. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-05bnxt_en: Re-factor bnxt_setup_tc().Michael Chan
Add a new function bnxt_setup_mq_tc() to handle MQPRIO. This new function will be called during ETS setup when we add DCBNL in the next patch. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>