summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-06Merge branch 'for-4.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Another fix for a really old bug. It only affects drain_workqueue() which isn't used often and even then triggers only during a pretty small race window, so it isn't too surprising that it stayed hidden for so long. The fix is straight-forward and low-risk. Kudos to Li Bin for reporting and fixing the bug" * 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix NULL pointer dereference
2017-11-06scripts: add leaking_addresses.plTobin C. Harding
Currently we are leaking addresses from the kernel to user space. This script is an attempt to find some of those leakages. Script parses `dmesg` output and /proc and /sys files for hex strings that look like kernel addresses. Only works for 64 bit kernels, the reason being that kernel addresses on 64 bit kernels have 'ffff' as the leading bit pattern making greping possible. On 32 kernels we don't have this luxury. Scripts is _slightly_ smarter than a straight grep, we check for false positives (all 0's or all 1's, and vsyscall start/finish addresses). [ I think there is a lot of room for improvement here, but it's already useful, so I'm merging it as-is. The whole "hash %p format" series is expected to go into 4.15, but will not fix %x users, and will not incentivize people to look at what they are leaking. - Linus ] Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-06ALSA: seq: Avoid invalid lockdep class warningTakashi Iwai
The recent fix for adding rwsem nesting annotation was using the given "hop" argument as the lock subclass key. Although the idea itself works, it may trigger a kernel warning like: BUG: looking up invalid subclass: 8 .... since the lockdep has a smaller number of subclasses (8) than we currently allow for the hops there (10). The current definition is merely a sanity check for avoiding the too deep delivery paths, and the 8 hops are already enough. So, as a quick fix, just follow the max hops as same as the max lockdep subclasses. Fixes: 1f20f9ff57ca ("ALSA: seq: Fix nested rwsem annotation for lockdep splat") Reported-by: syzbot <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-06Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes an unaligned panic in x86/sha-mb and a bug in ccm that triggers with certain underlying implementations" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ccm - preserve the IV buffer crypto: x86/sha1-mb - fix panic due to unaligned access crypto: x86/sha256-mb - fix panic due to unaligned access
2017-11-06netfilter: conntrack: move nf_ct_netns_{get,put}() to corePablo Neira Ayuso
So we can call this from other expression that need conntrack in place to work. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2017-11-06netfilter: conntrack: don't cache nlattr_tuple_size result in nla_sizeFlorian Westphal
We currently call ->nlattr_tuple_size() once at register time and cache result in l4proto->nla_size. nla_size is the only member that is written to, avoiding this would allow to make l4proto trackers const. We can use ->nlattr_tuple_size() at run time, and cache result in the individual trackers instead. This is an intermediate step, next patch removes nlattr_size() callback and computes size at compile time, then removes nla_size. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: nft_hash: fix nft_hash_deactivateFlorian Westphal
Jindřich Makovička says: The logical OR looks fishy to me. Shouldn't be && there instead? Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1199 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: xt_connlimit: remove mask argumentFlorian Westphal
Instead of passing mask to all the helpers, just fixup the search key early. After rbtree conversion, each rbtree node stores connections of same 'addr & mask', so no need to pass the mask too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ebtables: clean up initialization of bufColin Ian King
buf is initialized to buf_start and then set on the next statement to buf_start + offsets[i]. Clean this up to just initialize buf to buf_start + offsets[i] to clean up the clang build warning: "Value stored to 'buf' during its initialization is never read" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Fix inappropriate output of procfsKUWAZAWA Takuya
Information about ipvs in different network namespace can be seen via procfs. How to reproduce: # ip netns add ns01 # ip netns add ns02 # ip netns exec ns01 ip a add dev lo 127.0.0.1/8 # ip netns exec ns02 ip a add dev lo 127.0.0.1/8 # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80 # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80 The ipvsadm displays information about its own network namespace only. # ip netns exec ns01 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.1:80 wlc # ip netns exec ns02 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.2:80 wlc But I can see information about other network namespace via procfs. # ip netns exec ns01 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010101:0050 wlc TCP 0A010102:0050 wlc # ip netns exec ns02 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010102:0050 wlc Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Use %pS printk format for direct addressesHelge Deller
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06ALSA: timer: Limit max instances per timerTakashi Iwai
Currently we allow unlimited number of timer instances, and it may bring the system hogging way too much CPU when too many timer instances are opened and processed concurrently. This may end up with a soft-lockup report as triggered by syzkaller, especially when hrtimer backend is deployed. Since such insane number of instances aren't demanded by the normal use case of ALSA sequencer and it merely opens a risk only for abuse, this patch introduces the upper limit for the number of instances per timer backend. As default, it's set to 1000, but for the fine-grained timer like hrtimer, it's set to 100. Reported-by: syzbot Tested-by: Jérôme Glisse <jglisse@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-11-05Linux 4.14-rc8v4.14-rc8Linus Torvalds
2017-11-05Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: - A PCID related revert that fixes power management and performance regressions. - The module loader robustization and sanity check commit is rather fresh, but it looked like a good idea to apply because of the hidden data corruption problem such invalid modules could cause" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/module: Detect and skip invalid relocations Revert "x86/mm: Stop calling leave_mm() in idle code"
2017-11-05Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fix from Ingo Molnar: "Fix an RCU warning that triggers when /dev/mcelog is used" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mcelog: Get rid of RCU remnants
2017-11-05Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Various fixes: - synchronize kernel and tooling headers - cgroup support fix - two tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/headers: Synchronize kernel ABI headers perf/cgroup: Fix perf cgroup hierarchy support perf tools: Unwind properly location after REJECT perf symbols: Fix memory corruption because of zero length symbols
2017-11-05Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "An irqchip driver init fix" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-mvebu-gicp: Add missing spin_lock init
2017-11-05Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Ingo Molnar: - workaround for gcc asm handling - futex race fixes - objtool build warning fix - two watchdog fixes: a crash fix (revert) and a bug fix for /proc/sys/kernel/watchdog_thresh handling. * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Prevent GCC from merging annotate_unreachable(), take 2 objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version watchdog/hardlockup/perf: Use atomics to track in-use cpu counter watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") futex: Fix more put_pi_state() vs. exit_pi_state_list() races
2017-11-05Merge tag 'enforcement-4.14-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull enforcement statement update from Greg KH: "Documentation: enforcement-statement: name updates Here are 12 patches for the kernel-enforcement-statement.rst file that add new names, fix the ordering of them, remove a duplicate, and remove some company markings that wished to be removed. All of these have passed the 0-day testing, even-though it is just a documentation file update :)" * tag 'enforcement-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Documentation: Add Frank Rowand to list of enforcement statement endorsers doc: add Willy Tarreau to the list of enforcement statement endorsers Documentation: Add Tim Bird to list of enforcement statement endorsers Documentation: Add my name to kernel enforcement statement Documentation: kernel-enforcement-statement.rst: proper sort names Documentation: Add Arm Ltd to kernel-enforcement-statement.rst Documentation: kernel-enforcement-statement.rst: Remove Red Hat markings Documentation: Add myself to the enforcement statement list Documentation: Sign kernel enforcement statement Add ack for Trond Myklebust to the enforcement statement Documentation: update kernel enforcement support list Documentation: add my name to supporters
2017-11-05Merge branch 'eBPF-based-device-cgroup-controller'David S. Miller
Roman Gushchin says: ==================== eBPF-based device cgroup controller This patchset introduces an eBPF-based device controller for cgroup v2. Patches (1) and (2) are a preparational work required to share some code with the existing device controller implementation. Patch (3) is the main patch, which introduces a new bpf prog type and all necessary infrastructure. Patch (4) moves cgroup_helpers.c/h to use them by patch (4). Patch (5) implements an example of eBPF program which controls access to device files and corresponding userspace test. v3: Renamed constants introduced by patch (3) to BPF_DEVCG_* v2: Added patch (1). v1: https://lkml.org/lkml/2017/11/1/363 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05selftests/bpf: add a test for device cgroup controllerRoman Gushchin
Add a test for device cgroup controller. The test loads a simple bpf program which logs all device access attempts using trace_printk() and forbids all operations except operations with /dev/zero and /dev/urandom. Then the test creates and joins a test cgroup, and attaches the bpf program to it. Then it tries to perform some simple device operations and checks the result: create /dev/null (should fail) create /dev/zero (should pass) copy data from /dev/urandom to /dev/zero (should pass) copy data from /dev/urandom to /dev/full (should fail) copy data from /dev/random to /dev/zero (should fail) Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/Roman Gushchin
The purpose of this move is to use these files in bpf tests. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05bpf, cgroup: implement eBPF-based device controller for cgroup v2Roman Gushchin
Cgroup v2 lacks the device controller, provided by cgroup v1. This patch adds a new eBPF program type, which in combination of previously added ability to attach multiple eBPF programs to a cgroup, will provide a similar functionality, but with some additional flexibility. This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type. A program takes major and minor device numbers, device type (block/character) and access type (mknod/read/write) as parameters and returns an integer which defines if the operation should be allowed or terminated with -EPERM. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05device_cgroup: prepare code for bpf-based device controllerRoman Gushchin
This is non-functional change to prepare the device cgroup code for adding eBPF-based controller for cgroups v2. The patch performs the following changes: 1) __devcgroup_inode_permission() and devcgroup_inode_mknod() are moving to the device-cgroup.h and converting into static inline. 2) __devcgroup_check_permission() is exported. 3) devcgroup_check_permission() wrapper is introduced to be used by both existing and new bpf-based implementations. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constantsRoman Gushchin
Rename device type and access type constants defined in security/device_cgroup.c by adding the DEVCG_ prefix. The reason behind this renaming is to make them global namespace friendly, as they will be moved to the corresponding header file by following patches. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05Merge tag 'mlx5-updates-2017-11-04' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-11-04 This series includes: From Huy: dscp to priority mapping for Ethernet packet. =================================================== First six patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. =================================================== Plus to the dscp series, some small misc changes are include as well: From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic From Or Gerlitz, Enlarge the NIC TC offload table size From Rabie, Initialize destination_flow struct to 0 From Feras, Add inner TTC table to IPoIB flow steering From Tal, Enable CQE based moderation on TX CQ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05Merge branch 'nfp-ethtool-and-related-improvements'David S. Miller
Simon Horman says: ==================== nfp: ethtool and related improvements Dirk van der Merwe says: This patch series throws a couple of loosely related items into a single series. Patch 1: Clang compilation fix reported by Matthias Kaehlcke <mka@chromium.org> Patch 2: Driver can now do MAC reinit on load when there has been a media override set in the NSP. Patch 3: Refactor the nfp_app_reprs_set API. Patch 4: Similar to vNICs, representors must be able to deal with media override changes in the NSP. Patch 5: Since representors can now handle media overrides, we can allocate the get/set link ndo's to them. Patch 6 & 7: Add support for FEC mode modification. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: implement ethtool FEC mode settingsDirk van der Merwe
Add support in the driver ethtool ops to modify the NFP FEC modes. The FEC modes can be set for vNIC associated with physical ports or for MAC representor netdevs. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: add helpers for FEC supportDirk van der Merwe
Implement helpers to determine and modify FEC modes via the NSP. The NSP advertises FEC capabilities on a per port basis and provides support for: * Auto mode selection * Reed Solomon * BaseR * None/Off Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: add get/set link settings ndos to representorsDirk van der Merwe
Since it is now safe to modify link settings for representors, we can attach the get/set link settings ndos to it. The get/set link settings are nfp_port based operations. If a port becomes invalid, the representor will be removed in the same way a vnic would be. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: resync repr state when port table syncDirk van der Merwe
If the NSP port table has been refreshed, resync the representor state with the new port information. At the moment, this only entails looking for invalid ports and killing off representors associated with them. The repr instance becomes NULL which is safe since the app accessor function for reprs returns NULL when it cannot access a repr. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: refactor nfp_app_reprs_setDirk van der Merwe
The criteria that reprs cannot be replaced with another new set of reprs has been removed. This check is not needed since the only use case that could exercise this at the moment, would be to modify the number of SRIOV VFs without first disabling them. This case is explicitly disallowed in any case and subsequent patches in this series need to be able to replace the running set of reprs. All cases where the return code used to be checked for the nfp_app_reprs_set function have been removed. As stated above, it is not possible for the current code to encounter a case where reprs exist and need to be replaced. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: make use of MAC reinitJakub Kicinski
Recent management FW images can perform full reinit of MAC cores without requiring a reboot. When loading the driver check if there are changes pending and if so call NSP MAC reinit. Full application FW reload is still required, and all MACs need to be reinited at the same time (not only the ones which have been reconfigured, and thus potentially causing disruption to unrelated netdevs) therefore for now changing MAC config without reloading the driver still remains future work. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05nfp: don't depend on compiler constant propagationJakub Kicinski
Matthias reports: nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to identify the 'mask' parameter as known to be constant at compile time, which is required to use the FIELD_GET() macro. The forced inlining does the trick for gcc, but for kernel builds with clang it results in undefined symbols: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function `__nfp_eth_set_aneg': drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787): undefined reference to `__compiletime_assert_492' drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1): undefined reference to `__compiletime_assert_496' These __compiletime_assert_xyx() calls would have been optimized away if the compiler had seen 'mask' as a constant. Add a macro to extract the mask and shift and pass those to nfp_eth_set_bit_config() separately. Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: fix DSACK-based undo on non-duplicate ACKPriyaranjan Jha
Fixes DSACK-based undo when sender is in Open State and an ACK advances snd_una. Example scenario: - Sender goes into recovery and makes some spurious rtx. - It comes out of recovery and enters into open state. - It sends some more packets, let's say 4. - The receiver sends an ACK for the first two, but this ACK is lost. - The sender receives ack for first two, and DSACK for previous spurious rtx. Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yousuk Seung <ysseung@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tcp: higher throughput under reordering with adaptive RACK reordering wndPriyaranjan Jha
Currently TCP RACK loss detection does not work well if packets are being reordered beyond its static reordering window (min_rtt/4).Under such reordering it may falsely trigger loss recoveries and reduce TCP throughput significantly. This patch improves that by increasing and reducing the reordering window based on DSACK, which is now supported in major TCP implementations. It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries. - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded by srtt), since there is possibility that spurious retransmission was due to reordering delay longer than reo_wnd. - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16) no. of successful recoveries (accounts for full DSACK-based loss recovery undo). After that, reset it to default (min_rtt/4). - At max, reo_wnd is incremented only once per rtt. So that the new DSACK on which we are reacting, is due to the spurious retx (approx) after the reo_wnd has been updated last time. - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than absolute value to account for change in rtt. In our internal testing, we observed significant increase in throughput, in scenarios where reordering exceeds min_rtt/4 (previous static value). Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05Merge branch 'dsa-parsing-stage'David S. Miller
Vivien Didelot says: ==================== net: dsa: parsing stage When registering a DSA switch, there is basically two stages. The first stage is the parsing of the switch device, from either device tree or platform data. It fetches the DSA tree to which it belongs, and validates its ports. The switch device is then added to the tree, and the second stage is called if this was the last switch of the tree. The second stage is the setup of the tree, which validates that the tree is complete, sets up the routing tables, the default CPU port for user ports, sets up the switch drivers and finally the master interfaces, which makes the whole switch fabric functional. This patch series covers the first parsing stage. It fixes the type of the switch and tree indexes to unsigned int, simplifies the tree reference counting and the switch and CPU ports parsing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: resolve tagging protocol at parse timeVivien Didelot
Extend the dsa_port_parse_cpu() function to resolve the tagging protocol at port parsing time, instead of waiting for the whole tree to be complete. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: add one port parsing function per typeVivien Didelot
Add dsa_port_parse_user, dsa_port_parse_dsa and dsa_port_parse_cpu functions to factorize the code shared by both OF and pdata parsing. They don't do much for the moment but will be extended later to support tagging protocol resolution for example. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: only check presence of link propertyVivien Didelot
When parsing a port, simply use of_property_read_bool which checks the presence of a given property, instead of parsing the link phandle. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: rework switch parsingVivien Didelot
When parsing a switch, we have to identify to which tree it belongs and parse its ports. Provide two functions to separate the OF and platform data specific paths. Also use the of_property_read_variable_u32_array function to parse the OF member array instead of calling of_property_read_u32_index twice. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: get tree before parsing portsVivien Didelot
We will need a reference to the dsa_switch_tree when parsing a CPU port, so fetch it right after parsing the member and before parsing ports. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: rework switch addition and removalVivien Didelot
This patch removes the unnecessary index argument from the dsa_dst_add_ds and dsa_dst_del_ds functions and renames them to dsa_tree_add_switch and dsa_tree_remove_switch respectively. In addition to a more explicit scope, we now check the presence of an existing switch with the same index directly within dsa_tree_add_switch. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: provide a find or new tree helperVivien Didelot
Rename dsa_get_dst to dsa_tree_find since it doesn't increment the reference counter, rename dsa_add_dst to dsa_tree_alloc for symmetry with dsa_tree_free, and provide a convenient dsa_tree_touch function to find or allocate a new tree. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: get and put tree reference countingVivien Didelot
Provide convenient dsa_tree_get and dsa_tree_put functions scoping a DSA tree used to increment and decrement its reference counter, instead of poking directly its kref structure. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: simplify tree reference countingVivien Didelot
DSA trees have a refcount used to automatically free the dsa_switch_tree structure once there is no switch devices inside of it. The refcount is incremented when a switch is added to the tree, and decremented when it is removed from it. But because of kref_init, the refcount is also incremented at initialization, and when looking up the tree from the list for symmetry. Thus the current code stores the number of switches plus one, and makes the switch registration more complex. To simplify the switch registration function, we reset the refcount to zero after initialization and don't increment it when looking up a tree. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: make tree index unsignedVivien Didelot
Similarly to a DSA switch and port, rename the tree index from "tree" to "index" and make it an unsigned int because it isn't supposed to be less than 0. u32 is an OF specific data used to retrieve the value and has no need to be propagated up to the tree index. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05net: dsa: make switch index unsignedVivien Didelot
Define the DSA switch index as an unsigned int, because it will never be less than 0. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05liquidio: do not consider packets dropped by network stack as driver Rx droppedIntiyaz Basha
netdev->rx_dropped was including packets dropped by napi_gro_receive. If a packet is dropped by network stack, it should not be counted under driver Rx dropped. Made necessary changes to not include network stack drops under netdev->rx_dropped. Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05tools: bpftool: move p_err() and p_info() from main.h to common.cQuentin Monnet
The two functions were declared as static inline in a header file. There is no particular reason why they should be inlined, they just happened to remain in the same header file when they were turned from macros to functions in a precious commit. Make them non-inlined functions and move them to common.c file instead. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>