summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-15libbpf: Add generic bpf_program__attach()Andrii Nakryiko
Generalize BPF program attaching and allow libbpf to auto-detect type (and extra parameters, where applicable) and attach supported BPF program types based on program sections. Currently this is supported for: - kprobe/kretprobe; - tracepoint; - raw tracepoint; - tracing programs (typed raw TP/fentry/fexit). More types support can be trivially added within this framework. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-3-andriin@fb.com
2019-12-15libbpf: Don't require root for bpf_object__open()Andrii Nakryiko
Reorganize bpf_object__open and bpf_object__load steps such that bpf_object__open doesn't need root access. This was previously done for feature probing and BTF sanitization. This doesn't have to happen on open, though, so move all those steps into the load phase. This is important, because it makes it possible for tools like bpftool, to just open BPF object file and inspect their contents: programs, maps, BTF, etc. For such operations it is prohibitive to require root access. On the other hand, there is a lot of custom libbpf logic in those steps, so its best avoided for tools to reimplement all that on their own. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-2-andriin@fb.com
2019-12-15libbpf: Fix readelf output parsing for FedoraThadeu Lima de Souza Cascardo
Fedora binutils has been patched to show "other info" for a symbol at the end of the line. This was done in order to support unmaintained scripts that would break with the extra info. [1] [1] https://src.fedoraproject.org/rpms/binutils/c/b8265c46f7ddae23a792ee8306fbaaeacba83bf8 This in turn has been done to fix the build of ruby, because of checksec. [2] Thanks Michael Ellerman for the pointer. [2] https://bugzilla.redhat.com/show_bug.cgi?id=1479302 As libbpf Makefile is not unmaintained, we can simply deal with either output format, by just removing the "other info" field, as it always comes inside brackets. Fixes: 3464afdf11f9 (libbpf: Fix readelf output parsing on powerpc with recent binutils) Reported-by: Justin Forbes <jmforbes@linuxtx.org> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Link: https://lore.kernel.org/bpf/20191213101114.GA3986@calabresa
2019-12-15Merge branch 'bpftool-match-by-name'Alexei Starovoitov
Paul Chaignon says: ==================== When working with frequently modified BPF programs, both the ID and the tag may change. bpftool currently doesn't provide a "stable" way to match such programs. This patchset allows bpftool to match programs and maps by name. When given a tag that matches several programs, bpftool currently only considers the first match. The first patch changes that behavior to either process all matching programs (for the show and dump commands) or error out. The second patch implements program lookup by name, with the same behavior as for tags in case of ambiguity. The last patch implements map lookup by name. Changelogs: Changes in v2: - Fix buffer overflow after realloc. - Add example output to commit message. - Properly close JSON arrays on errors. - Fix style errors (line breaks, for loops, exit labels, type for tagname). - Move do_show code for argc == 2 to do_show_subset functions. - Rebase. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-15bpftool: Match maps by namePaul Chaignon
This patch implements lookup by name for maps and changes the behavior of lookups by tag to be consistent with prog subcommands. Similarly to program subcommands, the show and dump commands will return all maps with the given name (or tag), whereas other commands will error out if several maps have the same name (resp. tag). When a map has BTF info, it is dumped in JSON with available BTF info. This patch requires that all matched maps have BTF info before switching the output format to JSON. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/8de1c9f273860b3ea1680502928f4da2336b853e.1576263640.git.paul.chaignon@gmail.com
2019-12-15bpftool: Match programs by namePaul Chaignon
When working with frequently modified BPF programs, both the ID and the tag may change. bpftool currently doesn't provide a "stable" way to match such programs. This patch implements lookup by name for programs. The show and dump commands will return all programs with the given name, whereas other commands will error out if several programs have the same name. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Link: https://lore.kernel.org/bpf/b5fc1a5dcfaeb5f16fc80295cdaa606dd2d91534.1576263640.git.paul.chaignon@gmail.com
2019-12-15bpftool: Match several programs with same tagPaul Chaignon
When several BPF programs have the same tag, bpftool matches only the first (in ID order). This patch changes that behavior such that dump and show commands return all matched programs. Commands that require a single program (e.g., pin and attach) will error out if given a tag that matches several. bpftool prog dump will also error out if file or visual are given and several programs have the given tag. In the case of the dump command, a program header is added before each dump only if the tag matches several programs; this patch doesn't change the output if a single program matches. The output when several programs match thus looks as follows. $ ./bpftool prog dump xlated tag 6deef7357e7b4530 3: cgroup_skb tag 6deef7357e7b4530 gpl 0: (bf) r6 = r1 [...] 7: (95) exit 4: cgroup_skb tag 6deef7357e7b4530 gpl 0: (bf) r6 = r1 [...] 7: (95) exit Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/fb1fe943202659a69cd21dd5b907c205af1e1e22.1576263640.git.paul.chaignon@gmail.com
2019-12-13selftests/bpf: Test wire_len/gso_segs in BPF_PROG_TEST_RUNStanislav Fomichev
Make sure we can pass arbitrary data in wire_len/gso_segs. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191213223028.161282-2-sdf@google.com
2019-12-13bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUNStanislav Fomichev
wire_len should not be less than real len and is capped by GSO_MAX_SIZE. gso_segs is capped by GSO_MAX_SEGS. v2: * set wire_len to skb->len when passed wire_len is 0 (Alexei Starovoitov) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191213223028.161282-1-sdf@google.com
2019-12-13Merge branch 'bpf-dispatcher'Alexei Starovoitov
Björn Töpel says: ==================== Overview ======== This is the 6th iteration of the series that introduces the BPF dispatcher, which is a mechanism to avoid indirect calls. The BPF dispatcher is a multi-way branch code generator, targeted for BPF programs. E.g. when an XDP program is executed via the bpf_prog_run_xdp(), it is invoked via an indirect call. With retpolines enabled, the indirect call has a substantial performance impact. The dispatcher is a mechanism that transform indirect calls to direct calls, and therefore avoids the retpoline. The dispatcher is generated using the BPF JIT, and relies on text poking provided by bpf_arch_text_poke(). The dispatcher hijacks a trampoline function it via the __fentry__ nop of the trampoline. One dispatcher instance currently supports up to 48 dispatch points. This can be extended in the future. In this series, only one dispatcher instance is supported, and the only user is XDP. The dispatcher is updated when an XDP program is attached/detached to/from a netdev. An alternative to this could have been to update the dispatcher at program load point, but as there are usually more XDP programs loaded than attached, so the latter was picked. The XDP dispatcher is always enabled, if available, because it helps even when retpolines are disabled. Please refer to the "Performance" section below. The first patch refactors the image allocation from the BPF trampoline code. Patch two introduces the dispatcher, and patch three adds a dispatcher for XDP, and wires up the XDP control-/ fast-path. Patch four adds the dispatcher to BPF_TEST_RUN. Patch five adds a simple selftest, and the last adds alignment to jump targets. I have rebased the series on commit 679152d3a32e ("libbpf: Fix printf compilation warnings on ppc64le arch"). Generated code, x86-64 ====================== The dispatcher currently has a maximum of 48 entries, where one entry is a unique BPF program. Multiple users of a dispatcher instance using the same BPF program will share that entry. The program/slot lookup is performed by a binary search, O(log n). Let's have a look at the generated code. The trampoline function has the following signature: unsigned int tramp(const void *ctx, const struct bpf_insn *insnsi, unsigned int (*bpf_func)(const void *, const struct bpf_insn *)) On Intel x86-64 this means that rdx will contain the bpf_func. To, make it easier to read, I've let the BPF programs have the following range: 0xffffffffffffffff (-1) to 0xfffffffffffffff0 (-16). 0xffffffff81c00f10 is the retpoline thunk, in this case __x86_indirect_thunk_rdx. If retpolines are disabled the thunk will be a regular indirect call. The minimal dispatcher will then look like this: ffffffffc0002000: cmp rdx,0xffffffffffffffff ffffffffc0002007: je 0xffffffffffffffff ; -1 ffffffffc000200d: jmp 0xffffffff81c00f10 A 16 entry dispatcher looks like this: ffffffffc0020000: cmp rdx,0xfffffffffffffff7 ; -9 ffffffffc0020007: jg 0xffffffffc0020130 ffffffffc002000d: cmp rdx,0xfffffffffffffff3 ; -13 ffffffffc0020014: jg 0xffffffffc00200a0 ffffffffc002001a: cmp rdx,0xfffffffffffffff1 ; -15 ffffffffc0020021: jg 0xffffffffc0020060 ffffffffc0020023: cmp rdx,0xfffffffffffffff0 ; -16 ffffffffc002002a: jg 0xffffffffc0020040 ffffffffc002002c: cmp rdx,0xfffffffffffffff0 ; -16 ffffffffc0020033: je 0xfffffffffffffff0 ; -16 ffffffffc0020039: jmp 0xffffffff81c00f10 ffffffffc002003e: xchg ax,ax ffffffffc0020040: cmp rdx,0xfffffffffffffff1 ; -15 ffffffffc0020047: je 0xfffffffffffffff1 ; -15 ffffffffc002004d: jmp 0xffffffff81c00f10 ffffffffc0020052: nop DWORD PTR [rax+rax*1+0x0] ffffffffc002005a: nop WORD PTR [rax+rax*1+0x0] ffffffffc0020060: cmp rdx,0xfffffffffffffff2 ; -14 ffffffffc0020067: jg 0xffffffffc0020080 ffffffffc0020069: cmp rdx,0xfffffffffffffff2 ; -14 ffffffffc0020070: je 0xfffffffffffffff2 ; -14 ffffffffc0020076: jmp 0xffffffff81c00f10 ffffffffc002007b: nop DWORD PTR [rax+rax*1+0x0] ffffffffc0020080: cmp rdx,0xfffffffffffffff3 ; -13 ffffffffc0020087: je 0xfffffffffffffff3 ; -13 ffffffffc002008d: jmp 0xffffffff81c00f10 ffffffffc0020092: nop DWORD PTR [rax+rax*1+0x0] ffffffffc002009a: nop WORD PTR [rax+rax*1+0x0] ffffffffc00200a0: cmp rdx,0xfffffffffffffff5 ; -11 ffffffffc00200a7: jg 0xffffffffc00200f0 ffffffffc00200a9: cmp rdx,0xfffffffffffffff4 ; -12 ffffffffc00200b0: jg 0xffffffffc00200d0 ffffffffc00200b2: cmp rdx,0xfffffffffffffff4 ; -12 ffffffffc00200b9: je 0xfffffffffffffff4 ; -12 ffffffffc00200bf: jmp 0xffffffff81c00f10 ffffffffc00200c4: nop DWORD PTR [rax+rax*1+0x0] ffffffffc00200cc: nop DWORD PTR [rax+0x0] ffffffffc00200d0: cmp rdx,0xfffffffffffffff5 ; -11 ffffffffc00200d7: je 0xfffffffffffffff5 ; -11 ffffffffc00200dd: jmp 0xffffffff81c00f10 ffffffffc00200e2: nop DWORD PTR [rax+rax*1+0x0] ffffffffc00200ea: nop WORD PTR [rax+rax*1+0x0] ffffffffc00200f0: cmp rdx,0xfffffffffffffff6 ; -10 ffffffffc00200f7: jg 0xffffffffc0020110 ffffffffc00200f9: cmp rdx,0xfffffffffffffff6 ; -10 ffffffffc0020100: je 0xfffffffffffffff6 ; -10 ffffffffc0020106: jmp 0xffffffff81c00f10 ffffffffc002010b: nop DWORD PTR [rax+rax*1+0x0] ffffffffc0020110: cmp rdx,0xfffffffffffffff7 ; -9 ffffffffc0020117: je 0xfffffffffffffff7 ; -9 ffffffffc002011d: jmp 0xffffffff81c00f10 ffffffffc0020122: nop DWORD PTR [rax+rax*1+0x0] ffffffffc002012a: nop WORD PTR [rax+rax*1+0x0] ffffffffc0020130: cmp rdx,0xfffffffffffffffb ; -5 ffffffffc0020137: jg 0xffffffffc00201d0 ffffffffc002013d: cmp rdx,0xfffffffffffffff9 ; -7 ffffffffc0020144: jg 0xffffffffc0020190 ffffffffc0020146: cmp rdx,0xfffffffffffffff8 ; -8 ffffffffc002014d: jg 0xffffffffc0020170 ffffffffc002014f: cmp rdx,0xfffffffffffffff8 ; -8 ffffffffc0020156: je 0xfffffffffffffff8 ; -8 ffffffffc002015c: jmp 0xffffffff81c00f10 ffffffffc0020161: nop DWORD PTR [rax+rax*1+0x0] ffffffffc0020169: nop DWORD PTR [rax+0x0] ffffffffc0020170: cmp rdx,0xfffffffffffffff9 ; -7 ffffffffc0020177: je 0xfffffffffffffff9 ; -7 ffffffffc002017d: jmp 0xffffffff81c00f10 ffffffffc0020182: nop DWORD PTR [rax+rax*1+0x0] ffffffffc002018a: nop WORD PTR [rax+rax*1+0x0] ffffffffc0020190: cmp rdx,0xfffffffffffffffa ; -6 ffffffffc0020197: jg 0xffffffffc00201b0 ffffffffc0020199: cmp rdx,0xfffffffffffffffa ; -6 ffffffffc00201a0: je 0xfffffffffffffffa ; -6 ffffffffc00201a6: jmp 0xffffffff81c00f10 ffffffffc00201ab: nop DWORD PTR [rax+rax*1+0x0] ffffffffc00201b0: cmp rdx,0xfffffffffffffffb ; -5 ffffffffc00201b7: je 0xfffffffffffffffb ; -5 ffffffffc00201bd: jmp 0xffffffff81c00f10 ffffffffc00201c2: nop DWORD PTR [rax+rax*1+0x0] ffffffffc00201ca: nop WORD PTR [rax+rax*1+0x0] ffffffffc00201d0: cmp rdx,0xfffffffffffffffd ; -3 ffffffffc00201d7: jg 0xffffffffc0020220 ffffffffc00201d9: cmp rdx,0xfffffffffffffffc ; -4 ffffffffc00201e0: jg 0xffffffffc0020200 ffffffffc00201e2: cmp rdx,0xfffffffffffffffc ; -4 ffffffffc00201e9: je 0xfffffffffffffffc ; -4 ffffffffc00201ef: jmp 0xffffffff81c00f10 ffffffffc00201f4: nop DWORD PTR [rax+rax*1+0x0] ffffffffc00201fc: nop DWORD PTR [rax+0x0] ffffffffc0020200: cmp rdx,0xfffffffffffffffd ; -3 ffffffffc0020207: je 0xfffffffffffffffd ; -3 ffffffffc002020d: jmp 0xffffffff81c00f10 ffffffffc0020212: nop DWORD PTR [rax+rax*1+0x0] ffffffffc002021a: nop WORD PTR [rax+rax*1+0x0] ffffffffc0020220: cmp rdx,0xfffffffffffffffe ; -2 ffffffffc0020227: jg 0xffffffffc0020240 ffffffffc0020229: cmp rdx,0xfffffffffffffffe ; -2 ffffffffc0020230: je 0xfffffffffffffffe ; -2 ffffffffc0020236: jmp 0xffffffff81c00f10 ffffffffc002023b: nop DWORD PTR [rax+rax*1+0x0] ffffffffc0020240: cmp rdx,0xffffffffffffffff ; -1 ffffffffc0020247: je 0xffffffffffffffff ; -1 ffffffffc002024d: jmp 0xffffffff81c00f10 The nops are there to align jump targets to 16 B. Performance =========== The tests were performed using the xdp_rxq_info sample program with the following command-line: 1. XDP_DRV: # xdp_rxq_info --dev eth0 --action XDP_DROP 2. XDP_SKB: # xdp_rxq_info --dev eth0 -S --action XDP_DROP 3. xdp-perf, from selftests/bpf: # test_progs -v -t xdp_perf Run with mitigations=auto ------------------------- Baseline: 1. 21.7 Mpps (21736190) 2. 3.8 Mpps (3837582) 3. 15 ns Dispatcher: 1. 30.2 Mpps (30176320) 2. 4.0 Mpps (4015579) 3. 5 ns Dispatcher (full; walk all entries, and fallback): 1. 22.0 Mpps (21986704) 2. 3.8 Mpps (3831298) 3. 17 ns Run with mitigations=off ------------------------ Baseline: 1. 29.9 Mpps (29875135) 2. 4.1 Mpps (4100179) 3. 4 ns Dispatcher: 1. 30.4 Mpps (30439241) 2. 4.1 Mpps (4109350) 1. 4 ns Dispatcher (full; walk all entries, and fallback): 1. 28.9 Mpps (28903269) 2. 4.1 Mpps (4080078) 3. 5 ns xdp-perf runs, aliged vs non-aligned jump targets ------------------------------------------------- In this test dispatchers of different sizes, with and without jump target alignment, were exercised. As outlined above the function lookup is performed via binary search. This means that depending on the pointer value of the function, it can reside in the upper or lower part of the search table. The performed tests were: 1. aligned, mititations=auto, function entry < other entries 2. aligned, mititations=auto, function entry > other entries 3. non-aligned, mititations=auto, function entry < other entries 4. non-aligned, mititations=auto, function entry > other entries 5. aligned, mititations=off, function entry < other entries 6. aligned, mititations=off, function entry > other entries 7. non-aligned, mititations=off, function entry < other entries 8. non-aligned, mititations=off, function entry > other entries The micro benchmarks showed that alignment of jump target has some positive impact. A reply to this cover letter will contain complete data for all runs. Multiple xdp-perf baseline with mitigations=auto ------------------------------------------------ Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs): 16.69 msec task-clock # 0.984 CPUs utilized ( +- 0.08% ) 2 context-switches # 0.123 K/sec ( +- 1.11% ) 0 cpu-migrations # 0.000 K/sec ( +- 70.68% ) 97 page-faults # 0.006 M/sec ( +- 0.05% ) 49,254,635 cycles # 2.951 GHz ( +- 0.09% ) (12.28%) 42,138,558 instructions # 0.86 insn per cycle ( +- 0.02% ) (36.15%) 7,315,291 branches # 438.300 M/sec ( +- 0.01% ) (59.43%) 1,011,201 branch-misses # 13.82% of all branches ( +- 0.01% ) (83.31%) 15,440,788 L1-dcache-loads # 925.143 M/sec ( +- 0.00% ) (99.40%) 39,067 L1-dcache-load-misses # 0.25% of all L1-dcache hits ( +- 0.04% ) 6,531 LLC-loads # 0.391 M/sec ( +- 0.05% ) 442 LLC-load-misses # 6.76% of all LL-cache hits ( +- 0.77% ) <not supported> L1-icache-loads 57,964 L1-icache-load-misses ( +- 0.06% ) 15,442,496 dTLB-loads # 925.246 M/sec ( +- 0.00% ) 514 dTLB-load-misses # 0.00% of all dTLB cache hits ( +- 0.73% ) (40.57%) 130 iTLB-loads # 0.008 M/sec ( +- 2.75% ) (16.69%) <not counted> iTLB-load-misses ( +- 8.71% ) (0.60%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 0.0169558 +- 0.0000127 seconds time elapsed ( +- 0.07% ) Multiple xdp-perf dispatcher with mitigations=auto -------------------------------------------------- Note that this includes generating the dispatcher. Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs): 4.80 msec task-clock # 0.953 CPUs utilized ( +- 0.06% ) 1 context-switches # 0.258 K/sec ( +- 1.57% ) 0 cpu-migrations # 0.000 K/sec 97 page-faults # 0.020 M/sec ( +- 0.05% ) 14,185,861 cycles # 2.955 GHz ( +- 0.17% ) (50.49%) 45,691,935 instructions # 3.22 insn per cycle ( +- 0.01% ) (99.19%) 8,346,008 branches # 1738.709 M/sec ( +- 0.00% ) 13,046 branch-misses # 0.16% of all branches ( +- 0.10% ) 15,443,735 L1-dcache-loads # 3217.365 M/sec ( +- 0.00% ) 39,585 L1-dcache-load-misses # 0.26% of all L1-dcache hits ( +- 0.05% ) 7,138 LLC-loads # 1.487 M/sec ( +- 0.06% ) 671 LLC-load-misses # 9.40% of all LL-cache hits ( +- 0.73% ) <not supported> L1-icache-loads 56,213 L1-icache-load-misses ( +- 0.08% ) 15,443,735 dTLB-loads # 3217.365 M/sec ( +- 0.00% ) <not counted> dTLB-load-misses (0.00%) <not counted> iTLB-loads (0.00%) <not counted> iTLB-load-misses (0.00%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 0.00503705 +- 0.00000546 seconds time elapsed ( +- 0.11% ) Revisions ========= v4->v5: [1] * Fixed s/xdp_ctx/ctx/ type-o (Toke) * Marked dispatcher trampoline with noinline attribute (Alexei) v3->v4: [2] * Moved away from doing dispatcher lookup based on the trampoline function, to a model where the dispatcher instance is explicitly passed to the bpf_dispatcher_change_prog() (Alexei) v2->v3: [3] * Removed xdp_call, and instead make the dispatcher available to all XDP users via bpf_prog_run_xdp() and dev_xdp_install(). (Toke) * Always enable the dispatcher, if available (Alexei) * Reuse BPF trampoline image allocator (Alexei) * Make sure the dispatcher is exercised in selftests (Alexei) * Only allow one dispatcher, and wire it to XDP v1->v2: [4] * Fixed i386 build warning (kbuild robot) * Made bpf_dispatcher_lookup() static (kbuild robot) * Make sure xdp_call.h is only enabled for builtins * Add xdp_call() to ixgbe, mlx4, and mlx5 RFC->v1: [5] * Improved error handling (Edward and Andrii) * Explicit cleanup (Andrii) * Use 32B with sext cmp (Alexei) * Align jump targets to 16B (Alexei) * 4 to 16 entries (Toke) * Added stats to xdp_call_run() [1] https://lore.kernel.org/bpf/20191211123017.13212-1-bjorn.topel@gmail.com/ [2] https://lore.kernel.org/bpf/20191209135522.16576-1-bjorn.topel@gmail.com/ [3] https://lore.kernel.org/bpf/20191123071226.6501-1-bjorn.topel@gmail.com/ [4] https://lore.kernel.org/bpf/20191119160757.27714-1-bjorn.topel@gmail.com/ [5] https://lore.kernel.org/bpf/20191113204737.31623-1-bjorn.topel@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-13bpf, x86: Align dispatcher branch targets to 16BBjörn Töpel
>From Intel 64 and IA-32 Architectures Optimization Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch targets should be 16-byte aligned. This commits aligns branch targets according to the Intel manual. The nops used to align branch targets make the dispatcher larger, and therefore the number of supported dispatch points/programs are descreased from 64 to 48. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com
2019-12-13selftests: bpf: Add xdp_perf testBjörn Töpel
The xdp_perf is a dummy XDP test, only used to measure the the cost of jumping into a naive XDP program one million times. To build and run the program: $ cd tools/testing/selftests/bpf $ make $ ./test_progs -v -t xdp_perf Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-6-bjorn.topel@gmail.com
2019-12-13bpf: Start using the BPF dispatcher in BPF_TEST_RUNBjörn Töpel
In order to properly exercise the BPF dispatcher, this commit adds BPF dispatcher usage to BPF_TEST_RUN when executing XDP programs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-5-bjorn.topel@gmail.com
2019-12-13bpf, xdp: Start using the BPF dispatcher for XDPBjörn Töpel
This commit adds a BPF dispatcher for XDP. The dispatcher is updated from the XDP control-path, dev_xdp_install(), and used when an XDP program is run via bpf_prog_run_xdp(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com
2019-12-13bpf: Introduce BPF dispatcherBjörn Töpel
The BPF dispatcher is a multi-way branch code generator, mainly targeted for XDP programs. When an XDP program is executed via the bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect call has a substantial performance impact, when retpolines are enabled. The dispatcher transform indirect calls to direct calls, and therefore avoids the retpoline. The dispatcher is generated using the BPF JIT, and relies on text poking provided by bpf_arch_text_poke(). The dispatcher hijacks a trampoline function it via the __fentry__ nop of the trampoline. One dispatcher instance currently supports up to 64 dispatch points. A user creates a dispatcher with its corresponding trampoline with the DEFINE_BPF_DISPATCHER macro. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com
2019-12-13bpf: Move trampoline JIT image allocation to a functionBjörn Töpel
Refactor the image allocation in the BPF trampoline code into a separate function, so it can be shared with the BPF dispatcher in upcoming commits. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-2-bjorn.topel@gmail.com
2019-12-13selftests/bpf: Fix perf_buffer test on systems w/ offline CPUsAndrii Nakryiko
Fix up perf_buffer.c selftest to take into account offline/missing CPUs. Fixes: ee5cf82ce04a ("selftests/bpf: test perf buffer API") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212013621.1691858-1-andriin@fb.com
2019-12-13libbpf: Don't attach perf_buffer to offline/missing CPUsAndrii Nakryiko
It's quite common on some systems to have more CPUs enlisted as "possible", than there are (and could ever be) present/online CPUs. In such cases, perf_buffer creationg will fail due to inability to create perf event on missing CPU with error like this: libbpf: failed to open perf buffer event on cpu #16: No such device This patch fixes the logic of perf_buffer__new() to ignore CPUs that are missing or currently offline. In rare cases where user explicitly listed specific CPUs to connect to, behavior is unchanged: libbpf will try to open perf event buffer on specified CPU(s) anyways. Fixes: fb84b8224655 ("libbpf: add perf buffer API") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212013609.1691168-1-andriin@fb.com
2019-12-13selftests/bpf: Add CPU mask parsing testsAndrii Nakryiko
Add a bunch of test validating CPU mask parsing logic and error handling. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212013559.1690898-1-andriin@fb.com
2019-12-13libbpf: Extract and generalize CPU mask parsing logicAndrii Nakryiko
This logic is re-used for parsing a set of online CPUs. Having it as an isolated piece of code working with input string makes it conveninent to test this logic as well. While refactoring, also improve the robustness of original implementation. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212013548.1690564-1-andriin@fb.com
2019-12-13Merge branch 'reuseport_to_test_progs'Alexei Starovoitov
Jakub Sitnicki says: ==================== This change has been suggested by Martin Lau [0] during a review of a related patch set that extends reuseport tests [1]. Patches 1 & 2 address a warning due to unrecognized section name from libbpf when running reuseport tests. We don't want to carry this warning into test_progs. Patches 3-8 massage the reuseport tests to ease the switch to test_progs framework. The intention here is to show the work. Happy to squash these, if needed. Patches 9-10 do the actual move and conversion to test_progs. Output from a test_progs run after changes pasted below. Thanks, Jakub [0] https://lore.kernel.org/bpf/20191123110751.6729-1-jakub@cloudflare.com/T/#m607d822caeb1eb5db101172821a78cc3896ff1c3 [1] https://lore.kernel.org/bpf/20191123110751.6729-1-jakub@cloudflare.com/T/#m55881bae9fb6e34837d07a0c0a7ffbc138f8d06f ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-13selftests/bpf: Switch reuseport tests for test_progs frameworkJakub Sitnicki
The tests were originally written in abort-on-error style. With the switch to test_progs we can no longer do that. So at the risk of not cleaning up some resource on failure, we now return to the caller on error. That said, failure inside one test should not affect others because we run setup/cleanup before/after every test. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-11-jakub@cloudflare.com
2019-12-13selftests/bpf: Move reuseport tests under prog_tests/Jakub Sitnicki
Do a pure move the show the actual work needed to adapt the tests in subsequent patch at the cost of breaking test_progs build for the moment. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-10-jakub@cloudflare.com
2019-12-13selftests/bpf: Pull up printing the test name into test runnerJakub Sitnicki
Again, prepare for switching reuseport tests to test_progs framework. test_progs framework will print the subtest name for us if we set it. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-9-jakub@cloudflare.com
2019-12-13selftests/bpf: Propagate errors during setup for reuseport testsJakub Sitnicki
Prepare for switching reuseport tests to test_progs framework, where we don't have the luxury to terminate the process on failure. Modify setup helpers to signal failure via the return value with the help of a macro similar to the one currently in use by the tests. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-8-jakub@cloudflare.com
2019-12-13selftests/bpf: Run reuseport tests in a loopJakub Sitnicki
Prepare for switching reuseport tests to test_progs framework. Loop over the tests and perform setup/cleanup for each test separately, remembering that with test_progs we can select tests to run. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-7-jakub@cloudflare.com
2019-12-13selftests/bpf: Unroll the main loop in reuseport testJakub Sitnicki
Prepare for iterating over individual tests without introducing another nested loop in the main test function. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-6-jakub@cloudflare.com
2019-12-13selftests/bpf: Add helpers for getting socket family & type nameJakub Sitnicki
Having string arrays to map socket family & type to a name prevents us from unrolling the test runner loop in the subsequent patch. Introduce helpers that do the same thing. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-5-jakub@cloudflare.com
2019-12-13selftests/bpf: Use sa_family_t everywhere in reuseport testsJakub Sitnicki
Update the only function that is not using sa_family_t in this source file. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-4-jakub@cloudflare.com
2019-12-13selftests/bpf: Let libbpf determine program type from section nameJakub Sitnicki
Now that libbpf can recognize SK_REUSEPORT programs, we no longer have to pass a prog_type hint before loading the object file. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-3-jakub@cloudflare.com
2019-12-13libbpf: Recognize SK_REUSEPORT programs from section nameJakub Sitnicki
Allow loading BPF object files that contain SK_REUSEPORT programs without having to manually set the program type before load if the the section name is set to "sk_reuseport". Makes user-space code needed to load SK_REUSEPORT BPF program more concise. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191212102259.418536-2-jakub@cloudflare.com
2019-12-12libbpf: Fix printf compilation warnings on ppc64le archAndrii Nakryiko
On ppc64le __u64 and __s64 are defined as long int and unsigned long int, respectively. This causes compiler to emit warning when %lld/%llu are used to printf 64-bit numbers. Fix this by casting to size_t/ssize_t with %zu and %zd format specifiers, respectively. v1->v2: - use size_t/ssize_t instead of custom typedefs (Martin). Fixes: 1f8e2bcb2cd5 ("libbpf: Refactor relocation handling") Fixes: abd29c931459 ("libbpf: allow specifying map definitions using BTF") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191212171918.638010-1-andriin@fb.com
2019-12-11bpf, x86, arm64: Enable jit by default when not built as always-onDaniel Borkmann
After Spectre 2 fix via 290af86629b2 ("bpf: introduce BPF_JIT_ALWAYS_ON config") most major distros use BPF_JIT_ALWAYS_ON configuration these days which compiles out the BPF interpreter entirely and always enables the JIT. Also given recent fix in e1608f3fa857 ("bpf: Avoid setting bpf insns pages read-only when prog is jited"), we additionally avoid fragmenting the direct map for the BPF insns pages sitting in the general data heap since they are not used during execution. Latter is only needed when run through the interpreter. Since both x86 and arm64 JITs have seen a lot of exposure over the years, are generally most up to date and maintained, there is more downside in !BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the bpf_jit_kallsyms knob was set to 0 by default since major distros still had /proc/kallsyms addresses exposed to unprivileged user space which is not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net
2019-12-11bpf: Emit audit messages upon successful prog load and unloadDaniel Borkmann
Allow for audit messages to be emitted upon BPF program load and unload for having a timeline of events. The load itself is in syscall context, so additional info about the process initiating the BPF prog creation can be logged and later directly correlated to the unload event. The only info really needed from BPF side is the globally unique prog ID where then audit user space tooling can query / dump all info needed about the specific BPF program right upon load event and enrich the record, thus these changes needed here can be kept small and non-intrusive to the core. Raw example output: # auditctl -D # auditctl -a always,exit -F arch=x86_64 -S bpf # ausearch --start recent -m 1334 ... ---- time->Wed Nov 27 16:04:13 2019 type=PROCTITLE msg=audit(1574867053.120:84664): proctitle="./bpf" type=SYSCALL msg=audit(1574867053.120:84664): arch=c000003e syscall=321 \ success=yes exit=3 a0=5 a1=7ffea484fbe0 a2=70 a3=0 items=0 ppid=7477 \ pid=12698 auid=1001 uid=1001 gid=1001 euid=1001 suid=1001 fsuid=1001 \ egid=1001 sgid=1001 fsgid=1001 tty=pts2 ses=4 comm="bpf" \ exe="/home/jolsa/auditd/audit-testsuite/tests/bpf/bpf" \ subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null) type=UNKNOWN[1334] msg=audit(1574867053.120:84664): prog-id=76 op=LOAD ---- time->Wed Nov 27 16:04:13 2019 type=UNKNOWN[1334] msg=audit(1574867053.120:84665): prog-id=76 op=UNLOAD ... Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/bpf/20191206214934.11319-1-jolsa@kernel.org
2019-12-11bpf: Switch to offsetofend in BPF_PROG_TEST_RUNStanislav Fomichev
Switch existing pattern of "offsetof(..., member) + FIELD_SIZEOF(..., member)' to "offsetofend(..., member)" which does exactly what we need without all the copy-paste. Suggested-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191210191933.105321-1-sdf@google.com
2019-12-11libbpf: Bump libpf current version to v0.0.7Andrii Nakryiko
New development cycles starts, bump to v0.0.7 proactively. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191209224022.3544519-1-andriin@fb.com
2019-12-11ARM: net: bpf: Improve prologue code sequenceRussell King
Improve the prologue code sequence to be able to take advantage of 64-bit stores, changing the code from: push {r4, r5, r6, r7, r8, r9, fp, lr} mov fp, sp sub ip, sp, #80 ; 0x50 sub sp, sp, #600 ; 0x258 str ip, [fp, #-100] ; 0xffffff9c mov r6, #0 str r6, [fp, #-96] ; 0xffffffa0 mov r4, #0 mov r3, r4 mov r2, r0 str r4, [fp, #-104] ; 0xffffff98 str r4, [fp, #-108] ; 0xffffff94 to the tighter: push {r4, r5, r6, r7, r8, r9, fp, lr} mov fp, sp mov r3, #0 sub r2, sp, #80 ; 0x50 sub sp, sp, #600 ; 0x258 strd r2, [fp, #-100] ; 0xffffff9c mov r2, #0 strd r2, [fp, #-108] ; 0xffffff94 mov r2, r0 resulting in a saving of three instructions. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/E1ieH2g-0004ih-Rb@rmk-PC.armlinux.org.uk
2019-12-10cxgb4: add support for high priority filtersShahjada Abul Husain
T6 has a separate region known as high priority filter region that allows classifying packets going through ULD path. So, query firmware for HPFILTER resources and enable the high priority offload filter support when it is available. Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10enetc: remove variable 'tc_max_sized_frame' set but not usedChen Wandun
Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/freescale/enetc/enetc_qos.c: In function enetc_setup_tc_cbs: drivers/net/ethernet/freescale/enetc/enetc_qos.c:195:6: warning: variable tc_max_sized_frame set but not used [-Wunused-but-set-variable] Fixes: c431047c4efe ("enetc: add support Credit Based Shaper(CBS) for hardware offload") Signed-off-by: Chen Wandun <chenwandun@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10nfp: add support for TLV device statsJakub Kicinski
Device stats are currently hard coded in the PCI BAR0 layout. Add a ability to read them from the TLV area instead. Names for the stats are maintained by the driver, and their meaning documented. This allows us to more easily add and remove device stats. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tcp: Cleanup duplicate initialization of sk->sk_state.Kuniyuki Iwashima
When a TCP socket is created, sk->sk_state is initialized twice as TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is always called after the sock_init_data(), so it is not necessary to update sk->sk_state in the tcp_init_sock(). Before v2.1.8, the code of the two functions was in the inet_create(). In the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of initialization of sk->state was duplicated. Signed-off-by: Kuniyuki Iwashima <kuni1840@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10enetc: add software timestampingMichael Walle
Provide a software TX timestamp and add it to the ethtool query interface. skb_tx_timestamp() is also needed if one would like to use PHY timestamping. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10Merge branch 'tipc-introduce-variable-window-congestion-control'David S. Miller
Jon Maloy says: ==================== tipc: introduce variable window congestion control We improve thoughput greatly by introducing a variety of the Reno congestion control algorithm at the link level. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tipc: introduce variable window congestion controlJon Maloy
We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'congestion avoidance', and 'fast recovery' modes. - We introduce hard lower and upper window limits per link, still different and configurable per bearer type. - We introduce a 'slow start theshold' variable, initially set to the maximum window size. - We let a link start at the minimum congestion window, i.e. in slow start mode, and then let is grow rapidly (+1 per rceived ACK) until it reaches the slow start threshold and enters congestion avoidance mode. - In congestion avoidance mode we increment the congestion window for each window-size number of acked packets, up to a possible maximum equal to the configured maximum window. - For each non-duplicate NACK received, we drop back to fast recovery mode, by setting the both the slow start threshold to and the congestion window to (current_congestion_window / 2). - If the timeout handler finds that the transmit queue has not moved since the previous timeout, it drops the link back to slow start and forces a probe containing the last sent sequence number to the sent to the peer, so that this can discover the stale situation. This change does in reality have effect only on unicast ethernet transport, as we have seen that there is no room whatsoever for increasing the window max size for the UDP bearer. For now, we also choose to keep the limits for the broadcast link unchanged and equal. This algorithm seems to give a 50-100% throughput improvement for messages larger than MTU. Suggested-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tipc: eliminate more unnecessary nacks and retransmissionsJon Maloy
When we increase the link tranmsit window we often observe the following scenario: 1) A STATE message bypasses a sequence of traffic packets and arrives far ahead of those to the receiver. STATE messages contain a 'peers_nxt_snt' field to indicate which was the last packet sent from the peer. This mechanism is intended as a last resort for the receiver to detect missing packets, e.g., during very low traffic when there is no packet flow to help early loss detection. 3) The receiving link compares the 'peer_nxt_snt' field to its own 'rcv_nxt', finds that there is a gap, and immediately sends a NACK message back to the peer. 4) When this NACKs arrives at the sender, all the requested retransmissions are performed, since it is a first-time request. Just like in the scenario described in the previous commit this leads to many redundant retransmissions, with decreased throughput as a consequence. We fix this by adding two more conditions before we send a NACK in this sitution. First, the deferred queue must be empty, so we cannot assume that the potential packet loss has already been detected by other means. Second, we check the 'peers_snd_nxt' field only in probe/ probe_reply messages, thus turning this into a true mechanism of last resort as it was really meant to be. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10tipc: eliminate gap indicator from ACK messagesJon Maloy
When we increase the link send window we sometimes observe the following scenario: 1) A packet #N arrives out of order far ahead of a sequence of older packets which are still under way. The packet is added to the deferred queue. 2) The missing packets arrive in sequence, and for each 16th of them an ACK is sent back to the receiver, as it should be. 3) When building those ACK messages, it is checked if there is a gap between the link's 'rcv_nxt' and the first packet in the deferred queue. This is always the case until packet number #N-1 arrives, and a 'gap' indicator is added, effectively turning them into NACK messages. 4) When those NACKs arrive at the sender, all the requested retransmissions are done, since it is a first-time request. This sometimes leads to a huge amount of redundant retransmissions, causing a drop in max throughput. This problem gets worse when we in a later commit introduce variable window congestion control, since it drops the link back to 'fast recovery' much more often than necessary. We now fix this by not sending any 'gap' indicator in regular ACK messages. We already have a mechanism for sending explicit NACKs in place, and this is sufficient to keep up the packet flow. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09ppp: Adjust indentation into ppp_async_inputNathan Chancellor
Clang warns: ../drivers/net/ppp/ppp_async.c:877:6: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] ap->rpkt = skb; ^ ../drivers/net/ppp/ppp_async.c:875:5: note: previous statement is here if (!skb) ^ 1 warning generated. This warning occurs because there is a space before the tab on this line. Clean up this entire block's indentation so that it is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 6722e78c9005 ("[PPP]: handle misaligned accesses") Link: https://github.com/ClangBuiltLinux/linux/issues/800 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09net: smc911x: Adjust indentation in smc911x_phy_configureNathan Chancellor
Clang warns: ../drivers/net/ethernet/smsc/smc911x.c:939:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] if (!lp->ctl_rfduplx) ^ ../drivers/net/ethernet/smsc/smc911x.c:936:2: note: previous statement is here if (lp->ctl_rspeed != 100) ^ 1 warning generated. This warning occurs because there is a space after the tab on this line. Remove it so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. Fixes: 0a0c72c9118c ("[PATCH] RE: [PATCH 1/1] net driver: Add support for SMSC LAN911x line of ethernet chips") Link: https://github.com/ClangBuiltLinux/linux/issues/796 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09net: tulip: Adjust indentation in {dmfe, uli526x}_init_moduleNathan Chancellor
Clang warns: ../drivers/net/ethernet/dec/tulip/uli526x.c:1812:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] switch (mode) { ^ ../drivers/net/ethernet/dec/tulip/uli526x.c:1809:2: note: previous statement is here if (cr6set) ^ 1 warning generated. ../drivers/net/ethernet/dec/tulip/dmfe.c:2217:3: warning: misleading indentation; statement is not part of the previous 'if' [-Wmisleading-indentation] switch(mode) { ^ ../drivers/net/ethernet/dec/tulip/dmfe.c:2214:2: note: previous statement is here if (cr6set) ^ 1 warning generated. This warning occurs because there is a space before the tab on these lines. Remove them so that the indentation is consistent with the Linux kernel coding style and clang no longer warns. While we are here, adjust the default block in dmfe_init_module to have a proper break between the label and assignment and add a space between the switch and opening parentheses to avoid a checkpatch warning. Fixes: e1c3e5014040 ("[PATCH] initialisation cleanup for ULI526x-net-driver") Link: https://github.com/ClangBuiltLinux/linux/issues/795 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-09Merge branch 'dp83867-fix-fifo-depth'David S. Miller
Dan Murphy says: ==================== Fix Tx/Rx FIFO depth for DP83867 The DP83867 supports both the RGMII and SGMII modes. The Tx and Rx FIFO depths are configurable in these modes but may not applicable for both modes. When the device is configured for RGMII mode the Tx FIFO depth is applicable and for SGMII mode both Tx and Rx FIFO depth settings are applicable. When the driver was originally written only the RGMII device was available and there were no standard fifo-depth DT properties. The patchset converts the special ti,fifo-depth property to the standard tx-fifo-depth property while still allowing the ti,fifo-depth property to be set as to maintain backward compatibility. In addition to this change the rx-fifo-depth property support was added and only written when the device is configured for SGMII mode. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>