summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-08-31xsk: Add shared umem support between queue idsMagnus Karlsson
Add support to share a umem between queue ids on the same device. This mode can be invoked with the XDP_SHARED_UMEM bind flag. Previously, sharing was only supported within the same queue id and device, and you shared one set of fill and completion rings. However, note that when sharing a umem between queue ids, you need to create a fill ring and a completion ring and tie them to the socket before you do the bind with the XDP_SHARED_UMEM flag. This so that the single-producer single-consumer semantics can be upheld. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-12-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for better ↵Magnus Karlsson
performance Test for dma_need_sync earlier to increase performance. xsk_buff_dma_sync_for_cpu() takes an xdp_buff as parameter and from that the xsk_buff_pool reference is dug out. Perf shows that this dereference causes a lot of cache misses. But as the buffer pool is now sent down to the driver at zero-copy initialization time, we might as well use this pointer directly, instead of going via the xsk_buff and we can do so already in xsk_buff_dma_sync_for_cpu() instead of in xp_dma_sync_for_cpu. This gets rid of these cache misses. Throughput increases with 3% for the xdpsock l2fwd sample application on my machine. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-11-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Rearrange internal structs for better performanceMagnus Karlsson
Rearrange the xdp_sock, xdp_umem and xsk_buff_pool structures so that they get smaller and align better to the cache lines. In the previous commits of this patch set, these structs have been reordered with the focus on functionality and simplicity, not performance. This patch improves throughput performance by around 3%. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-10-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Enable sharing of dma mappingsMagnus Karlsson
Enable the sharing of dma mappings by moving them out from the buffer pool. Instead we put each dma mapped umem region in a list in the umem structure. If dma has already been mapped for this umem and device, it is not mapped again and the existing dma mappings are reused. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-9-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move addrs from buffer pool to umemMagnus Karlsson
Replicate the addrs pointer in the buffer pool to the umem. This mapping will be the same for all buffer pools sharing the same umem. In the buffer pool we leave the addrs pointer for performance reasons. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-8-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move xsk_tx_list and its lock to buffer poolMagnus Karlsson
Move the xsk_tx_list and the xsk_tx_list_lock from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one xsk_tx_list per device and queue id, so it should be located in the buffer pool. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-7-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move queue_id, dev and need_wakeup to buffer poolMagnus Karlsson
Move queue_id, dev, and need_wakeup from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queues. There is one buffer pool per dev and queue id, so these variables should belong to the buffer pool, not the umem. Need_wakeup is also something that is set on a per napi level, so there is usually one per device and queue id. So move this to the buffer pool too. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-6-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Move fill and completion rings to buffer poolMagnus Karlsson
Move the fill and completion rings from the umem to the buffer pool. This so that we in a later commit can share the umem between multiple HW queue ids. In this case, we need one fill and completion ring per queue id. As the buffer pool is per queue id and napi id this is a natural place for it and one umem struture can be shared between these buffer pools. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-5-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: Create and free buffer pool independently from umemMagnus Karlsson
Create and free the buffer pool independently from the umem. Move these operations that are performed on the buffer pool from the umem create and destroy functions to new create and destroy functions just for the buffer pool. This so that in later commits we can instantiate multiple buffer pools per umem when sharing a umem between HW queues and/or devices. We also erradicate the back pointer from the umem to the buffer pool as this will not work when we introduce the possibility to have multiple buffer pools per umem. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-4-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Rename xsk zero-copy driver interfacesMagnus Karlsson
Rename the AF_XDP zero-copy driver interface functions to better reflect what they do after the replacement of umems with buffer pools in the previous commit. Mostly it is about replacing the umem name from the function names with xsk_buff and also have them take the a buffer pool pointer instead of a umem. The various ring functions have also been renamed in the process so that they have the same naming convention as the internal functions in xsk_queue.h. This so that it will be clearer what they do and also for consistency. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
2020-08-31xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umemMagnus Karlsson
Replace the explicit umem reference passed to the driver in AF_XDP zero-copy mode with the buffer pool instead. This in preparation for extending the functionality of the zero-copy mode so that umems can be shared between queues on the same netdev and also between netdevs. In this commit, only an umem reference has been added to the buffer pool struct. But later commits will add other entities to it. These are going to be entities that are different between different queue ids and netdevs even though the umem is shared between them. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-08-31netlink: policy: correct validation type checkJohannes Berg
In the policy export for binary attributes I erroneously used a != NLA_VALIDATE_NONE comparison instead of checking for the two possible values, which meant that if a validation function pointer ended up aliasing the min/max as negatives, we'd hit a warning in nla_get_range_unsigned(). Fix this to correctly check for only the two types that should be handled here, i.e. range with or without warn-too-long. Reported-by: syzbot+353df1490da781637624@syzkaller.appspotmail.com Fixes: 8aa26c575fb3 ("netlink: make NLA_BINARY validation more flexible") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-31bpf: Fix build without BPF_LSM.Alexei Starovoitov
resolve_btfids doesn't like empty set. Add unused ID when BPF_LSM is off. Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Reported-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200831163132.66521-1-alexei.starovoitov@gmail.com
2020-08-31bpf: Fix build without BPF_SYSCALL, but with BPF_JIT.Alexei Starovoitov
When CONFIG_BPF_SYSCALL is not set, but CONFIG_BPF_JIT=y the kernel build fails: In file included from ../kernel/bpf/trampoline.c:11: ../kernel/bpf/trampoline.c: In function ‘bpf_trampoline_update’: ../kernel/bpf/trampoline.c:220:39: error: ‘call_rcu_tasks_trace’ undeclared ../kernel/bpf/trampoline.c: In function ‘__bpf_prog_enter_sleepable’: ../kernel/bpf/trampoline.c:411:2: error: implicit declaration of function ‘rcu_read_lock_trace’ ../kernel/bpf/trampoline.c: In function ‘__bpf_prog_exit_sleepable’: ../kernel/bpf/trampoline.c:416:2: error: implicit declaration of function ‘rcu_read_unlock_trace’ This is due to: obj-$(CONFIG_BPF_JIT) += trampoline.o obj-$(CONFIG_BPF_JIT) += dispatcher.o There is a number of functions that arch/x86/net/bpf_jit_comp.c is using from these two files, but none of them will be used when only cBPF is on (which is the case for BPF_SYSCALL=n BPF_JIT=y). Add rcu_trace functions to rcupdate_trace.h. The JITed code won't execute them and BPF trampoline logic won't be used without BPF_SYSCALL. Fixes: 1e6c62a88215 ("bpf: Introduce sleepable BPF programs") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/bpf/20200831155155.62754-1-alexei.starovoitov@gmail.com
2020-08-28Merge branch 'bpf-sleepable'Daniel Borkmann
Alexei Starovoitov says: ==================== v2->v3: - switched to minimal allowlist approach. Essentially that means that syscall entry, few btrfs allow_error_inject functions, should_fail_bio(), and two LSM hooks: file_mprotect and bprm_committed_creds are the only hooks that allow attaching of sleepable BPF programs. When comprehensive analysis of LSM hooks will be done this allowlist will be extended. - added patch 1 that fixes prototypes of two mm functions to reliably work with error injection. It's also necessary for resolve_btfids tool to recognize these two funcs, but that's secondary. v1->v2: - split fmod_ret fix into separate patch - added denylist v1: This patch set introduces the minimal viable support for sleepable bpf programs. In this patch only fentry/fexit/fmod_ret and lsm progs can be sleepable. Only array and pre-allocated hash and lru maps allowed. Here is 'perf report' difference of sleepable vs non-sleepable: 3.86% bench [k] __srcu_read_unlock 3.22% bench [k] __srcu_read_lock 0.92% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.50% bench [k] bpf_trampoline_10297 0.26% bench [k] __bpf_prog_exit_sleepable 0.21% bench [k] __bpf_prog_enter_sleepable vs 0.88% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry 0.84% bench [k] bpf_trampoline_10297 0.13% bench [k] __bpf_prog_enter 0.12% bench [k] __bpf_prog_exit vs 0.79% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.72% bench [k] bpf_trampoline_10381 0.31% bench [k] __bpf_prog_exit_sleepable 0.29% bench [k] __bpf_prog_enter_sleepable Sleepable vs non-sleepable program invocation overhead is only marginally higher due to rcu_trace. srcu approach is much slower. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-08-28selftests/bpf: Add sleepable testsAlexei Starovoitov
Modify few tests to sanity test sleepable bpf functionality. Running 'bench trig-fentry-sleep' vs 'bench trig-fentry' and 'perf report': sleepable with SRCU: 3.86% bench [k] __srcu_read_unlock 3.22% bench [k] __srcu_read_lock 0.92% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.50% bench [k] bpf_trampoline_10297 0.26% bench [k] __bpf_prog_exit_sleepable 0.21% bench [k] __bpf_prog_enter_sleepable sleepable with RCU_TRACE: 0.79% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep 0.72% bench [k] bpf_trampoline_10381 0.31% bench [k] __bpf_prog_exit_sleepable 0.29% bench [k] __bpf_prog_enter_sleepable non-sleepable with RCU: 0.88% bench [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry 0.84% bench [k] bpf_trampoline_10297 0.13% bench [k] __bpf_prog_enter 0.12% bench [k] __bpf_prog_exit Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-6-alexei.starovoitov@gmail.com
2020-08-28libbpf: Support sleepable progsAlexei Starovoitov
Pass request to load program as sleepable via ".s" suffix in the section name. If it happens in the future that all map types and helpers are allowed with BPF_F_SLEEPABLE flag "fmod_ret/" and "lsm/" can be aliased to "fmod_ret.s/" and "lsm.s/" to make all lsm and fmod_ret programs sleepable by default. The fentry and fexit programs would always need to have sleepable vs non-sleepable distinction, since not all fentry/fexit progs will be attached to sleepable kernel functions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: KP Singh <kpsingh@google.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-5-alexei.starovoitov@gmail.com
2020-08-28bpf: Add bpf_copy_from_user() helper.Alexei Starovoitov
Sleepable BPF programs can now use copy_from_user() to access user memory. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-4-alexei.starovoitov@gmail.com
2020-08-28bpf: Introduce sleepable BPF programsAlexei Starovoitov
Introduce sleepable BPF programs that can request such property for themselves via BPF_F_SLEEPABLE flag at program load time. In such case they will be able to use helpers like bpf_copy_from_user() that might sleep. At present only fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only when they are attached to kernel functions that are known to allow sleeping. The non-sleepable programs are relying on implicit rcu_read_lock() and migrate_disable() to protect life time of programs, maps that they use and per-cpu kernel structures used to pass info between bpf programs and the kernel. The sleepable programs cannot be enclosed into rcu_read_lock(). migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs should not be enclosed in migrate_disable() as well. Therefore rcu_read_lock_trace is used to protect the life time of sleepable progs. There are many networking and tracing program types. In many cases the 'struct bpf_prog *' pointer itself is rcu protected within some other kernel data structure and the kernel code is using rcu_dereference() to load that program pointer and call BPF_PROG_RUN() on it. All these cases are not touched. Instead sleepable bpf programs are allowed with bpf trampoline only. The program pointers are hard-coded into generated assembly of bpf trampoline and synchronize_rcu_tasks_trace() is used to protect the life time of the program. The same trampoline can hold both sleepable and non-sleepable progs. When rcu_read_lock_trace is held it means that some sleepable bpf program is running from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru maps. These map types are waiting on programs to complete via synchronize_rcu_tasks_trace(); Updates to trampoline now has to do synchronize_rcu_tasks_trace() and synchronize_rcu_tasks() to wait for sleepable progs to finish and for trampoline assembly to finish. This is the first step of introducing sleepable progs. Eventually dynamically allocated hash maps can be allowed and networking program types can become sleepable too. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
2020-08-28mm/error_inject: Fix allow_error_inject function signatures.Alexei Starovoitov
'static' and 'static noinline' function attributes make no guarantees that gcc/clang won't optimize them. The compiler may decide to inline 'static' function and in such case ALLOW_ERROR_INJECT becomes meaningless. The compiler could have inlined __add_to_page_cache_locked() in one callsite and didn't inline in another. In such case injecting errors into it would cause unpredictable behavior. It's worse with 'static noinline' which won't be inlined, but it still can be optimized. Like the compiler may decide to remove one argument or constant propagate the value depending on the callsite. To avoid such issues make sure that these functions are global noinline. Fixes: af3b854492f3 ("mm/page_alloc.c: allow error injection") Fixes: cfcbfb1382db ("mm/filemap.c: enable error injection at add_to_page_cache()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/bpf/20200827220114.69225-2-alexei.starovoitov@gmail.com
2020-08-28netlabel: remove unused param from audit_log_format()Alex Dewar
Commit d3b990b7f327 ("netlabel: fix problems with mapping removal") added a check to return an error if ret_val != 0, before ret_val is later used in a log message. Now it will unconditionally print "... res=1". So just drop the check. Addresses-Coverity: ("Dead code") Fixes: d3b990b7f327 ("netlabel: fix problems with mapping removal") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28Merge branch 'ionic-memory-usage-rework'David S. Miller
Shannon Nelson says: ==================== ionic memory usage rework Previous review comments have suggested [1],[2] that this driver needs to rework how queue resources are managed and reconfigured so that we don't do a full driver reset and to better handle potential allocation failures. This patchset is intended to address those comments. The first few patches clean some general issues and simplify some of the memory structures. The last 4 patches specifically address queue parameter changes without a full ionic_stop()/ionic_open(). [1] https://lore.kernel.org/netdev/20200706103305.182bd727@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [2] https://lore.kernel.org/netdev/20200724.194417.2151242753657227232.davem@davemloft.net/ v3: use PTR_ALIGN without typecast fix up Neel's attribution v2: use PTR_ALIGN recovery if netif_set_real_num_tx/rx_queues fails less racy queue bring up after reconfig common-ize the reconfig queue stop and start ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: pull reset_queues into tx_timeout handlerShannon Nelson
Convert tx_timeout handler to not do the full reset. As this was the last user of ionic_reset_queues(), we can drop it. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: change queue count with no resetShannon Nelson
Add to our new ionic_reconfigure_queues() to also be able to change the number of queues in use, and to change the queue interrupt layout between split and combined. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: change the descriptor ring length without full resetShannon Nelson
The original way of changing ring length was to completely tear down the lif's queue structure and then rebuild it, while running the risk of allocations that might fail in the middle and leave us with a broken driver. Instead, we can set up all the new queue and descriptor allocations first, then swap them out and delete the old allocations. If the new allocations fail, we report the error, stay with the old setup and continue running. This gives us a safer path, and a smaller window of time where we're not processing traffic. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: change mtu without full queue rebuildShannon Nelson
We really don't need to tear down and rebuild the whole queue structure when changing the MTU; we can simply stop the queues, clean and refill, then restart the queues. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: use index not pointer for queue trackingShannon Nelson
Use index counters rather than pointers for tracking head and tail in the queues to save a little memory and to perhaps slightly faster queue processing. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: reduce contiguous memory allocation requirementShannon Nelson
Split out the queue descriptor blocks into separate dma allocations to make for smaller blocks. Co-developed-by: Neel Patel <neel@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: clean up unnecessary non-static functionsShannon Nelson
ionic_open() and ionic_stop() are not referenced outside of their defining file, so make them static. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: rework and simplify handling of the queue stats blockShannon Nelson
Use a block of stats structs attached to the lif instead of little ones attached to each qcq. This simplifies our memory management and gets rid of a lot of unnecessary indirection. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: remove lif list conceptShannon Nelson
As we aren't yet supporting multiple lifs, we can remove complexity by removing the list concept and related code, to be re-engineered later when actually needed. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: use kcalloc for new arraysShannon Nelson
Use kcalloc for allocating arrays of structures. Following along after commit e71642009cbdA ("ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()") there are a couple more array allocations that can be converted to using devm_kcalloc(). Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: fix up a couple of debug stringsShannon Nelson
Fix the queue name displayed. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28ionic: set MTU floor at ETH_MIN_MTUShannon Nelson
The NIC might tell us its minimum MTU, but let's be sure not to use something smaller than ETH_MIN_MTU. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28Merge branch 'Enable-Fiber-on-DP83822-PHY'David S. Miller
Dan Murphy says: ==================== Enable Fiber on DP83822 PHY The DP83822 Ethernet PHY has the ability to connect via a Fiber port. The derivative PHYs DP83825 and DP83826 do not have this ability. In fiber mode the DP83822 disables auto negotiation and has a fixed 100Mbps speed with support for full or half duplex modes. A devicetree binding was added to set the signal polarity for the fiber connection. This property is only applicable if the FX_EN strap is set in hardware other wise the signal loss detection is disabled on the PHY. If the FX_EN is not strapped the device can be configured to run in fiber mode via the device tree. All be it the PHY will not perform signal loss detection. v2 review from a long time ago can be found here - https://lore.kernel.org/patchwork/patch/1270958/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28net: phy: DP83822: Add ability to advertise Fiber connectionDan Murphy
The DP83822 can be configured to use a Fiber connection. The strap register is read to determine if the device has been configured to use a fiber connection. With the fiber connection the PHY can be configured to detect whether the fiber connection is active by either a high signal or a low signal. Fiber mode is only applicable to the DP83822 so rework the PHY match table so that non-fiber PHYs can still use the same driver but not call or use any of the fiber features. Signed-off-by: Dan Murphy <dmurphy@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28dt-bindings: net: dp83822: Add TI dp83822 phyDan Murphy
Add a dt binding for the TI dp83822 ethernet phy device. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Dan Murphy <dmurphy@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28net: add option to not create fall-back tunnels in root-ns as wellMahesh Bandewar
The sysctl that was added earlier by commit 79134e6ce2c ("net: do not create fallback tunnels for non-default namespaces") to create fall-back only in root-ns. This patch enhances that behavior to provide option not to create fallback tunnels in root-ns as well. Since modules that create fallback tunnels could be built-in and setting the sysctl value after booting is pointless, so added a kernel cmdline options to change this default. The default setting is preseved for backward compatibility. The kernel command line option of fb_tunnels=initns will set the sysctl value to 1 and will create fallback tunnels only in initns while kernel cmdline fb_tunnels=none will set the sysctl value to 2 and fallback tunnels are skipped in every netns. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Maciej Zenczykowski <maze@google.com> Cc: Jian Yang <jianyang@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28Merge branch 'Add-phylib-support-to-smsc95xx'David S. Miller
Andre Edich says: ==================== Add phylib support to smsc95xx To allow to probe external PHY drivers, this patch series adds use of phylib to the smsc95xx driver. Changes in v5: - Removed all phy_read calls from the smsc95xx driver. Changes in v4: - Removed useless inline type qualifier. Changes in v3: - Moved all MDI-X functionality to the corresponding phy driver; - Removed field internal_phy from a struct smsc95xx_priv; - Initialized field is_internal of a struct phy_device; - Kconfig: Added selection of PHYLIB and SMSC_PHY for USB_NET_SMSC95XX. Changes in v2: - Moved 'net' patches from here to the separate patch series; - Removed redundant call of the phy_start_aneg after phy_start; - Removed netif_dbg tracing "speed, duplex, lcladv, and rmtadv"; - mdiobus: added dependency from the usbnet device; - Moved making of the MII address from 'phy_id' and 'idx' into the function mii_address; - Moved direct MDIO accesses under condition 'if (pdata->internal_phy)', as they only need for the internal PHY; - To be sure, that this set of patches is git-bisectable, tested each sub-set of patches to be functional for both, internal and external PHYs, including suspend/resume test for the 'devices' (5.7.8-1-ARCH, Raspberry Pi 3 Model B). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28smsc95xx: add phylib supportAndre Edich
Generally, each PHY has their own configuration and it can be done through an external PHY driver. The smsc95xx driver uses only the hard-coded internal PHY configuration. This patch adds phylib support to probe external PHY drivers for configuring external PHYs. The MDI-X configuration for the internal PHYs moves from drivers/net/usb/smsc95xx.c to drivers/net/phy/smsc.c. Signed-off-by: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28smsc95xx: use usbnet->driver_privAndre Edich
Using `void *driver_priv` instead of `unsigned long data[]` is more straightforward way to recover the `struct smsc95xx_priv *` from the `struct net_device *`. Signed-off-by: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28smsc95xx: remove redundant function argumentsAndre Edich
This patch removes arguments netdev and phy_id from the functions smsc95xx_mdio_read_nopm and smsc95xx_mdio_write_nopm. Both removed arguments are recovered from a new argument `struct usbnet *dev`. Signed-off-by: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28bpf: selftests: Add test for different inner map sizeMartin KaFai Lau
This patch tests the inner map size can be different for reuseport_sockarray but has to be the same for arraymap. A new subtest "diff_size" is added for this. The existing test is moved to a subtest "lookup_update". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011819.1970825-1-kafai@fb.com
2020-08-28bpf: Relax max_entries check for most of the inner map typesMartin KaFai Lau
Most of the maps do not use max_entries during verification time. Thus, those map_meta_equal() do not need to enforce max_entries when it is inserted as an inner map during runtime. The max_entries check is removed from the default implementation bpf_map_meta_equal(). The prog_array_map and xsk_map are exception. Its map_gen_lookup uses max_entries to generate inline lookup code. Thus, they will implement its own map_meta_equal() to enforce max_entries. Since there are only two cases now, the max_entries check is not refactored and stays in its own .c file. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011813.1970516-1-kafai@fb.com
2020-08-28bpf: Add map_meta_equal map opsMartin KaFai Lau
Some properties of the inner map is used in the verification time. When an inner map is inserted to an outer map at runtime, bpf_map_meta_equal() is currently used to ensure those properties of the inserting inner map stays the same as the verification time. In particular, the current bpf_map_meta_equal() checks max_entries which turns out to be too restrictive for most of the maps which do not use max_entries during the verification time. It limits the use case that wants to replace a smaller inner map with a larger inner map. There are some maps do use max_entries during verification though. For example, the map_gen_lookup in array_map_ops uses the max_entries to generate the inline lookup code. To accommodate differences between maps, the map_meta_equal is added to bpf_map_ops. Each map-type can decide what to check when its map is used as an inner map during runtime. Also, some map types cannot be used as an inner map and they are currently black listed in bpf_map_meta_alloc() in map_in_map.c. It is not unusual that the new map types may not aware that such blacklist exists. This patch enforces an explicit opt-in and only allows a map to be used as an inner map if it has implemented the map_meta_equal ops. It is based on the discussion in [1]. All maps that support inner map has its map_meta_equal points to bpf_map_meta_equal in this patch. A later patch will relax the max_entries check for most maps. bpf_types.h counts 28 map types. This patch adds 23 ".map_meta_equal" by using coccinelle. -5 for BPF_MAP_TYPE_PROG_ARRAY BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE BPF_MAP_TYPE_STRUCT_OPS BPF_MAP_TYPE_ARRAY_OF_MAPS BPF_MAP_TYPE_HASH_OF_MAPS The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc() is moved such that the same error is returned. [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/ Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
2020-08-28Merge tag 'mac80211-next-for-davem-2020-08-28' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== This time we have: * some code to support SAE (WPA3) offload in AP mode * many documentation (wording) fixes/updates * netlink policy updates, including the use of NLA_RANGE with binary attributes * regulatory improvements for adjacent frequency bands * and a few other small additions/refactorings/cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-28bpf: Make bpf_link_info.iter similar to bpf_iter_link_infoYonghong Song
bpf_link_info.iter is used by link_query to return bpf_iter_link_info to user space. Fields may be different, e.g., map_fd vs. map_id, so we cannot reuse the exact structure. But make them similar, e.g., struct bpf_link_info { /* common fields */ union { struct { ... } raw_tracepoint; struct { ... } tracing; ... struct { /* common fields for iter */ union { struct { __u32 map_id; } map; /* other structs for other targets */ }; }; }; }; so the structure is extensible the same way as bpf_iter_link_info. Fixes: 6b0a249a301e ("bpf: Implement link_query for bpf iterators") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200828051922.758950-1-yhs@fb.com
2020-08-28tools, bpf/build: Cleanup feature files on make cleanJesper Dangaard Brouer
The system for "Auto-detecting system features" located under tools/build/ are (currently) used by perf, libbpf and bpftool. It can contain stalled feature detection files, which are not cleaned up by libbpf and bpftool on make clean (side-note: perf tool is correct). Fix this by making the users invoke the make clean target. Some details about the changes. The libbpf Makefile already had a clean-config target (which seems to be copy-pasted from perf), but this target was not "connected" (a make dependency) to clean target. Choose not to rename target as someone might be using it. Did change the output from "CLEAN config" to "CLEAN feature-detect", to make it more clear what happens. This is related to the complaint and troubleshooting in the following link: https://lore.kernel.org/lkml/20200818122007.2d1cfe2d@carbon/ Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/lkml/20200818122007.2d1cfe2d@carbon/ Link: https://lore.kernel.org/bpf/159851841661.1072907.13770213104521805592.stgit@firesoul
2020-08-27gtp: add notification mechanismNicolas Dichtel
Like all other network functions, let's notify gtp context on creation and deletion. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Gabriel Ganne <gabriel.ganne@6wind.com> Acked-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-27Merge branch 's390-qeth-next'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: updates 2020-08-27 please apply the following patch series for qeth to netdev's net-next tree. Patch 8 makes some improvements to how we handle HW address events, avoiding some uncertainty around processing stale events after we switched off the feature. Except for that it's all straight-forward cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>