summaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)Author
2023-05-01rxrpc: Fix timeout of a call that hasn't yet been granted a channelDavid Howells
afs_make_call() calls rxrpc_kernel_begin_call() to begin a call (which may get stalled in the background waiting for a connection to become available); it then calls rxrpc_kernel_set_max_life() to set the timeouts - but that starts the call timer so the call timer might then expire before we get a connection assigned - leading to the following oops if the call stalled: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... CPU: 1 PID: 5111 Comm: krxrpcio/0 Not tainted 6.3.0-rc7-build3+ #701 RIP: 0010:rxrpc_alloc_txbuf+0xc0/0x157 ... Call Trace: <TASK> rxrpc_send_ACK+0x50/0x13b rxrpc_input_call_event+0x16a/0x67d rxrpc_io_thread+0x1b6/0x45f ? _raw_spin_unlock_irqrestore+0x1f/0x35 ? rxrpc_input_packet+0x519/0x519 kthread+0xe7/0xef ? kthread_complete_and_exit+0x1b/0x1b ret_from_fork+0x22/0x30 Fix this by noting the timeouts in struct rxrpc_call when the call is created. The timer will be started when the first packet is transmitted. It shouldn't be possible to trigger this directly from userspace through AF_RXRPC as sendmsg() will return EBUSY if the call is in the waiting-for-conn state if it dropped out of the wait due to a signal. Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-27SUNRPC: Recognize control messages in server-side TCP socket codeChuck Lever
To support kTLS, the server-side TCP socket receive path needs to watch for CMSGs. Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-04-27xsk: Use pool->dma_pages to check for DMAKal Conley
Compare pool->dma_pages instead of pool->dma_pages_cnt to check for an active DMA mapping. pool->dma_pages needs to be read anyway to access the map so this compiles to more efficient code. Signed-off-by: Kal Conley <kal.conley@dectris.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230423180157.93559-1-kal.conley@dectris.com
2023-04-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-24Merge tag 'nf-next-23-04-22' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Reduce jumpstack footprint: Stash chain in last rule marker in blob for tracing. Remove last rule and chain from jumpstack. From Florian Westphal. 2) nf_tables validates all tables before committing the new rules. Unfortunately, this has two drawbacks: - Since addition of the transaction mutex pernet state gets written to outside of the locked section from the cleanup callback, this is wrong so do this cleanup directly after table has passed all checks. - Revalidate tables that saw no changes. This can be avoided by keeping the validation state per table, not per netns. From Florian Westphal. 3) Get rid of a few redundant pointers in the traceinfo structure. The three removed pointers are used in the expression evaluation loop, so gcc keeps them in registers. Passing them to the (inlined) helpers thus doesn't increase nft_do_chain text size, while stack is reduced by another 24 bytes on 64bit arches. From Florian Westphal. 4) IPVS cleanups in several ways without implementing any functional changes, aside from removing some debugging output: - Update width of source for ip_vs_sync_conn_options The operation is safe, use an annotation to describe it properly. - Consistently use array_size() in ip_vs_conn_init() It seems better to use helpers consistently. - Remove {Enter,Leave}Function. These seem to be well past their use-by date. - Correct spelling in comments. From Simon Horman. 5) Extended netlink error report for netdevice in flowtables and netdev/chains. Allow for incrementally add/delete devices to netdev basechain. Allow to create netdev chain without device. * tag 'nf-next-23-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: allow to create netdev chain without device netfilter: nf_tables: support for deleting devices in an existing netdev chain netfilter: nf_tables: support for adding new devices to an existing netdev chain netfilter: nf_tables: rename function to destroy hook list netfilter: nf_tables: do not send complete notification of deletions netfilter: nf_tables: extended netlink error reporting for netdevice ipvs: Correct spelling in comments ipvs: Remove {Enter,Leave}Function ipvs: Consistently use array_size() in ip_vs_conn_init() ipvs: Update width of source for ip_vs_sync_conn_options netfilter: nf_tables: do not store rule in traceinfo structure netfilter: nf_tables: do not store verdict in traceinfo structure netfilter: nf_tables: do not store pktinfo in traceinfo structure netfilter: nf_tables: remove unneeded conditional netfilter: nf_tables: make validation state per table netfilter: nf_tables: don't write table validation state without mutex netfilter: nf_tables: don't store chain address on jump netfilter: nf_tables: don't store address of last rule on jump netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob marker ==================== Link: https://lore.kernel.org/r/20230421235021.216950-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-23Bluetooth: hci_sync: Only allow hci_cmd_sync_queue if runningLuiz Augusto von Dentz
This makes sure hci_cmd_sync_queue only queue new work if HCI_RUNNING has been set otherwise there is a risk of commands being sent while turning off. Because hci_cmd_sync_queue can no longer queue work while HCI_RUNNING is not set it cannot be used to power on adapters so instead hci_cmd_sync_submit is introduced which bypass the HCI_RUNNING check, so it behaves like the old implementation. Link: https://lore.kernel.org/all/CAB4PzUpDMvdc8j2MdeSAy1KkAE-D3woprCwAdYWeOc-3v3c9Sw@mail.gmail.com/ Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: Add new quirk for broken set random RPA timeout for ATS2851Raul Cheleguini
The ATS2851 based controller advertises support for command "LE Set Random Private Address Timeout" but does not actually implement it, impeding the controller initialization. Add the quirk HCI_QUIRK_BROKEN_SET_RPA_TIMEOUT to unblock the controller initialization. < HCI Command: LE Set Resolvable Private... (0x08|0x002e) plen 2 Timeout: 900 seconds > HCI Event: Command Status (0x0f) plen 4 LE Set Resolvable Private Address Timeout (0x08|0x002e) ncmd 1 Status: Unknown HCI Command (0x01) Co-developed-by: imoc <wzj9912@gmail.com> Signed-off-by: imoc <wzj9912@gmail.com> Signed-off-by: Raul Cheleguini <raul.cheleguini@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: hci_conn: Fix not waiting for HCI_EVT_LE_CIS_ESTABLISHEDLuiz Augusto von Dentz
When submitting HCI_OP_LE_CREATE_CIS the code shall wait for HCI_EVT_LE_CIS_ESTABLISHED thus enforcing the serialization of HCI_OP_LE_CREATE_CIS as the Core spec does not allow to send them in parallel: BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2566: If the Host issues this command before all the HCI_LE_CIS_Established events from the previous use of the command have been generated, the Controller shall return the error code Command Disallowed (0x0C). Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: hci_conn: Fix not matching by CIS IDLuiz Augusto von Dentz
This fixes only matching CIS by address which prevents creating new hcon if upper layer is requesting a specific CIS ID. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: hci_conn: Add support for linking multiple hconLuiz Augusto von Dentz
Since it is required for some configurations to have multiple CIS with the same peer which is now covered by iso-tester in the following test cases: ISO AC 6(i) - Success ISO AC 7(i) - Success ISO AC 8(i) - Success ISO AC 9(i) - Success ISO AC 11(i) - Success Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: Enable all supported LE PHY by defaultLuiz Augusto von Dentz
This enables 2M and Coded PHY by default if they are marked as supported in the LE features bits. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: Split bt_iso_qos into dedicated structuresIulia Tanasescu
Split bt_iso_qos into dedicated unicast and broadcast structures and add additional broadcast parameters. Fixes: eca0ae4aea66 ("Bluetooth: Add initial implementation of BIS connections") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: Add support for hci devcoredumpAbhishek Pandit-Subedi
Add devcoredump APIs to hci core so that drivers only have to provide the dump skbs instead of managing the synchronization and timeouts. The devcoredump APIs should be used in the following manner: - hci_devcoredump_init is called to allocate the dump. - hci_devcoredump_append is called to append any skbs with dump data OR hci_devcoredump_append_pattern is called to insert a pattern. - hci_devcoredump_complete is called when all dump packets have been sent OR hci_devcoredump_abort is called to indicate an error and cancel an ongoing dump collection. The high level APIs just prepare some skbs with the appropriate data and queue it for the dump to process. Packets part of the crashdump can be intercepted in the driver in interrupt context and forwarded directly to the devcoredump APIs. Internally, there are 5 states for the dump: idle, active, complete, abort and timeout. A devcoredump will only be in active state after it has been initialized. Once active, it accepts data to be appended, patterns to be inserted (i.e. memset) and a completion event or an abort event to generate a devcoredump. The timeout is initialized at the same time the dump is initialized (defaulting to 10s) and will be cleared either when the timeout occurs or the dump is complete or aborted. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: Add new quirk for broken local ext features page 2Vasily Khoruzhick
Some adapters (e.g. RTL8723CS) advertise that they have more than 2 pages for local ext features, but they don't support any features declared in these pages. RTL8723CS reports max_page = 2 and declares support for sync train and secure connection, but it responds with either garbage or with error in status on corresponding commands. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Bastian Germann <bage@debian.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: L2CAP: Delay identity address updatesLuiz Augusto von Dentz
This delays the identity address updates to give time for userspace to process the new address otherwise there is a risk that userspace creates a duplicated device if the MGMT event is delayed for some reason. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: hci_core: Make hci_conn_hash_add append to the listLuiz Augusto von Dentz
This makes hci_conn_hash_add append to the tail of the conn_hash so it matches the order they are created, this is required if the controller attempts to match the order of ACL with CIS which uses append logic when programming the CIS ids on the CIG. The result of this change affects Create CIS: Before: < HCI Command: LE Create Connected Isochronous Stream (0x08|0x0064) plen 9 Number of CIS: 2 CIS Handle: 2560 ACL Handle: 3586 CIS Handle: 2561 ACL Handle: 3585 After: < HCI Command: LE Create Connected Isochronous Stream (0x08|0x0064) plen 9 Number of CIS: 2 CIS Handle: 2560 ACL Handle: 3585 CIS Handle: 2561 ACL Handle: 3586 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-23Bluetooth: MGMT: Use BIT macro when defining bitfieldsLuiz Augusto von Dentz
This makes use of BIT macro when defining bitfields which makes it clearer what bit it is toggling. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-22rxrpc: Fix potential race in error handling in afs_make_call()David Howells
If the rxrpc call set up by afs_make_call() receives an error whilst it is transmitting the request, there's the possibility that it may get to the point the rxrpc call is ended (after the error_kill_call label) just as the call is queued for async processing. This could manifest itself as call->rxcall being seen as NULL in afs_deliver_to_call() when it tries to lock the call. Fix this by splitting rxrpc_kernel_end_call() into a function to shut down an rxrpc call and a function to release the caller's reference and calling the latter only when we get to afs_put_call(). Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: kafs-testing+fedora36_64checkkafs-build-306@auristor.com cc: Marc Dionne <marc.dionne@auristor.com> cc: "David S. Miller" <davem@davemloft.net> cc: Eric Dumazet <edumazet@google.com> cc: Jakub Kicinski <kuba@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-04-21 We've added 71 non-merge commits during the last 8 day(s) which contain a total of 116 files changed, 13397 insertions(+), 8896 deletions(-). The main changes are: 1) Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward, from Florian Westphal. 2) Fix race between btf_put and btf_idr walk which caused a deadlock, from Alexei Starovoitov. 3) Second big batch to migrate test_verifier unit tests into test_progs for ease of readability and debugging, from Eduard Zingerman. 4) Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree, from Dave Marchevsky. 5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF selftests into libbpf-provided bpf_helpers.h header and improve kfunc handling, from Andrii Nakryiko. 6) Support 64-bit pointers to kfuncs needed for archs like s390x, from Ilya Leoshkevich. 7) Support BPF progs under getsockopt with a NULL optval, from Stanislav Fomichev. 8) Improve verifier u32 scalar equality checking in order to enable LLVM transformations which earlier had to be disabled specifically for BPF backend, from Yonghong Song. 9) Extend bpftool's struct_ops object loading to support links, from Kui-Feng Lee. 10) Add xsk selftest follow-up fixes for hugepage allocated umem, from Magnus Karlsson. 11) Support BPF redirects from tc BPF to ifb devices, from Daniel Borkmann. 12) Add BPF support for integer type when accessing variable length arrays, from Feng Zhou. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits) selftests/bpf: verifier/value_ptr_arith converted to inline assembly selftests/bpf: verifier/value_illegal_alu converted to inline assembly selftests/bpf: verifier/unpriv converted to inline assembly selftests/bpf: verifier/subreg converted to inline assembly selftests/bpf: verifier/spin_lock converted to inline assembly selftests/bpf: verifier/sock converted to inline assembly selftests/bpf: verifier/search_pruning converted to inline assembly selftests/bpf: verifier/runtime_jit converted to inline assembly selftests/bpf: verifier/regalloc converted to inline assembly selftests/bpf: verifier/ref_tracking converted to inline assembly selftests/bpf: verifier/map_ptr_mixing converted to inline assembly selftests/bpf: verifier/map_in_map converted to inline assembly selftests/bpf: verifier/lwt converted to inline assembly selftests/bpf: verifier/loops1 converted to inline assembly selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly selftests/bpf: verifier/direct_packet_access converted to inline assembly selftests/bpf: verifier/d_path converted to inline assembly selftests/bpf: verifier/ctx converted to inline assembly selftests/bpf: verifier/btf_ctx_access converted to inline assembly selftests/bpf: verifier/bpf_get_stack converted to inline assembly ... ==================== Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-22netfilter: nf_tables: support for adding new devices to an existing netdev chainPablo Neira Ayuso
This patch allows users to add devices to an existing netdev chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Correct spelling in commentsSimon Horman
Correct some spelling errors flagged by codespell and found by inspection. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Remove {Enter,Leave}FunctionSimon Horman
Remove EnterFunction and LeaveFunction. These debugging macros seem well past their use-by date. And seem to have little value these days. Removing them allows some trivial cleanup of some exit paths for some functions. These are also included in this patch. There is likely scope for further cleanup of both debugging and unwind paths. But let's leave that for another day. Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG is enabled. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Update width of source for ip_vs_sync_conn_optionsSimon Horman
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options. That structure looks like this: struct ip_vs_sync_conn_options { struct ip_vs_seq in_seq; struct ip_vs_seq out_seq; }; The source of the copy is the in_seq field of struct ip_vs_conn. Whose type is struct ip_vs_seq. Thus we can see that the source - is not as wide as the amount of data copied, which is the width of struct ip_vs_sync_conn_option. The copy is safe because the next field in is another struct ip_vs_seq. Make use of struct_group() to annotate this. Flagged by gcc-13 as: In file included from ./include/linux/string.h:254, from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:67, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from net/netfilter/ipvs/ip_vs_sync.c:38: In function 'fortify_memcpy_chk', inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3: ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 529 | __read_overflow2_field(q_size_field, size); | Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store rule in traceinfo structureFlorian Westphal
pass it as argument instead. This reduces size of traceinfo to 16 bytes. Total stack usage: nf_tables_core.c:252 nft_do_chain 304 static While its possible to also pass basechain as argument, doing so increases nft_do_chaininfo function size. Unlike pktinfo/verdict/rule the basechain info isn't used in the expression evaluation path. gcc places it on the stack, which results in extra push/pop when it gets passed to the trace helpers as argument rather than as part of the traceinfo structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store verdict in traceinfo structureFlorian Westphal
Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes: nf_tables_core.c:256 nft_do_chain 312 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store pktinfo in traceinfo structureFlorian Westphal
pass it as argument. No change in object size. stack usage decreases by 8 byte: nf_tables_core.c:254 nft_do_chain 320 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: make validation state per tableFlorian Westphal
We only need to validate tables that saw changes in the current transaction. The existing code revalidates all tables, but this isn't needed as cross-table jumps are not allowed (chains have table scope). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't store chain address on jumpFlorian Westphal
Now that the rule trailer/end marker and the rcu head reside in the same structure, we no longer need to save/restore the chain pointer when performing/returning from a jump. We can simply let the trace infra walk the evaluated rule until it hits the end marker and then fetch the chain pointer from there. When the rule is NULL (policy tracing), then chain and basechain pointers were already identical, so just use the basechain. This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit, scripts/stackusage says: nf_tables_core.c:251 nft_do_chain 328 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-21bpf: minimal support for programs hooked into netfilter frameworkFlorian Westphal
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs that will be invoked via the NF_HOOK() points in the ip stack. Invocation incurs an indirect call. This is not a necessity: Its possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the program invocation with the same method already done for xdp progs. This isn't done here to keep the size of this chunk down. Verifier restricts verdicts to either DROP or ACCEPT. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpf: add bpf_link support for BPF_NETFILTER programsFlorian Westphal
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and not the actual bpf program. Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig. Uapi example usage: union bpf_attr attr = { }; attr.link_create.prog_fd = progfd; attr.link_create.attach_type = 0; /* unused */ attr.link_create.netfilter.pf = PF_INET; attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN; attr.link_create.netfilter.priority = -128; err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); ... this would attach progfd to ipv4:input hook. Such hook gets removed automatically if the calling program exits. BPF_NETFILTER program invocation is added in followup change. NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it allows to tell userspace which program is attached at the given hook when user runs 'nft hook list' command rather than just the priority and not-very-helpful 'this hook runs a bpf prog but I can't tell which one'. Will also be used to disallow registration of two bpf programs with same priority in a followup patch. v4: arm32 cmpxchg only supports 32bit operand s/prio/priority/ v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if more use cases pop up (arptables, ebtables, netdev ingress/egress etc). Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21Merge tag 'wireless-next-2023-04-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.4 Most likely the last -next pull request for v6.4. We have changes all over. rtw88 now supports SDIO bus and iwlwifi continues to work on Wi-Fi 7 support. Not much stack changes this time. Major changes: cfg80211/mac80211 - fix some Fine Time Measurement (FTM) frames not being bufferable - flush frames before key removal to avoid potential unencrypted transmission depending on the hardware design iwlwifi - preparation for Wi-Fi 7 EHT and multi-link support rtw88 - SDIO bus support - RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support rtw89 - framework firmware backwards compatibility brcmfmac - Cypress 43439 SDIO support mt76 - mt7921 P2P support - mt7996 mesh A-MSDU support - mt7996 EHT support - mt7996 coredump support wcn36xx - support for pronto v3 hardware ath11k - PCIe DeviceTree bindings - WCN6750: enable SAR support ath10k - convert DeviceTree bindings to YAML * tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits) wifi: rtw88: Update spelling in main.h wifi: airo: remove ISA_DMA_API dependency wifi: rtl8xxxu: Simplify setting the initial gain wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear} wifi: rtl8xxxu: Don't print the vendor/product/serial wifi: rtw88: Fix memory leak in rtw88_usb wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant wifi: rtw88: set pkg_type correctly for specific rtw8821c variants wifi: rtw88: rtw8821c: Fix rfe_option field width wifi: rtw88: usb: fix priority queue to endpoint mapping wifi: rtw88: 8822c: add iface combination wifi: rtw88: handle station mode concurrent scan with AP mode wifi: rtw88: prevent scan abort with other VIFs wifi: rtw88: refine reserved page flow for AP mode wifi: rtw88: disallow PS during AP mode wifi: rtw88: 8822c: extend reserved page number wifi: rtw88: add port switch for AP mode wifi: rtw88: add bitmap for dynamic port settings wifi: rtw89: mac: use regular int as return type of DLE buffer request wifi: mac80211: remove return value check of debugfs_create_dir() ... ==================== Link: https://lore.kernel.org/r/20230421104726.800BCC433D2@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21sctp: delete the nested flexible array peer_initXin Long
This patch deletes the flexible-array peer_init[] from the structure sctp_cookie to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/sm_make_chunk.c: note: in included file (through include/net/sctp/sctp.h): ./include/net/sctp/structs.h:1588:28: warning: nested flexible array ./include/net/sctp/structs.h:343:28: warning: nested flexible array Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array skipXin Long
This patch deletes the flexible-array skip[] from the structure sctp_ifwdtsn/fwdtsn_hdr to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/stream_interleave.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h): ./include/linux/sctp.h:611:32: warning: nested flexible array ./include/linux/sctp.h:628:33: warning: nested flexible array Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array paramsXin Long
This patch deletes the flexible-array params[] from the structure sctp_inithdr, sctp_addiphdr and sctp_reconf_chunk to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/input.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h): ./include/linux/sctp.h:278:29: warning: nested flexible array ./include/linux/sctp.h:675:30: warning: nested flexible array This warning is reported if a structure having a flexible array member is included by other structures. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-20mac80211: use the new drop reasons infrastructureJohannes Berg
It can be really hard to analyse or debug why packets are going missing in mac80211, so add the needed infrastructure to use use the new per-subsystem drop reasons. We actually use two drop reason subsystems here because of the different handling of frames that are dropped but still go to monitor for old versions of hostapd, and those that are just completely unusable (e.g. crypto failed.) Annotate a few reasons here just to illustrate this, we'll need to go through and annotate more of them later. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: extend drop reasons for multiple subsystemsJohannes Berg
Extend drop reasons to make them usable by subsystems other than core by reserving the high 16 bits for a new subsystem ID, of which 0 of course is used for the existing reasons immediately. To still be able to have string reasons, restructure that code a bit to make the loopup under RCU, the only user of this (right now) is drop_monitor. Link: https://lore.kernel.org/netdev/00659771ed54353f92027702c5bbb84702da62ce.camel@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: move dropreason.h to dropreason-core.hJohannes Berg
This will, after the next patch, hold only the core drop reasons and minimal infrastructure. Fix a small kernel-doc issue while at it, to avoid the move triggering a checker. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20ipv6: add icmpv6_error_anycast_as_unicast for ICMPv6Mahesh Bandewar
ICMPv6 error packets are not sent to the anycast destinations and this prevents things like traceroute from working. So create a setting similar to ECHO when dealing with Anycast sources (icmpv6_echo_ignore_anycast). Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20230419013238.2691167-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20flow_dissector: Address kdoc warningsSimon Horman
Address a number of warnings flagged by ./scripts/kernel-doc -none include/net/flow_dissector.h include/net/flow_dissector.h:23: warning: Function parameter or member 'addr_type' not described in 'flow_dissector_key_control' include/net/flow_dissector.h:23: warning: Function parameter or member 'flags' not described in 'flow_dissector_key_control' include/net/flow_dissector.h:46: warning: Function parameter or member 'padding' not described in 'flow_dissector_key_basic' include/net/flow_dissector.h:145: warning: Function parameter or member 'tipckey' not described in 'flow_dissector_key_addrs' include/net/flow_dissector.h:157: warning: cannot understand function prototype: 'struct flow_dissector_key_arp ' include/net/flow_dissector.h:171: warning: cannot understand function prototype: 'struct flow_dissector_key_ports ' include/net/flow_dissector.h:203: warning: cannot understand function prototype: 'struct flow_dissector_key_icmp ' Also improve indentation on adjacent lines to those changed to address the above. No functional changes intended. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230419-flow-dissector-kdoc-v1-1-1aa0cca1118b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20page_pool: unlink from napi during destroyJakub Kicinski
Jesper points out that we must prevent recycling into cache after page_pool_destroy() is called, because page_pool_destroy() is not synchronized with recycling (some pages may still be outstanding when destroy() gets called). I assumed this will not happen because NAPI can't be scheduled if its page pool is being destroyed. But I missed the fact that NAPI may get reused. For instance when user changes ring configuration driver may allocate a new page pool, stop NAPI, swap, start NAPI, and then destroy the old pool. The NAPI is running so old page pool will think it can recycle to the cache, but the consumer at that point is the destroy() path, not NAPI. To avoid extra synchronization let the drivers do "unlinking" during the "swap" stage while NAPI is indeed disabled. Fixes: 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI") Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com> Link: https://lore.kernel.org/all/e8df2654-6a5b-3c92-489d-2fe5e444135f@redhat.com/ Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230419182006.719923-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net/handshake: Add a kernel API for requesting a TLSv1.3 handshakeChuck Lever
To enable kernel consumers of TLS to request a TLS handshake, add support to net/handshake/ to request a handshake upcall. This patch also acts as a template for adding handshake upcall support for other kernel transport layer security providers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net: skbuff: hide wifi_acked when CONFIG_WIRELESS not setJakub Kicinski
Datacenter kernel builds will very likely not include WIRELESS, so let them shave 2 bits off the skb by hiding the wifi fields. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-19netfilter: conntrack: fix wrong ct->timeout valueTzung-Bi Shih
(struct nf_conn)->timeout is an interval before the conntrack confirmed. After confirmed, it becomes a timestamp. It is observed that timeout of an unconfirmed conntrack: - Set by calling ctnetlink_change_timeout(). As a result, `nfct_time_stamp` was wrongly added to `ct->timeout` twice. - Get by calling ctnetlink_dump_timeout(). As a result, `nfct_time_stamp` was wrongly subtracted. Call Trace: <TASK> dump_stack_lvl ctnetlink_dump_timeout __ctnetlink_glue_build ctnetlink_glue_build __nfqnl_enqueue_packet nf_queue nf_hook_slow ip_mc_output ? __pfx_ip_finish_output ip_send_skb ? __pfx_dst_output udp_send_skb udp_sendmsg ? __pfx_ip_generic_getfrag sock_sendmsg Separate the 2 cases in: - Setting `ct->timeout` in __nf_ct_set_timeout(). - Getting `ct->timeout` in ctnetlink_dump_timeout(). Pablo appends: Update ctnetlink to set up the timeout _after_ the IPS_CONFIRMED flag is set on, otherwise conntrack creation via ctnetlink breaks. Note that the problem described in this patch occurs since the introduction of the nfnetlink_queue conntrack support, select a sufficiently old Fixes: tag for -stable kernel to pick up this fix. Fixes: a4b4766c3ceb ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info") Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Unbreak br_netfilter physdev match support, from Florian Westphal. 2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian. 3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal. 4) Fix validation of catch-all set element. 5) Tighten requirements for catch-all set elements. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements netfilter: nf_tables: validate catch-all set elements netfilter: nf_tables: fix ifdef to also consider nf_tables=m netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT netfilter: br_netfilter: fix recent physdev match breakage ==================== Link: https://lore.kernel.org/r/20230418145048.67270-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-18wifi: mac80211: remove ieee80211_tx_status_8023Felix Fietkau
It is unused and should not be used. In order to avoid limitations in 4-address mode, the driver should always use ieee80211_tx_status_ext for 802.3 frames with a valid sta pointer. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230417133751.79160-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-04-18net: add macro netif_subqueue_completed_wakeHeiner Kallweit
Add netif_subqueue_completed_wake, complementing the subqueue versions netif_subqueue_try_stop and netif_subqueue_maybe_stop. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-18netfilter: nf_tables: validate catch-all set elementsPablo Neira Ayuso
catch-all set element might jump/goto to chain that uses expressions that require validation. Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-17sctp: delete the obsolete code for the host name address paramXin Long
In the latest RFC9260, the Host Name Address param has been deprecated. For INIT chunk: Note 3: An INIT chunk MUST NOT contain the Host Name Address parameter. The receiver of an INIT chunk containing a Host Name Address parameter MUST send an ABORT chunk and MAY include an "Unresolvable Address" error cause. For Supported Address Types: The value indicating the Host Name Address parameter MUST NOT be used when sending this parameter and MUST be ignored when receiving this parameter. Currently Linux SCTP doesn't really support Host Name Address param, but only saves some flag and print debug info, which actually won't even be triggered due to the verification in sctp_verify_param(). This patch is to delete those dead code. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-14page_pool: allow caching from safely localized NAPIJakub Kicinski
Recent patches to mlx5 mentioned a regression when moving from driver local page pool to only using the generic page pool code. Page pool has two recycling paths (1) direct one, which runs in safe NAPI context (basically consumer context, so producing can be lockless); and (2) via a ptr_ring, which takes a spin lock because the freeing can happen from any CPU; producer and consumer may run concurrently. Since the page pool code was added, Eric introduced a revised version of deferred skb freeing. TCP skbs are now usually returned to the CPU which allocated them, and freed in softirq context. This places the freeing (producing of pages back to the pool) enticingly close to the allocation (consumer). If we can prove that we're freeing in the same softirq context in which the consumer NAPI will run - lockless use of the cache is perfectly fine, no need for the lock. Let drivers link the page pool to a NAPI instance. If the NAPI instance is scheduled on the same CPU on which we're freeing - place the pages in the direct cache. With that and patched bnxt (XDP enabled to engage the page pool, sigh, bnxt really needs page pool work :() I see a 2.6% perf boost with a TCP stream test (app on a different physical core than softirq). The CPU use of relevant functions decreases as expected: page_pool_refill_alloc_cache 1.17% -> 0% _raw_spin_lock 2.41% -> 0.98% Only consider lockless path to be safe when NAPI is scheduled - in practice this should cover majority if not all of steady state workloads. It's usually the NAPI kicking in that causes the skb flush. The main case we'll miss out on is when application runs on the same CPU as NAPI. In that case we don't use the deferred skb free path. Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>