Age | Commit message (Collapse) | Author |
|
Prepare NFCT to support IPv6 for FTP:
- Do not restrict the expectation callback to PF_INET
- Split the debug messages, so that the 160-byte limitation
in IP_VS_DBG_BUF is not exceeded when printing many IPv6
addresses. This means no more than 3 addresses in one message,
i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This allows us to forward packets from the netdev family via neighbour
layer, so you don't need an explicit link-layer destination when using
this expression from rules. The ttl/hop_limit field is decremented.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before
using nft_trans_set(trans). Otherwise we can get out of bounds read.
For example, KASAN reported the one when running 0001_cache_handling_0 nft
test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE:
[75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356
...
[75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1
[75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2
[75517.618129] Call Trace:
[75517.648821] dump_stack+0xd1/0x13b
[75517.691040] ? show_regs_print_info+0x5/0x5
[75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5
[75517.799300] ? lock_acquire+0x143/0x310
[75517.846738] print_address_description+0x85/0x3a0
[75517.904547] kasan_report+0x18d/0x4b0
[75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables]
[75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables]
[75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables]
[75518.357154] ? nla_parse+0x1a5/0x300
[75518.401455] ? kasan_kmalloc+0xa6/0xd0
[75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
[75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink]
[75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink]
[75518.631711] ? lock_acquire+0x143/0x310
[75518.679133] ? netlink_deliver_tap+0x9b/0x1070
[75518.733840] ? kasan_unpoison_shadow+0x31/0x40
[75518.788542] netlink_unicast+0x45d/0x680
[75518.837111] ? __isolate_free_page+0x890/0x890
[75518.891913] ? netlink_attachskb+0x6b0/0x6b0
[75518.944542] netlink_sendmsg+0x6fa/0xd30
[75518.993107] ? netlink_unicast+0x680/0x680
[75519.043758] ? netlink_unicast+0x680/0x680
[75519.094402] sock_sendmsg+0xd9/0x160
[75519.138810] ___sys_sendmsg+0x64d/0x980
[75519.186234] ? copy_msghdr_from_user+0x350/0x350
[75519.243118] ? lock_downgrade+0x650/0x650
[75519.292738] ? do_raw_spin_unlock+0x5d/0x250
[75519.345456] ? _raw_spin_unlock+0x24/0x30
[75519.395065] ? __handle_mm_fault+0xbde/0x3410
[75519.448830] ? sock_setsockopt+0x3d2/0x1940
[75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0
[75519.558448] ? lock_downgrade+0x650/0x650
[75519.608057] ? __audit_syscall_entry+0x317/0x720
[75519.664960] ? __fget_light+0x58/0x250
[75519.711325] ? __sys_sendmsg+0xde/0x170
[75519.758850] __sys_sendmsg+0xde/0x170
[75519.804193] ? __ia32_sys_shutdown+0x90/0x90
[75519.856725] ? syscall_trace_enter+0x897/0x10e0
[75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920
[75519.979432] ? __audit_syscall_entry+0x720/0x720
[75520.036118] do_syscall_64+0xa3/0x3d0
[75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0
[75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[75520.201680] RIP: 0033:0x7fc153320ba0
[75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0
[75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003
[75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090
[75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660
[75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001
[75520.790946] Allocated by task 7356:
[75520.833994] kasan_kmalloc+0xa6/0xd0
[75520.878088] __kmalloc+0x189/0x450
[75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables]
[75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables]
[75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink]
[75521.108655] netlink_unicast+0x45d/0x680
[75521.157013] netlink_sendmsg+0x6fa/0xd30
[75521.205271] sock_sendmsg+0xd9/0x160
[75521.249365] ___sys_sendmsg+0x64d/0x980
[75521.296686] __sys_sendmsg+0xde/0x170
[75521.341822] do_syscall_64+0xa3/0x3d0
[75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[75521.467867] Freed by task 23454:
[75521.507804] __kasan_slab_free+0x132/0x180
[75521.558137] kfree+0x14d/0x4d0
[75521.596005] free_rt_sched_group+0x153/0x280
[75521.648410] sched_autogroup_create_attach+0x19a/0x520
[75521.711330] ksys_setsid+0x2ba/0x400
[75521.755529] __ia32_sys_setsid+0xa/0x10
[75521.802850] do_syscall_64+0xa3/0x3d0
[75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[75521.929000] The buggy address belongs to the object at ffff881bdb643f80
which belongs to the cache kmalloc-96 of size 96
[75522.079797] The buggy address is located 72 bytes inside of
96-byte region [ffff881bdb643f80, ffff881bdb643fe0)
[75522.221234] The buggy address belongs to the page:
[75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0
[75522.377443] flags: 0x2fffff80000100(slab)
[75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020
[75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000
[75522.615601] page dumped because: kasan: bad access detected
Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The helper and timeout strings are from user-space, we need to make
sure they are null terminated. If not, evil user could make kernel
read the unexpected memory, even print it when fail to find by the
following codes.
pr_info_ratelimited("No such helper \"%s\"\n", helper_name);
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible attrs and adds
sanity-checks at both registration and usage to make sure nothing
gets out of sync.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Some drivers, such as vxlan and wireguard, use the skb's dst in order to
determine things like PMTU. They therefore loose functionality when flow
offloading is enabled. So, we ensure the skb has it before xmit'ing it
in the offloading path.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The following ruleset:
add table ip filter
add chain ip filter input { type filter hook input priority 4; }
add chain ip filter ap
add rule ip filter input jump ap
add rule ip filter ap masquerade
results in a panic, because the masquerade extension should be rejected
from the filter chain. The existing validation is missing a chain
dependency check when the rule is added to the non-base chain.
This patch fixes the problem by walking down the rules from the
basechains, searching for either immediate or lookup expressions, then
jumping to non-base chains and again walking down the rules to perform
the expression validation, so we make sure the full ruleset graph is
validated. This is done only once from the commit phase, in case of
problem, we abort the transaction and perform fine grain validation for
error reporting. This patch requires 003087911af2 ("netfilter:
nfnetlink: allow commit to fail") to achieve this behaviour.
This patch also adds a cleanup callback to nfnl batch interface to reset
the validate state from the exit path.
As a result of this patch, nf_tables_check_loops() doesn't use
->validate to check for loops, instead it just checks for immediate
expressions.
Reported-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This extends log statement to support the behaviour achieved with
AUDIT target in iptables.
Audit logging is enabled via a pseudo log level 8. In this case any
other settings like log prefix are ignored since audit log format is
fixed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Now it can only match the transparent flag of an ip/ipv6 socket.
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
net/netfilter/nft_numgen.c:117:1-3: WARNING: PTR_ERR_OR_ZERO can be used
net/netfilter/nft_hash.c:180:1-3: WARNING: PTR_ERR_OR_ZERO can be used
net/netfilter/nft_hash.c:223:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations")
Fixes: d734a2888922 ("netfilter: nft_numgen: add map lookups for numgen statements")
CC: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch reorders the error cases in showing the XPS configuration so
that we hold off on memory allocation until after we have verified that we
can support XPS on a given ring.
Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the quest to remove all stack VLA usage from the kernel[1], this
allocates the maximum size expected for all possible types and adds
sanity-checks at both registration and usage to make sure nothing gets
out of sync. This matches the proposed VLA solution for nfnetlink[2]. The
values chosen here were based on finding assignments for .maxtype and
.slave_maxtype and manually counting the enums:
slave_maxtype (max 33):
IFLA_BRPORT_MAX 33
IFLA_BOND_SLAVE_MAX 9
maxtype (max 45):
IFLA_BOND_MAX 28
IFLA_BR_MAX 45
__IFLA_CAIF_HSI_MAX 8
IFLA_CAIF_MAX 4
IFLA_CAN_MAX 16
IFLA_GENEVE_MAX 12
IFLA_GRE_MAX 25
IFLA_GTP_MAX 5
IFLA_HSR_MAX 7
IFLA_IPOIB_MAX 4
IFLA_IPTUN_MAX 21
IFLA_IPVLAN_MAX 3
IFLA_MACSEC_MAX 15
IFLA_MACVLAN_MAX 7
IFLA_PPP_MAX 2
__IFLA_RMNET_MAX 4
IFLA_VLAN_MAX 6
IFLA_VRF_MAX 2
IFLA_VTI_MAX 7
IFLA_VXLAN_MAX 28
VETH_INFO_MAX 2
VXCAN_INFO_MAX 2
This additionally changes maxtype and slave_maxtype fields to unsigned,
since they're only ever using positive values.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
[2] https://patchwork.kernel.org/patch/10439647/
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when
parsing a NCSI_CMD_PKG_INFO command:
[ 150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
[ 150.149711]
[ 150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1
[ 150.170004] Hardware name: Generic DT based system
[ 150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[ 150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28)
[ 150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278)
[ 150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24)
[ 150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258)
[ 150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54)
[ 150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284)
[ 150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c)
[ 150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4)
[ 150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134)
[ 150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44)
[ 150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234)
[ 150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0)
[ 150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34)
[ 150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260)
[ 150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c)
[ 150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c)
[ 150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c)
[ 150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08
This turns out to be because the attrs array in ncsi_pkg_info_all_nl()
is initialised to a length of NCSI_ATTR_MAX which is the maximum
attribute number, not the number of attributes.
Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family")
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we fail to modify a rule, we incorrectly release the idr handle
of the unmodified old rule.
Fix that by checking if we need to release it.
Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure")
Reported-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A driver might need to react to changes in settings of brentry VLANs.
Therefore send switchdev port notifications for these as well. Reuse
SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use
netif_is_bridge_master() on orig_dev to determine whether the
notification is about a bridge port or a bridge.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.
Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extract the code that deals with adding a preexisting VLAN to bridge CPU
port to a separate function. A follow-up patch introduces a need to roll
back operations in this block due to an error, and this split will make
the error-handling code clearer.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves
initializing a struct switchdev_obj_port_vlan, a piece of code that
repeats on each call site almost verbatim. While in the current codebase
there is just one duplicated add call, the follow-up patches add more of
both add and del calls.
Thus to remove the duplication, extract the repetition into named
functions and reuse.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Checking netif_xmit_frozen_or_stopped() at the end of sch_direct_xmit()
is being bypassed. This is because "ret" from sch_direct_xmit() will be
either NETDEV_TX_OK or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0
will reach the condition:
if (ret && netif_xmit_frozen_or_stopped(txq))
return false;
This patch cleans up the code by removing the whole condition.
For more discussion about this, please refer to
https://marc.info/?t=152727195700008
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is additional to the
commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage").
At this point, skb->data is same with tcp_hdr() as tcp header has not
been pulled yet. So use the less expensive one to get the tcp header.
Remove the third parameter of tcp_rcv_established() and put it into
the function body.
Furthermore, the local variables are listed as a reverse christmas tree :)
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We may derference an invalid pointer in the error path of
xfrm_bundle_create(). Fix this by returning this error
pointer directly instead of assigning it to xdst0.
Fixes: 45b018beddb6 ("ipsec: Create and use new helpers for dst child access.")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
families
Update bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address
families. Allows userspace to probe for support as more are added
(e.g., AF_MPLS).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Re-use kstrtobool_from_user() instead of open coded variant.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Verify flags argument contains only known flags. Allows programs to probe
for support as more are added.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The local variable is only used while CONFIG_IPV6 enabled
net/core/filter.c: In function ‘sk_msg_convert_ctx_access’:
net/core/filter.c:6489:6: warning: unused variable ‘off’ [-Wunused-variable]
int off;
^
This puts it into #ifdef.
Fixes: 303def35f64e ("bpf: allow sk_msg programs to read sock fields")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
gcc-7.3.0 report following err:
HOSTCC net/bpfilter/main.o
In file included from net/bpfilter/main.c:9:0:
./include/uapi/linux/bpf.h:12:10: fatal error: linux/bpf_common.h: No such file or directory
#include <linux/bpf_common.h>
remove it by adding a include path.
Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for IFA_RT_PRIORITY to ipv6 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be atomically replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for IFA_RT_PRIORITY to ipv4 addresses.
If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be replaced.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update inet6_addr_modify to take ifa6_config argument versus a parameter
list. This is an argument move only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the creation of struct ifa6_config up to callers of inet6_addr_add.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move config parameters for adding an ipv6 address to a struct. struct
names stem from inet6_rtm_newaddr which is the modern handler for
adding an address.
Start the conversion to ifa6_config with ipv6_add_addr. This is an argument
move only; no functional change intended. Mapping of variable changes:
addr --> cfg->pfx
peer_addr --> cfg->peer_pfx
pfxlen --> cfg->plen
flags --> cfg->ifa_flags
scope, valid_lft, prefered_lft have the same names within cfg
(with corrected spelling).
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the message be freed immediately, no need to trim it
back to the previous size.
Inspired by commit 7a9b3ec1e19f ("nl80211: remove unnecessary genlmsg_cancel() calls")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes the following sparse warnings:
net/ipv4/bpfilter/sockopt.c:13:5: warning:
symbol 'bpfilter_mbox_request' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MQ doesn't hold any statistics on its own, however, statistic
from offloads are requested starting from the root, hence MQ
will read the old values for its sums. Call into the drivers,
because of the additive nature of the stats drivers are aware
of how much "pending updates" they have to children of the MQ.
Since MQ reset its stats on every dump we can simply offset
the stats, predicting how stats of offloaded children will
change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mq offload is trivial, we just need to let the device know
that the root qdisc is mq. Alternative approach would be
to export qdisc_lookup() and make drivers check the root
type themselves, but notification via ndo_setup_tc is more
in line with other qdiscs.
Note that mq doesn't hold any stats on it's own, it just
adds up stats of its children.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The comment and trace_loginfo are not used anymore.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We can make all dumps and lookups lockless.
Dumps currently only hold the nfnl mutex on the dump request itself.
Dumps can span multiple syscalls, dump continuation doesn't acquire the
nfnl mutex anywhere, i.e. the dump callbacks in nf_tables already use
rcu and never rely on nfnl mutex being held.
So, just switch all dumpers to rcu.
This requires taking a module reference before dropping the rcu lock
so rmmod is blocked, we also need to hold module reference over
the entire dump operation sequence. netlink already supports this
via the .module member in the netlink_dump_control struct.
For the non-dump case (i.e. lookup of a specific tables, chains, etc),
we need to swtich to _rcu list iteration primitive and make sure we
use GFP_ATOMIC.
This patch also adds the new nft_netlink_dump_start_rcu() helper that
takes care of the get_ref, drop-rcu-lock,start dump,
get-rcu-lock,put-ref sequence.
The helper will be reused for all dumps.
Rationale in all dump requests is:
- use the nft_netlink_dump_start_rcu helper added in first patch
- use GFP_ATOMIC and rcu list iteration
- switch to .call_rcu
... thus making all dumps in nf_tables not depend on the
nfnl mutex anymore.
In the nf_tables_getgen: This callback just fetches the current base
sequence, there is no need to serialize this with nfnl nft mutex.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
abort batch processing and return so task can exit faster.
Otherwise even SIGKILL has no immediate effect.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
harmless, but it avoids sparse warnings:
nf_tables_api.c:2813:16: warning: incorrect type in return expression (different base types)
nf_tables_api.c:2863:47: warning: incorrect type in argument 3 (different base types)
nf_tables_api.c:3524:47: warning: incorrect type in argument 3 (different base types)
nf_tables_api.c:3538:55: warning: incorrect type in argument 3 (different base types)
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Just use .call_rcu instead. We can drop the rcu read lock
after obtaining a reference and re-acquire on return.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Fixes the following sparse warning:
net/netfilter/nf_nat_core.c:1039:20: warning:
symbol 'nat_hook' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
synchronize_rcu() is expensive.
The commit phase currently enforces an unconditional
synchronize_rcu() after incrementing the generation counter.
This is to make sure that a packet always sees a consistent chain, either
nft_do_chain is still using old generation (it will skip the newly added
rules), or the new one (it will skip old ones that might still be linked
into the list).
We could just remove the synchronize_rcu(), it would not cause a crash but
it could cause us to evaluate a rule that was removed and new rule for the
same packet, instead of either-or.
To resolve this, add rule pointer array holding two generations, the
current one and the future generation.
In commit phase, allocate the rule blob and populate it with the rules that
will be active in the new generation.
Then, make this rule blob public, replacing the old generation pointer.
Then the generation counter can be incremented.
nft_do_chain() will either continue to use the current generation
(in case loop was invoked right before increment), or the new one.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
bpfilter_process_sockopt is a callback that gets called from
ip_setsockopt() and ip_getsockopt(). However, when CONFIG_INET is
disabled, it never gets called at all, and assigning a function to the
callback pointer results in a link failure:
net/bpfilter/bpfilter_kern.o: In function `__stop_umh':
bpfilter_kern.c:(.text.unlikely+0x3): undefined reference to `bpfilter_process_sockopt'
net/bpfilter/bpfilter_kern.o: In function `load_umh':
bpfilter_kern.c:(.init.text+0x73): undefined reference to `bpfilter_process_sockopt'
Since there is no caller in this configuration, I assume we can
simply make the assignment conditional.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
seg6_do_srh_encap and seg6_do_srh_inline can possibly do an
out-of-bounds access when adding the SRH to the packet. This no longer
happen when expanding the skb not only by the size of the SRH (+
outer IPv6 header), but also by skb->mac_len.
[ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620
[ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674
[ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90
[ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 53.796673] Call Trace:
[ 53.796679] <IRQ>
[ 53.796689] dump_stack+0x71/0xab
[ 53.796700] print_address_description+0x6a/0x270
[ 53.796707] kasan_report+0x258/0x380
[ 53.796715] ? seg6_do_srh_encap+0x284/0x620
[ 53.796722] memmove+0x34/0x50
[ 53.796730] seg6_do_srh_encap+0x284/0x620
[ 53.796741] ? seg6_do_srh+0x29b/0x360
[ 53.796747] seg6_do_srh+0x29b/0x360
[ 53.796756] seg6_input+0x2e/0x2e0
[ 53.796765] lwtunnel_input+0x93/0xd0
[ 53.796774] ipv6_rcv+0x690/0x920
[ 53.796783] ? ip6_input+0x170/0x170
[ 53.796791] ? eth_gro_receive+0x2d0/0x2d0
[ 53.796800] ? ip6_input+0x170/0x170
[ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0
[ 53.796820] ? netdev_info+0x110/0x110
[ 53.796827] ? napi_complete_done+0xb6/0x170
[ 53.796834] ? e1000_clean+0x6da/0xf70
[ 53.796845] ? process_backlog+0x129/0x2a0
[ 53.796853] process_backlog+0x129/0x2a0
[ 53.796862] net_rx_action+0x211/0x5c0
[ 53.796870] ? napi_complete_done+0x170/0x170
[ 53.796887] ? run_rebalance_domains+0x11f/0x150
[ 53.796891] __do_softirq+0x10e/0x39e
[ 53.796894] do_softirq_own_stack+0x2a/0x40
[ 53.796895] </IRQ>
[ 53.796898] do_softirq.part.16+0x54/0x60
[ 53.796900] __local_bh_enable_ip+0x5b/0x60
[ 53.796903] ip6_finish_output2+0x416/0x9f0
[ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110
[ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390
[ 53.796911] ? __rcu_read_unlock+0x66/0x80
[ 53.796913] ? ip6_mtu+0x44/0xf0
[ 53.796916] ? ip6_output+0xfc/0x220
[ 53.796918] ip6_output+0xfc/0x220
[ 53.796921] ? ip6_finish_output+0x2b0/0x2b0
[ 53.796923] ? memcpy+0x34/0x50
[ 53.796926] ip6_send_skb+0x43/0xc0
[ 53.796929] rawv6_sendmsg+0x1216/0x1530
[ 53.796932] ? __orc_find+0x6b/0xc0
[ 53.796934] ? rawv6_rcv_skb+0x160/0x160
[ 53.796937] ? __rcu_read_unlock+0x66/0x80
[ 53.796939] ? __rcu_read_unlock+0x66/0x80
[ 53.796942] ? is_bpf_text_address+0x1e/0x30
[ 53.796944] ? kernel_text_address+0xec/0x100
[ 53.796946] ? __kernel_text_address+0xe/0x30
[ 53.796948] ? unwind_get_return_address+0x2f/0x50
[ 53.796950] ? __save_stack_trace+0x92/0x100
[ 53.796954] ? save_stack+0x89/0xb0
[ 53.796956] ? kasan_kmalloc+0xa0/0xd0
[ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0
[ 53.796961] ? prepare_creds+0x23/0x160
[ 53.796963] ? __x64_sys_capset+0x252/0x3e0
[ 53.796966] ? do_syscall_64+0x69/0x160
[ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.796971] ? __alloc_pages_nodemask+0x170/0x380
[ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0
[ 53.796977] ? tty_vhangup+0x20/0x20
[ 53.796979] ? policy_nodemask+0x1a/0x90
[ 53.796982] ? __mod_node_page_state+0x8d/0xa0
[ 53.796986] ? __check_object_size+0xe7/0x240
[ 53.796989] ? __sys_sendto+0x229/0x290
[ 53.796991] ? rawv6_rcv_skb+0x160/0x160
[ 53.796993] __sys_sendto+0x229/0x290
[ 53.796996] ? __ia32_sys_getpeername+0x50/0x50
[ 53.796999] ? commit_creds+0x2de/0x520
[ 53.797002] ? security_capset+0x57/0x70
[ 53.797004] ? __x64_sys_capset+0x29f/0x3e0
[ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0
[ 53.797011] ? __do_page_fault+0x664/0x770
[ 53.797014] __x64_sys_sendto+0x74/0x90
[ 53.797017] do_syscall_64+0x69/0x160
[ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797022] RIP: 0033:0x7f43b7a6714a
[ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
[ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a
[ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004
[ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c
[ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661
[ 53.797171] Allocated by task 642:
[ 53.797460] kasan_kmalloc+0xa0/0xd0
[ 53.797463] kmem_cache_alloc+0xd2/0x1f0
[ 53.797465] getname_flags+0x40/0x210
[ 53.797467] user_path_at_empty+0x1d/0x40
[ 53.797469] do_faccessat+0x12a/0x320
[ 53.797471] do_syscall_64+0x69/0x160
[ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.797607] Freed by task 642:
[ 53.797869] __kasan_slab_free+0x130/0x180
[ 53.797871] kmem_cache_free+0xa8/0x230
[ 53.797872] filename_lookup+0x15b/0x230
[ 53.797874] do_faccessat+0x12a/0x320
[ 53.797876] do_syscall_64+0x69/0x160
[ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.798014] The buggy address belongs to the object at ffff88011975e600
which belongs to the cache names_cache of size 4096
[ 53.799043] The buggy address is located 1786 bytes inside of
4096-byte region [ffff88011975e600, ffff88011975f600)
[ 53.800013] The buggy address belongs to the page:
[ 53.800414] page:ffffea000465d600 count:1 mapcount:0
mapping:0000000000000000 index:0x0 compound_mapcount: 0
[ 53.801259] flags: 0x17fff0000008100(slab|head)
[ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000
0000000100070007
[ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40
0000000000000000
[ 53.803787] page dumped because: kasan: bad access detected
[ 53.804384] Memory state around the buggy address:
[ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.806577] ^
[ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 53.808356] ==================================================================
[ 53.808949] Disabling lock debugging due to kernel taint
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: David Lebrun <dlebrun@google.com>
Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The failover module provides a generic interface for paravirtual drivers
to register a netdev and a set of ops with a failover instance. The ops
are used as event handlers that get called to handle netdev register/
unregister/link change/name change events on slave pci ethernet devices
with the same mac address as the failover netdev.
This enables paravirtual drivers to use a VF as an accelerated low latency
datapath. It also allows migration of VMs with direct attached VFs by
failing over to the paravirtual datapath when the VF is unplugged.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Null pointer dereference when dumping conntrack helper configuration,
from Taehee Yoo.
2) Missing sanitization in ebtables extension name through compat,
from Paolo Abeni.
3) Broken fetch of tracing value, from Taehee Yoo.
4) Incorrect arithmetics in packet ratelimiting.
5) Buffer overflow in IPVS sync daemon, from Julian Anastasov.
6) Wrong argument to nla_strlcpy() in nfnetlink_{acct,cthelper},
from Eric Dumazet.
7) Fix splat in nft_update_chain_stats().
8) Null pointer dereference from object netlink dump path, from
Taehee Yoo.
9) Missing static_branch_inc() when enabling counters in existing
chain, from Taehee Yoo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
->commit() cannot fail at the moment.
Followup-patch adds kmalloc calls in the commit phase, so we'll need
to be able to handle errors.
Make it so that -EGAIN causes a full replay, and make other errors
cause the transaction to fail.
Failing is ok from a consistency point of view as long as we
perform all actions that could return an error before
we increment the generation counter and the base seq.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Similar to previous patch, this time, merge redirect+nat.
The redirect module is just 2k in size, get rid of it and make
redirect part available from the nat core.
before:
text data bss dec hex filename
19461 1484 4138 25083 61fb net/netfilter/nf_nat.ko
1236 792 0 2028 7ec net/netfilter/nf_nat_redirect.ko
after:
20340 1508 4138 25986 6582 net/netfilter/nf_nat.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Instead of using extra modules for these, turn the config options into
an implicit dependency that adds masq feature to the protocol specific nf_nat module.
before:
text data bss dec hex filename
2001 860 4 2865 b31 net/ipv4/netfilter/nf_nat_masquerade_ipv4.ko
5579 780 2 6361 18d9 net/ipv4/netfilter/nf_nat_ipv4.ko
2860 836 8 3704 e78 net/ipv6/netfilter/nf_nat_masquerade_ipv6.ko
6648 780 2 7430 1d06 net/ipv6/netfilter/nf_nat_ipv6.ko
after:
text data bss dec hex filename
7245 872 8 8125 1fbd net/ipv4/netfilter/nf_nat_ipv4.ko
9165 848 12 10025 2729 net/ipv6/netfilter/nf_nat_ipv6.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When a chain is updated, a counter can be attached. if so,
the nft_counters_enabled should be increased.
test commands:
%nft add table ip filter
%nft add chain ip filter input { type filter hook input priority 4\; }
%iptables-compat -Z input
%nft delete chain ip filter input
we can see below messages.
[ 286.443720] jump label: negative count!
[ 286.448278] WARNING: CPU: 0 PID: 1459 at kernel/jump_label.c:197 __static_key_slow_dec_cpuslocked+0x6f/0xf0
[ 286.449144] Modules linked in: nf_tables nfnetlink ip_tables x_tables
[ 286.449144] CPU: 0 PID: 1459 Comm: nft Tainted: G W 4.17.0-rc2+ #12
[ 286.449144] RIP: 0010:__static_key_slow_dec_cpuslocked+0x6f/0xf0
[ 286.449144] RSP: 0018:ffff88010e5176f0 EFLAGS: 00010286
[ 286.449144] RAX: 000000000000001b RBX: ffffffffc0179500 RCX: ffffffffb8a82522
[ 286.449144] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88011b7e5eac
[ 286.449144] RBP: 0000000000000000 R08: ffffed00236fce5c R09: ffffed00236fce5b
[ 286.449144] R10: ffffffffc0179503 R11: ffffed00236fce5c R12: 0000000000000000
[ 286.449144] R13: ffff88011a28e448 R14: ffff88011a28e470 R15: dffffc0000000000
[ 286.449144] FS: 00007f0384328700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
[ 286.449144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 286.449144] CR2: 00007f038394bf10 CR3: 0000000104a86000 CR4: 00000000001006f0
[ 286.449144] Call Trace:
[ 286.449144] static_key_slow_dec+0x6a/0x70
[ 286.449144] nf_tables_chain_destroy+0x19d/0x210 [nf_tables]
[ 286.449144] nf_tables_commit+0x1891/0x1c50 [nf_tables]
[ 286.449144] nfnetlink_rcv+0x1148/0x13d0 [nfnetlink]
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|