summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-17usb: core: hub: Disable autosuspend for Cypress CY7C65632Andrew Lunn
The Cypress CY7C65632 appears to have an issue with auto suspend and detecting devices, not too dissimilar to the SMSC 5534B hub. It is easiest to reproduce by connecting multiple mass storage devices to the hub at the same time. On a Lenovo Yoga, around 1 in 3 attempts result in the devices not being detected. It is however possible to make them appear using lsusb -v. Disabling autosuspend for this hub resolves the issue. Fixes: 1208f9e1d758 ("USB: hub: Fix the broken detection of USB3 device in SMSC hub") Cc: stable@vger.kernel.org Signed-off-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20210614155524.2228800-1-andrew@lunn.ch Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-17selftests/bpf: Fix selftests build with old system-wide headersAndrii Nakryiko
migrate_reuseport.c selftest relies on having TCP_FASTOPEN_CONNECT defined in system-wide netinet/tcp.h. Selftests can use up-to-date uapi/linux/tcp.h, but that one doesn't have SOL_TCP. So instead of switching everything to uapi header, add #define for TCP_FASTOPEN_CONNECT to fix the build. Fixes: c9d0bdef89a6 ("bpf: Test BPF_SK_REUSEPORT_SELECT_OR_MIGRATE.") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20210617041446.425283-1-andrii@kernel.org
2021-06-17bpf: Fix up register-based shifts in interpreter to silence KUBSANDaniel Borkmann
syzbot reported a shift-out-of-bounds that KUBSAN observed in the interpreter: [...] UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2 shift exponent 255 is too large for 64-bit type 'long long unsigned int' CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327 ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420 __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735 bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline] bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline] bpf_prog_run_clear_cb include/linux/filter.h:755 [inline] run_filter+0x1a1/0x470 net/packet/af_packet.c:2031 packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104 dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387 xmit_one net/core/dev.c:3588 [inline] dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609 __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182 __bpf_tx_skb net/core/filter.c:2116 [inline] __bpf_redirect_no_mac net/core/filter.c:2141 [inline] __bpf_redirect+0x548/0xc80 net/core/filter.c:2164 ____bpf_clone_redirect net/core/filter.c:2448 [inline] bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420 ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523 __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737 bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline] bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50 bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582 bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline] __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Generally speaking, KUBSAN reports from the kernel should be fixed. However, in case of BPF, this particular report caused concerns since the large shift is not wrong from BPF point of view, just undefined. In the verifier, K-based shifts that are >= {64,32} (depending on the bitwidth of the instruction) are already rejected. The register-based cases were not given their content might not be known at verification time. Ideas such as verifier instruction rewrite with an additional AND instruction for the source register were brought up, but regularly rejected due to the additional runtime overhead they incur. As Edward Cree rightly put it: Shifts by more than insn bitness are legal in the BPF ISA; they are implementation-defined behaviour [of the underlying architecture], rather than UB, and have been made legal for performance reasons. Each of the JIT backends compiles the BPF shift operations to machine instructions which produce implementation-defined results in such a case; the resulting contents of the register may be arbitrary but program behaviour as a whole remains defined. Guard checks in the fast path (i.e. affecting JITted code) will thus not be accepted. The case of division by zero is not truly analogous here, as division instructions on many of the JIT-targeted architectures will raise a machine exception / fault on division by zero, whereas (to the best of my knowledge) none will do so on an out-of-bounds shift. Given the KUBSAN report only affects the BPF interpreter, but not JITs, one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run(). That would make the shifts defined, and thus shuts up KUBSAN, and the compiler would optimize out the AND on any CPU that interprets the shift amounts modulo the width anyway (e.g., confirmed from disassembly that on x86-64 and arm64 the generated interpreter code is the same before and after this fix). The BPF interpreter is slow path, and most likely compiled out anyway as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of BPF instructions by the interpreter. Given the main argument was to avoid sacrificing performance, the fact that the AND is optimized away from compiler for mainstream archs helps as well as a solution moving forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors to provide guidance when they see the ___bpf_prog_run() interpreter code and use it as a model for a new JIT backend. Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com Reported-by: Kurt Manucredo <fuzzybritches0@gmail.com> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Co-developed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com Cc: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com
2021-06-17btrfs: zoned: fix negative space_info->bytes_readonlyNaohiro Aota
Consider we have a using block group on zoned btrfs. |<- ZU ->|<- used ->|<---free--->| `- Alloc offset ZU: Zone unusable Marking the block group read-only will migrate the zone unusable bytes to the read-only bytes. So, we will have this. |<- RO ->|<- used ->|<--- RO --->| RO: Read only When marking it back to read-write, btrfs_dec_block_group_ro() subtracts the above "RO" bytes from the space_info->bytes_readonly. And, it moves the zone unusable bytes back and again subtracts those bytes from the space_info->bytes_readonly, leading to negative bytes_readonly. This can be observed in the output as eg.: Data, single: total=512.00MiB, used=165.21MiB, zone_unusable=16.00EiB Data, single: total=536870912, used=173256704, zone_unusable=18446744073603186688 This commit fixes the issue by reordering the operations. Link: https://github.com/naota/linux/issues/37 Reported-by: David Sterba <dsterba@suse.com> Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-16libbpf: Fail compilation if target arch is missingLorenz Bauer
bpf2go is the Go equivalent of libbpf skeleton. The convention is that the compiled BPF is checked into the repository to facilitate distributing BPF as part of Go packages. To make this portable, bpf2go by default generates both bpfel and bpfeb variants of the C. Using bpf_tracing.h is inherently non-portable since the fields of struct pt_regs differ between platforms, so CO-RE can't help us here. The only way of working around this is to compile for each target platform independently. bpf2go can't do this by default since there are too many platforms. Define the various PT_... macros when no target can be determined and turn them into compilation failures. This works because bpf2go always compiles for bpf targets, so the compiler fallback doesn't kick in. Conditionally define __BPF_MISSING_TARGET so that we can inject a more appropriate error message at build time. The user can then choose which platform to target explicitly. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210616083635.11434-1-lmb@cloudflare.com
2021-06-16samples/bpf: Add missing option to xdp_sample_pkts usageWang Hai
xdp_sample_pkts usage() is missing the introduction of the "-S" option, this patch adds it. Fixes: d50ecc46d18f ("samples/bpf: Attach XDP programs in driver mode by default") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210615135724.29528-1-wanghai38@huawei.com
2021-06-16samples/bpf: Add missing option to xdp_fwd usageWang Hai
xdp_fwd usage() is missing the introduction of the "-S" and "-F" options, this patch adds it. Fixes: d50ecc46d18f ("samples/bpf: Attach XDP programs in driver mode by default") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210615135554.29158-1-wanghai38@huawei.com
2021-06-16bpf: Fix typo in kernel/bpf/bpf_lsm.cShuyi Cheng
Fix s/sleeable/sleepable/ typo in a comment. Signed-off-by: Shuyi Cheng <chengshuyi@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1623809076-97907-1-git-send-email-chengshuyi@linux.alibaba.com
2021-06-16selftests/bpf: Whitelist test_progs.h from .gitignoreDaniel Xu
Somehow test_progs.h was being included by the existing rule: /test_progs* This is bad because: 1) test_progs.h is a checked in file 2) grep-like tools like ripgrep[0] respect gitignore and test_progs.h was being hidden from searches [0]: https://github.com/BurntSushi/ripgrep Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/a46f64944bf678bc652410ca6028d3450f4f7f4b.1623880296.git.dxu@dxuuu.xyz
2021-06-16net/mlx5: Reset mkey index on creationAya Levin
Reset only the index part of the mkey and keep the variant part. On devlink reload, driver recreates mkeys, so the mkey index may change. Trying to preserve the variant part of the mkey, driver mistakenly merged the mkey index with current value. In case of a devlink reload, current value of index part is dirty, so the index may be corrupted. Fixes: 54c62e13ad76 ("{IB,net}/mlx5: Setup mkey variant before mr create command invocation") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5e: Don't create devices during unload flowDmytro Linkin
Running devlink reload command for port in switchdev mode cause resources to corrupt: driver can't release allocated EQ and reclaim memory pages, because "rdma" auxiliary device had add CQs which blocks EQ from deletion. Erroneous sequence happens during reload-down phase, and is following: 1. detach device - suspends auxiliary devices which support it, destroys others. During this step "eth-rep" and "rdma-rep" are destroyed, "eth" - suspended. 2. disable SRIOV - moves device to legacy mode; as part of disablement - rescans drivers. This step adds "rdma" auxiliary device. 3. destroy EQ table - <failure>. Driver shouldn't create any device during unload flows. To handle that implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset on device attach. If flag is set do no-op on drivers rescan. Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5: DR, Fix STEv1 incorrect L3 decapsulation paddingAlex Vesker
Decapsulation L3 on small inner packets which are less than 64 Bytes was done incorrectly. In small packets there is an extra padding added in L2 which should not be included in L3 length. The issue was that after decapL3 the extra L2 padding caused an update on the L3 length. To avoid this issue the new header is pushed to the beginning of the packet (offset 0) which should not cause a HW reparse and update the L3 length. Fixes: c349b4137cfd ("net/mlx5: DR, Add STEv1 modify header logic") Reviewed-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5: SF_DEV, remove SF device on invalid stateParav Pandit
When auxiliary bus autoprobe is disabled and SF is in ACTIVE state, on SF port deletion it transitions from ACTIVE->ALLOCATED->INVALID. When VHCA event handler queries the state, it is already transition to INVALID state. In this scenario, event handler missed to delete the SF device. Fix it by deleting the SF when SF state is INVALID. Fixes: 90d010b8634b ("net/mlx5: SF, Add auxiliary device support") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5: E-Switch, Allow setting GUID for host PF vportParav Pandit
E-switch should be able to set the GUID of host PF vport. Currently it returns an error. This results in below error when user attempts to configure MAC address of the PF of an external controller. $ devlink port function set pci/0000:03:00.0/196608 \ hw_addr 00:00:00:11:22:33 mlx5_core 0000:03:00.0: mlx5_esw_set_vport_mac_locked:1876:(pid 6715):\ "Failed to set vport 0 node guid, err = -22. RDMA_CM will not function properly for this VF." Check for zero vport is no longer needed. Fixes: 330077d14de1 ("net/mlx5: E-switch, Supporting setting devlink port function mac address") Signed-off-by: Yuval Avnery <yuvalav@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5: E-Switch, Read PF mac addressParav Pandit
External controller PF's MAC address is not read from the device during vport setup. Fail to read this results in showing all zeros to user while the factory programmed MAC is a valid value. $ devlink port show eth1 -jp { "port": { "pci/0000:03:00.0/196608": { "type": "eth", "netdev": "eth1", "flavour": "pcipf", "controller": 1, "pfnum": 0, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00" } } } } Hence, read it when enabling a vport. After the fix, $ devlink port show eth1 -jp { "port": { "pci/0000:03:00.0/196608": { "type": "eth", "netdev": "eth1", "flavour": "pcipf", "controller": 1, "pfnum": 0, "splittable": false, "function": { "hw_addr": "98:03:9b:a0:60:11" } } } } Fixes: f099fde16db3 ("net/mlx5: E-switch, Support querying port function mac address") Signed-off-by: Bodong Wang <bodong@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5: Check that driver was probed prior attaching the deviceLeon Romanovsky
The device can be requested to be attached despite being not probed. This situation is possible if devlink reload races with module removal, and the following kernel panic is an outcome of such race. mlx5_core 0000:00:09.0: firmware version: 4.7.9999 mlx5_core 0000:00:09.0: 0.000 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x255 link) BUG: unable to handle page fault for address: fffffffffffffff0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3218067 P4D 3218067 PUD 321a067 PMD 0 Oops: 0000 [#1] SMP KASAN NOPTI CPU: 7 PID: 250 Comm: devlink Not tainted 5.12.0-rc2+ #2836 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5_attach_device+0x80/0x280 [mlx5_core] Code: f8 48 c1 e8 03 42 80 3c 38 00 0f 85 80 01 00 00 48 8b 45 68 48 8d 78 f0 48 89 fe 48 c1 ee 03 42 80 3c 3e 00 0f 85 70 01 00 00 <48> 8b 40 f0 48 85 c0 74 0d 48 89 ef ff d0 85 c0 0f 85 84 05 0e 00 RSP: 0018:ffff8880129675f0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff827407f1 RDX: 1ffff110011336cf RSI: 1ffffffffffffffe RDI: fffffffffffffff0 RBP: ffff888008e0c000 R08: 0000000000000008 R09: ffffffffa0662ee7 R10: fffffbfff40cc5dc R11: 0000000000000000 R12: ffff88800ea002e0 R13: ffffed1001d459f7 R14: ffffffffa05ef4f8 R15: dffffc0000000000 FS: 00007f51dfeaf740(0000) GS:ffff88806d5c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffff0 CR3: 000000000bc82006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_load_one+0x117/0x1d0 [mlx5_core] devlink_reload+0x2d5/0x520 ? devlink_remote_reload_actions_performed+0x30/0x30 ? mutex_trylock+0x24b/0x2d0 ? devlink_nl_cmd_reload+0x62b/0x1070 devlink_nl_cmd_reload+0x66d/0x1070 ? devlink_reload+0x520/0x520 ? devlink_nl_pre_doit+0x64/0x4d0 genl_family_rcv_msg_doit+0x1e9/0x2f0 ? mutex_lock_io_nested+0x1130/0x1130 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? security_capable+0x51/0x90 genl_rcv_msg+0x27f/0x4a0 ? genl_get_cmd+0x3c0/0x3c0 ? lock_acquire+0x1a9/0x6d0 ? devlink_reload+0x520/0x520 ? lock_release+0x6c0/0x6c0 netlink_rcv_skb+0x11d/0x340 ? genl_get_cmd+0x3c0/0x3c0 ? netlink_ack+0x9f0/0x9f0 ? lock_release+0x1f9/0x6c0 genl_rcv+0x24/0x40 netlink_unicast+0x433/0x700 ? netlink_attachskb+0x730/0x730 ? _copy_from_iter_full+0x178/0x650 ? __alloc_skb+0x113/0x2b0 netlink_sendmsg+0x6f1/0xbd0 ? netlink_unicast+0x700/0x700 ? netlink_unicast+0x700/0x700 sock_sendmsg+0xb0/0xe0 __sys_sendto+0x193/0x240 ? __x64_sys_getpeername+0xb0/0xb0 ? copy_page_range+0x2300/0x2300 ? __up_read+0x1a1/0x7b0 ? do_user_addr_fault+0x219/0xdc0 __x64_sys_sendto+0xdd/0x1b0 ? syscall_enter_from_user_mode+0x1d/0x50 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f51dffb514a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007ffcaef22e78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f51dffb514a RDX: 0000000000000030 RSI: 000055750daf2440 RDI: 0000000000000003 RBP: 000055750daf2410 R08: 00007f51e0081200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: mlx5_core(-) ptp pps_core ib_ipoib rdma_ucm rdma_cm iw_cm ib_cm ib_umad ib_uverbs ib_core [last unloaded: mlx5_ib] CR2: fffffffffffffff0 ---[ end trace 7789831bfe74fa42 ]--- Fixes: a925b5e309c9 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16net/mlx5: Fix error path for set HCA defaultsLeon Romanovsky
In the case of the failure to execute mlx5_core_set_hca_defaults(), we used wrong goto label to execute error unwind flow. Fixes: 5bef709d76a2 ("net/mlx5: Enable host PF HCA after eswitch is initialized") Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell.Yifan Zhang
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-06-16drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue.Yifan Zhang
If GC has entered CGPG, ringing doorbell > first page doesn't wakeup GC. Enlarge CP_MEC_DOORBELL_RANGE_UPPER to workaround this issue. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-06-16r8169: Avoid memcpy() over-reading of ETH_SS_STATSKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally reading across neighboring array fields. The memcpy() is copying the entire structure, not just the first array. Adjust the source argument so the compiler can do appropriate bounds checking. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16sh_eth: Avoid memcpy() over-reading of ETH_SS_STATSKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally reading across neighboring array fields. The memcpy() is copying the entire structure, not just the first array. Adjust the source argument so the compiler can do appropriate bounds checking. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16r8152: Avoid memcpy() over-reading of ETH_SS_STATSKees Cook
In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally reading across neighboring array fields. The memcpy() is copying the entire structure, not just the first array. Adjust the source argument so the compiler can do appropriate bounds checking. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16Merge tag 'wireless-drivers-next-2021-06-16' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.14 First set of patches for v5.14. Major new features are here support WCN6855 PCI in ath11k and WoWLAN support for wcn36xx. Also smaller fixes and cleanups all over. ath9k * provide STBC info in the received frames brcmfmac * fix setting of station info chains bitmask * correctly report average RSSI in station info rsi * support for changing beacon interval in AP mode ath11k * support for WCN6855 PCI hardware wcn36xx * WoWLAN support with magic packets and GTK rekeying ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16Merge branch 'marvell-prestera-flower-match-all'David S. Miller
Vadym Kochan says: ==================== Marvell Prestera add flower and match all support Add ACL infrastructure for Prestera Switch ASICs family devices to offload cls_flower rules to be processed in the HW. ACL implementation is based on tc filter api. The flower classifier is supported to configure ACL rules/matches/action. Supported actions: - drop - trap - pass Supported dissector keys: - indev - src_mac - dst_mac - src_ip - dst_ip - ip_proto - src_port - dst_port - vlan_id - vlan_ethtype - icmp type/code - Introduce matchall filter support - Add SPAN API to configure port mirroring. - Add tc mirror action. At this moment, only mirror (egress) action is supported. Example: tc filter ... action mirred egress mirror dev DEV v2: Fixed "newline at EOF warnings" from "git am" by re-applying with --whitespace=fix patch #1: 1) Set TC HW Offload always enabled without disable it [suggested by Vladimir Oltean] by user. It reduced the logic by removing feature handling and acl block disable counting. patch #2: 1) Removed extra not needed diff with prestera_port and [suggested by Vladimir Oltean] prestera_switch lines exchanging in prestera_acl.h 2) Fix local variables ordering to reverse chrostmas tree [suggested by Vladimir Oltean] 3) Use tc_cls_can_offload_and_chain0() in [suggested by Vladimir Oltean] prestera_span_replace() 4) Removed TODO about prio check [suggested by Vladimir Oltean] 5) Rephrase error message if prestera_netdev_check() [suggested by Vladimir Oltean] fails in prestera_span_replace() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net: marvell: prestera: Add matchall supportSerhiy Boiko
- Introduce matchall filter support - Add SPAN API to configure port mirroring. - Add tc mirror action. At this moment, only mirror (egress) action is supported. Example: tc filter ... action mirred egress mirror dev DEV Co-developed-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Vadym Kochan <vkochan@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net: marvell: Implement TC flower offloadSerhiy Boiko
Add ACL infrastructure for Prestera Switch ASICs family devices to offload cls_flower rules to be processed in the HW. ACL implementation is based on tc filter api. The flower classifier is supported to configure ACL rules/matches/action. Supported actions: - drop - trap - pass Supported dissector keys: - indev - src_mac - dst_mac - src_ip - dst_ip - ip_proto - src_port - dst_port - vlan_id - vlan_ethtype - icmp type/code Co-developed-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Vadym Kochan <vkochan@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16selftests: net: use bash to run udpgro_fwd test caseAndrea Righi
udpgro_fwd.sh contains many bash specific operators ("[[", "local -r"), but it's using /bin/sh; in some distro /bin/sh is mapped to /bin/dash, that doesn't support such operators. Force the test to use /bin/bash explicitly and prevent false positive test failures. Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16Merge branch 'net-smc-stats'David S. Miller
Karsten Graul says: ==================== net/smc: Add SMC statistic support Please apply the following patch series for smc to netdev's net-next tree. This v2 is a resend of the code contained in v1 but with an updated cover letter to describe why we have chosen to use the generic netlink mechanism to access the smc protocol's statistic data. The patchset adds statistic support to the SMC protocol. Per-cpu variables are used to collect the statistic information for better performance and for reducing concurrency pitfalls. The code that is collecting statistic data is implemented in macros to increase code reuse and readability. The generic netlink mechanism in SMC is extended to provide the collected statistics to userspace. Network namespace awareness is also part of the statistics implementation. SMC is a protocol interacting with PCI devices (like RoCE Cards) and runs on top of the TCP protocol. As SMC is a network protocol and not an ethernet device driver, we decided to use the generic netlink interface. This should be comparable to what other protocols in the net subsystem like tipc, ncsi, ieee802154 or tcp, et al, do. There is already an established internal generic netlink interface mechanism in SMC which is used to collect SMC Protocol internal information. This patchset extends that existing mechanism. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/smc: Make SMC statistics network namespace awareGuvenc Gulce
Make the gathered SMC statistics network namespace aware, for each namespace collect an own set of statistic information. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/smc: Add netlink support for SMC fallback statisticsGuvenc Gulce
Add support to collect more detailed SMC fallback reason statistics and provide these statistics to user space on the netlink interface. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/smc: Add netlink support for SMC statisticsGuvenc Gulce
Add the netlink function which collects the statistics information and delivers it to the userspace. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/smc: Add SMC statistics supportGuvenc Gulce
Add the ability to collect SMC statistics information. Per-cpu variables are used to collect the statistic information for better performance and for reducing concurrency pitfalls. The code that is collecting statistic data is implemented in macros to increase code reuse and readability. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sockEric Dumazet
While unix_may_send(sk, osk) is called while osk is locked, it appears unix_release_sock() can overwrite unix_peer() after this lock has been released, making KCSAN unhappy. Changing unix_release_sock() to access/change unix_peer() before lock is released should fix this issue. BUG: KCSAN: data-race in unix_dgram_sendmsg / unix_release_sock write to 0xffff88810465a338 of 8 bytes by task 20852 on cpu 1: unix_release_sock+0x4ed/0x6e0 net/unix/af_unix.c:558 unix_release+0x2f/0x50 net/unix/af_unix.c:859 __sock_release net/socket.c:599 [inline] sock_close+0x6c/0x150 net/socket.c:1258 __fput+0x25b/0x4e0 fs/file_table.c:280 ____fput+0x11/0x20 fs/file_table.c:313 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:209 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:302 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88810465a338 of 8 bytes by task 20888 on cpu 0: unix_may_send net/unix/af_unix.c:189 [inline] unix_dgram_sendmsg+0x923/0x1610 net/unix/af_unix.c:1712 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff888167905400 -> 0x0000000000000000 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 20888 Comm: syz-executor.0 Not tainted 5.13.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16selftests: net: veth: make test compatible with dashAndrea Righi
veth.sh is a shell script that uses /bin/sh; some distro (Ubuntu for example) use dash as /bin/sh and in this case the test reports the following error: # ./veth.sh: 21: local: -r: bad variable name # ./veth.sh: 21: local: -r: bad variable name This happens because dash doesn't support the option "-r" with local. Moreover, in case of missing bpf object, the script is exiting -1, that is an illegal number for dash: exit: Illegal number: -1 Change the script to be compatible both with bash and dash and prevent the errors above. Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16Merge branch 'net-packet-data-races'David S. Miller
Eric Dumazet says: ==================== net/packet: annotate data races KCSAN sent two reports about data races in af_packet. Nothing serious, but worth fixing. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/packet: annotate accesses to po->ifindexEric Dumazet
Like prior patch, we need to annotate lockless accesses to po->ifindex For instance, packet_getname() is reading po->ifindex (twice) while another thread is able to change po->ifindex. KCSAN reported: BUG: KCSAN: data-race in packet_do_bind / packet_getname write to 0xffff888143ce3cbc of 4 bytes by task 25573 on cpu 1: packet_do_bind+0x420/0x7e0 net/packet/af_packet.c:3191 packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255 __sys_bind+0x200/0x290 net/socket.c:1637 __do_sys_bind net/socket.c:1648 [inline] __se_sys_bind net/socket.c:1646 [inline] __x64_sys_bind+0x3d/0x50 net/socket.c:1646 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888143ce3cbc of 4 bytes by task 25578 on cpu 0: packet_getname+0x5b/0x1a0 net/packet/af_packet.c:3525 __sys_getsockname+0x10e/0x1a0 net/socket.c:1887 __do_sys_getsockname net/socket.c:1902 [inline] __se_sys_getsockname net/socket.c:1899 [inline] __x64_sys_getsockname+0x3e/0x50 net/socket.c:1899 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 25578 Comm: syz-executor.5 Not tainted 5.13.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net/packet: annotate accesses to po->bindEric Dumazet
tpacket_snd(), packet_snd(), packet_getname() and packet_seq_show() can read po->num without holding a lock. This means other threads can change po->num at the same time. KCSAN complained about this known fact [1] Add READ_ONCE()/WRITE_ONCE() to address the issue. [1] BUG: KCSAN: data-race in packet_do_bind / packet_sendmsg write to 0xffff888131a0dcc0 of 2 bytes by task 24714 on cpu 0: packet_do_bind+0x3ab/0x7e0 net/packet/af_packet.c:3181 packet_bind+0xc3/0xd0 net/packet/af_packet.c:3255 __sys_bind+0x200/0x290 net/socket.c:1637 __do_sys_bind net/socket.c:1648 [inline] __se_sys_bind net/socket.c:1646 [inline] __x64_sys_bind+0x3d/0x50 net/socket.c:1646 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888131a0dcc0 of 2 bytes by task 24719 on cpu 1: packet_snd net/packet/af_packet.c:2899 [inline] packet_sendmsg+0x317/0x3570 net/packet/af_packet.c:3040 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmsg+0x1ed/0x270 net/socket.c:2433 __do_sys_sendmsg net/socket.c:2442 [inline] __se_sys_sendmsg net/socket.c:2440 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2440 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000 -> 0x1200 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 24719 Comm: syz-executor.5 Not tainted 5.13.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16mlxsw: spectrum_router: remove redundant continue statementColin Ian King
The continue statement at the end of a for-loop has no effect, remove it. Addresses-Coverity: ("Continue has no effect") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16Merge tag 'linux-can-fixes-for-5.13-20210616' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-06-16 this is a pull request of 4 patches for net/master. The first patch is by Oleksij Rempel and fixes a Use-after-Free found by syzbot in the j1939 stack. The next patch is by Tetsuo Handa and fixes hung task detected by syzbot in the bcm, raw and isotp protocols. Norbert Slusarek's patch fixes a infoleak in bcm's struct bcm_msg_head. Pavel Skripkin's patch fixes a memory leak in the mcba_usb driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16Merge branch 'nfp-ct-part-two'David S. Miller
Simon Horman says: ==================== Next set of conntrack patches for the nfp driver Louis Peens says: This follows on from the previous series of a similar nature. Looking at the diagram as explained in the previous series this implements changes up to the point where the merged nft entries are saved. There are still bits of stubbed out code where offloading of the flows will be implemented. +-------------+ +----------+ | pre_ct flow +--------+ | nft flow | +-------------+ v +------+---+ +----------+ | | tc_merge +--------+ | +----------+ v v +--------------+ ^ +-------------+ | post_ct flow +-------+ +---+nft_tc merge | +--------------+ | +-------------+ | | | v Offload to nfp ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: implement action_merge checkLouis Peens
Fill in code stub to check that the flow actions are valid for merge. The actions of the flow X should not conflict with the matches of flow X+1. For now this check is quite strict and set_actions are very limited, will need to update this when NAT support is added. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: fill ct metadata check functionLouis Peens
Fill in check_meta stub to check that ct_metadata action fields in the nft flow matches the ct_match data of the post_ct flow. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: fill in ct merge check functionLouis Peens
Replace merge check stub code with the actual implementation. This checks that the match parts of two tc flows does not conflict. Only overlapping keys needs to be checked, and only the narrowest masked parts needs to be checked, so each key is masked with the AND'd result of both masks before comparing. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: implement code to save merge of tc and nft flowsLouis Peens
Add in the code to merge the tc_merge objects with the flows received from nft. At the moment flows are just merged blindly as the validity check functions are stubbed out, this will be populated in follow-up patches. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: add nft_merge tableLouis Peens
Add table and struct to save the result of the three-way merge between pre_ct,post_ct, and nft flows. Merging code is to be added in follow-up patches. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: make a full copy of the rule when it is a NFT flowYinjun Zhang
The nft flow will be destroyed after offload cb returns. This means we need save a full copy of it since it can be referenced through other paths other than just the offload cb, for example when a new pre_ct or post_ct entry is added, and it needs to be merged with an existing nft entry. Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: add nft flows to nft listLouis Peens
Implement code to add and remove nft flows to the relevant list. Registering and deregistering the callback function for the nft table is quite complicated. The safest is to delete the callback on the removal of the last pre_ct flow. This is because if this is also the latest pre_ct flow in software it means that this specific nft table will be freed, so there will not be a later opportunity to do this. Another place where it looks possible to delete the callback is when the last nft_flow is deleted, but this happens under the flow_table lock, which is also taken when deregistering the callback, leading to a deadlock situation. This means the final solution here is to delete the callback when removing the last pre_ct flow, and then clean up any remaining nft_flow entries which may still be present, since there will never be a callback now to do this, leaving them orphaned if not cleaned up here as well. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: add nft callback stubsLouis Peens
Add register/unregister of the nft callback. For now just add stub code to accept the flows, but don't do anything with it. Decided to accept the flows since netfilter will keep on trying to offload a flow if it was rejected, which is quite noisy. Follow-up patches will start implementing the functions to add nft flows to the relevant tables. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16nfp: flower-ct: add delete flow handling for ctLouis Peens
Add functions to handle delete flow callbacks for ct flows. Also accept the flows for offloading by returning 0 instead of -EOPNOTSUPP. Flows will still not actually be offloaded to hw, but at this point it's difficult to not accept the flows and also exercise the cleanup paths properly. Traffic will still be handled safely through the fallback path. Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-16net: ipv4: fix memory leak in ip_mc_add1_srcChengyang Fan
BUG: memory leak unreferenced object 0xffff888101bc4c00 (size 32): comm "syz-executor527", pid 360, jiffies 4294807421 (age 19.329s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 00 ac 14 14 bb 00 00 02 00 ................ backtrace: [<00000000f17c5244>] kmalloc include/linux/slab.h:558 [inline] [<00000000f17c5244>] kzalloc include/linux/slab.h:688 [inline] [<00000000f17c5244>] ip_mc_add1_src net/ipv4/igmp.c:1971 [inline] [<00000000f17c5244>] ip_mc_add_src+0x95f/0xdb0 net/ipv4/igmp.c:2095 [<000000001cb99709>] ip_mc_source+0x84c/0xea0 net/ipv4/igmp.c:2416 [<0000000052cf19ed>] do_ip_setsockopt net/ipv4/ip_sockglue.c:1294 [inline] [<0000000052cf19ed>] ip_setsockopt+0x114b/0x30c0 net/ipv4/ip_sockglue.c:1423 [<00000000477edfbc>] raw_setsockopt+0x13d/0x170 net/ipv4/raw.c:857 [<00000000e75ca9bb>] __sys_setsockopt+0x158/0x270 net/socket.c:2117 [<00000000bdb993a8>] __do_sys_setsockopt net/socket.c:2128 [inline] [<00000000bdb993a8>] __se_sys_setsockopt net/socket.c:2125 [inline] [<00000000bdb993a8>] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2125 [<000000006a1ffdbd>] do_syscall_64+0x40/0x80 arch/x86/entry/common.c:47 [<00000000b11467c4>] entry_SYSCALL_64_after_hwframe+0x44/0xae In commit 24803f38a5c0 ("igmp: do not remove igmp souce list info when set link down"), the ip_mc_clear_src() in ip_mc_destroy_dev() was removed, because it was also called in igmpv3_clear_delrec(). Rough callgraph: inetdev_destroy -> ip_mc_destroy_dev -> igmpv3_clear_delrec -> ip_mc_clear_src -> RCU_INIT_POINTER(dev->ip_ptr, NULL) However, ip_mc_clear_src() called in igmpv3_clear_delrec() doesn't release in_dev->mc_list->sources. And RCU_INIT_POINTER() assigns the NULL to dev->ip_ptr. As a result, in_dev cannot be obtained through inetdev_by_index() and then in_dev->mc_list->sources cannot be released by ip_mc_del1_src() in the sock_close. Rough call sequence goes like: sock_close -> __sock_release -> inet_release -> ip_mc_drop_socket -> inetdev_by_index -> ip_mc_leave_src -> ip_mc_del_src -> ip_mc_del1_src So we still need to call ip_mc_clear_src() in ip_mc_destroy_dev() to free in_dev->mc_list->sources. Fixes: 24803f38a5c0 ("igmp: do not remove igmp souce list info ...") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Chengyang Fan <cy.fan@huawei.com> Acked-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>