summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-06-11Merge tag 'drm-misc-fixes-2021-06-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes One fix for snu4i that prevents it from probing, two locking fixes for ttm and drm_auth, one off-by-x1000 fix for mcde and a fix for vc4 to prevent an out-of-bounds access. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210610171653.lqsoadxrhdk73cdy@gilmour
2021-06-11Merge tag 'drm-msm-fixes-2021-06-10' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-fixes - NULL ptr deref fix - CP_PROTECT reg programming fix - incorrect register shift fix - DSI blank screen fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvbcz0=QxGYnX9u7cD1SCvFSx20dzrZuOccjtRRBTJd5Q@mail.gmail.com
2021-06-10ARC: fix CONFIG_HARDENED_USERCOPYVineet Gupta
Currently enabling this triggers a warning | usercopy: Kernel memory overwrite attempt detected to kernel text (offset 155633, size 11)! | usercopy: BUG: failure at mm/usercopy.c:99/usercopy_abort()! | |gcc generated __builtin_trap |Path: /bin/busybox |CPU: 0 PID: 84 Comm: init Not tainted 5.4.22 | |[ECR ]: 0x00090005 => gcc generated __builtin_trap |[EFA ]: 0x9024fcaa |[BLINK ]: usercopy_abort+0x8a/0x8c |[ERET ]: memfd_fcntl+0x0/0x470 |[STAT32]: 0x80080802 : IE K |... |... |Stack Trace: | memfd_fcntl+0x0/0x470 | usercopy_abort+0x8a/0x8c | __check_object_size+0x10e/0x138 | copy_strings+0x1f4/0x38c | __do_execve_file+0x352/0x848 | EV_Trap+0xcc/0xd0 The issue is triggered by an allocation in "init reclaimed" region. ARC _stext emcompasses the init region (for historical reasons we wanted the init.text to be under .text as well). This however trips up __check_object_size()->check_kernel_text_object() which treats this as object bleeding into kernel text. Fix that by rezoning _stext to start from regular kernel .text and leave out .init altogether. Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/15 Reported-by: Evgeniy Didin <didin@synopsys.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-06-10ARCv2: save ABI registers across signal handlingVineet Gupta
ARCv2 has some configuration dependent registers (r30, r58, r59) which could be targetted by the compiler. To keep the ABI stable, these were unconditionally part of the glibc ABI (sysdeps/unix/sysv/linux/arc/sys/ucontext.h:mcontext_t) however we missed populating them (by saving/restoring them across signal handling). This patch fixes the issue by - adding arcv2 ABI regs to kernel struct sigcontext - populating them during signal handling Change to struct sigcontext might seem like a glibc ABI change (although it primarily uses ucontext_t:mcontext_t) but the fact is - it has only been extended (existing fields are not touched) - the old sigcontext was ABI incomplete to begin with anyways Fixes: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/53 Cc: <stable@vger.kernel.org> Tested-by: kernel test robot <lkp@intel.com> Reported-by: Vladimir Isaev <isaev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-06-10Merge branch 'mptcp-fixes'David S. Miller
Mat Martineau says: ==================== mptcp: More v5.13 fixes Here's another batch of MPTCP fixes for v5.13. Patch 1 cleans up memory accounting between the MPTCP-level socket and the subflows to more reliably transfer forward allocated memory under pressure. Patch 2 wakes up socket readers more reliably. Patch 3 changes a WARN_ONCE to a pr_debug. Patch 4 changes the selftests to only use syncookies in test cases where they do not cause spurious failures. Patch 5 modifies socket error reporting to avoid a possible soft lockup. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10mptcp: fix soft lookup in subflow_error_report()Paolo Abeni
Maxim reported a soft lookup in subflow_error_report(): watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] RIP: 0010:native_queued_spin_lock_slowpath RSP: 0018:ffffa859c0003bc0 EFLAGS: 00000202 RAX: 0000000000000101 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff9195c2772d88 RSI: 0000000000000000 RDI: ffff9195c2772d88 RBP: ffff9195c2772d00 R08: 00000000000067b0 R09: c6e31da9eb1e44f4 R10: ffff9195ef379700 R11: ffff9195edb50710 R12: ffff9195c2772d88 R13: ffff9195f500e3d0 R14: ffff9195ef379700 R15: ffff9195ef379700 FS: 0000000000000000(0000) GS:ffff91961f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c000407000 CR3: 0000000002988000 CR4: 00000000000006f0 Call Trace: <IRQ> _raw_spin_lock_bh subflow_error_report mptcp_subflow_data_available __mptcp_move_skbs_from_subflow mptcp_data_ready tcp_data_queue tcp_rcv_established tcp_v4_do_rcv tcp_v4_rcv ip_protocol_deliver_rcu ip_local_deliver_finish __netif_receive_skb_one_core netif_receive_skb rtl8139_poll 8139too __napi_poll net_rx_action __do_softirq __irq_exit_rcu common_interrupt </IRQ> The calling function - mptcp_subflow_data_available() - can be invoked from different contexts: - plain ssk socket lock - ssk socket lock + mptcp_data_lock - ssk socket lock + mptcp_data_lock + msk socket lock. Since subflow_error_report() tries to acquire the mptcp_data_lock, the latter two call chains will cause soft lookup. This change addresses the issue moving the error reporting call to outer functions, where the held locks list is known and the we can acquire only the needed one. Reported-by: Maxim Galaganov <max@internet.ru> Fixes: 15cc10453398 ("mptcp: deliver ssk errors to msk") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/199 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10selftests: mptcp: enable syncookie only in absence of reordersPaolo Abeni
Syncookie validation may fail for OoO packets, causing spurious resets and self-tests failures, so let's force syncookie only for tests iteration with no OoO. Fixes: fed61c4b584c ("selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/198 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10mptcp: do not warn on bad input from the networkPaolo Abeni
warn_bad_map() produces a kernel WARN on bad input coming from the network. Use pr_debug() to avoid spamming the system log. Additionally, when the right bound check fails, warn_bad_map() reports the wrong ssn value, let's fix it. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/107 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10mptcp: wake-up readers only for in sequence dataPaolo Abeni
Currently we rely on the subflow->data_avail field, which is subject to races: ssk1 skb len = 500 DSS(seq=1, len=1000, off=0) # data_avail == MPTCP_SUBFLOW_DATA_AVAIL ssk2 skb len = 500 DSS(seq = 501, len=1000) # data_avail == MPTCP_SUBFLOW_DATA_AVAIL ssk1 skb len = 500 DSS(seq = 1, len=1000, off =500) # still data_avail == MPTCP_SUBFLOW_DATA_AVAIL, # as the skb is covered by a pre-existing map, # which was in-sequence at reception time. Instead we can explicitly check if some has been received in-sequence, propagating the info from __mptcp_move_skbs_from_subflow(). Additionally add the 'ONCE' annotation to the 'data_avail' memory access, as msk will read it outside the subflow socket lock. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10mptcp: try harder to borrow memory from subflow under pressurePaolo Abeni
If the host is under sever memory pressure, and RX forward memory allocation for the msk fails, we try to borrow the required memory from the ingress subflow. The current attempt is a bit flaky: if skb->truesize is less than SK_MEM_QUANTUM, the ssk will not release any memory, and the next schedule will fail again. Instead, directly move the required amount of pages from the ssk to the msk, if available Fixes: 9c3f94e1681b ("mptcp: add missing memory scheduling in the rx path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10riscv: code patching only works on !XIP_KERNELJisheng Zhang
Some features which need code patching such as KPROBES, DYNAMIC_FTRACE KGDB can only work on !XIP_KERNEL. Add dependencies for these features that rely on code patching. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-10riscv: xip: support runtime trap patchingVitaly Wool
RISCV_ERRATA_ALTERNATIVE patches text at runtime which is currently not possible when the kernel is executed from the flash in XIP mode. Since runtime patching concerns only traps at the moment, let's just have all the traps reside in RAM anyway if RISCV_ERRATA_ALTERNATIVE is set. Thus, these functions will be patch-able even when the .text section is in flash. Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-10io_uring: add feature flag for rsrc tagsPavel Begunkov
Add IORING_FEAT_RSRC_TAGS indicating that io_uring supports a bunch of new IORING_REGISTER operations, in particular IORING_REGISTER_[FILES[,UPDATE]2,BUFFERS[2,UPDATE]] that support rsrc tagging, and also indicating implemented dynamic fixed buffer updates. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b995d4045b6c6b4ab7510ca124fd25ac2203af7.1623339162.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-10io_uring: change registration/upd/rsrc tagging ABIPavel Begunkov
There are ABI moments about recently added rsrc registration/update and tagging that might become a nuisance in the future. First, IORING_REGISTER_RSRC[_UPD] hide different types of resources under it, so breaks fine control over them by restrictions. It works for now, but once those are wanted under restrictions it would require a rework. It was also inconvenient trying to fit a new resource not supporting all the features (e.g. dynamic update) into the interface, so better to return to IORING_REGISTER_* top level dispatching. Second, register/update were considered to accept a type of resource, however that's not a good idea because there might be several ways of registration of a single resource type, e.g. we may want to add non-contig buffers or anything more exquisite as dma mapped memory. So, remove IORING_RSRC_[FILE,BUFFER] out of the ABI, and place them internally for now to limit changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b554897a7c17ad6e3becc48dfed2f7af9f423d5.1623339162.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix a crash when stateful expression with its own gc callback is used in a set definition. 2) Skip IPv6 packets from any link-local address in IPv6 fib expression. Add a selftest for this scenario, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10Merge branch 'tcp-options-oob-fixes'David S. Miller
Maxim Mikityanskiy says: ==================== Fix out of bounds when parsing TCP options This series fixes out-of-bounds access in various places in the kernel where parsing of TCP options takes place. Fortunately, many more occurrences don't have this bug. v2 changes: synproxy: Added an early return when length < 0 to avoid calling skb_header_pointer with negative length. sch_cake: Added doff validation to avoid parsing garbage. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10sch_cake: Fix out of bounds when parsing TCP options and headerMaxim Mikityanskiy
The TCP option parser in cake qdisc (cake_get_tcpopt and cake_tcph_may_drop) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1. This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options."). v2 changes: Added doff validation in cake_get_tcphdr to avoid parsing garbage as TCP header. Although it wasn't strictly an out-of-bounds access (memory was allocated), garbage values could be read where CAKE expected the TCP header if doff was smaller than 5. Cc: Young Xiao <92siuyang@gmail.com> Fixes: 8b7138814f29 ("sch_cake: Add optional ACK filter") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10mptcp: Fix out of bounds when parsing TCP optionsMaxim Mikityanskiy
The TCP option parser in mptcp (mptcp_get_options) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1. This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options."). Cc: Young Xiao <92siuyang@gmail.com> Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10netfilter: synproxy: Fix out of bounds when parsing TCP optionsMaxim Mikityanskiy
The TCP option parser in synproxy (synproxy_parse_options) could read one byte out of bounds. When the length is 1, the execution flow gets into the loop, reads one byte of the opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the length of 1. This fix is inspired by commit 9609dad263f8 ("ipv4: tcp_input: fix stack out of bounds when parsing TCP options."). v2 changes: Added an early return when length < 0 to avoid calling skb_header_pointer with negative length. Cc: Young Xiao <92siuyang@gmail.com> Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10net/packet: annotate data race in packet_sendmsg()Eric Dumazet
There is a known race in packet_sendmsg(), addressed in commit 32d3182cd2cd ("net/packet: fix race in tpacket_snd()") Now we have data_race(), we can use it to avoid a future KCSAN warning, as syzbot loves stressing af_packet sockets :) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10inet: annotate date races around sk->sk_txhashEric Dumazet
UDP sendmsg() path can be lockless, it is possible for another thread to re-connect an change sk->sk_txhash under us. There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE() pair to document the race. BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1: sk_set_txhash include/net/sock.h:1937 [inline] __ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75 __ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189 ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272 inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580 __sys_connect_file net/socket.c:1837 [inline] __sys_connect+0x245/0x280 net/socket.c:1854 __do_sys_connect net/socket.c:1864 [inline] __se_sys_connect net/socket.c:1861 [inline] __x64_sys_connect+0x3d/0x50 net/socket.c:1861 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0: skb_set_hash_from_sk include/net/sock.h:2211 [inline] skb_set_owner_w+0x118/0x220 net/core/sock.c:2101 sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359 sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373 __ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621 ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983 udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xbca3c43d -> 0xfdb309e0 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10net: annotate data race in sock_error()Eric Dumazet
sock_error() is known to be racy. The code avoids an atomic operation is sk_err is zero, and this field could be changed under us, this is fine. Sysbot reported: BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock write to 0xffff888131855630 of 4 bytes by task 9365 on cpu 1: unix_release_sock+0x2e9/0x6e0 net/unix/af_unix.c:550 unix_release+0x2f/0x50 net/unix/af_unix.c:859 __sock_release net/socket.c:599 [inline] sock_close+0x6c/0x150 net/socket.c:1258 __fput+0x25b/0x4e0 fs/file_table.c:280 ____fput+0x11/0x20 fs/file_table.c:313 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888131855630 of 4 bytes by task 9385 on cpu 0: sock_error include/net/sock.h:2269 [inline] sock_alloc_send_pskb+0xe4/0x4e0 net/core/sock.c:2336 unix_dgram_sendmsg+0x478/0x1610 net/unix/af_unix.c:1671 unix_seqpacket_sendmsg+0xc2/0x100 net/unix/af_unix.c:2055 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 __sys_sendmsg_sock+0x25/0x30 net/socket.c:2416 io_sendmsg fs/io_uring.c:4367 [inline] io_issue_sqe+0x231a/0x6750 fs/io_uring.c:6135 __io_queue_sqe+0xe9/0x360 fs/io_uring.c:6414 __io_req_task_submit fs/io_uring.c:2039 [inline] io_async_task_func+0x312/0x590 fs/io_uring.c:5074 __tctx_task_work fs/io_uring.c:1910 [inline] tctx_task_work+0x1d4/0x3d0 fs/io_uring.c:1924 task_work_run+0xae/0x130 kernel/task_work.c:164 tracehook_notify_signal include/linux/tracehook.h:212 [inline] handle_signal_work kernel/entry/common.c:145 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0xf8/0x190 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301 do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x00000000 -> 0x00000068 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 9385 Comm: syz-executor.3 Not tainted 5.13.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10Merge branch 'bridge-egress-fixes'David S. Miller
Nikolay Aleksandrov says: ==================== net: bridge: vlan tunnel egress path fixes These two fixes take care of tunnel_dst problems in the vlan tunnel egress path. Patch 01 fixes a null ptr deref due to the lockless use of tunnel_dst pointer without checking it first, and patch 02 fixes a use-after-free issue due to wrong dst refcounting (dst_clone() -> dst_hold_safe()). Both fix the same commit and should be queued for stable backports: Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") v2: no changes, added stable list to CC ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10net: bridge: fix vlan tunnel dst refcnt when egressingNikolay Aleksandrov
The egress tunnel code uses dst_clone() and directly sets the result which is wrong because the entry might have 0 refcnt or be already deleted, causing number of problems. It also triggers the WARN_ON() in dst_hold()[1] when a refcnt couldn't be taken. Fix it by using dst_hold_safe() and checking if a reference was actually taken before setting the dst. [1] dmesg WARN_ON log and following refcnt errors WARNING: CPU: 5 PID: 38 at include/net/dst.h:230 br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge] Modules linked in: 8021q garp mrp bridge stp llc bonding ipv6 virtio_net CPU: 5 PID: 38 Comm: ksoftirqd/5 Kdump: loaded Tainted: G W 5.13.0-rc3+ #360 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 RIP: 0010:br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge] Code: e8 85 bc 01 e1 45 84 f6 74 90 45 31 f6 85 db 48 c7 c7 a0 02 19 a0 41 0f 94 c6 31 c9 31 d2 44 89 f6 e8 64 bc 01 e1 85 db 75 02 <0f> 0b 31 c9 31 d2 44 89 f6 48 c7 c7 70 02 19 a0 e8 4b bc 01 e1 49 RSP: 0018:ffff8881003d39e8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa01902a0 RBP: ffff8881040c6700 R08: 0000000000000000 R09: 0000000000000001 R10: 2ce93d0054fe0d00 R11: 54fe0d00000e0000 R12: ffff888109515000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000401 FS: 0000000000000000(0000) GS:ffff88822bf40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f42ba70f030 CR3: 0000000109926000 CR4: 00000000000006e0 Call Trace: br_handle_vlan+0xbc/0xca [bridge] __br_forward+0x23/0x164 [bridge] deliver_clone+0x41/0x48 [bridge] br_handle_frame_finish+0x36f/0x3aa [bridge] ? skb_dst+0x2e/0x38 [bridge] ? br_handle_ingress_vlan_tunnel+0x3e/0x1c8 [bridge] ? br_handle_frame_finish+0x3aa/0x3aa [bridge] br_handle_frame+0x2c3/0x377 [bridge] ? __skb_pull+0x33/0x51 ? vlan_do_receive+0x4f/0x36a ? br_handle_frame_finish+0x3aa/0x3aa [bridge] __netif_receive_skb_core+0x539/0x7c6 ? __list_del_entry_valid+0x16e/0x1c2 __netif_receive_skb_list_core+0x6d/0xd6 netif_receive_skb_list_internal+0x1d9/0x1fa gro_normal_list+0x22/0x3e dev_gro_receive+0x55b/0x600 ? detach_buf_split+0x58/0x140 napi_gro_receive+0x94/0x12e virtnet_poll+0x15d/0x315 [virtio_net] __napi_poll+0x2c/0x1c9 net_rx_action+0xe6/0x1fb __do_softirq+0x115/0x2d8 run_ksoftirqd+0x18/0x20 smpboot_thread_fn+0x183/0x19c ? smpboot_unregister_percpu_thread+0x66/0x66 kthread+0x10a/0x10f ? kthread_mod_delayed_work+0xb6/0xb6 ret_from_fork+0x22/0x30 ---[ end trace 49f61b07f775fd2b ]--- dst_release: dst:00000000c02d677a refcnt:-1 dst_release underflow Cc: stable@vger.kernel.org Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10net: bridge: fix vlan tunnel dst null pointer dereferenceNikolay Aleksandrov
This patch fixes a tunnel_dst null pointer dereference due to lockless access in the tunnel egress path. When deleting a vlan tunnel the tunnel_dst pointer is set to NULL without waiting a grace period (i.e. while it's still usable) and packets egressing are dereferencing it without checking. Use READ/WRITE_ONCE to annotate the lockless use of tunnel_id, use RCU for accessing tunnel_dst and make sure it is read only once and checked in the egress path. The dst is already properly RCU protected so we don't need to do anything fancy than to make sure tunnel_id and tunnel_dst are read only once and checked in the egress path. Cc: stable@vger.kernel.org Fixes: 11538d039ac6 ("bridge: vlan dst_metadata hooks in ingress and egress paths") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10coredump: Limit what can interrupt coredumpsEric W. Biederman
Olivier Langlois has been struggling with coredumps being incompletely written in processes using io_uring. Olivier Langlois <olivier@trillion01.com> writes: > io_uring is a big user of task_work and any event that io_uring made a > task waiting for that occurs during the core dump generation will > generate a TIF_NOTIFY_SIGNAL. > > Here are the detailed steps of the problem: > 1. io_uring calls vfs_poll() to install a task to a file wait queue > with io_async_wake() as the wakeup function cb from io_arm_poll_handler() > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling > set_notify_signal() The coredump code deliberately supports being interrupted by SIGKILL, and depends upon prepare_signal to filter out all other signals. Now that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack in dump_emitted by the coredump code no longer works. Make the coredump code more robust by explicitly testing for all of the wakeup conditions the coredump code supports. This prevents new wakeup conditions from breaking the coredump code, as well as fixing the current issue. The filesystem code that the coredump code uses already limits itself to only aborting on fatal_signal_pending. So it should not develop surprising wake-up reasons either. v2: Don't remove the now unnecessary code in prepare_signal. Cc: stable@vger.kernel.org Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Reported-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-10ping: Check return value of function 'ping_queue_rcv_skb'Zheng Yongjun
Function 'ping_queue_rcv_skb' not always return success, which will also return fail. If not check the wrong return value of it, lead to function `ping_rcv` return success. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10skbuff: fix incorrect msg_zerocopy copy notificationsWillem de Bruijn
msg_zerocopy signals if a send operation required copying with a flag in serr->ee.ee_code. This field can be incorrect as of the below commit, as a result of both structs uarg and serr pointing into the same skb->cb[]. uarg->zerocopy must be read before skb->cb[] is reinitialized to hold serr. Similar to other fields len, hi and lo, use a local variable to temporarily hold the value. This was not a problem before, when the value was passed as a function argument. Fixes: 75518851a2a0 ("skbuff: Push status and refcounts into sock_zerocopy_callback") Reported-by: Talal Ahmad <talalahmad@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10Merge tag 'mlx5-fixes-2021-06-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2021-06-09 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-10Merge branch 'for-5.13-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "This is a high priority but low risk fix for a cgroup1 bug where rename(2) can change a cgroup's name to something which can break parsing of /proc/PID/cgroup" * 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup1: don't allow '\n' in renaming
2021-06-10usb: typec: mux: Fix copy-paste mistake in typec_mux_matchBjorn Andersson
Fix the copy-paste mistake in the return path of typec_mux_match(), where dev is considered a member of struct typec_switch rather than struct typec_mux. The two structs are identical in regards to having the struct device as the first entry, so this provides no functional change. Fixes: 3370db35193b ("usb: typec: Registering real device entries for the muxes") Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210610002132.3088083-1-bjorn.andersson@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10usb: typec: ucsi: Clear PPM capability data in ucsi_init() error pathMayank Rana
If ucsi_init() fails for some reason (e.g. ucsi_register_port() fails or general communication failure to the PPM), particularly at any point after the GET_CAPABILITY command had been issued, this results in unwinding the initialization and returning an error. However the ucsi structure's ucsi_capability member retains its current value, including likely a non-zero num_connectors. And because ucsi_init() itself is done in a workqueue a UCSI interface driver will be unaware that it failed and may think the ucsi_register() call was completely successful. Later, if ucsi_unregister() is called, due to this stale ucsi->cap value it would try to access the items in the ucsi->connector array which might not be in a proper state or not even allocated at all and results in NULL or invalid pointer dereference. Fix this by clearing the ucsi->cap value to 0 during the error path of ucsi_init() in order to prevent a later ucsi_unregister() from entering the connector cleanup loop. Fixes: c1b0bc2dabfa ("usb: typec: Add support for UCSI interface") Cc: stable@vger.kernel.org Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Mayank Rana <mrana@codeaurora.org> Signed-off-by: Jack Pham <jackp@codeaurora.org> Link: https://lore.kernel.org/r/20210609073535.5094-1-jackp@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10usb: gadget: fsl: Re-enable driver for ARM SoCsJoel Stanley
The commit a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver") dropped the ARCH_MXC dependency from USB_FSL_USB2, leaving it depending solely on FSL_SOC. FSL_SOC is powerpc only; it was briefly available on ARM in 2014 but was removed by commit cfd074ad8600 ("ARM: imx: temporarily remove CONFIG_SOC_FSL from LS1021A"). Therefore the driver can no longer be enabled on ARM platforms. This appears to be a mistake as arm64's ARCH_LAYERSCAPE and arm32 SOC_LS1021A SoCs use this symbol. It's enabled in these defconfigs: arch/arm/configs/imx_v6_v7_defconfig:CONFIG_USB_FSL_USB2=y arch/arm/configs/multi_v7_defconfig:CONFIG_USB_FSL_USB2=y arch/powerpc/configs/mgcoge_defconfig:CONFIG_USB_FSL_USB2=y arch/powerpc/configs/mpc512x_defconfig:CONFIG_USB_FSL_USB2=y To fix, expand the dependencies so USB_FSL_USB2 can be enabled on the ARM platforms, and with COMPILE_TEST. Fixes: a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver") Signed-off-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20210610034957.93376-1-joel@jms.id.au Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10usb: typec: wcove: Use LE to CPU conversion when accessing msg->headerAndy Shevchenko
As LKP noticed the Sparse is not happy about strict type handling: .../typec/tcpm/wcove.c:380:50: sparse: expected unsigned short [usertype] header .../typec/tcpm/wcove.c:380:50: sparse: got restricted __le16 const [usertype] header Fix this by switching to use pd_header_cnt_le() instead of pd_header_cnt() in the affected code. Fixes: ae8a2ca8a221 ("usb: typec: Group all TCPCI/TCPM code together") Fixes: 3c4fb9f16921 ("usb: typec: wcove: start using tcpm for USB PD support") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210609172202.83377-1-andriy.shevchenko@linux.intel.com Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-06-10Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "A mixture of small bug fixes and a small security issue: - WARN_ON when IPoIB is automatically moved between namespaces - Long standing bug where mlx5 would use the wrong page for the doorbell recovery memory if fork is used - Security fix for mlx4 that disables the timestamp feature - Several crashers for mlx5 - Plug a recent mlx5 memory leak for the sig_mr" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/mlx5: Fix initializing CQ fragments buffer RDMA/mlx5: Delete right entry from MR signature database RDMA: Verify port when creating flow rule RDMA/mlx5: Block FDB rules when not in switchdev mode RDMA/mlx4: Do not map the core_clock page to user space unless enabled RDMA/mlx5: Use different doorbell memory for different processes RDMA/ipoib: Fix warning caused by destroying non-initial netns
2021-06-10irqchip/gic-v3: Workaround inconsistent PMR setting on NMI entryMarc Zyngier
The arm64 entry code suffers from an annoying issue on taking a NMI, as it sets PMR to a value that actually allows IRQs to be acknowledged. This is done for consistency with other parts of the code, and is in the process of being fixed. This shouldn't be a problem, as we are not enabling interrupts whilst in NMI context. However, in the infortunate scenario that we took a spurious NMI (retired before the read of IAR) *and* that there is an IRQ pending at the same time, we'll ack the IRQ in NMI context. Too bad. In order to avoid deadlocks while running something like perf, teach the GICv3 driver about this situation: if we were in a context where no interrupt should have fired, transiently set PMR to a value that only allows NMIs before acking the pending interrupt, and restore the original value after that. This papers over the core issue for the time being, and makes NMIs great again. Sort of. Fixes: 4d6a38da8e79e94c ("arm64: entry: always set GIC_PRIO_PSR_I_SET during entry") Co-developed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/lkml/20210610145731.1350460-1-maz@kernel.org
2021-06-10hwmon: (tps23861) correct shunt LSB valuesRobert Marko
Current shunt LSB values got reversed during in the original driver commit. So, correct the current shunt LSB values according to the datasheet. This caused reading slightly skewed current values. Fixes: fff7b8ab2255 ("hwmon: add Texas Instruments TPS23861 driver") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://lore.kernel.org/r/20210609220728.499879-3-robert.marko@sartura.hr Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-10hwmon: (tps23861) set current shunt valueRobert Marko
TPS23861 has a configuration bit for setting of the current shunt value used on the board. Its bit 0 of the General Mask 1 register. According to the datasheet bit values are: 0 for 255 mOhm (Default) 1 for 250 mOhm So, configure the bit before registering the hwmon device according to the value passed in the DTS or default one if none is passed. This caused potentially reading slightly skewed values due to max current value being 1.02A when 250mOhm shunt is used instead of 1.0A when 255mOhm is used. Fixes: fff7b8ab2255 ("hwmon: add Texas Instruments TPS23861 driver") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://lore.kernel.org/r/20210609220728.499879-2-robert.marko@sartura.hr Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-10hwmon: (tps23861) define regmap max registerRobert Marko
Define the max register address the device supports. This allows reading the whole register space via regmap debugfs, without it only register 0x0 is visible. This was forgotten in the original driver commit. Fixes: fff7b8ab2255 ("hwmon: add Texas Instruments TPS23861 driver") Signed-off-by: Robert Marko <robert.marko@sartura.hr> Link: https://lore.kernel.org/r/20210609220728.499879-1-robert.marko@sartura.hr Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2021-06-10ALSA: seq: Fix race of snd_seq_timer_open()Takashi Iwai
The timer instance per queue is exclusive, and snd_seq_timer_open() should have managed the concurrent accesses. It looks as if it's checking the already existing timer instance at the beginning, but it's not right, because there is no protection, hence any later concurrent call of snd_seq_timer_open() may override the timer instance easily. This may result in UAF, as the leftover timer instance can keep running while the queue itself gets closed, as spotted by syzkaller recently. For avoiding the race, add a proper check at the assignment of tmr->timeri again, and return -EBUSY if it's been already registered. Reported-by: syzbot+ddc1260a83ed1cbf6fb5@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/000000000000dce34f05c42f110c@google.com Link: https://lore.kernel.org/r/20210610152059.24633-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-06-10USB: serial: cp210x: fix CP2102N-A01 modem controlJohan Hovold
CP2102N revision A01 (firmware version <= 1.0.4) has a buggy flow-control implementation that uses the ulXonLimit instead of ulFlowReplace field of the flow-control settings structure (erratum CP2102N_E104). A recent change that set the input software flow-control limits incidentally broke RTS control for these devices when CRTSCTS is not set as the new limits would always enable hardware flow control. Fix this by explicitly disabling flow control for the buggy firmware versions and only updating the input software flow-control limits when IXOFF is requested. This makes sure that the terminal settings matches the default zero ulXonLimit (ulFlowReplace) for these devices. Link: https://lore.kernel.org/r/20210609161509.9459-1-johan@kernel.org Reported-by: David Frey <dpfrey@gmail.com> Reported-by: Alex Villacís Lasso <a_villacis@palosanto.com> Tested-by: Alex Villacís Lasso <a_villacis@palosanto.com> Fixes: f61309d9c96a ("USB: serial: cp210x: set IXOFF thresholds") Cc: stable@vger.kernel.org # 5.12 Signed-off-by: Johan Hovold <johan@kernel.org>
2021-06-10drm/msm/dsi: Stash away calculated vco frequency on recalcStephen Boyd
A problem was reported on CoachZ devices where the display wouldn't come up, or it would be distorted. It turns out that the PLL code here wasn't getting called once dsi_pll_10nm_vco_recalc_rate() started returning the same exact frequency, down to the Hz, that the bootloader was setting instead of 0 when the clk was registered with the clk framework. After commit 001d8dc33875 ("drm/msm/dsi: remove temp data from global pll structure") we use a hardcoded value for the parent clk frequency, i.e. VCO_REF_CLK_RATE, and we also hardcode the value for FRAC_BITS, instead of getting it from the config structure. This combination of changes to the recalc function allows us to properly calculate the frequency of the PLL regardless of whether or not the PLL has been clk_prepare()d or clk_set_rate()d. That's a good improvement. Unfortunately, this means that now we won't call down into the PLL clk driver when we call clk_set_rate() because the frequency calculated in the framework matches the frequency that is set in hardware. If the rate is the same as what we want it should be OK to not call the set_rate PLL op. The real problem is that the prepare op in this driver uses a private struct member to stash away the vco frequency so that it can call the set_rate op directly during prepare. Once the set_rate op is never called because recalc_rate told us the rate is the same, we don't set this private struct member before the prepare op runs, so we try to call the set_rate function directly with a frequency of 0. This effectively kills the PLL and configures it for a rate that won't work. Calling set_rate from prepare is really quite bad and will confuse any downstream clks about what the rate actually is of their parent. Fixing that will be a rather large change though so we leave that to later. For now, let's stash away the rate we calculate during recalc so that the prepare op knows what frequency to set, instead of 0. This way things keep working and the display can enable the PLL properly. In the future, we should remove that code from the prepare op so that it doesn't even try to call the set rate function. Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Abhinav Kumar <abhinavk@codeaurora.org> Fixes: 001d8dc33875 ("drm/msm/dsi: remove temp data from global pll structure") Signed-off-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/20210608195519.125561-1-swboyd@chromium.org Signed-off-by: Rob Clark <robdclark@chromium.org>
2021-06-10cgroup1: don't allow '\n' in renamingAlexander Kuznetsov
cgroup_mkdir() have restriction on newline usage in names: $ mkdir $'/sys/fs/cgroup/cpu/test\ntest2' mkdir: cannot create directory '/sys/fs/cgroup/cpu/test\ntest2': Invalid argument But in cgroup1_rename() such check is missed. This allows us to make /proc/<pid>/cgroup unparsable: $ mkdir /sys/fs/cgroup/cpu/test $ mv /sys/fs/cgroup/cpu/test $'/sys/fs/cgroup/cpu/test\ntest2' $ echo $$ > $'/sys/fs/cgroup/cpu/test\ntest2' $ cat /proc/self/cgroup 11:pids:/ 10:freezer:/ 9:hugetlb:/ 8:cpuset:/ 7:blkio:/user.slice 6:memory:/user.slice 5:net_cls,net_prio:/ 4:perf_event:/ 3:devices:/user.slice 2:cpu,cpuacct:/test test2 1:name=systemd:/ 0::/ Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru> Reported-by: Andrey Krasichkov <buglloc@yandex-team.ru> Acked-by: Dmitry Yakunin <zeil@yandex-team.ru> Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org>
2021-06-10KVM: x86: Immediately reset the MMU context when the SMM flag is clearedSean Christopherson
Immediately reset the MMU context when the vCPU's SMM flag is cleared so that the SMM flag in the MMU role is always synchronized with the vCPU's flag. If RSM fails (which isn't correctly emulated), KVM will bail without calling post_leave_smm() and leave the MMU in a bad state. The bad MMU role can lead to a NULL pointer dereference when grabbing a shadow page's rmap for a page fault as the initial lookups for the gfn will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will use the shadow page's SMM flag, which comes from the MMU (=1). SMM has an entirely different set of memslots, and so the initial lookup can find a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1). general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline] RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947 Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44 RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002 R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000 R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000 FS: 000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline] mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604 __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline] direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769 kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline] kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065 vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122 vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428 vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494 kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722 kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:1069 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x440ce9 Cc: stable@vger.kernel.org Reported-by: syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com Fixes: 9ec19493fb86 ("KVM: x86: clear SMM flags before loading state while leaving SMM") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-10IB/mlx5: Fix initializing CQ fragments bufferAlaa Hleihel
The function init_cq_frag_buf() can be called to initialize the current CQ fragments buffer cq->buf, or the temporary cq->resize_buf that is filled during CQ resize operation. However, the offending commit started to use function get_cqe() for getting the CQEs, the issue with this change is that get_cqe() always returns CQEs from cq->buf, which leads us to initialize the wrong buffer, and in case of enlarging the CQ we try to access elements beyond the size of the current cq->buf and eventually hit a kernel panic. [exception RIP: init_cq_frag_buf+103] [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib] [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core] [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt] [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt] [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt] [ffff9f799ddcbec8] kthread at ffffffffa66c5da1 [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that takes the correct source buffer as a parameter. Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)") Link: https://lore.kernel.org/r/90a0e8c924093cfa50a482880ad7e7edb73dc19a.1623309971.git.leonro@nvidia.com Signed-off-by: Alaa Hleihel <alaa@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10RDMA/mlx5: Delete right entry from MR signature databaseAharon Landau
The value mr->sig is stored in the entry upon mr allocation, however, ibmr is wrongly entered here as "old", therefore, xa_cmpxchg() does not replace the entry with NULL, which leads to the following trace: WARNING: CPU: 28 PID: 2078 at drivers/infiniband/hw/mlx5/main.c:3643 mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib] Modules linked in: nvme_rdma nvme_fabrics nvme_core 8021q garp mrp bonding bridge stp llc rfkill rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_tad CPU: 28 PID: 2078 Comm: reboot Tainted: G X --------- --- 5.13.0-0.rc2.19.el9.x86_64 #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.9.1 12/07/2018 RIP: 0010:mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib] Code: 8d bb 70 1f 00 00 be 00 01 00 00 e8 9d 94 ce da 48 3d 00 01 00 00 75 02 5b c3 0f 0b 5b c3 0f 0b 48 83 bb b0 20 00 00 00 74 d5 <0f> 0b eb d1 4 RSP: 0018:ffffa8db06d33c90 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff97f890a44000 RCX: ffff97f900ec0160 RDX: 0000000000000000 RSI: 0000000080080001 RDI: ffff97f890a44000 RBP: ffffffffc0c189b8 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000300 R12: ffff97f890a44000 R13: ffffffffc0c36030 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f0d5a8a3b40(0000) GS:ffff98077fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555acbf4f450 CR3: 00000002a6f56002 CR4: 00000000001706e0 Call Trace: mlx5r_remove+0x39/0x60 [mlx5_ib] auxiliary_bus_remove+0x1b/0x30 __device_release_driver+0x17a/0x230 device_release_driver+0x24/0x30 bus_remove_device+0xdb/0x140 device_del+0x18b/0x3e0 mlx5_detach_device+0x59/0x90 [mlx5_core] mlx5_unload_one+0x22/0x60 [mlx5_core] shutdown+0x31/0x3a [mlx5_core] pci_device_shutdown+0x34/0x60 device_shutdown+0x15b/0x1c0 __do_sys_reboot.cold+0x2f/0x5b ? vfs_writev+0xc7/0x140 ? handle_mm_fault+0xc5/0x290 ? do_writev+0x6b/0x110 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Link: https://lore.kernel.org/r/f3f585ea0db59c2a78f94f65eedeafc5a2374993.1623309971.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10RDMA: Verify port when creating flow ruleMaor Gottlieb
Validate port value provided by the user and with that remove no longer needed validation by the driver. The missing check in the mlx5_ib driver could cause to the below oops. Call trace: _create_flow_rule+0x2d4/0xf28 [mlx5_ib] mlx5_ib_create_flow+0x2d0/0x5b0 [mlx5_ib] ib_uverbs_ex_create_flow+0x4cc/0x624 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd4/0x150 [ib_uverbs] ib_uverbs_cmd_verbs.isra.7+0xb28/0xc50 [ib_uverbs] ib_uverbs_ioctl+0x158/0x1d0 [ib_uverbs] do_vfs_ioctl+0xd0/0xaf0 ksys_ioctl+0x84/0xb4 __arm64_sys_ioctl+0x28/0xc4 el0_svc_common.constprop.3+0xa4/0x254 el0_svc_handler+0x84/0xa0 el0_svc+0x10/0x26c Code: b9401260 f9615681 51000400 8b001c20 (f9403c1a) Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs") Link: https://lore.kernel.org/r/faad30dc5219a01727f47db3dc2f029d07c82c00.1623309971.git.leonro@nvidia.com Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-10KVM: x86: Fix fall-through warnings for ClangGustavo A. R. Silva
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple of warnings by explicitly adding break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Message-Id: <20210528200756.GA39320@embeddedor> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-10KVM: SVM: fix doc warningsChenXiaoSong
Fix kernel-doc warnings: arch/x86/kvm/svm/avic.c:233: warning: Function parameter or member 'activate' not described in 'avic_update_access_page' arch/x86/kvm/svm/avic.c:233: warning: Function parameter or member 'kvm' not described in 'avic_update_access_page' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'e' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'kvm' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'svm' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:781: warning: Function parameter or member 'vcpu_info' not described in 'get_pi_vcpu_info' arch/x86/kvm/svm/avic.c:1009: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Message-Id: <20210609122217.2967131-1-chenxiaosong2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-10KVM: selftests: Fix compiling errors when initializing the static structureYanan Wang
Errors like below were produced from test_util.c when compiling the KVM selftests on my local platform. lib/test_util.c: In function 'vm_mem_backing_src_alias': lib/test_util.c:177:12: error: initializer element is not constant .flag = anon_flags, ^~~~~~~~~~ lib/test_util.c:177:12: note: (near initialization for 'aliases[0].flag') The reason is that we are using non-const expressions to initialize the static structure, which will probably trigger a compiling error/warning on stricter GCC versions. Fix it by converting the two const variables "anon_flags" and "anon_huge_flags" into more stable macros. Fixes: b3784bc28ccc0 ("KVM: selftests: refactor vm_mem_backing_src_type flags") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Yanan Wang <wangyanan55@huawei.com> Message-Id: <20210610085418.35544-1-wangyanan55@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>