linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2022-03-29	ax25: fix UAF bug in ax25_send_control()	Duoming Zhou
	There are UAF bugs in ax25_send_control(), when we call ax25_release() to deallocate ax25_dev. The possible race condition is shown below: (Thread 1) \| (Thread 2) ax25_dev_device_up() //(1) \| \| ax25_kill_by_device() ax25_bind() //(2) \| ax25_connect() \| ... ax25->state = AX25_STATE_1 \| ... \| ax25_dev_device_down() //(3) (Thread 3) ax25_release() \| ax25_dev_put() //(4) FREE \| case AX25_STATE_1: \| ax25_send_control() \| alloc_skb() //USE \| The refcount of ax25_dev increases in position (1) and (2), and decreases in position (3) and (4). The ax25_dev will be freed before dereference sites in ax25_send_control(). The following is part of the report: [ 102.297448] BUG: KASAN: use-after-free in ax25_send_control+0x33/0x210 [ 102.297448] Read of size 8 at addr ffff888009e6e408 by task ax25_close/602 [ 102.297448] Call Trace: [ 102.303751] ax25_send_control+0x33/0x210 [ 102.303751] ax25_release+0x356/0x450 [ 102.305431] __sock_release+0x6d/0x120 [ 102.305431] sock_close+0xf/0x20 [ 102.305431] __fput+0x11f/0x420 [ 102.305431] task_work_run+0x86/0xd0 [ 102.307130] get_signal+0x1075/0x1220 [ 102.308253] arch_do_signal_or_restart+0x1df/0xc00 [ 102.308253] exit_to_user_mode_prepare+0x150/0x1e0 [ 102.308253] syscall_exit_to_user_mode+0x19/0x50 [ 102.308253] do_syscall_64+0x48/0x90 [ 102.308253] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 102.308253] RIP: 0033:0x405ae7 This patch defers the free operation of ax25_dev and net_device after all corresponding dereference sites in ax25_release() to avoid UAF. Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29	openvswitch: Fixed nd target mask field in the flow dump.	Martin Varghese
	IPv6 nd target mask was not getting populated in flow dump. In the function __ovs_nla_put_key the icmp code mask field was checked instead of icmp code key field to classify the flow as neighbour discovery. ufid:bdfbe3e5-60c2-43b0-a5ff-dfcac1c37328, recirc_id(0),dp_hash(0/0), skb_priority(0/0),in_port(ovs-nm1),skb_mark(0/0),ct_state(0/0), ct_zone(0/0),ct_mark(0/0),ct_label(0/0), eth(src=00:00:00:00:00:00/00:00:00:00:00:00, dst=00:00:00:00:00:00/00:00:00:00:00:00), eth_type(0x86dd), ipv6(src=::/::,dst=::/::,label=0/0,proto=58,tclass=0/0,hlimit=0/0,frag=no), icmpv6(type=135,code=0), nd(target=2001::2/::, sll=00:00:00:00:00:00/00:00:00:00:00:00, tll=00:00:00:00:00:00/00:00:00:00:00:00), packets:10, bytes:860, used:0.504s, dp:ovs, actions:ovs-nm2 Fixes: e64457191a25 (openvswitch: Restructure datapath.c and flow.c) Signed-off-by: Martin Varghese <martin.varghese@nokia.com> Link: https://lore.kernel.org/r/20220328054148.3057-1-martinvarghesenokia@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-29	nvme-multipath: fix hang when disk goes live over reconnect	Anton Eidelman
	nvme_mpath_init_identify() invoked from nvme_init_identify() fetches a fresh ANA log from the ctrl. This is essential to have an up to date path states for both existing namespaces and for those scan_work may discover once the ctrl is up. This happens in the following cases: 1) A new ctrl is being connected. 2) An existing ctrl is successfully reconnected. 3) An existing ctrl is being reset. While in (1) ctrl->namespaces is empty, (2 & 3) may have namespaces, and nvme_read_ana_log() may call nvme_update_ns_ana_state(). This result in a hang when the ANA state of an existing namespace changes and makes the disk live: nvme_mpath_set_live() issues IO to the namespace through the ctrl, which does NOT have IO queues yet. See sample hang below. Solution: - nvme_update_ns_ana_state() to call set_live only if ctrl is live - nvme_read_ana_log() call from nvme_mpath_init_identify() therefore only fetches and parses the ANA log; any erros in this process will fail the ctrl setup as appropriate; - a separate function nvme_mpath_update() is called in nvme_start_ctrl(); this parses the ANA log without fetching it. At this point the ctrl is live, therefore, disks can be set live normally. Sample failure: nvme nvme0: starting error recovery nvme nvme0: Reconnecting in 10 seconds... block nvme0n6: no usable path - requeuing I/O INFO: task kworker/u8:3:312 blocked for more than 122 seconds. Tainted: G E 5.14.5-1.el7.elrepo.x86_64 #1 Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp] Call Trace: __schedule+0x2a2/0x7e0 schedule+0x4e/0xb0 io_schedule+0x16/0x40 wait_on_page_bit_common+0x15c/0x3e0 do_read_cache_page+0x1e0/0x410 read_cache_page+0x12/0x20 read_part_sector+0x46/0x100 read_lba+0x121/0x240 efi_partition+0x1d2/0x6a0 bdev_disk_changed.part.0+0x1df/0x430 bdev_disk_changed+0x18/0x20 blkdev_get_whole+0x77/0xe0 blkdev_get_by_dev+0xd2/0x3a0 __device_add_disk+0x1ed/0x310 device_add_disk+0x13/0x20 nvme_mpath_set_live+0x138/0x1b0 [nvme_core] nvme_update_ns_ana_state+0x2b/0x30 [nvme_core] nvme_update_ana_state+0xca/0xe0 [nvme_core] nvme_parse_ana_log+0xac/0x170 [nvme_core] nvme_read_ana_log+0x7d/0xe0 [nvme_core] nvme_mpath_init_identify+0x105/0x150 [nvme_core] nvme_init_identify+0x2df/0x4d0 [nvme_core] nvme_init_ctrl_finish+0x8d/0x3b0 [nvme_core] nvme_tcp_setup_ctrl+0x337/0x390 [nvme_tcp] nvme_tcp_reconnect_ctrl_work+0x24/0x40 [nvme_tcp] process_one_work+0x1bd/0x360 worker_thread+0x50/0x3d0 Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29	nvme: fix RCU hole that allowed for endless looping in multipath round robin	Chris Leech
	Make nvme_ns_remove match the assumptions elsewhere. 1) !NVME_NS_READY needs to be srcu synchronized to make sure nothing is running in __nvme_find_path or nvme_round_robin_path that will re-assign this ns to current_path. 2) Any matching current_path entries need to be cleared before removing from the siblings list, to prevent calling nvme_round_robin_path with an "old" ns that's off list. 3) Finally the list_del_rcu can happen, and then synchronize again before releasing any reference counts. Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29	nvme: allow duplicate NSIDs for private namespaces	Sungup Moon
	A NVMe subsystem with multiple controller can have private namespaces that use the same NSID under some conditions: "If Namespace Management, ANA Reporting, or NVM Sets are supported, the NSIDs shall be unique within the NVM subsystem. If the Namespace Management, ANA Reporting, and NVM Sets are not supported, then NSIDs: a) for shared namespace shall be unique; and b) for private namespace are not required to be unique." Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec. Make sure this specific setup is supported in Linux. Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head") Signed-off-by: Sungup Moon <sungup.moon@samsung.com> [hch: refactored and fixed the controller vs subsystem based naming conflict] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2022-03-29	nvmet: remove redundant assignment after left shift	Colin Ian King
	The left shift is followed by a re-assignment back to cc_css, the assignment is redundant. Fix this by replacing the "<<=" operator with "<<" instead. This cleans up the clang scan build warning: drivers/nvme/target/core.c:1124:10: warning: Although the value stored to 'cc_css' is used in the enclosing expression, the value is never actually read from 'cc_css' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29	nvmet: use a private workqueue instead of the system workqueue	Sagi Grimberg
	Any attempt to flush kernel-global WQs has possibility of deadlock so we should simply stop using them, instead introduce nvmet_wq which is the generic nvmet workqueue for work elements that don't explicitly require a dedicated workqueue (by the mere fact that they are using the system_wq). Changes were done using the following replaces: - s/schedule_work(/queue_work(nvmet_wq, /g - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-28	selftests/bpf: Fix clang compilation errors	Yonghong Song
	llvm upstream patch ([1]) added to issue warning for code like void test() { int j = 0; for (int i = 0; i < 1000; i++) j++; return; } This triggered several errors in selftests/bpf build since compilation flag -Werror is used. ... test_lpm_map.c:212:15: error: variable 'n_matches' set but not used [-Werror,-Wunused-but-set-variable] size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups; ^ test_lpm_map.c:212:26: error: variable 'n_matches_after_delete' set but not used [-Werror,-Wunused-but-set-variable] size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups; ^ ... prog_tests/get_stack_raw_tp.c:32:15: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable] static __u64 cnt; ^ ... For test_lpm_map.c, 'n_matches'/'n_matches_after_delete' are changed to be volatile in order to silent the warning. I didn't remove these two declarations since they are referenced in a commented code which might be used by people in certain cases. For get_stack_raw_tp.c, the variable 'cnt' is removed. [1] https://reviews.llvm.org/D122271 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220325200304.2915588-1-yhs@fb.com
2022-03-28	Merge branch 'xsk: another round of fixes'	Alexei Starovoitov
	Maciej Fijalkowski says: ==================== Hello, yet another fixes for XSK from Magnus and me. Magnus addresses the fact that xp_alloc() can return NULL, so this needs to be handled to avoid clearing entries in the SW ring on driver side. Then he addresses the off-by-one problem in Tx desc cleaning routine for ice ZC driver. From my side, I am adding protection to ZC Rx processing loop so that cleaning of descriptors wouldn't go over already processed entries. Then I also fix an issue with assigning XSK pool to Tx queues. This is directed to bpf tree. Thanks! Maciej Fijalkowski (2): ice: xsk: stop Rx processing when ntc catches ntu ice: xsk: fix indexing in ice_tx_xsk_pool() ==================== Acked-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-28	ice: xsk: Fix indexing in ice_tx_xsk_pool()	Maciej Fijalkowski
	Ice driver tries to always create XDP rings array to be num_possible_cpus() sized, regardless of user's queue count setting that can be changed via ethtool -L for example. Currently, ice_tx_xsk_pool() calculates the qid by decrementing the ring->q_index by the count of XDP queues, but ring->q_index is set to 'i + vsi->alloc_txq'. When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool() will do OOB access and in the final result ring would not get xsk_pool pointer assigned. Then, each ice_xsk_wakeup() call will fail with error and it will not be possible to get into NAPI and do the processing from driver side. Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the setting of ring->q_index. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com
2022-03-28	ice: xsk: Stop Rx processing when ntc catches ntu	Maciej Fijalkowski
	This can happen with big budget values and some breakage of re-filling descriptors as we do not clear the entry that ntu is pointing at the end of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case that status_error0 has an old, uncleared value and ntc would go over with processing which would result in false results. Break Rx loop when ntc == ntu to avoid broken behavior. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-4-maciej.fijalkowski@intel.com
2022-03-28	ice: xsk: Eliminate unnecessary loop iteration	Magnus Karlsson
	The NIC Tx ring completion routine cleans entries from the ring in batches. However, it processes one more batch than it is supposed to. Note that this does not matter from a functionality point of view since it will not find a set DD bit for the next batch and just exit the loop. But from a performance perspective, it is faster to terminate the loop before and not issue an expensive read over PCIe to get the DD bit. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-3-maciej.fijalkowski@intel.com
2022-03-28	xsk: Do not write NULL in SW ring at allocation failure	Magnus Karlsson
	For the case when xp_alloc_batch() is used but the batched allocation cannot be used, there is a slow path that uses the non-batched xp_alloc(). When it fails to allocate an entry, it returns NULL. The current code wrote this NULL into the entry of the provided results array (pointer to the driver SW ring usually) and returned. This might not be what the driver expects and to make things simpler, just write successfully allocated xdp_buffs into the SW ring,. The driver might have information in there that is still important after an allocation failure. Note that at this point in time, there are no drivers using xp_alloc_batch() that could trigger this slow path. But one might get added. Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220328142123.170157-2-maciej.fijalkowski@intel.com
2022-03-28	Merge branch 'kprobes: rethook: x86: Replace kretprobe trampoline with rethook'	Alexei Starovoitov
	Masami Hiramatsu says: ==================== Here are the 3rd version for generic kretprobe and kretprobe on x86 for replacing the kretprobe trampoline with rethook. The previous version is here[1] [1] https://lore.kernel.org/all/164821817332.2373735.12048266953420821089.stgit@devnote2/T/#u This version fixed typo and build issues for bpf-next and CONFIG_RETHOOK=y error. I also add temporary mitigation lines for ANNOTATE_NOENDBR macro issue for bpf-next tree [2/4]. This will be removed after merging kernel IBT series. Background: This rethook came from Jiri's request of multiple kprobe for bpf[2]. He tried to solve an issue that starting bpf with multiple kprobe will take a long time because bpf-kprobe will wait for RCU grace period for sync rcu events. Jiri wanted to attach a single bpf handler to multiple kprobes and he tried to introduce multiple-probe interface to kprobe. So I asked him to use ftrace and kretprobe-like hook if it is only for the function entry and exit, instead of adding ad-hoc interface to kprobes. For this purpose, I introduced the fprobe (kprobe like interface for ftrace) with the rethook (this is a generic return hook feature for fprobe exit handler)[3]. [2] https://lore.kernel.org/all/20220104080943.113249-1-jolsa@kernel.org/T/#u [3] https://lore.kernel.org/all/164191321766.806991.7930388561276940676.stgit@devnote2/T/#u The rethook is basically same as the kretprobe trampoline. I just made it decoupled from kprobes. Eventually, the all arch dependent kretprobe trampolines will be replaced with the rethook trampoline instead of cloning and set HAVE_RETHOOK=y. When I port the rethook for all arch which supports kretprobe, the legacy kretprobe specific code (which is for CONFIG_KRETPROBE_ON_RETHOOK=n) will be removed eventually. ==================== Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-28	x86,kprobes: Fix optprobe trampoline to generate complete pt_regs	Masami Hiramatsu
	Currently the optprobe trampoline template code ganerate an almost complete pt_regs on-stack, everything except regs->ss. The 'regs->ss' points to the top of stack, which is not a valid segment decriptor. As same as the rethook does, complete the job by also pushing ss. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164826166027.2455864.14759128090648961900.stgit@devnote2
2022-03-28	x86,rethook: Fix arch_rethook_trampoline() to generate a complete pt_regs	Peter Zijlstra
	Currently arch_rethook_trampoline() generates an almost complete pt_regs on-stack, everything except regs->ss that is, that currently points to the fake return address, which is not a valid segment descriptor. Since interpretation of regs->[sb]p should be done in the context of regs->ss, and we have code actually doing that (see arch/x86/lib/insn-eval.c for instance), complete the job by also pushing ss. This ensures that anybody who does do look at regs->ss doesn't mysteriously malfunction, avoiding much future pain. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/bpf/164826164851.2455864.17272661073069737350.stgit@devnote2
2022-03-28	x86,rethook,kprobes: Replace kretprobe with rethook on x86	Masami Hiramatsu
	Replaces the kretprobe code with rethook on x86. With this patch, kretprobe on x86 uses the rethook instead of kretprobe specific trampoline code. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/164826163692.2455864.13745421016848209527.stgit@devnote2
2022-03-28	kprobes: Use rethook for kretprobe if possible	Masami Hiramatsu
	Use rethook for kretprobe function return hooking if the arch sets CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is set to 'y' automatically, and the kretprobe internal data fields switches to use rethook. If not, it continues to use kretprobe specific function return hooks. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164826162556.2455864.12255833167233452047.stgit@devnote2
2022-03-28	bpftool: Fix generated code in codegen_asserts	Jiri Olsa
	Arnaldo reported perf compilation fail with: $ make -k BUILD_BPF_SKEL=1 CORESIGHT=1 PYTHON=python3 ... In file included from util/bpf_counter.c:28: /tmp/build/perf//util/bpf_skel/bperf_leader.skel.h: In function ‘bperf_leader_bpf__assert’: /tmp/build/perf//util/bpf_skel/bperf_leader.skel.h:351:51: error: unused parameter ‘s’ [-Werror=unused-parameter] 351 \| bperf_leader_bpf__assert(struct bperf_leader_bpf *s) \| ~~~~~~~~~~~~~~~~~~~~~~~~~^ cc1: all warnings being treated as errors If there's nothing to generate in the new assert function, we will get unused 's' warn/error, adding 'unused' attribute to it. Fixes: 08d4dba6ae77 ("bpftool: Bpf skeletons assert type sizes") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/bpf/20220328083703.2880079-1-jolsa@kernel.org
2022-03-28	selftests/bpf: fix selftest after random: Urandom_read tracepoint removal	Andrii Nakryiko
	14c174633f34 ("random: remove unused tracepoints") removed all the tracepoints from drivers/char/random.c, one of which, random:urandom_read, was used by stacktrace_build_id selftest to trigger stack trace capture. Fix breakage by switching to kprobing urandom_read() function. Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220325225643.2606-1-andrii@kernel.org
2022-03-28	bpf: Fix maximum permitted number of arguments check	Yuntao Wang
	Since the m->arg_size array can hold up to MAX_BPF_FUNC_ARGS argument sizes, it's ok that nargs is equal to MAX_BPF_FUNC_ARGS. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220324164238.1274915-1-ytcoode@gmail.com
2022-03-28	bpf: Sync comments for bpf_get_stack	Geliang Tang
	Commit ee2a098851bf missed updating the comments for helper bpf_get_stack in tools/include/uapi/linux/bpf.h. Sync it. Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/ce54617746b7ed5e9ba3b844e55e74cb8a60e0b5.1648110794.git.geliang.tang@suse.com
2022-03-28	Merge branch 'fprobe: Fixes for Sparse and Smatch warnings'	Alexei Starovoitov
	Masami Hiramatsu says: ==================== Hi, These fprobe patches are for fixing the warnings by Smatch and sparse. This is arch independent part of the fixes. Thank you, --- ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-03-28	fprobe: Fix sparse warning for acccessing __rcu ftrace_hash	Masami Hiramatsu
	Since ftrace_ops::local_hash::filter_hash field is an __rcu pointer, we have to use rcu_access_pointer() to access it. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164802093635.1732982.4938094876018890866.stgit@devnote2
2022-03-28	fprobe: Fix smatch type mismatch warning	Masami Hiramatsu
	Fix the type mismatching warning of 'rethook_node vs fprobe_rethook_node' found by Smatch. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164802092611.1732982.12268174743437084619.stgit@devnote2
2022-03-28	bpf/bpftool: Add unprivileged_bpf_disabled check against value of 2	Milan Landaverde
	In [1], we added a kconfig knob that can set /proc/sys/kernel/unprivileged_bpf_disabled to 2 We now check against this value in bpftool feature probe [1] https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net Signed-off-by: Milan Landaverde <milan@mdaverde.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Acked-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20220322145012.1315376-1-milan@mdaverde.com
2022-03-28	dt-bindings: Fix missing '/schemas' in $ref paths	Rob Herring
	Absolute paths in $ref should always begin with '/schemas'. The tools mostly work with it omitted, but for correctness the path should be everything except the hostname as that is taken from the schema's $id value. This scheme is defined in the json-schema spec. Cc: Hector Martin <marcan@marcan.st> Cc: Sven Peter <sven@svenpeter.dev> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mukesh Savaliya <msavaliy@codeaurora.org> Cc: Akash Asthana <akashast@codeaurora.org> Cc: Bayi Cheng <bayi.cheng@mediatek.com> Cc: Chuanhong Guo <gch981213@gmail.com> Cc: Min Guo <min.guo@mediatek.com> Cc: netdev@vger.kernel.org Cc: linux-spi@vger.kernel.org Cc: linux-usb@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mark Brown <broonie@debian.org> Link: https://lore.kernel.org/r/20220325215652.525383-1-robh@kernel.org
2022-03-28	dt-bindings: media: mediatek,vcodec: Fix addressing cell sizes	Rob Herring
	'dma-ranges' in the example is written for cell sizes of 2 cells, but the schema and example specify sizes of 1 cell. As the h/w has a bus address of >32-bits, cell sizes of 2 is correct. Update the schema's '#address-cells' and '#size-cells' to be 2 and adjust the example throughout. There's no error currently because dtc only checks 'dma-ranges' is a correct multiple number of cells (3) and the schema checking is based on bracketing of entries. Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220301233501.2110047-1-robh@kernel.org
2022-03-28	dt-bindings: net: snps,dwmac: modify available values of PBL	Biao Huang
	PBL can be any of the following values: 1, 2, 4, 8, 16 or 32 according to the datasheet, so modify available values of PBL in snps,dwmac.yaml. Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220324012112.7016-2-biao.huang@mediatek.com
2022-03-28	dt-bindings: display: mediatek: Fix examples on new bindings	AngeloGioacchino Del Regno
	To avoid failure of dt_binding_check perform a slight refactoring of the examples: the main block is kept, but that required fixing the address and size cells, plus the inclusion of missing dt-bindings headers, required to parse some of the values assigned to various properties. Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Tested-by: jason-jh.lin <jason-jh.lin@medaitek.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220309134702.9942-5-jason-jh.lin@mediatek.com
2022-03-28	dt-bindings: display: mediatek, ovl: Fix 'iommu' required property typo	AngeloGioacchino Del Regno
	The property is called 'iommus' and not 'iommu'. Fix this typo. Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220309134702.9942-4-jason-jh.lin@mediatek.com
2022-03-28	dt-bindings: display: mediatek, mutex: Fix mediatek, gce-events type	AngeloGioacchino Del Regno
	The mediatek,gce-events property needs as value an array of uint32 corresponding to the CMDQ events to listen to, and not any phandle. Fixes: 4ed545e7d100 ("dt-bindings: display: mediatek: disp: split each block to individual yaml") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220309134702.9942-3-jason-jh.lin@mediatek.com
2022-03-28	Revert "dt-bindings: display: mediatek: add ethdr definition for mt8195"	jason-jh.lin
	This reverts commit e7dcfe64204a5cd9a74a9ca7d9c7a22434dc7fe5. Because examples property of mediatek,ethdr.yaml should base on [1][2]. Reverting it until [1][2] are applied. [1] dt-bindings: mediatek: mt8195: Add binding for MM IOMMU https://patchwork.kernel.org/project/linux-mediatek/patch/20220217113453.13658-2-yong.wu@mediatek.com/ [2] dt-bindings: reset: mt8195: add vdosys1 reset control bit https://patchwork.kernel.org/project/linux-mediatek/patch/20220222100741.30138-5-nancy.lin@mediatek.com/ Signed-off-by: jason-jh.lin <jason-jh.lin@mediatek.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220309134702.9942-2-jason-jh.lin@mediatek.com
2022-03-28	Merge tag 'ptrace-cleanups-for-v5.18' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace cleanups from Eric Biederman: "This set of changes removes tracehook.h, moves modification of all of the ptrace fields inside of siglock to remove races, adds a missing permission check to ptrace.c The removal of tracehook.h is quite significant as it has been a major source of confusion in recent years. Much of that confusion was around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the semantics clearer). For people who don't know tracehook.h is a vestiage of an attempt to implement uprobes like functionality that was never fully merged, and was later superseeded by uprobes when uprobes was merged. For many years now we have been removing what tracehook functionaly a little bit at a time. To the point where anything left in tracehook.h was some weird strange thing that was difficult to understand" * tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: Remove duplicated include in ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE ptrace: Return the signal to continue with from ptrace_stop ptrace: Move setting/clearing ptrace_message into ptrace_stop tracehook: Remove tracehook.h resume_user_mode: Move to resume_user_mode.h resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume signal: Move set_notify_signal and clear_notify_signal into sched/signal.h task_work: Decouple TIF_NOTIFY_SIGNAL and task_work task_work: Call tracehook_notify_signal from get_signal on all architectures task_work: Introduce task_work_pending task_work: Remove unnecessary include from posix_timers.h ptrace: Remove tracehook_signal_handler ptrace: Remove arch_syscall_{enter,exit}_tracehook ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h ptrace/arm: Rename tracehook_report_syscall report_syscall ptrace: Move ptrace_report_syscall into ptrace.h
2022-03-28	Merge tag 'ucount-rlimit-for-v5.18' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull shm ucounts fix from Eric Biederman: "The introduction of a new failure mode when the code was converted to ucounts resulted in user_shm_lock misbehaving. The change simplifies the code to make the code easier to follow and removes the known misbehaviors" * tag 'ucount-rlimit-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: mm/mlock: fix two bugs in user_shm_lock()
2022-03-28	Merge tag 'net-5.18-rc0' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - llc: only change llc->dev when bind() succeeds, fix null-deref Current release - new code bugs: - smc: fix a memory leak in smc_sysctl_net_exit() - dsa: realtek: make interface drivers depend on OF Previous releases - regressions: - sched: act_ct: fix ref leak when switching zones Previous releases - always broken: - netfilter: egress: report interface as outgoing - vsock/virtio: enable VQs early on probe and finish the setup before using them Misc: - memcg: enable accounting for nft objects" * tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (39 commits) Revert "selftests: net: Add tls config dependency for tls selftests" net/smc: Send out the remaining data in sndbuf before close net: move net_unlink_todo() out of the header net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator net: bnxt_ptp: fix compilation error selftests: net: Add tls config dependency for tls selftests memcg: enable accounting for nft objects net/sched: act_ct: fix ref leak when switching zones net/smc: fix a memory leak in smc_sysctl_net_exit() selftests: tls: skip cmsg_to_pipe tests with TLS=n octeontx2-af: initialize action variable net: sparx5: switchdev: fix possible NULL pointer dereference net/x25: Fix null-ptr-deref caused by x25_disconnect qlcnic: dcb: default to returning -EOPNOTSUPP net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL net: hns3: fix phy can not link up when autoneg off and reset net: hns3: add NULL pointer check for hns3_set/get_ringparam() net: hns3: add netdev reset check for hns3_set_tunable() net: hns3: clean residual vf config after disable sriov net: hns3: add max order judgement for tx spare buffer ...
2022-03-28	XArray: Fix xas_create_range() when multi-order entry present	Matthew Wilcox (Oracle)
	If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinterpret that entry as a node and dereference xa_node->parent, generally leading to a crash that looks something like this: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0 RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline] RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725 It's deterministically reproducable once you know what the problem is, but producing it in a live kernel requires khugepaged to hit a race. While the problem has been present since xas_create_range() was introduced, I'm not aware of a way to hit it before the page cache was converted to use multi-index entries. Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Reported-by: syzbot+0d2b0bf32ca5cfd09f2e@syzkaller.appspotmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-28	Revert "selftests: net: Add tls config dependency for tls selftests"	Jakub Kicinski
	This reverts commit d9142e1cf3bbdaf21337767114ecab26fe702d47. The test is supposed to run cleanly with TLS is disabled, to test compatibility with TCP behavior. I can't repro the failure [1], the problem should be debugged rather than papered over. Link: https://lore.kernel.org/all/20220325161203.7000698c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [1] Fixes: d9142e1cf3bb ("selftests: net: Add tls config dependency for tls selftests") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220328212904.2685395-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-28	net/smc: Send out the remaining data in sndbuf before close	Wen Gu
	The current autocork algorithms will delay the data transmission in BH context to smc_release_cb() when sock_lock is hold by user. So there is a possibility that when connection is being actively closed (sock_lock is hold by user now), some corked data still remains in sndbuf, waiting to be sent by smc_release_cb(). This will cause: - smc_close_stream_wait(), which is called under the sock_lock, has a high probability of timeout because data transmission is delayed until sock_lock is released. - Unexpected data sends may happen after connction closed and use the rtoken which has been deleted by remote peer through LLC_DELETE_RKEY messages. So this patch will try to send out the remaining corked data in sndbuf before active close process, to ensure data integrity and avoid unexpected data transmission after close. Reported-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Fixes: 6b88af839d20 ("net/smc: don't send in the BH context if sock_owned_by_user") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/1648447836-111521-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-28	smb3: cleanup and clarify status of tree connections	Steve French
	Currently the way the tid (tree connection) status is tracked is confusing. The same enum is used for structs cifs_tcon and cifs_ses and TCP_Server_info, but each of these three has different states that they transition among. The current code also unnecessarily uses camelCase. Convert from use of statusEnum to a new tid_status_enum for tree connections. The valid states for a tid are: TID_NEW = 0, TID_GOOD, TID_EXITING, TID_NEED_RECON, TID_NEED_TCON, TID_IN_TCON, TID_NEED_FILES_INVALIDATE, /* unused, considering removing in future */ TID_IN_FILES_INVALIDATE It also removes CifsNeedTcon, CifsInTcon, CifsNeedFilesInvalidate and CifsInFilesInvalidate from the statusEnum used for session and TCP_Server_Info since they are not relevant for those. A follow on patch will fix the places where we use the tcon->need_reconnect flag to be more consistent with the tid->status. Also fixes a bug that was: Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-28	Merge tag 'kgdb-5.18-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux Pull kgdb update from Daniel Thompson: "Only a single patch this cycle. Fix an obvious mistake with the kdb memory accessors. It was a stupid mistake (to/from backwards) but it has been there for a long time since many architectures tolerated it with surprisingly good grace" * tag 'kgdb-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux: kdb: Fix the putarea helper function
2022-03-28	Merge tag 'hexagon-5.18-0' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux Pull hexagon update from Brian Cain: "Maintainer email update" * tag 'hexagon-5.18-0' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux: MAINTAINERS: update hexagon maintainer email, tree
2022-03-28	Merge tag 'microblaze-v5.18' of git://git.monstr.eu/linux-2.6-microblaze	Linus Torvalds
	Pull microblaze updates from Michal Simek: - Small fixups - Remove unused pci_phys_mem_access_prot() * tag 'microblaze-v5.18' of git://git.monstr.eu/linux-2.6-microblaze: microblaze/PCI: Remove pci_phys_mem_access_prot() dead code microblaze: add const to of_device_id microblaze: fix typo in a comment
2022-03-28	net: move net_unlink_todo() out of the header	Johannes Berg
	There's no reason for this to be in netdevice.h, it's all just used in dev.c. Also make it no longer inline and let the compiler decide to do that by itself. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20220325225023.f49b9056fe1c.I6b901a2df00000837a9bd251a8dd259bd23f5ded@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-28	net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator	Xiaomeng Tong
	The bug is here: return rule; The list iterator value 'rule' will always be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, return 'rule' when found, otherwise return NULL. Fixes: ae7a5aff783c7 ("net: dsa: bcm_sf2: Keep copy of inserted rules") Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Link: https://lore.kernel.org/r/20220328032431.22538-1-xiam0nd.tong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-28	Merge tag 'livepatching-for-5.18' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching updates from Petr Mladek: - Forced transitions block only to-be-removed livepatches [Chengming] - Detect when ftrace handler could not be disabled in self-tests [David] - Calm down warning from a static analyzer [Tom] * tag 'livepatching-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Reorder to use before freeing a pointer livepatch: Don't block removal of patches that are safe to unload livepatch: Skip livepatch tests if ftrace cannot be configured
2022-03-28	Documentation: kunit: Fix cross-referencing warnings	David Gow
	The Architecture chapter of the KUnit documentation tried to include copies of the kernel-doc for a couple of things, despite these already existing in the API documentation. This lead to some warnings: architecture:31: ./include/kunit/test.h:3: WARNING: Duplicate C declaration, also defined at dev-tools/kunit/api/test:66. Declaration is '.. c:struct:: kunit_case'. architecture:163: ./include/kunit/test.h:1217: WARNING: Duplicate C declaration, also defined at dev-tools/kunit/api/test:1217. Declaration is '.. c:macro:: KUNIT_ARRAY_PARAM'. architecture.rst:3: WARNING: Duplicate C declaration, also defined at dev-tools/kunit/api/test:66. Declaration is '.. c:struct:: kunit_case'. architecture.rst:1217: WARNING: Duplicate C declaration, also defined at dev-tools/kunit/api/test:1217. Declaration is '.. c:macro:: KUNIT_ARRAY_PARAM'. Get rid of these, and cleanup the mentions of the struct and macro in question so that sphinx generates a link to the existing copy of the documentation in the api/test document. Fixes: bc145b370c11 ("Documentation: KUnit: Added KUnit Architecture") Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Brendan Higgins <brendanhiggins@google.com> Tested-by: Brendan Higgins <brendanhiggins@google.com> Link: https://lore.kernel.org/r/20220326054414.637293-1-davidgow@google.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-28	Merge tag 'for-linus-5.18-rc1-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - A bunch of minor cleanups - A fix for kexec in Xen dom0 when executed on a high cpu number - A fix for resuming after suspend of a Xen guest with assigned PCI devices - A fix for a crash due to not disabled preemption when resuming as Xen dom0 * tag 'for-linus-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: fix is_xen_pmu() xen: don't hang when resuming PCI device arch:x86:xen: Remove unnecessary assignment in xen_apic_read() xen/grant-table: remove readonly parameter from functions xen/grant-table: remove gnttab_transfer() functions drivers/xen: use helper macro __ATTR_RW x86/xen: Fix kerneldoc warning xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32 xen: use time_is_before_eq_jiffies() instead of open coding it
2022-03-28	s390/alternatives: avoid using jgnop mnemonic	Vasily Gorbik
	jgnop mnemonic is only available since binutils 2.36, kernel minimal required version is 2.23. Stick to brcl to avoid build errors. Reported-by: Nathan Chancellor <nathan@kernel.org> Fixes: 4afeb670710e ("s390/alternatives: use instructions instead of byte patterns") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-28	vdpa/mlx5: Avoid processing works if workqueue was destroyed	Eli Cohen
	If mlx5_vdpa gets unloaded while a VM is running, the workqueue will be destroyed. However, vhost might still have reference to the kick function and might attempt to push new works. This could lead to null pointer dereference. To fix this, set mvdev->wq to NULL just before destroying and verify that the workqueue is not NULL in mlx5_vdpa_kick_vq before attempting to push a new work. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220321141303.9586-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>