summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2021-10-28Merge branch irq/remove-handle-domain-irq-20211026 into irq/irqchip-nextMarc Zyngier
* irq/remove-handle-domain-irq-20211026: : Large rework of the architecture entry code from Mark Rutland. : From the cover letter: : : <quote> : The handle_domain_{irq,nmi}() functions were oringally intended as a : convenience, but recent rework to entry code across the kernel tree has : demonstrated that they cause more pain than they're worth and prevent : architectures from being able to write robust entry code. : : This series reworks the irq code to remove them, handling the necessary : entry work consistently in entry code (be it architectural or generic). : </quote> MIPS: irq: Avoid an unused-variable error irq: remove handle_domain_{irq,nmi}() irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY irq: riscv: perform irqentry in entry code irq: openrisc: perform irqentry in entry code irq: csky: perform irqentry in entry code irq: arm64: perform irqentry in entry code irq: arm: perform irqentry in entry code irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ irq: add generic_handle_arch_irq() irq: unexport handle_irq_desc() irq: simplify handle_domain_{irq,nmi}() irq: mips: simplify do_domain_IRQ() irq: mips: stop (ab)using handle_domain_irq() irq: mips: simplify bcm6345_l1_irq_handle() irq: mips: avoid nested irq_enter() Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-10-27tracing: Do not warn when connecting eprobe to non existing eventSteven Rostedt (VMware)
When the syscall trace points are not configured in, the kselftests for ftrace will try to attach an event probe (eprobe) to one of the system call trace points. This triggered a WARNING, because the failure only expects to see memory issues. But this is not the only failure. The user may attempt to attach to a non existent event, and the kernel must not warn about it. Link: https://lkml.kernel.org/r/20211027120854.0680aa0f@gandalf.local.home Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27bpf: Use u64_stats_t in struct bpf_prog_statsEric Dumazet
Commit 316580b69d0a ("u64_stats: provide u64_stats_t type") fixed possible load/store tearing on 64bit arches. For instance the following C code stats->nsecs += sched_clock() - start; Could be rightfully implemented like this by a compiler, confusing concurrent readers a lot: stats->nsecs += sched_clock(); // arbitrary delay stats->nsecs -= start; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
2021-10-27bpf: Fixes possible race in update_prog_stats() for 32bit archesEric Dumazet
It seems update_prog_stats() suffers from same issue fixed in the prior patch: As it can run while interrupts are enabled, it could be re-entered and the u64_stats syncp could be mangled. Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211026214133.3114279-3-eric.dumazet@gmail.com
2021-10-27tracing: Show size of requested perf bufferRobin H. Johnson
If the perf buffer isn't large enough, provide a hint about how large it needs to be for whatever is running. Link: https://lkml.kernel.org/r/20210831043723.13481-1-robbat2@gentoo.org Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27ftrace: do CPU checking after preemption disabled王贇
With CONFIG_DEBUG_PREEMPT we observed reports like: BUG: using smp_processor_id() in preemptible caller is perf_ftrace_function_call+0x6f/0x2e0 CPU: 1 PID: 680 Comm: a.out Not tainted Call Trace: <TASK> dump_stack_lvl+0x8d/0xcf check_preemption_disabled+0x104/0x110 ? optimize_nops.isra.7+0x230/0x230 ? text_poke_bp_batch+0x9f/0x310 perf_ftrace_function_call+0x6f/0x2e0 ... __text_poke+0x5/0x620 text_poke_bp_batch+0x9f/0x310 This telling us the CPU could be changed after task is preempted, and the checking on CPU before preemption will be invalid. Since now ftrace_test_recursion_trylock() will help to disable the preemption, this patch just do the checking after trylock() to address the issue. Link: https://lkml.kernel.org/r/54880691-5fe2-33e7-d12f-1fa6136f5183@linux.alibaba.com CC: Steven Rostedt <rostedt@goodmis.org> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27ftrace: disable preemption when recursion locked王贇
As the documentation explained, ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() were supposed to disable and enable preemption properly, however currently this work is done outside of the function, which could be missing by mistake. And since the internal using of trace_test_and_set_recursion() and trace_clear_recursion() also require preemption disabled, we can just merge the logical. This patch will make sure the preemption has been disabled when trace_test_and_set_recursion() return bit >= 0, and trace_clear_recursion() will enable the preemption if previously enabled. Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com CC: Petr Mladek <pmladek@suse.com> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Miroslav Benes <mbenes@suse.cz> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> [ Removed extra line in comment - SDR ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-27fsnotify: clarify contract for create event hooksAmir Goldstein
Clarify argument names and contract for fsnotify_create() and fsnotify_mkdir() to reflect the anomaly of kernfs, which leaves dentries negavite after mkdir/create. Remove the WARN_ON(!inode) in audit code that were added by the Fixes commit under the wrong assumption that dentries cannot be negative after mkdir/create. Fixes: aa93bdc5500c ("fsnotify: use helpers to access data by data_type") Link: https://lore.kernel.org/linux-fsdevel/87mtp5yz0q.fsf@collabora.com/ Link: https://lore.kernel.org/r/20211025192746.66445-4-krisman@collabora.com Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27dma-mapping: use 'bitmap_zalloc()' when applicableChristophe JAILLET
'dma_mem->bitmap' is a bitmap. So use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid some open-coded arithmetic in allocator arguments. Also change the corresponding 'kfree()' into 'bitmap_free()' to keep consistency. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-26tracing/histogram: Optimize division by a power of 2Kalesh Singh
The division is a slow operation. If the divisor is a power of 2, use a shift instead. Results were obtained using Android's version of perf (simpleperf[1]) as described below: 1. hist_field_div() is modified to call 2 test functions: test_hist_field_div_[not]_optimized(); passing them the same args. Use noinline and volatile to ensure these are not optimized out by the compiler. 2. Create a hist event trigger that uses division: events/kmem/rss_stat$ echo 'hist:keys=common_pid:x=size/<divisor>' >> trigger events/kmem/rss_stat$ echo 'hist:keys=common_pid:vals=$x' >> trigger 3. Run Android's lmkd_test[2] to generate rss_stat events, and record CPU samples with Android's simpleperf: simpleperf record -a --exclude-perf --post-unwind=yes -m 16384 -g -f 2000 -o perf.data == Results == Divisor is a power of 2 (divisor == 32): test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles If the divisor is a power of 2, the optimized version is ~5.3x faster. Divisor is not a power of 2 (divisor == 33): test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles If the divisor is not a power of 2, as expected, the optimized version is slightly slower (~24% slower). [1] https://android.googlesource.com/platform/system/extras/+/master/simpleperf/doc/README.md [2] https://cs.android.com/android/platform/superproject/+/master:system/memory/lmkd/tests/lmkd_test.cpp Link: https://lkml.kernel.org/r/20211025200852.3002369-7-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Covert expr to const if both operands are constantsKalesh Singh
If both operands of a hist trigger expression are constants, convert the expression to a constant. This optimization avoids having to perform the same calculation multiple times and also saves on memory since the merged constants are represented by a single struct hist_field instead or multiple. Link: https://lkml.kernel.org/r/20211025200852.3002369-6-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing/histogram: Simplify handling of .sym-offset in expressionsKalesh Singh
The '-' in .sym-offset can confuse the hist trigger arithmetic expression parsing. Simplify the handling of this by replacing the 'sym-offset' with 'symXoffset'. This allows us to correctly evaluate expressions where the user may have inadvertently added a .sym-offset modifier to one of the operands in an expression, instead of bailing out. In this case the .sym-offset has no effect on the evaluation of the expression. The only valid use of the .sym-offset is as a hist key modifier. Link: https://lkml.kernel.org/r/20211025200852.3002369-5-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Fix operator precedence for hist triggers expressionKalesh Singh
The current histogram expression evaluation logic evaluates the expression from right to left. This can lead to incorrect results if the operations are not associative (as is the case for subtraction and, the now added, division operators). e.g. 16-8-4-2 should be 2 not 10 --> 16-8-4-2 = ((16-8)-4)-2 64/8/4/2 should be 1 not 16 --> 64/8/4/2 = ((64/8)/4)/2 Division and multiplication are currently limited to single operation expression due to operator precedence support not yet implemented. Rework the expression parsing to support the correct evaluation of expressions containing operators of different precedences; and fix the associativity error by evaluating expressions with operators of the same precedence from left to right. Examples: (1) echo 'hist:keys=common_pid:a=8,b=4,c=2,d=1,w=$a-$b-$c-$d' \ >> event/trigger (2) echo 'hist:keys=common_pid:x=$a/$b/3/2' >> event/trigger (3) echo 'hist:keys=common_pid:y=$a+10/$c*1024' >> event/trigger (4) echo 'hist:keys=common_pid:z=$a/$b+$c*$d' >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Add division and multiplication support for hist triggersKalesh Singh
Adds basic support for division and multiplication operations for hist trigger variable expressions. For simplicity this patch only supports, division and multiplication for a single operation expression (e.g. x=$a/$b), as currently expressions are always evaluated right to left. This can lead to some incorrect results: e.g. echo 'hist:keys=common_pid:x=8-4-2' >> event/trigger 8-4-2 should evaluate to 2 i.e. (8-4)-2 but currently x evaluate to 6 i.e. 8-(4-2) Multiplication and division in sub-expressions will work correctly, once correct operator precedence support is added (See next patch in this series). For the undefined case of division by 0, the histogram expression evaluates to (u64)(-1). Since this cannot be detected when the expression is created, it is the responsibility of the user to be aware and account for this possibility. Examples: echo 'hist:keys=common_pid:a=8,b=4,x=$a/$b' \ >> event/trigger echo 'hist:keys=common_pid:y=5*$b' \ >> event/trigger Link: https://lkml.kernel.org/r/20211025200852.3002369-3-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Add support for creating hist trigger variables from literalKalesh Singh
Currently hist trigger expressions don't support the use of numeric literals: e.g. echo 'hist:keys=common_pid:x=$y-1234' --> is not valid expression syntax Having the ability to use numeric constants in hist triggers supports a wider range of expressions for creating variables. Add support for creating trace event histogram variables from numeric literals. e.g. echo 'hist:keys=common_pid:x=1234,y=size-1024' >> event/trigger A negative numeric constant is created, using unary minus operator (parentheses are required). e.g. echo 'hist:keys=common_pid:z=-(2)' >> event/trigger Constants can be used with division/multiplication (added in the next patch in this series) to implement granularity filters for frequent trace events. For instance we can limit emitting the rss_stat trace event to when there is a 512KB cross over in the rss size: # Create a synthetic event to monitor instead of the high frequency # rss_stat event echo 'rss_stat_throttled unsigned int mm_id; unsigned int curr; int member; long size' >> tracing/synthetic_events # Create a hist trigger that emits the synthetic rss_stat_throttled # event only when the rss size crosses a 512KB boundary. echo 'hist:keys=keys=mm_id,member:bucket=size/0x80000:onchange($bucket) .rss_stat_throttled(mm_id,curr,member,size)' >> events/kmem/rss_stat/trigger A use case for using constants with addition/subtraction is not yet known, but for completeness the use of constants are supported for all operators. Link: https://lkml.kernel.org/r/20211025200852.3002369-2-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-10-26 We've added 12 non-merge commits during the last 7 day(s) which contain a total of 23 files changed, 118 insertions(+), 98 deletions(-). The main changes are: 1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen. 2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang. 3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai. 4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer. 5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun. 6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian. 7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix potential race in tail call compatibility check bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET selftests/bpf: Use recv_timeout() instead of retries net: Implement ->sock_is_readable() for UDP and AF_UNIX skmsg: Extract and reuse sk_msg_is_readable() net: Rename ->stream_memory_read to ->sock_is_readable tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function cgroup: Fix memory leak caused by missing cgroup_bpf_offline bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch() bpf: Prevent increasing bpf_jit_limit above max bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT bpf: Define bpf_jit_alloc_exec_limit for riscv JIT ==================== Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26test_kprobes: Move it from kernel/ to lib/Tiezhu Yang
Since config KPROBES_SANITY_TEST is in lib/Kconfig.debug, it is better to let test_kprobes.c in lib/, just like other similar tests found in lib/. Link: https://lkml.kernel.org/r/1635213091-24387-4-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26kprobes: Add a test case for stacktrace from kretprobe handlerMasami Hiramatsu
Add a test case for stacktrace from kretprobe handler and nested kretprobe handlers. This test checks both of stack trace inside kretprobe handler and stack trace from pt_regs. Those stack trace must include actual function return address instead of kretprobe trampoline. The nested kretprobe stacktrace test checks whether the unwinder can correctly unwind the call frame on the stack which has been modified by the kretprobe. Since the stacktrace on kretprobe is correctly fixed only on x86, this introduces a meta kconfig ARCH_CORRECT_STACKTRACE_ON_KRETPROBE which tells user that the stacktrace on kretprobe is correct or not. The test results will be shown like below; TAP version 14 1..1 # Subtest: kprobes_test 1..6 ok 1 - test_kprobe ok 2 - test_kprobes ok 3 - test_kretprobe ok 4 - test_kretprobes ok 5 - test_stacktrace_on_kretprobe ok 6 - test_stacktrace_on_nested_kretprobe # kprobes_test: pass:6 fail:0 skip:0 total:6 # Totals: pass:6 fail:0 skip:0 total:6 ok 1 - kprobes_test Link: https://lkml.kernel.org/r/163516211244.604541.18350507860972214415.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26bpf: Fix potential race in tail call compatibility checkToke Høiland-Jørgensen
Lorenzo noticed that the code testing for program type compatibility of tail call maps is potentially racy in that two threads could encounter a map with an unset type simultaneously and both return true even though they are inserting incompatible programs. The race window is quite small, but artificially enlarging it by adding a usleep_range() inside the check in bpf_prog_array_compatible() makes it trivial to trigger from userspace with a program that does, essentially: map_fd = bpf_create_map(BPF_MAP_TYPE_PROG_ARRAY, 4, 4, 2, 0); pid = fork(); if (pid) { key = 0; value = xdp_fd; } else { key = 1; value = tc_fd; } err = bpf_map_update_elem(map_fd, &key, &value, 0); While the race window is small, it has potentially serious ramifications in that triggering it would allow a BPF program to tail call to a program of a different type. So let's get rid of it by protecting the update with a spinlock. The commit in the Fixes tag is the last commit that touches the code in question. v2: - Use a spinlock instead of an atomic variable and cmpxchg() (Alexei) v3: - Put lock and the members it protects into an embedded 'owner' struct (Daniel) Fixes: 3324b584b6f6 ("ebpf: misc core cleanup") Reported-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211026110019.363464-1-toke@redhat.com
2021-10-26PM: suspend: Use valid_state() consistentlyRafael J. Wysocki
Make valid_state() check if the ->enter callback is present in suspend_ops (only PM_SUSPEND_TO_IDLE can be valid otherwise) and make sleep_state_supported() call valid_state() consistently to validate the states other than PM_SUSPEND_TO_IDLE. While at it, clean up the comment in valid_state(). No expected functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-26PM: sleep: Pause cpuidle later and resume it earlier during system transitionsRafael J. Wysocki
Commit 8651f97bd951 ("PM / cpuidle: System resume hang fix with cpuidle") that introduced cpuidle pausing during system suspend did that to work around a platform firmware issue causing systems to hang during resume if CPUs were allowed to enter idle states in the system suspend and resume code paths. However, pausing cpuidle before the last phase of suspending devices is the source of an otherwise arbitrary difference between the suspend-to-idle path and other system suspend variants, so it is cleaner to do that later, before taking secondary CPUs offline (it is still safer to take secondary CPUs offline with cpuidle paused, though). Modify the code accordingly, but in order to avoid code duplication, introduce new wrapper functions, pm_sleep_disable_secondary_cpus() and pm_sleep_enable_secondary_cpus(), to combine cpuidle_pause() and cpuidle_resume(), respectively, with the handling of secondary CPUs during system-wide transitions to sleep states. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26PM: suspend: Do not pause cpuidle in the suspend-to-idle pathRafael J. Wysocki
It is pointless to pause cpuidle in the suspend-to-idle path, because it is going to be resumed in the same path later and pausing it does not serve any particular purpose in that case. Rework the code to avoid doing that. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
2021-10-26tracing/hwlat: Make some internal symbols staticWang ShaoBo
The sparse tool complains as follows: kernel/trace/trace_hwlat.c:82:27: warning: symbol 'hwlat_single_cpu_data' was not declared. Should it be static? kernel/trace/trace_hwlat.c:83:1: warning: symbol '__pcpu_scope_hwlat_per_cpu_data' was not declared. Should it be static? This symbol is not used outside of trace_hwlat.c, so this commit marks it static. Link: https://lkml.kernel.org/r/20211021035225.1050685-1-bobo.shaobowang@huawei.com Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26tracing: Fix missing trace_boot_init_histograms kstrdup NULL checksMathieu Desnoyers
trace_boot_init_histograms misses NULL pointer checks for kstrdup failure. Link: https://lkml.kernel.org/r/20211015195550.22742-1-mathieu.desnoyers@efficios.com Fixes: 64dc7f6958ef5 ("tracing/boot: Show correct histogram error command") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-26genirq: Hide irq_cpu_{on,off}line() behind a deprecated optionMarc Zyngier
irq_cpu_{on,off}line() are now only used by the Octeon platform. Make their use conditional on this plaform being enabled, and otherwise hidden away. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20211021170414.3341522-4-maz@kernel.org
2021-10-26irq: remove handle_domain_{irq,nmi}()Mark Rutland
Now that entry code handles IRQ entry (including setting the IRQ regs) before calling irqchip code, irqchip code can safely call generic_handle_domain_irq(), and there's no functional reason for it to call handle_domain_irq(). Let's cement this split of responsibility and remove handle_domain_irq() entirely, updating irqchip drivers to call generic_handle_domain_irq(). For consistency, handle_domain_nmi() is similarly removed and replaced with a generic_handle_domain_nmi() function which also does not perform any entry logic. Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire when they were called in an inappropriate context. So that we can identify similar issues going forward, similar WARN_ON_ONCE() logic is added to the generic_handle_*() functions, and comments are updated for clarity and consistency. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-26irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRYMark Rutland
Now that all users of CONFIG_HANDLE_DOMAIN_IRQ perform the irq entry work themselves, we can remove the legacy CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY behaviour. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-26signal: Add an optional check for altstack sizeThomas Gleixner
New x86 FPU features will be very large, requiring ~10k of stack in signal handlers. These new features require a new approach called "dynamic features". The kernel currently tries to ensure that altstacks are reasonably sized. Right now, on x86, sys_sigaltstack() requires a size of >=2k. However, that 2k is a constant. Simply raising that 2k requirement to >10k for the new features would break existing apps which have a compiled-in size of 2k. Instead of universally enforcing a larger stack, prohibit a process from using dynamic features without properly-sized altstacks. This must be enforced in two places: * A dynamic feature can not be enabled without an large-enough altstack for each process thread. * Once a dynamic feature is enabled, any request to install a too-small altstack will be rejected The dynamic feature enabling code must examine each thread in a process to ensure that the altstacks are large enough. Add a new lock (sigaltstack_lock()) to ensure that threads can not race and change their altstack after being examined. Add the infrastructure in form of a config option and provide empty stubs for architectures which do not need dynamic altstack size checks. This implementation will be fleshed out for x86 in a future patch called x86/arch_prctl: Add controls for dynamic XSTATE components [dhansen: commit message. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-2-chang.seok.bae@intel.com
2021-10-25trace/timerlat: Add migrate-disabled field to the timerlat headerDaniel Bristot de Oliveira
Since "54357f0c9149 tracing: Add migrate-disabled counter to tracing output," the migrate disabled field is also printed in the !PREEMPR_RT kernel config. While this information was added to the vast majority of tracers, osnoise and timerlat were not updated (because they are new tracers). Fix timerlat header by adding the information about migrate disabled. Link: https://lkml.kernel.org/r/bc0c234ab49946cdd63effa6584e1d5e8662cb44.1634308385.git.bristot@kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: 54357f0c9149 ("tracing: Add migrate-disabled counter to tracing output.") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25trace/osnoise: Add migrate-disabled field to the osnoise headerDaniel Bristot de Oliveira
Since "54357f0c9149 tracing: Add migrate-disabled counter to tracing output," the migrate disabled field is also printed in the !PREEMPR_RT kernel config. While this information was added to the vast majority of tracers, osnoise and timerlat were not updated (because they are new tracers). Fix osnoise header by adding the information about migrate disabled. Link: https://lkml.kernel.org/r/9cb3d54e29e0588dbba12e81486bd8a09adcd8ca.1634308385.git.bristot@kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: 54357f0c9149 ("tracing: Add migrate-disabled counter to tracing output.") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25perf/core: allow ftrace for functions in kernel/event/core.cSong Liu
It is useful to trace functions in kernel/event/core.c. Allow ftrace for them by removing $(CC_FLAGS_FTRACE) from Makefile. Link: https://lkml.kernel.org/r/20211006210732.2826289-1-songliubraving@fb.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25ftrace: Make ftrace_profile_pages_init staticchongjiapeng
This symbol is not used outside of ftrace.c, so marks it static. Fixes the following sparse warning: kernel/trace/ftrace.c:579:5: warning: symbol 'ftrace_profile_pages_init' was not declared. Should it be static? Link: https://lkml.kernel.org/r/1634640534-18280-1-git-send-email-jiapeng.chong@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: cafb168a1c92 ("tracing: make the function profiler per cpu") Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-10-25cgroup: no need for cgroup_mutex for /proc/cgroupsShakeel Butt
On the real systems, the cgroups hierarchies are setup early and just once by the node controller, so, other than number of cgroups, all information in /proc/cgroups remain same for the system uptime. Let's remove the cgroup_mutex usage on reading /proc/cgroups. There is a chance of inconsistent number of cgroups for co-mounted cgroups while printing the information from /proc/cgroups but that is not a big issue. In addition /proc/cgroups is a v1 specific interface, so the dependency on it should reduce over time. The main motivation for removing the cgroup_mutex from /proc/cgroups is to reduce the avenues of its contention. On our fleet, we have observed buggy application hammering on /proc/cgroups and drastically slowing down the node controller on the system which have many negative consequences on other workloads running on the system. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-25cgroup: remove cgroup_mutex from cgroupstats_buildShakeel Butt
The function cgroupstats_build extracts cgroup from the kernfs_node's priv pointer which is a RCU pointer. So, there is no need to grab cgroup_mutex. Just get the reference on the cgroup before using and remove the cgroup_mutex altogether. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-25cgroup: reduce dependency on cgroup_mutexShakeel Butt
Currently cgroup_get_from_path() and cgroup_get_from_id() grab cgroup_mutex before traversing the default hierarchy to find the kernfs_node corresponding to the path/id and then extract the linked cgroup. Since cgroup_mutex is still held, it is guaranteed that the cgroup will be alive and the reference can be taken on it. However similar guarantee can be provided without depending on the cgroup_mutex and potentially reducing avenues of cgroup_mutex contentions. The kernfs_node's priv pointer is RCU protected pointer and with just rcu read lock we can grab the reference on the cgroup without cgroup_mutex. So, remove cgroup_mutex from them. Signed-off-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-10-25irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRYMark Rutland
Going forward we want architecture/entry code to perform all the necessary work to enter/exit IRQ context, with irqchip code merely handling the mapping of the interrupt to any handler(s). Among other reasons, this is necessary to consistently fix some longstanding issues with the ordering of lockdep/RCU/tracing instrumentation which many architectures get wrong today in their entry code. Importantly, rcu_irq_{enter,exit}() must be called precisely once per IRQ exception, so that rcu_is_cpu_rrupt_from_idle() can correctly identify when an interrupt was taken from an idle context which must be explicitly preempted. Currently handle_domain_irq() calls rcu_irq_{enter,exit}() via irq_{enter,exit}(), but entry code needs to be able to call rcu_irq_{enter,exit}() earlier for correct ordering across lockdep/RCU/tracing updates for sequences such as: lockdep_hardirqs_off(CALLER_ADDR0); rcu_irq_enter(); trace_hardirqs_off_finish(); To permit each architecture to be converted to the new style in turn, this patch adds a new CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY selected by all current users of HANDLE_DOMAIN_IRQ, which gates the existing behaviour. When CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY is not selected, handle_domain_irq() requires entry code to perform the irq_{enter,exit}() work, with an explicit check for this matching the style of handle_domain_nmi(). Subsequent patches will: 1) Add the necessary IRQ entry accounting to each architecture in turn, dropping CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY from that architecture's Kconfig. 2) Remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY once it is no longer selected. 3) Convert irqchip drivers to consistently use generic_handle_domain_irq() rather than handle_domain_irq(). 4) Remove handle_domain_irq() and CONFIG_HANDLE_DOMAIN_IRQ. ... which should leave us with a clear split of responsiblity across the entry and irqchip code, making it possible to perform additional cleanups and fixes for the aforementioned longstanding issues with entry code. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: add generic_handle_arch_irq()Mark Rutland
Several architectures select GENERIC_IRQ_MULTI_HANDLER and branch to handle_arch_irq() without performing any entry accounting. Add a generic wrapper to handle the common irqentry work when invoking handle_arch_irq(). Where an architecture needs to perform some entry accounting itself, it will need to invoke handle_arch_irq() itself. In subsequent patches it will become the responsibilty of the entry code to set the irq regs when entering an IRQ (rather than deferring this to an irqchip handler), so generic_handle_arch_irq() is made to set the irq regs now. This can be redundant in some cases, but is never harmful as saving/restoring the old regs nests safely. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: unexport handle_irq_desc()Mark Rutland
There are no modular users of handle_irq_desc(). Remove the export before we gain any. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-25irq: simplify handle_domain_{irq,nmi}()Mark Rutland
There's no need for handle_domain_{irq,nmi}() to open-code the NULL check performed by handle_irq_desc(), nor the resolution of the desc performed by generic_handle_domain_irq(). Use generic_handle_domain_irq() directly, as this is functioanlly equivalent and clearer. At the same time, delete the stale comments, which are no longer helpful. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2021-10-24Merge tag 'sched_urgent_for_v5.15_rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "Reset clang's Shadow Call Stack on hotplug to prevent it from overflowing" * tag 'sched_urgent_for_v5.15_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/scs: Reset the shadow stack when idle_task_exit
2021-10-22cgroup: Fix memory leak caused by missing cgroup_bpf_offlineQuanyang Wang
When enabling CONFIG_CGROUP_BPF, kmemleak can be observed by running the command as below: $mount -t cgroup -o none,name=foo cgroup cgroup/ $umount cgroup/ unreferenced object 0xc3585c40 (size 64): comm "mount", pid 425, jiffies 4294959825 (age 31.990s) hex dump (first 32 bytes): 01 00 00 80 84 8c 28 c0 00 00 00 00 00 00 00 00 ......(......... 00 00 00 00 00 00 00 00 6c 43 a0 c3 00 00 00 00 ........lC...... backtrace: [<e95a2f9e>] cgroup_bpf_inherit+0x44/0x24c [<1f03679c>] cgroup_setup_root+0x174/0x37c [<ed4b0ac5>] cgroup1_get_tree+0x2c0/0x4a0 [<f85b12fd>] vfs_get_tree+0x24/0x108 [<f55aec5c>] path_mount+0x384/0x988 [<e2d5e9cd>] do_mount+0x64/0x9c [<208c9cfe>] sys_mount+0xfc/0x1f4 [<06dd06e0>] ret_fast_syscall+0x0/0x48 [<a8308cb3>] 0xbeb4daa8 This is because that since the commit 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") root_cgrp->bpf.refcnt.data is allocated by the function percpu_ref_init in cgroup_bpf_inherit which is called by cgroup_setup_root when mounting, but not freed along with root_cgrp when umounting. Adding cgroup_bpf_offline which calls percpu_ref_kill to cgroup_kill_sb can free root_cgrp->bpf.refcnt.data in umount path. This patch also fixes the commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself"). A cgroup_bpf_offline is needed to do a cleanup that frees the resources which are allocated by cgroup_bpf_inherit in cgroup_setup_root. And inside cgroup_bpf_offline, cgroup_get() is at the beginning and cgroup_put is at the end of cgroup_bpf_release which is called by cgroup_bpf_offline. So cgroup_bpf_offline can keep the balance of cgroup's refcount. Fixes: 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of percpu_ref in fast path") Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211018075623.26884-1-quanyang.wang@windriver.com
2021-10-22bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()Xu Kuohai
1. The ufd in generic_map_update_batch() should be read from batch.map_fd; 2. A call to fdget() should be followed by a symmetric call to fdput(). Fixes: aa2e93b8e58e ("bpf: Add generic support for update and delete batch ops") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211019032934.1210517-1-xukuohai@huawei.com
2021-10-22bpf: Prevent increasing bpf_jit_limit above maxLorenz Bauer
Restrict bpf_jit_limit to the maximum supported by the arch's JIT. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211014142554.53120-4-lmb@cloudflare.com
2021-10-22bpf: Add BTF_KIND_DECL_TAG typedef supportYonghong Song
The llvm patches ([1], [2]) added support to attach btf_decl_tag attributes to typedef declarations. This patch added support in kernel. [1] https://reviews.llvm.org/D110127 [2] https://reviews.llvm.org/D112259 Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211021195628.4018847-1-yhs@fb.com
2021-10-22sched/core: Remove rq_relock()Peng Wang
After the removal of migrate_tasks(), there is no user of rq_relock() left, so remove it. Signed-off-by: Peng Wang <rocking@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/449948fdf9be4764b3929c52572917dd25eef758.1634611953.git.rocking@linux.alibaba.com
2021-10-22sched: Improve wake_up_all_idle_cpus() take #2Peter Zijlstra
As reported by syzbot and experienced by Pavel, using cpus_read_lock() in wake_up_all_idle_cpus() generates lock inversion (against mmap_sem and possibly others). Instead, shrink the preempt disable region by iterating all CPUs and checking the online status for each individual CPU while having preemption disabled. Fixes: 8850cb663b5c ("sched: Simplify wake_up_*idle*()") Reported-by: syzbot+d5b23b18d2f4feae8a67@syzkaller.appspotmail.com Reported-by: Pavel Machek <pavel@ucw.cz> Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Qian Cai <quic_qiancai@quicinc.com>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21Merge branch 'ucount-fixes-for-v5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fixes from Eric Biederman: "There has been one very hard to track down bug in the ucount code that we have been tracking since roughly v5.14 was released. Alex managed to find a reliable reproducer a few days ago and then I was able to instrument the code and figure out what the issue was. It turns out the sigqueue_alloc single atomic operation optimization did not play nicely with ucounts multiple level rlimits. It turned out that either sigqueue_alloc or sigqueue_free could be operating on multiple levels and trigger the conditions for the optimization on more than one level at the same time. To deal with that situation I have introduced inc_rlimit_get_ucounts and dec_rlimit_put_ucounts that just focuses on the optimization and the rlimit and ucount changes. While looking into the big bug I found I couple of other little issues so I am including those fixes here as well. When I have time I would very much like to dig into process ownership of the shared signal queue and see if we could pick a single owner for the entire queue so that all of the rlimits can count to that owner. That should entirely remove the need to call get_ucounts and put_ucounts in sigqueue_alloc and sigqueue_free. It is difficult because Linux unlike POSIX supports setuid that works on a single thread" * 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring ucounts: Proper error handling in set_cred_ucounts ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds ucounts: Fix signal ucount refcounting
2021-10-21bpf: Add verified_insns to bpf_prog_info and fdinfoDave Marchevsky
This stat is currently printed in the verifier log and not stored anywhere. To ease consumption of this data, add a field to bpf_prog_aux so it can be exposed via BPF_OBJ_GET_INFO_BY_FD and fdinfo. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211020074818.1017682-2-davemarchevsky@fb.com
2021-10-21bpf: Add bpf_skc_to_unix_sock() helperHengqi Chen
The helper is used in tracing programs to cast a socket pointer to a unix_sock pointer. The return value could be NULL if the casting is illegal. Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021134752.1223426-2-hengqi.chen@gmail.com