summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/kprobes
AgeCommit message (Collapse)Author
2023-02-23Merge tag 'probes-v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull kprobes updates from Masami Hiramatsu: - Skip negative return code check for snprintf in eprobe - Add recursive call test cases for kprobe unit test - Add 'char' type to probe events to show it as the character instead of value - Update kselftest kprobe-event testcase to ignore '__pfx_' symbols - Fix kselftest to check filter on eprobe event correctly - Add filter on eprobe to the README file in tracefs - Fix optprobes to check whether there is 'under unoptimizing' optprobe when optimizing another kprobe correctly - Fix optprobe to check whether there is 'under unoptimizing' optprobe when fetching the original instruction correctly - Fix optprobe to free 'forcibly unoptimized' optprobe correctly * tag 'probes-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/eprobe: no need to check for negative ret value for snprintf test_kprobes: Add recursed kprobe test case tracing/probe: add a char type to show the character value of traced arguments selftests/ftrace: Fix probepoint testcase to ignore __pfx_* symbols selftests/ftrace: Fix eprobe syntax test case to check filter support tracing/eprobe: Fix to add filter on eprobe description in README file x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range x86/kprobes: Fix __recover_optprobed_insn check optimizing logic kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list
2023-02-21Merge tag 'x86_alternatives_for_v6.3_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm alternatives updates from Borislav Petkov: - Teach the static_call patching infrastructure to handle conditional tall calls properly which can be static calls too - Add proper struct alt_instr.flags which controls different aspects of insn patching behavior * tag 'x86_alternatives_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/static_call: Add support for Jcc tail-calls x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions x86/alternatives: Introduce int3_emulate_jcc() x86/alternatives: Add alt_instr.flags
2023-02-20Merge tag 'perf-core-2023-02-20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: - Optimize perf_sample_data layout - Prepare sample data handling for BPF integration - Update the x86 PMU driver for Intel Meteor Lake - Restructure the x86 uncore code to fix a SPR (Sapphire Rapids) discovery breakage - Fix the x86 Zhaoxin PMU driver - Cleanups * tag 'perf-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) perf/x86/intel/uncore: Add Meteor Lake support x86/perf/zhaoxin: Add stepping check for ZXC perf/x86/intel/ds: Fix the conversion from TSC to perf time perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table perf/x86/uncore: Add a quirk for UPI on SPR perf/x86/uncore: Ignore broken units in discovery table perf/x86/uncore: Fix potential NULL pointer in uncore_get_alias_name perf/x86/uncore: Factor out uncore_device_to_die() perf/core: Call perf_prepare_sample() before running BPF perf/core: Introduce perf_prepare_header() perf/core: Do not pass header for sample ID init perf/core: Set data->sample_flags in perf_prepare_sample() perf/core: Add perf_sample_save_brstack() helper perf/core: Add perf_sample_save_raw_data() helper perf/core: Add perf_sample_save_callchain() helper perf/core: Save the dynamic parts of sample data size x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulation perf/core: Change the layout of perf_sample_data perf/x86/msr: Add Meteor Lake support perf/x86/cstate: Add Meteor Lake support ...
2023-02-21x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe rangeYang Jihong
When arch_prepare_optimized_kprobe calculating jump destination address, it copies original instructions from jmp-optimized kprobe (see __recover_optprobed_insn), and calculated based on length of original instruction. arch_check_optimized_kprobe does not check KPROBE_FLAG_OPTIMATED when checking whether jmp-optimized kprobe exists. As a result, setup_detour_execution may jump to a range that has been overwritten by jump destination address, resulting in an inval opcode error. For example, assume that register two kprobes whose addresses are <func+9> and <func+11> in "func" function. The original code of "func" function is as follows: 0xffffffff816cb5e9 <+9>: push %r12 0xffffffff816cb5eb <+11>: xor %r12d,%r12d 0xffffffff816cb5ee <+14>: test %rdi,%rdi 0xffffffff816cb5f1 <+17>: setne %r12b 0xffffffff816cb5f5 <+21>: push %rbp 1.Register the kprobe for <func+11>, assume that is kp1, corresponding optimized_kprobe is op1. After the optimization, "func" code changes to: 0xffffffff816cc079 <+9>: push %r12 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp Now op1->flags == KPROBE_FLAG_OPTIMATED; 2. Register the kprobe for <func+9>, assume that is kp2, corresponding optimized_kprobe is op2. register_kprobe(kp2) register_aggr_kprobe alloc_aggr_kprobe __prepare_optimized_kprobe arch_prepare_optimized_kprobe __recover_optprobed_insn // copy original bytes from kp1->optinsn.copied_insn, // jump address = <func+14> 3. disable kp1: disable_kprobe(kp1) __disable_kprobe ... if (p == orig_p || aggr_kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true) // add op1 in unoptimizing_list, not unoptimized orig_p->flags |= KPROBE_FLAG_DISABLED; // op1->flags == KPROBE_FLAG_OPTIMATED | KPROBE_FLAG_DISABLED ... 4. unregister kp2 __unregister_kprobe_top ... if (!kprobe_disabled(ap) && !kprobes_all_disarmed) { optimize_kprobe(op) ... if (arch_check_optimized_kprobe(op) < 0) // because op1 has KPROBE_FLAG_DISABLED, here not return return; p->kp.flags |= KPROBE_FLAG_OPTIMIZED; // now op2 has KPROBE_FLAG_OPTIMIZED } "func" code now is: 0xffffffff816cc079 <+9>: int3 0xffffffff816cc07a <+10>: push %rsp 0xffffffff816cc07b <+11>: jmp 0xffffffffa0210000 0xffffffff816cc080 <+16>: incl 0xf(%rcx) 0xffffffff816cc083 <+19>: xchg %eax,%ebp 0xffffffff816cc084 <+20>: (bad) 0xffffffff816cc085 <+21>: push %rbp 5. if call "func", int3 handler call setup_detour_execution: if (p->flags & KPROBE_FLAG_OPTIMIZED) { ... regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX; ... } The code for the destination address is 0xffffffffa021072c: push %r12 0xffffffffa021072e: xor %r12d,%r12d 0xffffffffa0210731: jmp 0xffffffff816cb5ee <func+14> However, <func+14> is not a valid start instruction address. As a result, an error occurs. Link: https://lore.kernel.org/all/20230216034247.32348-3-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-21x86/kprobes: Fix __recover_optprobed_insn check optimizing logicYang Jihong
Since the following commit: commit f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") modified the update timing of the KPROBE_FLAG_OPTIMIZED, a optimized_kprobe may be in the optimizing or unoptimizing state when op.kp->flags has KPROBE_FLAG_OPTIMIZED and op->list is not empty. The __recover_optprobed_insn check logic is incorrect, a kprobe in the unoptimizing state may be incorrectly determined as unoptimizing. As a result, incorrect instructions are copied. The optprobe_queued_unopt function needs to be exported for invoking in arch directory. Link: https://lore.kernel.org/all/20230216034247.32348-2-yangjihong1@huawei.com/ Fixes: f66c0447cca1 ("kprobes: Set unoptimized flag after unoptimizing code") Cc: stable@vger.kernel.org Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-02-08x86/kprobes: Fix 1 byte conditional jump targetNadav Amit
Commit 3bc753c06dd0 ("kbuild: treat char as always unsigned") broke kprobes. Setting a probe-point on 1 byte conditional jump can cause the kernel to crash when the (signed) relative jump offset gets treated as unsigned. Fix by replacing the unsigned 'immediate.bytes' (plus a cast) with the signed 'immediate.value' when assigning to the relative jump offset. [ dhansen: clarified changelog ] Fixes: 3bc753c06dd0 ("kbuild: treat char as always unsigned") Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230208071708.4048-1-namit%40vmware.com
2023-01-31x86/alternatives: Introduce int3_emulate_jcc()Peter Zijlstra
Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be used by more code -- specifically static_call() will need this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20230123210607.057678245@infradead.org
2023-01-18Merge tag 'v6.2-rc4' into perf/core, to pick up fixesIngo Molnar
Move from the -rc1 base to the fresher -rc4 kernel that has various fixes included, before applying a larger patchset. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-10x86/kprobes: Use switch-case for 0xFF opcodes in prepare_emulationChuang Wang
For the `FF /digit` opcodes in prepare_emulation, use switch-case instead of hand-written code to make the logic easier to understand. Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20221129084022.718355-1-nashuiliang@gmail.com
2023-01-07x86/kprobes: Drop removed INT3 handling codeMasami Hiramatsu (Google)
Drop removed INT3 handling code from kprobe_int3_handler() because this case (get_kprobe() doesn't return corresponding kprobe AND the INT3 is removed) must not happen with the kprobe managed INT3, but can happen with the non-kprobe INT3, which should be handled by other callbacks. For the kprobe managed INT3, it is already safe. The commit 5c02ece81848d ("x86/kprobes: Fix ordering while text-patching") introduced text_poke_sync() to the arch_disarm_kprobe() right after removing INT3. Since this text_poke_sync() uses IPI to call sync_core() on all online cpus, that ensures that all running INT3 exception handlers have done. And, the unregister_kprobe() will remove the kprobe from the hash table after arch_disarm_kprobe(). Thus, when the kprobe managed INT3 hits, kprobe_int3_handler() should be able to find corresponding kprobe always by get_kprobe(). If it can not find any kprobe, this means that is NOT a kprobe managed INT3. Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/166981518895.1131462.4693062055762912734.stgit@devnote3
2022-12-27x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNKMasami Hiramatsu (Google)
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping speculative execution after function return, kprobe jump optimization always fails on the functions with such INT3 inside the function body. (It already checks the INT3 padding between functions, but not inside the function) To avoid this issue, as same as kprobes, check whether the INT3 comes from kgdb or not, and if so, stop decoding and make it fail. The other INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be treated as a one-byte instruction. Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/167146051929.1374301.7419382929328081706.stgit@devnote3
2022-12-27x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNKMasami Hiramatsu (Google)
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping speculative execution after RET instruction, kprobes always failes to check the probed instruction boundary by decoding the function body if the probed address is after such sequence. (Note that some conditional code blocks will be placed after function return, if compiler decides it is not on the hot path.) This is because kprobes expects kgdb puts the INT3 as a software breakpoint and it will replace the original instruction. But these INT3 are not such purpose, it doesn't need to recover the original instruction. To avoid this issue, kprobes checks whether the INT3 is owned by kgdb or not, and if so, stop decoding and make it fail. The other INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be treated as a one-byte instruction. Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/167146051026.1374301.392728975473572291.stgit@devnote3
2022-12-17Merge tag 'x86_mm_for_6.2_v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Dave Hansen: "New Feature: - Randomize the per-cpu entry areas Cleanups: - Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it - Move to "native" set_memory_rox() helper - Clean up pmd_get_atomic() and i386-PAE - Remove some unused page table size macros" * tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) x86/mm: Ensure forced page table splitting x86/kasan: Populate shadow for shared chunk of the CPU entry area x86/kasan: Add helpers to align shadow addresses up and down x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area x86/mm: Recompute physical address for every page of per-CPU CEA mapping x86/mm: Rename __change_page_attr_set_clr(.checkalias) x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias() x86/mm: Untangle __change_page_attr_set_clr(.checkalias) x86/mm: Add a few comments x86/mm: Fix CR3_ADDR_MASK x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros mm: Convert __HAVE_ARCH_P..P_GET to the new style mm: Remove pointless barrier() after pmdp_get_lockless() x86/mm/pae: Get rid of set_64bit() x86_64: Remove pointless set_64bit() usage x86/mm/pae: Be consistent with pXXp_get_and_clear() x86/mm/pae: Use WRITE_ONCE() x86/mm/pae: Don't (ab)use atomic64 mm/gup: Fix the lockless PMD access ...
2022-12-15mm: Introduce set_memory_rox()Peter Zijlstra
Because endlessly repeating: set_memory_ro() set_memory_x() is getting tedious. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y1jek64pXOsougmz@hirez.programming.kicks-ass.net
2022-10-17x86/modules: Set VM_FLUSH_RESET_PERMS in module_alloc()Thomas Gleixner
Instead of resetting permissions all over the place when freeing module memory tell the vmalloc code to do so. Avoids the exercise for the next upcoming user. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111143.406703869@infradead.org
2022-09-27x86: kprobes: Remove unused macro stack_addrChen Zhongjin
An unused macro reported by [-Wunused-macros]. This macro is used to access the sp in pt_regs because at that time x86_32 can only get sp by kernel_stack_pointer(regs). '3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs")' This commit have unified the pt_regs and from them we can get sp from pt_regs with regs->sp easily. Nowhere is using this macro anymore. Refrencing pt_regs directly is more clear. Remove this macro for code cleaning. Link: https://lkml.kernel.org/r/20220924072629.104759-1-chenzhongjin@huawei.com Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-14x86/kprobes: Fix JNG/JNLE emulationNadav Amit
When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is performed if (ZF == 1 or SF != OF). However the kernel emulation currently uses 'and' instead of 'or'. As a result, setting a kprobe on JNG/JNLE might cause the kernel to behave incorrectly whenever the kprobe is hit. Fix by changing the 'and' to 'or'. Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step") Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com
2022-08-02x86/kprobes: Update kcb status flag after singlesteppingMasami Hiramatsu (Google)
Fix kprobes to update kcb (kprobes control block) status flag to KPROBE_HIT_SSDONE even if the kp->post_handler is not set. This bug may cause a kernel panic if another INT3 user runs right after kprobes because kprobe_int3_handler() misunderstands the INT3 is kprobe's single stepping INT3. Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step") Reported-by: Daniel Müller <deso@posteo.net> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Daniel Müller <deso@posteo.net> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20220727210136.jjgc3lpqeq42yr3m@muellerd-fedora-PC2BDTX9 Link: https://lore.kernel.org/r/165942025658.342061.12452378391879093249.stgit@devnote2
2022-03-28x86,kprobes: Fix optprobe trampoline to generate complete pt_regsMasami Hiramatsu
Currently the optprobe trampoline template code ganerate an almost complete pt_regs on-stack, everything except regs->ss. The 'regs->ss' points to the top of stack, which is not a valid segment decriptor. As same as the rethook does, complete the job by also pushing ss. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164826166027.2455864.14759128090648961900.stgit@devnote2
2022-03-28x86,rethook,kprobes: Replace kretprobe with rethook on x86Masami Hiramatsu
Replaces the kretprobe code with rethook on x86. With this patch, kretprobe on x86 uses the rethook instead of kretprobe specific trampoline code. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/164826163692.2455864.13745421016848209527.stgit@devnote2
2022-03-15x86/ibt: Annotate text referencesPeter Zijlstra
Annotate away some of the generic code references. This is things where we take the address of a symbol for exception handling or return addresses (eg. context switch). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.877758523@infradead.org
2022-03-15x86/ibt,kprobes: Cure sym+0 equals fentry woesPeter Zijlstra
In order to allow kprobes to skip the ENDBR instructions at sym+0 for X86_KERNEL_IBT builds, change _kprobe_addr() to take an architecture callback to inspect the function at hand and modify the offset if needed. This streamlines the existing interface to cover more cases and require less hooks. Once PowerPC gets fully converted there will only be the one arch hook. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.405947704@infradead.org
2022-03-15x86/ibt,ftrace: Search for __fentry__ locationPeter Zijlstra
Currently a lot of ftrace code assumes __fentry__ is at sym+0. However with Intel IBT enabled the first instruction of a function will most likely be ENDBR. Change ftrace_location() to not only return the __fentry__ location when called for the __fentry__ location, but also when called for the sym+0 location. Then audit/update all callsites of this function to consistently use these new semantics. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.227581603@infradead.org
2021-12-08x86: Prepare inline-asm for straight-line-speculationPeter Zijlstra
Replace all ret/retq instructions with ASM_RET in preparation of making it more than a single instruction. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.org
2021-10-27ftrace: disable preemption when recursion locked王贇
As the documentation explained, ftrace_test_recursion_trylock() and ftrace_test_recursion_unlock() were supposed to disable and enable preemption properly, however currently this work is done outside of the function, which could be missing by mistake. And since the internal using of trace_test_and_set_recursion() and trace_clear_recursion() also require preemption disabled, we can just merge the logical. This patch will make sure the preemption has been disabled when trace_test_and_set_recursion() return bit >= 0, and trace_clear_recursion() will enable the preemption if previously enabled. Link: https://lkml.kernel.org/r/13bde807-779c-aa4c-0672-20515ae365ea@linux.alibaba.com CC: Petr Mladek <pmladek@suse.com> Cc: Guo Ren <guoren@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: Miroslav Benes <mbenes@suse.cz> Reported-by: Abaci <abaci@linux.alibaba.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> [ Removed extra line in comment - SDR ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30x86/kprobes: Fixup return address in generic trampoline handlerMasami Hiramatsu
In x86, the fake return address on the stack saved by __kretprobe_trampoline() will be replaced with the real return address after returning from trampoline_handler(). Before fixing the return address, the real return address can be found in the 'current->kretprobe_instances'. However, since there is a window between updating the 'current->kretprobe_instances' and fixing the address on the stack, if an interrupt happens at that timing and the interrupt handler does stacktrace, it may fail to unwind because it can not get the correct return address from 'current->kretprobe_instances'. This will eliminate that window by fixing the return address right before updating 'current->kretprobe_instances'. Link: https://lkml.kernel.org/r/163163057094.489837.9044470370440745866.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30x86/kprobes: Push a fake return address at kretprobe_trampolineMasami Hiramatsu
Change __kretprobe_trampoline() to push the address of the __kretprobe_trampoline() as a fake return address at the bottom of the stack frame. This fake return address will be replaced with the correct return address in the trampoline_handler(). With this change, the ORC unwinder can check whether the return address is modified by kretprobes or not. Link: https://lkml.kernel.org/r/163163054185.489837.14338744048957727386.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline()Josh Poimboeuf
Add UNWIND_HINT_FUNC on __kretprobe_trampoline() code so that ORC information is generated on the __kretprobe_trampoline() correctly. Also, this uses STACK_FRAME_NON_STANDARD_FP(), CONFIG_FRAME_POINTER- -specific version of STACK_FRAME_NON_STANDARD(). Link: https://lkml.kernel.org/r/163163049242.489837.11970969750993364293.stgit@devnote2 Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Make it harder to refer kretprobe_trampoline directlyMasami Hiramatsu
Since now there is kretprobe_trampoline_addr() for referring the address of kretprobe trampoline code, we don't need to access kretprobe_trampoline directly. Make it harder to refer by renaming it to __kretprobe_trampoline(). Link: https://lkml.kernel.org/r/163163045446.489837.14510577516938803097.stgit@devnote2 Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()Masami Hiramatsu
The __kretprobe_trampoline_handler() callback, called from low level arch kprobes methods, has the 'trampoline_address' parameter, which is entirely superfluous as it basically just replicates: dereference_kernel_function_descriptor(kretprobe_trampoline) In fact we had bugs in arch code where it wasn't replicated correctly. So remove this superfluous parameter and use kretprobe_trampoline_addr() instead. Link: https://lkml.kernel.org/r/163163044546.489837.13505751885476015002.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-09-30kprobes: treewide: Use 'kprobe_opcode_t *' for the code address in ↵Masami Hiramatsu
get_optimized_kprobe() Since get_optimized_kprobe() is only used inside kprobes, it doesn't need to use 'unsigned long' type for 'addr' parameter. Make it use 'kprobe_opcode_t *' for the 'addr' parameter and subsequent call of arch_within_optimized_kprobe() also should use 'kprobe_opcode_t *'. Note that MAX_OPTIMIZED_LENGTH and RELATIVEJUMP_SIZE are defined by byte-size, but the size of 'kprobe_opcode_t' depends on the architecture. Therefore, we must be careful when calculating addresses using those macros. Link: https://lkml.kernel.org/r/163163040680.489837.12133032364499833736.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-07-02Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "190 patches. Subsystems affected by this patch series: mm (hugetlb, userfaultfd, vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock, migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap, zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc, core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs, signals, exec, kcov, selftests, compress/decompress, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) ipc/util.c: use binary search for max_idx ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock ipc: use kmalloc for msg_queue and shmid_kernel ipc sem: use kvmalloc for sem_undo allocation lib/decompressors: remove set but not used variabled 'level' selftests/vm/pkeys: exercise x86 XSAVE init state selftests/vm/pkeys: refill shadow register after implicit kernel write selftests/vm/pkeys: handle negative sys_pkey_alloc() return code selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random kcov: add __no_sanitize_coverage to fix noinstr for all architectures exec: remove checks in __register_bimfmt() x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned hfsplus: report create_date to kstat.btime hfsplus: remove unnecessary oom message nilfs2: remove redundant continue statement in a while-loop kprobes: remove duplicated strong free_insn_page in x86 and s390 init: print out unknown kernel parameters checkpatch: do not complain about positive return values starting with EPOLL checkpatch: improve the indented label test checkpatch: scripts/spdxcheck.py now requires python3 ...
2021-07-01kprobes: remove duplicated strong free_insn_page in x86 and s390Barry Song
free_insn_page() in x86 and s390 is same with the common weak function in kernel/kprobes.c. Plus, the comment "Recover page to RW mode before releasing it" in x86 seems insensible to be there since resetting mapping is done by common code in vfree() of module_memfree(). So drop these two duplicated strong functions and related comment, then mark the common one in kernel/kprobes.c strong. Link: https://lkml.kernel.org/r/20210608065736.32656-1-song.bao.hua@hisilicon.com Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Qi Liu <liuqi115@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-28Merge tag 'x86-cleanups-2021-06-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups & removal of obsolete code" * tag 'x86-cleanups-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sgx: Correct kernel-doc's arg name in sgx_encl_release() doc: Remove references to IBM Calgary x86/setup: Document that Windows reserves the first MiB x86/crash: Remove crash_reserve_low_1M() x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options x86/alternative: Align insn bytes vertically x86: Fix leftover comment typos x86/asm: Simplify __smp_mb() definition x86/alternatives: Make the x86nops[] symbol static
2021-06-03kprobes: Do not increment probe miss count in the fault handlerNaveen N. Rao
Kprobes has a counter 'nmissed', that is used to count the number of times a probe handler was not called. This generally happens when we hit a kprobe while handling another kprobe. However, if one of the probe handlers causes a fault, we are currently incrementing 'nmissed'. The comment in fault handler indicates that this can be used to account faults taken by the probe handlers. But, this has never been the intention as is evident from the comment above 'nmissed' in 'struct kprobe': /*count the number of times this probe was temporarily disarmed */ unsigned long nmissed; Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20210601120150.672652-1-naveen.n.rao@linux.vnet.ibm.com
2021-06-01kprobes: Remove kprobe::fault_handlerPeter Zijlstra
The reason for kprobe::fault_handler(), as given by their comment: * We come here because instructions in the pre/post * handler caused the page_fault, this could happen * if handler tries to access user space by * copy_from_user(), get_user() etc. Let the * user-specified handler try to fix it first. Is just plain bad. Those other handlers are ran from non-preemptible context and had better use _nofault() functions. Also, there is no upstream usage of this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210525073213.561116662@infradead.org
2021-05-12x86: Fix leftover comment typosIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-27Merge tag 'x86_core_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ...
2021-04-02Merge branch 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve ↵Ingo Molnar
a semantic conflict Conflict-merge this main commit in essence: a89dfde3dc3c: ("x86: Remove dynamic NOP selection") With this upstream commit: b90829704780: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG") Semantic merge conflict: arch/x86/net/bpf_jit_comp.c - memcpy(prog, ideal_nops[NOP_ATOMIC5], X86_PATCH_SIZE); + memcpy(prog, x86_nops[5], X86_PATCH_SIZE); Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-04-02Merge tag 'v5.12-rc5' into WIP.x86/core, to pick up recent NOP related changesIngo Molnar
In particular we want to have this upstream commit: b90829704780: ("bpf: Use NOP_ATOMIC5 instead of emit_nops(&prog, 5) for BPF_TRAMP_F_CALL_ORIG") ... before merging in x86/cpu changes and the removal of the NOP optimizations, and applying PeterZ's !retpoline objtool series. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-31x86: Add insn_decode_kernel()Peter Zijlstra
Add a helper to decode kernel instructions; there's no point in endlessly repeating those last two arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210326151259.379242587@infradead.org
2021-03-25x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declarationWei Yongjun
Address this GCC warning: arch/x86/kernel/kprobes/core.c:940:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration] 940 | static int nokprobe_inline kprobe_is_ss(struct kprobe_ctlblk *kcb) | ^~~~~~ [ mingo: Tidied up the changelog. ] Fixes: 6256e668b7af: ("x86/kprobes: Use int3 instead of debug trap for single-step") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20210324144502.1154883-1-weiyongjun1@huawei.com
2021-03-25x86/kprobes: Fix to identify indirect jmp and others using range caseMasami Hiramatsu
Fix can_boost() to identify indirect jmp and others using range case correctly. Since the condition in switch statement is opcode & 0xf0, it can not evaluate to 0xff case. This should be under the 0xf0 case. However, there is no reason to use the conbinations of the bit-masked condition and lower bit checking. Use range case to clean up the switch statement too. Fixes: 6256e668b7 ("x86/kprobes: Use int3 instead of debug trap for single-step") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/161666692308.1120877.4675552834049546493.stgit@devnote2
2021-03-25x86/kprobes: Fix to check non boostable prefixes correctlyMasami Hiramatsu
There are 2 bugs in the can_boost() function because of using x86 insn decoder. Since the insn->opcode never has a prefix byte, it can not find CS override prefix in it. And the insn->attr is the attribute of the opcode, thus inat_is_address_size_prefix( insn->attr) always returns false. Fix those by checking each prefix bytes with for_each_insn_prefix loop and getting the correct attribute for each prefix byte. Also, this removes unlikely, because this is a slow path. Fixes: a8d11cd0714f ("kprobes/x86: Consolidate insn decoder users for copying code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/161666691162.1120877.2808435205294352583.stgit@devnote2
2021-03-23x86/kprobes: Use int3 instead of debug trap for single-stepMasami Hiramatsu
Use int3 instead of debug trap exception for single-stepping the probed instructions. Some instructions which change the ip registers or modify IF flags are emulated because those are not able to be single-stepped by int3 or may allow the interrupt while single-stepping. This actually changes the kprobes behavior. - kprobes can not probe following instructions; int3, iret, far jmp/call which get absolute address as immediate, indirect far jmp/call, indirect near jmp/call with addressing by memory (register-based indirect jmp/call are OK), and vmcall/vmlaunch/vmresume/vmxoff. - If the kprobe post_handler doesn't set before registering, it may not be called in some case even if you set it afterwards. (IOW, kprobe booster is enabled at registration, user can not change it) But both are rare issue, unsupported instructions will not be used in the kernel (or rarely used), and post_handlers are rarely used (I don't see it except for the test code). Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/161469874601.49483.11985325887166921076.stgit@devnote2
2021-03-23x86/kprobes: Identify far indirect JMP correctlyMasami Hiramatsu
Since Grp5 far indirect JMP is FF "mod 101 r/m", it should be (modrm & 0x38) == 0x28, and near indirect JMP is also 0x38 == 0x20. So we can mask modrm with 0x30 and check 0x20. This is actually what the original code does, it also doesn't care the last bit. So the result code is same. Thus, I think this is just a cosmetic cleanup. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/161469873475.49483.13257083019966335137.stgit@devnote2
2021-03-23x86/kprobes: Retrieve correct opcode for group instructionMasami Hiramatsu
Since the opcodes start from 0xff are group5 instruction group which is not 2 bytes opcode but the extended opcode determined by the MOD/RM byte. The commit abd82e533d88 ("x86/kprobes: Do not decode opcode in resume_execution()") used insn->opcode.bytes[1], but that is not correct. We have to refer the insn->modrm.bytes[1] instead. Fixes: abd82e533d88 ("x86/kprobes: Do not decode opcode in resume_execution()") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/161469872400.49483.18214724458034233166.stgit@devnote2
2021-03-18x86: Fix various typos in commentsIngo Molnar
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2021-03-16ftrace: Fix spelling mistake "disabed" -> "disabled"Colin Ian King
There is a spelling mistake in a comment, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-03-15x86: Remove dynamic NOP selectionPeter Zijlstra
This ensures that a NOP is a NOP and not a random other instruction that is also a NOP. It allows simplification of dynamic code patching that wants to verify existing code before writing new instructions (ftrace, jump_label, static_call, etc..). Differentiating on NOPs is not a feature. This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS). 32bit is not a performance target. Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL being required for x86_64, all x86_64 CPUs can use NOPL. So stop caring about NOPs, simplify things and get on with life. [ The problem seems to be that some uarchs can only decode NOPL on a single front-end port while others have severe decode penalties for excessive prefixes. All modern uarchs can handle both, except Atom, which has prefix penalties. ] [ Also, much doubt you can actually measure any of this on normal workloads. ] After this, FEATURE_NOPL is unused except for required-features for x86_64. FEATURE_K8 is only used for PTI. [ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> # bpf Acked-by: Linus Torvalds <torvalds@linuxfoundation.org> Link: https://lkml.kernel.org/r/20210312115749.065275711@infradead.org