summaryrefslogtreecommitdiff
path: root/kernel/events/uprobes.c
AgeCommit message (Collapse)Author
2012-09-29uprobes: Introduce copy_opcode(), kill read_opcode()Oleg Nesterov
No functional changes, preparations. 1. Extract the kmap-and-memcpy code from read_opcode() into the new trivial helper, copy_opcode(). The next patch will add another user. 2. read_opcode() becomes really trivial, fold it into its single caller, is_swbp_at_addr(). 3. Remove "auprobe" argument from write_opcode(), it is not used since f403072c6. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Kill set_swbp()->is_swbp_at_addr()Oleg Nesterov
A separate patch for better documentation. set_swbp()->is_swbp_at_addr() is not needed for correctness, it is harmless to do the unnecessary __replace_page(old_page, new_page) when these 2 pages are identical. And it can not be counted as optimization. mmap/register races are very unlikely, while in the likely case is_swbp_at_addr() adds the extra get_user_pages() even if the caller is uprobe_mmap(current->mm) and returns false. Note also that the semantics/usage of is_swbp_at_addr() in uprobe.c is confusing. set_swbp() uses it to detect the case when this insn was already modified by uprobes, that is why it should always compare the opcode with UPROBE_SWBP_INSN even if the hardware (like powerpc) has other trap insns. It doesn't matter if this breakpoint was in fact installed by gdb or application itself, we are going to "steal" this breakpoint anyway and execute the original insn from vm_file even if it no longer matches the memory. OTOH, handle_swbp()->find_active_uprobe() uses is_swbp_at_addr() to figure out whether we need to send SIGTRAP or not if we can not find uprobe, so in this case it should return true for all trap variants, not only for UPROBE_SWBP_INSN. This patch removes set_swbp()->is_swbp_at_addr(), the next patches will remove it from set_orig_insn() which is similar to set_swbp() in this respect. So the only caller will be handle_swbp() and we can make its semantics clear. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Restrict valid_vma(false) to skip VM_SHARED vmasOleg Nesterov
valid_vma(false) ignores ->vm_flags, this is not actually right. We should never try to write into MAP_SHARED mapping, this can confuse an apllication which actually writes to ->vm_file. With this patch valid_vma(false) ignores VM_WRITE only but checks other (immutable) bits checked by valid_vma(true). This can also speedup uprobe_munmap() and uprobe_unregister(). Note: even after this patch _unregister can confuse the probed application if it does mprotect(PROT_WRITE) after _register and installs "int3", but this is hardly possible to avoid and this doesn't differ from gdb case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Change valid_vma() to demand VM_MAYEXEC rather than VM_EXECOleg Nesterov
uprobe_register() or uprobe_mmap() requires VM_READ | VM_EXEC, this is not right. An apllication can do mprotect(PROT_EXEC) later and execute this code. Change valid_vma(is_register => true) to check VM_MAYEXEC instead. No need to check VM_MAYREAD, it is always set. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Change write_opcode() to use FOLL_FORCEOleg Nesterov
write_opcode()->get_user_pages() needs FOLL_FORCE to ensure we can read the page even if the probed task did mprotect(PROT_NONE) after uprobe_register(). Without FOLL_WRITE, FOLL_FORCE doesn't have any side effect but allows to read the !VM_READ memory. Otherwiese the subsequent uprobe_unregister()->set_orig_insn() fails and we leak "int3". If that task does mprotect(PROT_READ | EXEC) and execute the probed insn later it will be killed. Note: in fact this is also needed for _register, see the next patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Move clear_thread_flag(TIF_UPROBE) to uprobe_notify_resume()Oleg Nesterov
Move clear_thread_flag(TIF_UPROBE) from do_notify_resume() to uprobe_notify_resume() for !CONFIG_UPROBES case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Kill UTASK_BP_HIT stateOleg Nesterov
Kill UTASK_BP_HIT state, it buys nothing but complicates the code. It is only used in uprobe_notify_resume() to decide who should be called, we can check utask->active_uprobe != NULL instead. And this allows us to simplify handle_swbp(), no need to clear utask->state. Likewise we could kill UTASK_SSTEP, but UTASK_BP_HIT is worse and imho should die. The problem is, it creates the special case when task->utask is NULL, we can't distinguish RUNNING and BP_HIT. With this patch utask == NULL always means RUNNING. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Fix UPROBE_SKIP_SSTEP checks in handle_swbp()Oleg Nesterov
If handle_swbp()->add_utask() fails but UPROBE_SKIP_SSTEP is set, cleanup_ret: path do not restart the insn, this is wrong. Remove this check and add the additional label for can_skip_sstep() = T case. Note also that UPROBE_SKIP_SSTEP can be false positive, we simply can not trust it unless arch_uprobe_skip_sstep() was already called. Also, move another UPROBE_SKIP_SSTEP check before can_skip_sstep() into this helper, this looks more clean and understandable. Note: probably we should rename "skip" to "emulate" and I think that "clear UPROBE_SKIP_SSTEP" should be moved to arch_can_skip. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Do not setup ->active_uprobe/state prematurelyOleg Nesterov
handle_swbp() sets utask->active_uprobe before handler_chain(), and UTASK_SSTEP before pre_ssout(). This complicates the code for no reason, arch_ hooks or consumer->handler() should not (and can't) use this info. Change handle_swbp() to initialize them after pre_ssout(), and remove the no longer needed cleanup-utask code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> cked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-29uprobes: Do not leak UTASK_BP_HIT if find_active_uprobe() failsOleg Nesterov
If handle_swbp()->find_active_uprobe() fails we return with utask->state = UTASK_BP_HIT. Change handle_swbp() to reset utask->state at the start. Note that we do this unconditionally, see the next patch(es). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Introduce arch_uprobe_enable/disable_step()Sebastian Andrzej Siewior
As Oleg pointed out in [0] uprobe should not use the ptrace interface for enabling/disabling single stepping. [0] http://lkml.kernel.org/r/20120730141638.GA5306@redhat.com Add the new "__weak arch" helpers which simply call user_*_single_step() as a preparation. This is only needed to not break the powerpc port, we will fold this logic into arch_uprobe_pre/post_xol() hooks later. We should also change handle_singlestep(), _disable_step(&uprobe->arch) should be called before put_uprobe(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Teach find_active_uprobe() to clear MMF_HAS_UPROBESOleg Nesterov
The wrong MMF_HAS_UPROBES doesn't really hurt, just it triggers the "slow" and unnecessary handle_swbp() path if the task hits the non-uprobe breakpoint. So this patch changes find_active_uprobe() to check every valid vma and clear MMF_HAS_UPROBES if no uprobes were found. This is adds the slow O(n) path, but it is only called in unlikely case when the task hits the normal breakpoint first time after uprobe_unregister(). Note the "not strictly accurate" comment in mmf_recalc_uprobes(). We can fix this, we only need to teach vma_has_uprobes() to return a bit more more info, but I am not sure this worth the trouble. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Introduce MMF_RECALC_UPROBESOleg Nesterov
Add the new MMF_RECALC_UPROBES flag, it means that MMF_HAS_UPROBES can be false positive after remove_breakpoint() or uprobe_munmap(). It is also set by uprobe_dup_mmap(), this is not optimal but simple. We could add the new hook, uprobe_dup_vma(), to set MMF_HAS_UPROBES only if the new mm actually has uprobes, but I don't think this makes sense. The next patch will use this flag to clear MMF_HAS_UPROBES. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: uprobes_treelock should not disable irqsOleg Nesterov
Nobody plays with uprobes_tree/uprobes_treelock in interrupt context, no need to disable irqs. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Don't put NULL pointer in uprobe_register()Sebastian Andrzej Siewior
alloc_uprobe() might return a NULL pointer, put_uprobe() can't deal with this. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Remove "verify" argument from set_orig_insn()Oleg Nesterov
Nobody does set_orig_insn(verify => false), and I think nobody will. Remove this argument. IIUC set_orig_insn(verify => false) was needed to single-step without xol area. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Fold uprobe_reset_state() into uprobe_dup_mmap()Oleg Nesterov
Now that we have uprobe_dup_mmap() we can fold uprobe_reset_state() into the new hook and remove it. mmput()->uprobe_clear_state() can't be called before dup_mmap(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Introduce MMF_HAS_UPROBESOleg Nesterov
Add the new MMF_HAS_UPROBES flag. It is set by install_breakpoint() and it is copied by dup_mmap(), uprobe_pre_sstep_notifier() checks it to avoid the slow path if the task was never probed. Perhaps it makes sense to check it in valid_vma(is_register => false) as well. This needs the new dup_mmap()->uprobe_dup_mmap() hook. We can't use uprobe_reset_state() or put MMF_HAS_UPROBES into MMF_INIT_MASK, we need oldmm->mmap_sem to avoid the race with uprobe_register() or mmap() from another thread. Currently we never clear this bit, it can be false-positive after uprobe_unregister() or uprobe_munmap() or if dup_mmap() hits the probed VM_DONTCOPY vma. But this is fine correctness-wise and has no effect unless the task hits the non-uprobe breakpoint. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Do not use -EEXIST in install_breakpoint() pathsOleg Nesterov
-EEXIST from install_breakpoint() no longer makes sense, all callers should simply treat it as "success". Change the code to return zero and simplify register_for_each_vma(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Change uprobe_mmap() to ignore the errors but check ↵Oleg Nesterov
fatal_signal_pending() Once install_breakpoint() fails uprobe_mmap() "ignores" all other uprobes and returns the error. It was never really needed to to stop after the first error, and in fact it was always wrong at least in -ENOTSUPP case. Change uprobe_mmap() to ignore the errors and always return 0. This is not what we want in the long term, but until we teach the callers to handle the failure it would be better to remove the pointless complications. And this doesn't look too bad, the only "reasonable" error is ENOMEM but in this case the caller should be oom-killed in the likely case or the system has more serious problems. However it makes sense to stop if fatal_signal_pending() == T. In particular this helps if the task was oom-killed. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Kill dup_mmap()->uprobe_mmap(), simplify uprobe_mmap/munmapOleg Nesterov
1. Kill dup_mmap()->uprobe_mmap(), it was only needed to calculate new_mm->uprobes_state.count removed by the previous patch. If the forking process has a pending uprobe (int3) in vma, it will be copied by copy_page_range(), note that it checks vma->anon_vma so "Don't copy ptes" is not possible after install_breakpoint() which does anon_vma_prepare(). 2. Remove is_swbp_at_addr() and "int count" in uprobe_mmap(). Again, this was needed for uprobes_state.count. As a side effect this fixes the bug pointed out by Srikar, this code lacked the necessary put_uprobe(). 3. uprobe_munmap() becomes a nop after the previous patch. Remove the meaningless code but do not remove the helper, we will need it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Kill uprobes_state->countOleg Nesterov
uprobes_state->count is only needed to avoid the slow path in uprobe_pre_sstep_notifier(). It is also checked in uprobe_munmap() but ironically its only goal to decrement this counter. However, it is very broken. Just some examples: - uprobe_mmap() can race with uprobe_unregister() and wrongly increment the counter if it hits the non-uprobe "int3". Note that install_breakpoint() checks ->consumers first and returns -EEXIST if it is NULL. "atomic_sub() if error" in uprobe_mmap() looks obviously wrong too. - uprobe_munmap() can race with uprobe_register() and wrongly decrement the counter by the same reason. - Suppose an appication tries to increase the mmapped area via sys_mremap(). vma_adjust() does uprobe_munmap(whole_vma) first, this can nullify the counter temporarily and race with another thread which can hit the bp, the application will be killed by SIGTRAP. - Suppose an application mmaps 2 consecutive areas in the same file and one (or both) of these areas has uprobes. In the likely case mmap_region()->vma_merge() suceeds. Like above, this leads to uprobe_munmap/uprobe_mmap from vma_merge()->vma_adjust() but then mmap_region() does another uprobe_mmap(resulting_vma) and doubles the counter. This patch only removes this counter and fixes the compile errors, then we will try to cleanup the changed code and add something else instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-08-28uprobes: Remove check for uprobe variable in handle_swbp()Sebastian Andrzej Siewior
by the time we get here (after we pass cleanup_ret) uprobe is always is set. If it is NULL we leave very early in the code. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2012-08-28uprobes: Remove redundant lock_page/unlock_pageSrikar Dronamraju
Since read_opcode() reads from the referenced page and doesnt modify the page contents nor the page attributes, there is no need to lock the page. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2012-07-30uprobes: __replace_page() needs munlock_vma_page()Oleg Nesterov
Like do_wp_page(), __replace_page() should do munlock_vma_page() for the case when the old page still has other !VM_LOCKED mappings. Unfortunately this needs mm/internal.h. Also, move put_page() outside of ptl lock. This doesn't really matter but looks a bit better. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182249.GA20372@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Rename vma_address() and make it return "unsigned long"Oleg Nesterov
1. vma_address() returns loff_t, this looks confusing and this is unnecessary after the previous change. Make it return "ulong", all callers truncate the result anyway. 2. Its name conflicts with mm/rmap.c:vma_address(), rename it to offset_to_vaddr(), this matches vaddr_to_offset(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182247.GA20365@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Fix register_for_each_vma()->vma_address() checkOleg Nesterov
1. register_for_each_vma() checks that vma_address() == vaddr, but this is not enough. We should also ensure that vaddr >= vm_start, find_vma() guarantees "vaddr < vm_end" only. 2. After the prevous changes, register_for_each_vma() is the only reason why vma_address() has to return loff_t, all other users know that we have the valid mapping at this offset and thus the overflow is not possible. Change the code to use vaddr_to_offset() instead, imho this looks more clean/understandable and now we can change vma_address(). 3. While at it, remove the unnecessary type-cast. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182244.GA20362@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Introduce vaddr_to_offset(vma, vaddr)Oleg Nesterov
Add the new helper, vaddr_to_offset(vma, vaddr) which returns the offset in vma->vm_file this vaddr is mapped at. Change build_probe_list() and find_active_uprobe() to use the new helper, the next patch adds another user. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182242.GA20355@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Teach build_probe_list() to consider the rangeOleg Nesterov
Currently build_probe_list() builds the list of all uprobes attached to the given inode, and the caller should filter out those who don't fall into the [start,end) range, this is sub-optimal. This patch turns find_least_offset_node() into find_node_in_range() which returns the first node inside the [min,max] range, and changes build_probe_list() to use this node as a starting point for rb_prev() and rb_next() to find all other nodes the caller needs. The resulting list is no longer sorted but we do not care. This can speed up both build_probe_list() and the callers, but there is another reason to introduce find_node_in_range(). It can be used to figure out whether the given vma has uprobes or not, this will be needed soon. While at it, shift INIT_LIST_HEAD(tmp_list) into build_probe_list(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182240.GA20352@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Fix overflow in vma_address()/find_active_uprobe()Oleg Nesterov
vma->vm_pgoff is "unsigned long", it should be promoted to loff_t before the multiplication to avoid the overflow. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182233.GA20339@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Suppress uprobe_munmap() from mmput()Oleg Nesterov
uprobe_munmap() does get_user_pages() and it is also called from the final mmput()->exit_mmap() path. This slows down exit/mmput() for no reason, and I think it is simply dangerous/wrong to try to fault-in a page into the dying mm. If nothing else, this happens after the last sync_mm_rss(), afaics handle_mm_fault() can change the task->rss_stat and make the subsequent check_mm() unhappy. Change uprobe_munmap() to check mm->mm_users != 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182231.GA20336@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()Oleg Nesterov
The bug was introduced by me in 449d0d7c ("uprobes: Simplify the usage of uprobe->pending_list"). Yes, we do not care about uprobe->pending_list after return and nobody can remove the current list entry, but put_uprobe(uprobe) can actually free it and thus we need list_for_each_safe(). Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20120729182229.GA20329@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Clean up and document write_opcode()->lock_page(old_page)Oleg Nesterov
The comment above write_opcode()->lock_page(old_page) tells about the race with do_wp_page(). I don't really understand which exactly race it means, but afaics this lock_page() was not enough to close all races with do_wp_page(). Anyway, since: 77fc4af1b59d uprobes: Change register_for_each_vma() to take mm->mmap_sem for writing this code is always called with ->mmap_sem held for writing, so we can forget about do_wp_page(). However, we can't simply remove this lock_page(), and the only (afaics) reason is __replace_page()->try_to_free_swap(). Nothing in write_opcode() needs it, move it into __replace_page() and fix the comment. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182220.GA20322@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Kill write_opcode()->lock_page(new_page)Oleg Nesterov
write_opcode() does lock_page(new_page) for no reason. Nobody can see this page until __replace_page() exposes it under ptl lock, and we do nothing with this page after pte_unmap_unlock(). If nothing else, the similar code in do_wp_page() doesn't lock the new page for page_add_new_anon_rmap/set_pte_at_notify. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182218.GA20315@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: __replace_page() should not use page_address_in_vma()Oleg Nesterov
page_address_in_vma(old_page) in __replace_page() is ugly and wrong. The caller already knows the correct virtual address, this page was found by get_user_pages(vaddr). However, page_address_in_vma() can actually fail if page->mapping was cleared by __delete_from_page_cache() after get_user_pages() returns. But this means the race with page reclaim, write_opcode() should not fail, it should retry and read this page again. Probably the race with remove_mapping() is not possible due to page_freeze_refs() logic, but afaics at least shmem_writepage()->shmem_delete_from_page_cache() can clear ->mapping. We could change __replace_page() to return -EAGAIN in this case, but it would be better to simply use the caller's vaddr and rely on page_check_address(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182216.GA20311@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-30uprobes: Don't recheck vma/f_mapping in write_opcode()Oleg Nesterov
write_opcode() rechecks valid_vma() and ->f_mapping, this is pointless. The caller, register_for_each_vma() or uprobe_mmap(), has already done these checks under mmap_sem. To clarify, uprobe_mmap() checks valid_vma() only, but we can rely on build_probe_list(vm_file->f_mapping->host). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar.vnet.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120729182212.GA20304@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Remove the unnecessary initialization in add_utask()Oleg Nesterov
Trivial cleanup. No need to nullify ->active_uprobe after kzalloc(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120615154401.GA9633@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: __copy_insn() needs "loff_t offset"Oleg Nesterov
1. __copy_insn() needs "loff_t offset", not "unsigned long", to read the file. 2. use pgoff_t for "idx" and remove the unnecessary typecast. 3. fix the typo, "&=" is not what we want 4. can't resist, rename off1 to off. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154359.GA9625@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Don't use loff_t for the valid virtual addressOleg Nesterov
loff_t looks confusing when it is used for the virtual address. Change map_info and install_breakpoint/remove_breakpoint paths to use "unsigned long". The patch doesn't change vma_address(), it can't return "long" because it is used to verify the mapping. But probably this needs some cleanups too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Arapov <anton@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154355.GA9622@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Simplify the usage of uprobe->pending_listOleg Nesterov
uprobe->pending_list is only used to create the temporary list, it has no meaning after we drop uprobes_mmap_hash(inode). No need to initialize this node or remove it from tmp_list, and we can use list_for_each_entry(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120615154353.GA9614@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Move BUG_ON(UPROBE_SWBP_INSN_SIZE) from write_opcode() to ↵Oleg Nesterov
install_breakpoint() write_opcode() ensures that UPROBE_SWBP_INSN doesn't cross the page boundary. This looks a bit confusing, the check does not depend on vaddr and it is enough to do it only once right after install_breakpoint()->arch_uprobe_analyze_insn(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154350.GA9611@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: No need to re-check vma_address() in write_opcode()Oleg Nesterov
write_opcode() is called by register_for_each_vma() and uprobe_mmap() paths. In both cases the caller has already verified this vaddr under mmap_sem, no need to re-check. Note also that this check is wrong anyway, we should not truncate loff_t returned by vma_address() if we do not trust this mapping. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154347.GA9604@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Copy_insn() should not return -ENOMEM if __copy_insn() failsOleg Nesterov
copy_insn() returns -ENOMEM if the first __copy_insn() fails, it should return the correct error code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154344.GA9601@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Copy_insn() shouldn't depend on mm/vma/vaddrOleg Nesterov
1. copy_insn() doesn't need "addr", it can use uprobe->offset. Remove this argument. 2. Change copy_insn/__copy_insn to accept "struct file*" instead of vma. copy_insn() is called only once and mm/vma/vaddr are random, it shouldn't depend on them. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154342.GA9598@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Document uprobe_register() vs uprobe_mmap() racePeter Zijlstra
Because the mind is treacherous and makes us forget we need to write stuff down. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120615154339.GA9591@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Change build_map_info() to try kmalloc(GFP_NOWAIT) firstOleg Nesterov
build_map_info() doesn't allocate the memory under i_mmap_mutex to avoid the deadlock with page reclaim. But it can try GFP_NOWAIT first, it should work in the likely case and thus we almost never need the pre-alloc-and-retry path. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20120615154336.GA9588@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Rework register_for_each_vma() to make it O(n)Oleg Nesterov
Currently register_for_each_vma() is O(n ** 2) + O(n ** 3), every time find_next_vma_info() "restarts" the vma_prio_tree_foreach() loop and each iteration rechecks the whole try_list. This also means that try_list can grow "indefinitely" if register/unregister races with munmap/mmap activity even if the number of mapping is bounded at any time. With this patch register_for_each_vma() builds the list of mm/vaddr structures only once and does install_breakpoint() for each entry. We do not care about the new mappings which can be created after build_map_info() drops mapping->i_mmap_mutex, uprobe_mmap() should do its work. Note that we do not allocate map_info under i_mmap_mutex, this can deadlock with page reclaim (but see the next patch). So we use 2 lists, "curr" which we are going to return, and "prev" which holds the already allocated memory. The main loop deques the entry from "prev" (initially it is empty), and if "prev" becomes empty again it counts the number of entries we need to pre-allocate outside of i_mmap_mutex. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20120615154333.GA9581@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Install_breakpoint() should fail if is_swbp_insn() == TOleg Nesterov
install_breakpoint() returns -EEXIST if is_swbp_insn(orig_insn) == T, the caller treats this code as success. This is doubly wrong. The successful return should set UPROBE_COPY_INSN, but the real problem is that it shouldn't succeed. If the probed insn is int3 the application should get SIGTRAP, this won't happen with uprobe. Probably we can fix this, we can add the UPROBE_SHARED_BP flag and teach handle_swbp/set_orig_insn to handle this case correctly. But this needs some complications and we have other insns which can't be probed, lets make a simple fix for now. I think this needs a cleanup. UPROBE_COPY_INSN should die, copy_insn() should be called by alloc_uprobe(). arch_uprobe_analyze_insn() depends on ->mm (ia32_compat) but it is called only once. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154331.GA9578@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: Write_opcode()->__replace_page() can race with try_to_unmap()Oleg Nesterov
write_opcode() gets old_page via get_user_pages() and then calls __replace_page() which assumes that this old_page is still mapped after pte_offset_map_lock(). This is not true if this old_page was already try_to_unmap()'ed, and in this case everything __replace_page() does with old_page is wrong. Just for example, put_page() is not balanced. I think it is possible to teach __replace_page() to handle this unlikely case correctly, but this patch simply changes it to use page_check_address() and return -EAGAIN if it fails. The caller should notice this error code and retry. Note: write_opcode() asks for the cleanups, I'll try to do this in a separate patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154328.GA9571@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-16uprobes: __copy_insn() should ensure a_ops->readpage != NULLOleg Nesterov
__copy_insn() blindly calls read_mapping_page(), this will crash the kernel if ->readpage == NULL, add the necessary check. For example, hugetlbfs_aops->readpage is NULL. Perhaps we should change read_mapping_page() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154325.GA9568@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>