summaryrefslogtreecommitdiff
path: root/kernel/tracepoint.c
AgeCommit message (Collapse)Author
2012-01-16tracepoints/module: Fix disabling tracepoints with taint CRAP or OOTSteven Rostedt
Tracepoints are disabled for tainted modules, which is usually because the module is either proprietary or was forced, and we don't want either of them using kernel tracepoints. But, a module can also be tainted by being in the staging directory or compiled out of tree. Either is fine for use with tracepoints, no need to punish them. I found this out when I noticed that my sample trace event module, when done out of tree, stopped working. Cc: stable@vger.kernel.org # 3.2 Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Dave Jones <davej@redhat.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-08-10Tracepoint: Dissociate from module mutexMathieu Desnoyers
Copy the information needed from struct module into a local module list held within tracepoint.c from within the module coming/going notifier. This vastly simplifies locking of tracepoint registration / unregistration, because we don't have to take the module mutex to register and unregister tracepoints anymore. Steven Rostedt ran into dependency problems related to modules mutex vs kprobes mutex vs ftrace mutex vs tracepoint mutex that seems to be hard to fix without removing this dependency between tracepoint and module mutex. (note: it should be investigated whether kprobes could benefit of being dissociated from the modules mutex too.) This also fixes module handling of tracepoint list iterators, because it was expecting the list to be sorted by pointer address. Given we have control on our own list now, it's OK to sort this list which has tracepoints as its only purpose. The reason why this sorting is required is to handle the fact that seq files (and any read() operation from user-space) cannot hold the tracepoint mutex across multiple calls, so list entries may vanish between calls. With sorting, the tracepoint iterator becomes usable even if the list don't contain the exact item pointed to by the iterator anymore. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Jason Baron <jbaron@redhat.com> CC: Ingo Molnar <mingo@elte.hu> CC: Lai Jiangshan <laijs@cn.fujitsu.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Thomas Gleixner <tglx@linutronix.de> CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20110810191839.GC8525@Krystal Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-04-04jump label: Introduce static_branch() interfaceJason Baron
Introduce: static __always_inline bool static_branch(struct jump_label_key *key); instead of the old JUMP_LABEL(key, label) macro. In this way, jump labels become really easy to use: Define: struct jump_label_key jump_key; Can be used as: if (static_branch(&jump_key)) do unlikely code enable/disale via: jump_label_inc(&jump_key); jump_label_dec(&jump_key); that's it! For the jump labels disabled case, the static_branch() becomes an atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(), atomic_dec() operations. We show testing results for this change below. Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct. Since we now require a 'struct jump_label_key *key', we can store a pointer into the jump table addresses. In this way, we can enable/disable jump labels, in basically constant time. This change allows us to completely remove the previous hashtable scheme. Thanks to Peter Zijlstra for this re-write. Testing: I ran a series of 'tbench 20' runs 5 times (with reboots) for 3 configurations, where tracepoints were disabled. jump label configured in avg: 815.6 jump label *not* configured in (using atomic reads) avg: 800.1 jump label *not* configured in (regular reads) avg: 803.4 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20110316212947.GA8792@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> Suggested-by: H. Peter Anvin <hpa@linux.intel.com> Tested-by: David Daney <ddaney@caviumnetworks.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-02-03tracepoints: Fix section alignment using pointer arrayMathieu Desnoyers
Make the tracepoints more robust, making them solid enough to handle compiler changes by not relying on anything based on compiler-specific behavior with respect to structure alignment. Implement an approach proposed by David Miller: use an array of const pointers to refer to the individual structures, and export this pointer array through the linker script rather than the structures per se. It will consume 32 extra bytes per tracepoint (24 for structure padding and 8 for the pointers), but are less likely to break due to compiler changes. History: commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() added the aligned(32) type and variable attribute to the tracepoint structures to deal with gcc happily aligning statically defined structures on 32-byte multiples. One attempt was to use a 8-byte alignment for tracepoint structures by applying both the variable and type attribute to tracepoint structures definitions and declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5. The reason is that the "aligned" attribute only specify the _minimum_ alignment for a structure, leaving both the compiler and the linker free to align on larger multiples. Because tracepoint.c expects the structures to be placed as an array within each section, up-alignment cause NULL-pointer exceptions due to the extra unexpected padding. (this patch applies on top of -tip) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: David S. Miller <davem@davemloft.net> LKML-Reference: <20110126222622.GA10794@Krystal> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de> CC: Andrew Morton <akpm@linux-foundation.org> CC: Peter Zijlstra <peterz@infradead.org> CC: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-10-18jump_label: Use more consistent namingPeter Zijlstra
Now that there's still only a few users around, rename things to make them more consistent. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101014203625.448565169@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-22jump label: Tracepoint support for jump labelsJason Baron
Make use of the jump label infrastructure for tracepoints. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <a9ba2056e2c9cf332c3c300b577463ce66ff23a8.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-14tracing: Let tracepoints have data passed to tracepoint callbacksSteven Rostedt
This patch adds data to be passed to tracepoint callbacks. The created functions from DECLARE_TRACE() now need a mandatory data parameter. For example: DECLARE_TRACE(mytracepoint, int value, value) Will create the register function: int register_trace_mytracepoint((void(*)(void *data, int value))probe, void *data); As the first argument, all callbacks (probes) must take a (void *data) parameter. So a callback for the above tracepoint will look like: void myprobe(void *data, int value) { } The callback may choose to ignore the data parameter. This change allows callbacks to register a private data pointer along with the function probe. void mycallback(void *data, int value); register_trace_mytracepoint(mycallback, mydata); Then the mycallback() will receive the "mydata" as the first parameter before the args. A more detailed example: DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status)); /* In the C file */ DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status)); [...] trace_mytracepoint(status); /* In a file registering this tracepoint */ int my_callback(void *data, int status) { struct my_struct my_data = data; [...] } [...] my_data = kmalloc(sizeof(*my_data), GFP_KERNEL); init_my_data(my_data); register_trace_mytracepoint(my_callback, my_data); The same callback can also be registered to the same tracepoint as long as the data registered is different. Note, the data must also be used to unregister the callback: unregister_trace_mytracepoint(my_callback, my_data); Because of the data parameter, tracepoints declared this way can not have no args. That is: DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS()); will cause an error. If no arguments are needed, a new macro can be used instead: DECLARE_TRACE_NOARGS(mytracepoint); Since there are no arguments, the proto and args fields are left out. This is part of a series to make the tracepoint footprint smaller: text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4914025 1088868 861512 6864405 68be15 vmlinux.class 4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint Again, this patch also increases the size of the kernel, but lays the ground work for decreasing it. v5: Fixed net/core/drop_monitor.c to handle these updates. v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the #ifdef CONFIG_TRACE_POINTS, since the two are the same in both cases. The __DECLARE_TRACE() is what changes. Thanks to Frederic Weisbecker for pointing this out. v3: Made all register_* functions require data to be passed and all callbacks to take a void * parameter as its first argument. This makes the calling functions comply with C standards. Also added more comments to the modifications of DECLARE_TRACE(). v2: Made the DECLARE_TRACE() have the ability to pass arguments and added a new DECLARE_TRACE_NOARGS() for tracepoints that do not need any arguments. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-09-21trivial: fix typo "to to" in multiple filesAnand Gadiyar
Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-08-26tracing: Don't trace kernel thread syscallsHendrik Brueckner
Kernel threads don't call syscalls using the sysenter/sysexit path. Instead they directly call the sys_* or do_* functions that implement the syscalls inside the kernel. The current syscall tracepoints only bind the sysenter/sysexit path, then it has no effect to trace the kernel thread calls to syscalls in that path. Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these. Actually there is only one case when a kernel thread can reach the usual syscall exit tracing path: when we create a kernel thread, the child comes to ret_from_fork and is the fork() return is then traced. But this information alone is useless, then we don't want to set the TIF flags for these threads. Kernel threads have task_struct->mm set to NULL. (Thanks to Heiko for that hint ;-) The idea is then to check the mm field in syscall_regfunc() and set the flag accordingly. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> LKML-Reference: <20090825160237.GG4639@cetus.boeblingen.de.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26tracing: Move tracepoint callbacks from declaration to definitionJosh Stone
It's not strictly correct for the tracepoint reg/unreg callbacks to occur when a client is hooking up, because the actual tracepoint may not be present yet. This happens to be fine for syscall, since that's in the core kernel, but it would cause problems for tracepoints defined in a module that hasn't been loaded yet. It also means the reg/unreg has to be EXPORTed for any modules to use the tracepoint (as in SystemTap). This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces DEFINE_TRACE_FN which stores the callbacks in struct tracepoint. The callbacks are used now when the active state of the tracepoint changes in set_tracepoint & disable_tracepoint. This also introduces TRACE_EVENT_FN, so ftrace events can also provide registration callbacks if needed. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26tracing: Make syscall tracepoints conditionalJosh Stone
The syscall enter/exit tracepoints are only supported on archs that HAVE_SYSCALL_TRACEPOINTS, so the declarations should be #ifdef'ed. Also, the definition of syscall_regfunc and syscall_unregfunc should depend on this same config, rather than the ftrace-specific one. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1251150194-1713-3-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-26tracing: Rename FTRACE_SYSCALLS for tracepointsJosh Stone
s/HAVE_FTRACE_SYSCALLS/HAVE_SYSCALL_TRACEPOINTS/g s/TIF_SYSCALL_FTRACE/TIF_SYSCALL_TRACEPOINT/g The syscall enter/exit tracing is no longer specific to just ftrace, so they now have names that reflect their tie to tracepoints instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> LKML-Reference: <1251150194-1713-2-git-send-email-jistone@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-08-25tracing: Eliminate code duplication in kernel/tracepoint.cAnirban Sinha
Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: "Oleg Nesterov" <oleg@tv-sign.ru> LKML-Reference: <DDFD17CC94A9BD49A82147DDF7D545C501EA9047@exchange.ZeugmaSystems.local> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-13tracing: Fix syscall tracing on !HAVE_FTRACE_SYSCALLS architecturesIngo Molnar
The new syscall_regfunc()/unregfunc() functions rely on the existence of TIF_SYSCALL_FTRACE - but that TIF flag is only offered by HAVE_FTRACE_SYSCALLS. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-11tracing: Add syscall tracepointsJason Baron
add two tracepoints in syscall exit and entry path, conditioned on TIF_SYSCALL_FTRACE. Supports the syscall trace event code. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-03-18tracepoints: dont update zero-sized tracepoint sectionsIngo Molnar
Zero-sized tracepoint sections can occur if tracing is enabled but no tracepoint is defined. Do not emit a warning in that case. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1237394936.3132.1.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18tracing: fix oops in tracepoint_update_probe_range()Jaswinder Singh Rajput
Change this crash: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8107d4de>] tracepoint_update_probe_range+0x1f/0x9b PGD 13d5fb067 PUD 13d688067 PMD 0 Oops: 0000 [#1] SMP To a more debuggable WARN_ONCE(). Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237394936.3132.1.camel@localhost.localdomain> [ moved the check outside the lock and added a WARN_ON(). ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16markers/tracpoints: fix non-modular buildIngo Molnar
fix: kernel/marker.c: In function 'marker_module_notify': kernel/marker.c:905: error: 'MODULE_STATE_COMING' undeclared (first use in this function) [...] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracepoints: use modules notifiersMathieu Desnoyers
Impact: cleanup Use module notifiers for tracepoint updates rather than adding a hook in module.c. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracepoints: fix disableMathieu Desnoyers
Impact: fix race Set the probe array pointer to NULL when the tracepoint is disabled. The probe array point not being NULL could generate a race condition where the reader would dereference a freed pointer. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03tracepoint: introduce *_noupdate APIs.Lai Jiangshan
Impact: add new tracepoint APIs to allow the batched registration of probes new APIs separate tracepoint_probe_register(), tracepoint_probe_unregister() into 2 steps. The first step of them is just update tracepoint_entry, not connect or disconnect. this patch introduces tracepoint_probe_update_all() for update all. these APIs are very useful for registering lots of probes but just updating once. Another very important thing is that *_noupdate APIs do not require module_mutex. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03tracepoint: simplification for tracepoints using RCULai Jiangshan
Impact: simplify implementation Now, unused memory is handled by struct tp_probes. old code use these three field to handle unused memory. struct tracepoint_entry { ... struct rcu_head rcu; void *oldptr; unsigned char rcu_pending:1; ... }; in this way, unused memory is handled by struct tracepoint_entry. it bring reenter bug(it was fixed) and tracepoint.c is filled full of ".*rcu.*" code statements. this patch removes all these. and: rcu_barrier_sched() is removed. Do not need regain tracepoints_mutex after tracepoint_update_probes() several little cleanup. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-27tracepoint: check if the probe has been registeredFrederic Weisbecker
Impact: fix kernel crash that can trigger during tracing If we try to remove a probe that has not been already registered, the tracepoint_entry_remove_probe() function will dereference a NULL pointer. Check the probe before removing it to avoid crashes. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14tracepoints: fix reentrancyMathieu Desnoyers
The tracepoints had the same problem markers did have wrt reentrancy. Apply a similar fix using a rcu_barrier after each tracepoint mutex lock. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14tracepoints: use rcu schedMathieu Desnoyers
Make tracepoints use rcu sched. (cleanup) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14tracepoints: use TABLE_SIZE macroMathieu Desnoyers
Steven Rostedt suggested: | Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ? Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-14tracing: Kernel TracepointsMathieu Desnoyers
Implementation of kernel tracepoints. Inspired from the Linux Kernel Markers. Allows complete typing verification by declaring both tracing statement inline functions and probe registration/unregistration static inline functions within the same macro "DEFINE_TRACE". No format string is required. See the tracepoint Documentation and Samples patches for usage examples. Taken from the documentation patch : "A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or "off" (no probe is attached). When a tracepoint is "off" it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section). When a tracepoint is "on", the function you provide is called each time the tracepoint is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the tracepoint site). You can put tracepoints at important locations in the code. They are lightweight hooks that can pass an arbitrary number of parameters, which prototypes are described in a tracepoint declaration placed in a header file." Addition and removal of tracepoints is synchronized by RCU using the scheduler (and preempt_disable) as guarantees to find a quiescent state (this is really RCU "classic"). The update side uses rcu_barrier_sched() with call_rcu_sched() and the read/execute side uses "preempt_disable()/preempt_enable()". We make sure the previous array containing probes, which has been scheduled for deletion by the rcu callback, is indeed freed before we proceed to the next update. It therefore limits the rate of modification of a single tracepoint to one update per RCU period. The objective here is to permit fast batch add/removal of probes on _different_ tracepoints. Changelog : - Use #name ":" #proto as string to identify the tracepoint in the tracepoint table. This will make sure not type mismatch happens due to connexion of a probe with the wrong type to a tracepoint declared with the same name in a different header. - Add tracepoint_entry_free_old. - Change __TO_TRACE to get rid of the 'i' iterator. Masami Hiramatsu <mhiramat@redhat.com> : Tested on x86-64. Performance impact of a tracepoint : same as markers, except that it adds about 70 bytes of instructions in an unlikely branch of each instrumented function (the for loop, the stack setup and the function call). It currently adds a memory read, a test and a conditional branch at the instrumentation site (in the hot path). Immediate values will eventually change this into a load immediate, test and branch, which removes the memory read which will make the i-cache impact smaller (changing the memory read for a load immediate removes 3-4 bytes per site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it also saves the d-cache hit). About the performance impact of tracepoints (which is comparable to markers), even without immediate values optimizations, tests done by Hideo Aoki on ia64 show no regression. His test case was using hackbench on a kernel where scheduler instrumentation (about 5 events in code scheduler code) was added. Quoting Hideo Aoki about Markers : I evaluated overhead of kernel marker using linux-2.6-sched-fixes git tree, which includes several markers for LTTng, using an ia64 server. While the immediate trace mark feature isn't implemented on ia64, there is no major performance regression. So, I think that we don't have any issues to propose merging marker point patches into Linus's tree from the viewpoint of performance impact. I prepared two kernels to evaluate. The first one was compiled without CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS. I downloaded the original hackbench from the following URL: http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c I ran hackbench 5 times in each condition and calculated the average and difference between the kernels. The parameter of hackbench: every 50 from 50 to 800 The number of CPUs of the server: 2, 4, and 8 Below is the results. As you can see, major performance regression wasn't found in any case. Even if number of processes increases, differences between marker-enabled kernel and marker- disabled kernel doesn't increase. Moreover, if number of CPUs increases, the differences doesn't increase either. Curiously, marker-enabled kernel is better than marker-disabled kernel in more than half cases, although I guess it comes from the difference of memory access pattern. * 2 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 4.811 | 4.872 | +0.061 | +1.27 | 100 | 9.854 | 10.309 | +0.454 | +4.61 | 150 | 15.602 | 15.040 | -0.562 | -3.6 | 200 | 20.489 | 20.380 | -0.109 | -0.53 | 250 | 25.798 | 25.652 | -0.146 | -0.56 | 300 | 31.260 | 30.797 | -0.463 | -1.48 | 350 | 36.121 | 35.770 | -0.351 | -0.97 | 400 | 42.288 | 42.102 | -0.186 | -0.44 | 450 | 47.778 | 47.253 | -0.526 | -1.1 | 500 | 51.953 | 52.278 | +0.325 | +0.63 | 550 | 58.401 | 57.700 | -0.701 | -1.2 | 600 | 63.334 | 63.222 | -0.112 | -0.18 | 650 | 68.816 | 68.511 | -0.306 | -0.44 | 700 | 74.667 | 74.088 | -0.579 | -0.78 | 750 | 78.612 | 79.582 | +0.970 | +1.23 | 800 | 85.431 | 85.263 | -0.168 | -0.2 | -------------------------------------------------------------- * 4 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.586 | 2.584 | -0.003 | -0.1 | 100 | 5.254 | 5.283 | +0.030 | +0.56 | 150 | 8.012 | 8.074 | +0.061 | +0.76 | 200 | 11.172 | 11.000 | -0.172 | -1.54 | 250 | 13.917 | 14.036 | +0.119 | +0.86 | 300 | 16.905 | 16.543 | -0.362 | -2.14 | 350 | 19.901 | 20.036 | +0.135 | +0.68 | 400 | 22.908 | 23.094 | +0.186 | +0.81 | 450 | 26.273 | 26.101 | -0.172 | -0.66 | 500 | 29.554 | 29.092 | -0.461 | -1.56 | 550 | 32.377 | 32.274 | -0.103 | -0.32 | 600 | 35.855 | 35.322 | -0.533 | -1.49 | 650 | 39.192 | 38.388 | -0.804 | -2.05 | 700 | 41.744 | 41.719 | -0.025 | -0.06 | 750 | 45.016 | 44.496 | -0.520 | -1.16 | 800 | 48.212 | 47.603 | -0.609 | -1.26 | -------------------------------------------------------------- * 8 CPUs Number of | without | with | diff | diff | processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] | -------------------------------------------------------------- 50 | 2.094 | 2.072 | -0.022 | -1.07 | 100 | 4.162 | 4.273 | +0.111 | +2.66 | 150 | 6.485 | 6.540 | +0.055 | +0.84 | 200 | 8.556 | 8.478 | -0.078 | -0.91 | 250 | 10.458 | 10.258 | -0.200 | -1.91 | 300 | 12.425 | 12.750 | +0.325 | +2.62 | 350 | 14.807 | 14.839 | +0.032 | +0.22 | 400 | 16.801 | 16.959 | +0.158 | +0.94 | 450 | 19.478 | 19.009 | -0.470 | -2.41 | 500 | 21.296 | 21.504 | +0.208 | +0.98 | 550 | 23.842 | 23.979 | +0.137 | +0.57 | 600 | 26.309 | 26.111 | -0.198 | -0.75 | 650 | 28.705 | 28.446 | -0.259 | -0.9 | 700 | 31.233 | 31.394 | +0.161 | +0.52 | 750 | 34.064 | 33.720 | -0.344 | -1.01 | 800 | 36.320 | 36.114 | -0.206 | -0.57 | -------------------------------------------------------------- Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: 'Peter Zijlstra' <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>