diff options
author | Alexei Starovoitov <ast@kernel.org> | 2022-11-20 09:16:21 -0800 |
---|---|---|
committer | Alexei Starovoitov <ast@kernel.org> | 2022-11-20 09:16:21 -0800 |
commit | efc1970d683fa7c53a2bc561d40436bf11a18dc0 (patch) | |
tree | de1d9429ce4cf0c4308ffda5fb17f0f61b25829f /kernel | |
parent | ee748cd95e3adf4acdb05194b2ea68e4073e09b6 (diff) | |
parent | fe147956fca4604b920e6be652abc9bea8ce8952 (diff) |
Merge branch 'Support storing struct task_struct objects as kptrs'
David Vernet says:
====================
Now that BPF supports adding new kernel functions with kfuncs, and
storing kernel objects in maps with kptrs, we can add a set of kfuncs
which allow struct task_struct objects to be stored in maps as
referenced kptrs.
The possible use cases for doing this are plentiful. During tracing,
for example, it would be useful to be able to collect some tasks that
performed a certain operation, and then periodically summarize who they
are, which cgroup they're in, how much CPU time they've utilized, etc.
Doing this now would require storing the tasks' pids along with some
relevant data to be exported to user space, and later associating the
pids to tasks in other event handlers where the data is recorded.
Another useful by-product of this is that it allows a program to pin a
task in a BPF program, and by proxy therefore also e.g. pin its task
local storage.
In order to support this, we'll need to expand KF_TRUSTED_ARGS to
support receiving trusted, non-refcounted pointers. It currently only
supports either PTR_TO_CTX pointers, or refcounted pointers. What this
means in terms of the implementation is that check_kfunc_args() would
have to also check for the PTR_TRUSTED or MEM_ALLOC type modifiers when
determining if a trusted KF_ARG_PTR_TO_ALLOC_BTF_ID or
KF_ARG_PTR_TO_BTF_ID pointer requires a refcount.
Note that PTR_UNTRUSTED is insufficient for this purpose, as it does not
cover all of the possible types of potentially unsafe pointers. For
example, a pointer obtained from walking a struct is not PTR_UNTRUSTED.
To account for this and enable us to expand KF_TRUSTED_ARGS to include
allow-listed arguments such as those passed by the kernel to tracepoints
and struct_ops callbacks, this patch set also introduces a new
PTR_TRUSTED type flag modifier which records if a pointer was obtained
passed from the kernel in a trusted context.
Currently, both PTR_TRUSTED and MEM_ALLOC are used to imply that a
pointer is trusted. Longer term, PTR_TRUSTED should be the sole source
of truth for whether a pointer is trusted. This requires us to set
PTR_TRUSTED when appropriate (e.g. when setting MEM_ALLOC), and unset it
when appropriate (e.g. when setting PTR_UNTRUSTED). We don't do that in
this patch, as we need to do more clean up before this can be done in a
clear and well-defined manner.
In closing, this patch set:
1. Adds the new PTR_TRUSTED register type modifier flag, and updates the
verifier and existing selftests accordingly. Also expands
KF_TRUSTED_ARGS to also include trusted pointers that were not obtained
from walking structs.
2. Adds a new set of kfuncs that allows struct task_struct* objects to be
used as kptrs.
3. Adds a new selftest suite to validate these new task kfuncs.
---
Changelog:
v8 -> v9:
- Moved check for release register back to where we check for
!PTR_TO_BTF_ID || socket. Change the verifier log message to
reflect really what's being tested (the presence of unsafe
modifiers) (Alexei)
- Fix verifier_test error tests to reflect above changes
- Remove unneeded parens around bitwise operator checks (Alexei)
- Move updates to reg_type_str() which allow multiple type modifiers
to be present in the prefix string, to a separate patch (Alexei)
- Increase TYPE_STR_BUF_LEN size to 128 to reflect larger prefix size
in reg_type_str().
v7 -> v8:
- Rebased onto Kumar's latest patch set which, adds a new MEM_ALLOC reg
type modifier for bpf_obj_new() calls.
- Added comments to bpf_task_kptr_get() describing some of the subtle
races we're protecting against (Alexei and John)
- Slightly rework process_kf_arg_ptr_to_btf_id(), and add a new
reg_has_unsafe_modifiers() function which validates that a register
containing a kfunc release arg doesn't have unsafe modifiers. Note
that this is slightly different than the check for KF_TRUSTED_ARGS.
An alternative here would be to treat KF_RELEASE as implicitly
requiring KF_TRUSTED_ARGS.
- Export inline bpf_type_has_unsafe_modifiers() function from
bpf_verifier.h so that it can be used from bpf_tcp_ca.c. Eventually this
function should likely be changed to bpf_type_is_trusted(), once
PTR_TRUSTED is the real source of truth.
v6 -> v7:
- Removed the PTR_WALKED type modifier, and instead define a new
PTR_TRUSTED type modifier which is set on registers containing
pointers passed from trusted contexts (i.e. as tracepoint or
struct_ops callback args) (Alexei)
- Remove the new KF_OWNED_ARGS kfunc flag. This can be accomplished
by defining a new type that wraps an existing type, such as with
struct nf_conn___init (Alexei)
- Add a test_task_current_acquire_release testcase which verifies we can
acquire a task struct returned from bpf_get_current_task_btf().
- Make bpf_task_acquire() no longer return NULL, as it can no longer be
called with a NULL task.
- Removed unnecessary is_test_kfunc_task() checks from failure
testcases.
v5 -> v6:
- Add a new KF_OWNED_ARGS kfunc flag which may be used by kfuncs to
express that they require trusted, refcounted args (Kumar)
- Rename PTR_NESTED -> PTR_WALKED in the verifier (Kumar)
- Convert reg_type_str() prefixes to use snprintf() instead of strncpy()
(Kumar)
- Add PTR_TO_BTF_ID | PTR_WALKED to missing struct btf_reg_type
instances -- specifically btf_id_sock_common_types, and
percpu_btf_ptr_types.
- Add a missing PTR_TO_BTF_ID | PTR_WALKED switch case entry in
check_func_arg_reg_off(), which is required when validating helper
calls (Kumar)
- Update reg_type_mismatch_ok() to check base types for the registers
(i.e. to accommodate type modifiers). Additionally, add a lengthy
comment that explains why this is being done (Kumar)
- Update convert_ctx_accesses() to also issue probe reads for
PTR_TO_BTF_ID | PTR_WALKED (Kumar)
- Update selftests to expect new prefix reg type strings.
- Rename task_kfunc_acquire_trusted_nested testcase to
task_kfunc_acquire_trusted_walked, and fix a comment (Kumar)
- Remove KF_TRUSTED_ARGS from bpf_task_release(), which already includes
KF_RELEASE (Kumar)
- Add bpf-next in patch subject lines (Kumar)
v4 -> v5:
- Fix an improperly formatted patch title.
v3 -> v4:
- Remove an unnecessary check from my repository that I forgot to remove
after debugging something.
v2 -> v3:
- Make bpf_task_acquire() check for NULL, and include KF_RET_NULL
(Martin)
- Include new PTR_NESTED register modifier type flag which specifies
whether a pointer was obtained from walking a struct. Use this to
expand the meaning of KF_TRUSTED_ARGS to include trusted pointers that
were passed from the kernel (Kumar)
- Add more selftests to the task_kfunc selftest suite which verify that
you cannot pass a walked pointer to bpf_task_acquire().
- Update bpf_task_acquire() to also specify KF_TRUSTED_ARGS.
v1 -> v2:
- Rename tracing_btf_ids to generic_kfunc_btf_ids, and add the new
kfuncs to that list instead of making a separate btf id list (Alexei).
- Don't run the new selftest suite on s390x, which doesn't appear to
support invoking kfuncs.
- Add a missing __diag_ignore block for -Wmissing-prototypes
(lkp@intel.com).
- Fix formatting on some of the SPDX-License-Identifier tags.
- Clarified the function header comment a bit on bpf_task_kptr_get().
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/bpf/btf.c | 8 | ||||
-rw-r--r-- | kernel/bpf/helpers.c | 78 | ||||
-rw-r--r-- | kernel/bpf/verifier.c | 84 | ||||
-rw-r--r-- | kernel/trace/bpf_trace.c | 2 |
4 files changed, 147 insertions, 25 deletions
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index f7d5fab61535..d52054ec69c9 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5799,6 +5799,11 @@ static u32 get_ctx_arg_idx(struct btf *btf, const struct btf_type *func_proto, return nr_args + 1; } +static bool prog_type_args_trusted(enum bpf_prog_type prog_type) +{ + return prog_type == BPF_PROG_TYPE_TRACING || prog_type == BPF_PROG_TYPE_STRUCT_OPS; +} + bool btf_ctx_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info) @@ -5942,6 +5947,9 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, } info->reg_type = PTR_TO_BTF_ID; + if (prog_type_args_trusted(prog->type)) + info->reg_type |= PTR_TRUSTED; + if (tgt_prog) { enum bpf_prog_type tgt_type; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 212e791d7452..89a95f3d854c 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1824,6 +1824,63 @@ struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head) return __bpf_list_del(head, true); } +/** + * bpf_task_acquire - Acquire a reference to a task. A task acquired by this + * kfunc which is not stored in a map as a kptr, must be released by calling + * bpf_task_release(). + * @p: The task on which a reference is being acquired. + */ +struct task_struct *bpf_task_acquire(struct task_struct *p) +{ + refcount_inc(&p->rcu_users); + return p; +} + +/** + * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task + * kptr acquired by this kfunc which is not subsequently stored in a map, must + * be released by calling bpf_task_release(). + * @pp: A pointer to a task kptr on which a reference is being acquired. + */ +struct task_struct *bpf_task_kptr_get(struct task_struct **pp) +{ + struct task_struct *p; + + rcu_read_lock(); + p = READ_ONCE(*pp); + + /* Another context could remove the task from the map and release it at + * any time, including after we've done the lookup above. This is safe + * because we're in an RCU read region, so the task is guaranteed to + * remain valid until at least the rcu_read_unlock() below. + */ + if (p && !refcount_inc_not_zero(&p->rcu_users)) + /* If the task had been removed from the map and freed as + * described above, refcount_inc_not_zero() will return false. + * The task will be freed at some point after the current RCU + * gp has ended, so just return NULL to the user. + */ + p = NULL; + rcu_read_unlock(); + + return p; +} + +/** + * bpf_task_release - Release the reference acquired on a struct task_struct *. + * If this kfunc is invoked in an RCU read region, the task_struct is + * guaranteed to not be freed until the current grace period has ended, even if + * its refcount drops to 0. + * @p: The task on which a reference is being released. + */ +void bpf_task_release(struct task_struct *p) +{ + if (!p) + return; + + put_task_struct_rcu_user(p); +} + __diag_pop(); BTF_SET8_START(generic_btf_ids) @@ -1836,6 +1893,9 @@ BTF_ID_FLAGS(func, bpf_list_push_front) BTF_ID_FLAGS(func, bpf_list_push_back) BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE) BTF_SET8_END(generic_btf_ids) static const struct btf_kfunc_id_set generic_kfunc_set = { @@ -1843,14 +1903,26 @@ static const struct btf_kfunc_id_set generic_kfunc_set = { .set = &generic_btf_ids, }; +BTF_ID_LIST(generic_dtor_ids) +BTF_ID(struct, task_struct) +BTF_ID(func, bpf_task_release) + static int __init kfunc_init(void) { int ret; + const struct btf_id_dtor_kfunc generic_dtors[] = { + { + .btf_id = generic_dtor_ids[0], + .kfunc_btf_id = generic_dtor_ids[1] + }, + }; ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set); - if (ret) - return ret; - return register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &generic_kfunc_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set); + return ret ?: register_btf_id_dtor_kfuncs(generic_dtors, + ARRAY_SIZE(generic_dtors), + THIS_MODULE); } late_initcall(kfunc_init); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 195d24316750..5bc9d84d7924 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -557,7 +557,7 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn) static const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type) { - char postfix[16] = {0}, prefix[32] = {0}; + char postfix[16] = {0}, prefix[64] = {0}; static const char * const str[] = { [NOT_INIT] = "?", [SCALAR_VALUE] = "scalar", @@ -589,16 +589,14 @@ static const char *reg_type_str(struct bpf_verifier_env *env, strncpy(postfix, "_or_null", 16); } - if (type & MEM_RDONLY) - strncpy(prefix, "rdonly_", 32); - if (type & MEM_RINGBUF) - strncpy(prefix, "ringbuf_", 32); - if (type & MEM_USER) - strncpy(prefix, "user_", 32); - if (type & MEM_PERCPU) - strncpy(prefix, "percpu_", 32); - if (type & PTR_UNTRUSTED) - strncpy(prefix, "untrusted_", 32); + snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s", + type & MEM_RDONLY ? "rdonly_" : "", + type & MEM_RINGBUF ? "ringbuf_" : "", + type & MEM_USER ? "user_" : "", + type & MEM_PERCPU ? "percpu_" : "", + type & PTR_UNTRUSTED ? "untrusted_" : "", + type & PTR_TRUSTED ? "trusted_" : "" + ); snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s", prefix, str[base_type(type)], postfix); @@ -3859,7 +3857,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno) { const char *targ_name = kernel_type_name(kptr_field->kptr.btf, kptr_field->kptr.btf_id); - int perm_flags = PTR_MAYBE_NULL; + int perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED; const char *reg_name = ""; /* Only unreferenced case accepts untrusted pointers */ @@ -4735,6 +4733,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env, if (type_flag(reg->type) & PTR_UNTRUSTED) flag |= PTR_UNTRUSTED; + /* Any pointer obtained from walking a trusted pointer is no longer trusted. */ + flag &= ~PTR_TRUSTED; + if (atype == BPF_READ && value_regno >= 0) mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag); @@ -5847,6 +5848,7 @@ static const struct bpf_reg_types btf_id_sock_common_types = { PTR_TO_TCP_SOCK, PTR_TO_XDP_SOCK, PTR_TO_BTF_ID, + PTR_TO_BTF_ID | PTR_TRUSTED, }, .btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON], }; @@ -5887,8 +5889,18 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } }; static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } }; static const struct bpf_reg_types ringbuf_mem_types = { .types = { PTR_TO_MEM | MEM_RINGBUF } }; static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } }; -static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } }; -static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } }; +static const struct bpf_reg_types btf_ptr_types = { + .types = { + PTR_TO_BTF_ID, + PTR_TO_BTF_ID | PTR_TRUSTED, + }, +}; +static const struct bpf_reg_types percpu_btf_ptr_types = { + .types = { + PTR_TO_BTF_ID | MEM_PERCPU, + PTR_TO_BTF_ID | MEM_PERCPU | PTR_TRUSTED, + } +}; static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } }; static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } }; static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } }; @@ -5976,7 +5988,7 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno, return -EACCES; found: - if (reg->type == PTR_TO_BTF_ID) { + if (reg->type == PTR_TO_BTF_ID || reg->type & PTR_TRUSTED) { /* For bpf_sk_release, it needs to match against first member * 'struct sock_common', hence make an exception for it. This * allows bpf_sk_release to work for multiple socket types. @@ -6058,6 +6070,8 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env, */ case PTR_TO_BTF_ID: case PTR_TO_BTF_ID | MEM_ALLOC: + case PTR_TO_BTF_ID | PTR_TRUSTED: + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED: /* When referenced PTR_TO_BTF_ID is passed to release function, * it's fixed offset must be 0. In the other cases, fixed offset * can be non-zero. @@ -7942,6 +7956,25 @@ static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg) return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET); } +static bool is_trusted_reg(const struct bpf_reg_state *reg) +{ + /* A referenced register is always trusted. */ + if (reg->ref_obj_id) + return true; + + /* If a register is not referenced, it is trusted if it has either the + * MEM_ALLOC or PTR_TRUSTED type modifiers, and no others. Some of the + * other type modifiers may be safe, but we elect to take an opt-in + * approach here as some (e.g. PTR_UNTRUSTED and PTR_MAYBE_NULL) are + * not. + * + * Eventually, we should make PTR_TRUSTED the single source of truth + * for whether a register is trusted. + */ + return type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS && + !bpf_type_has_unsafe_modifiers(reg->type); +} + static bool __kfunc_param_match_suffix(const struct btf *btf, const struct btf_param *arg, const char *suffix) @@ -8223,7 +8256,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env, const char *reg_ref_tname; u32 reg_ref_id; - if (reg->type == PTR_TO_BTF_ID) { + if (base_type(reg->type) == PTR_TO_BTF_ID) { reg_btf = reg->btf; reg_ref_id = reg->btf_id; } else { @@ -8369,6 +8402,7 @@ static int check_reg_allocation_locked(struct bpf_verifier_env *env, struct bpf_ ptr = reg->map_ptr; break; case PTR_TO_BTF_ID | MEM_ALLOC: + case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED: ptr = reg->btf; break; default: @@ -8599,8 +8633,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ case KF_ARG_PTR_TO_BTF_ID: if (!is_kfunc_trusted_args(meta)) break; - if (!reg->ref_obj_id) { - verbose(env, "R%d must be referenced\n", regno); + + if (!is_trusted_reg(reg)) { + verbose(env, "R%d must be referenced or trusted\n", regno); return -EINVAL; } fallthrough; @@ -8705,9 +8740,13 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ break; case KF_ARG_PTR_TO_BTF_ID: /* Only base_type is checked, further checks are done here */ - if (reg->type != PTR_TO_BTF_ID && - (!reg2btf_ids[base_type(reg->type)] || type_flag(reg->type))) { - verbose(env, "arg#%d expected pointer to btf or socket\n", i); + if ((base_type(reg->type) != PTR_TO_BTF_ID || + bpf_type_has_unsafe_modifiers(reg->type)) && + !reg2btf_ids[base_type(reg->type)]) { + verbose(env, "arg#%d is %s ", i, reg_type_str(env, reg->type)); + verbose(env, "expected %s or socket\n", + reg_type_str(env, base_type(reg->type) | + (type_flag(reg->type) & BPF_REG_TRUSTED_MODIFIERS))); return -EINVAL; } ret = process_kf_arg_ptr_to_btf_id(env, reg, ref_t, ref_tname, ref_id, meta, i); @@ -14716,6 +14755,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) break; case PTR_TO_BTF_ID: case PTR_TO_BTF_ID | PTR_UNTRUSTED: + case PTR_TO_BTF_ID | PTR_TRUSTED: /* PTR_TO_BTF_ID | MEM_ALLOC always has a valid lifetime, unlike * PTR_TO_BTF_ID, and an active ref_obj_id, but the same cannot * be said once it is marked PTR_UNTRUSTED, hence we must handle @@ -14723,6 +14763,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) * for this case. */ case PTR_TO_BTF_ID | MEM_ALLOC | PTR_UNTRUSTED: + case PTR_TO_BTF_ID | PTR_UNTRUSTED | PTR_TRUSTED: + case PTR_TO_BTF_ID | PTR_UNTRUSTED | MEM_ALLOC | PTR_TRUSTED: if (type == BPF_READ) { insn->code = BPF_LDX | BPF_PROBE_MEM | BPF_SIZE((insn)->code); diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f2d8d070d024..5b9008bc597b 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -774,7 +774,7 @@ BPF_CALL_0(bpf_get_current_task_btf) const struct bpf_func_proto bpf_get_current_task_btf_proto = { .func = bpf_get_current_task_btf, .gpl_only = true, - .ret_type = RET_PTR_TO_BTF_ID, + .ret_type = RET_PTR_TO_BTF_ID_TRUSTED, .ret_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], }; |