Merge branch 'bpf, mm: introduce cgroup.memory=nobpf'

Yafang Shao says: ==================== The bpf memory accouting has some known problems in contianer environment, - The container memory usage is not consistent if there's pinned bpf program After the container restart, the leftover bpf programs won't account to the new generation, so the memory usage of the container is not consistent. This issue can be resolved by introducing selectable memcg, but we don't have an agreement on the solution yet. See also the discussions at https://lwn.net/Articles/905150/ . - The leftover non-preallocated bpf map can't be limited The leftover bpf map will be reparented, and thus it will be limited by the parent, rather than the container itself. Furthermore, if the parent is destroyed, it be will limited by its parent's parent, and so on. It can also be resolved by introducing selectable memcg. - The memory dynamically allocated in bpf prog is charged into root memcg only Nowdays the bpf prog can dynamically allocate memory, for example via bpf_obj_new(), but it only allocate from the global bpf_mem_alloc pool, so it will charge into root memcg only. That needs to be addressed by a new proposal. So let's give the container user an option to disable bpf memory accouting. The idea of "cgroup.memory=nobpf" is originally by Tejun[1]. [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/ Changes, v1->v2: - squash patches (Roman) - commit log improvement in patch #2. (Johannes) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
author: Alexei Starovoitov <ast@kernel.org> 2023-02-10 18:59:57 -0800
committer: Alexei Starovoitov <ast@kernel.org> 2023-02-10 19:00:13 -0800
commit: ab86cf337a5b64f0456d2d0d01edc939ad5c20bb (patch)
tree: 6c9048dbac70ddf85660493933287c979ac4921d /kernel/bpf/syscall.c
parent: 7e2a9ebe8126206157056cca5dd30a656f3d8a81 (diff)
parent: bf3965082491601bf9cd6d9a0ce2d88cb219168a (diff)
1 files changed, 18 insertions, 2 deletions
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bcc97613de768..cda8d00f37622 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -309,7 +309,7 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
 	 * __GFP_RETRY_MAYFAIL to avoid such situations.
 	 */
 
-	const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO | __GFP_ACCOUNT;
+	gfp_t gfp = bpf_memcg_flags(__GFP_NOWARN | __GFP_ZERO);
 	unsigned int flags = 0;
 	unsigned long align = 1;
 	void *area;
@@ -418,7 +418,8 @@ static void bpf_map_save_memcg(struct bpf_map *map)
 	 * So we have to check map->objcg for being NULL each time it's
 	 * being used.
 	 */
-	map->objcg = get_obj_cgroup_from_current();
+	if (memcg_bpf_enabled())
+		map->objcg = get_obj_cgroup_from_current();
 }
 
 static void bpf_map_release_memcg(struct bpf_map *map)
@@ -464,6 +465,21 @@ void *bpf_map_kzalloc(const struct bpf_map *map, size_t size, gfp_t flags)
 	return ptr;
 }
 
+void *bpf_map_kvcalloc(struct bpf_map *map, size_t n, size_t size,
+		       gfp_t flags)
+{
+	struct mem_cgroup *memcg, *old_memcg;
+	void *ptr;
+
+	memcg = bpf_map_get_memcg(map);
+	old_memcg = set_active_memcg(memcg);
+	ptr = kvcalloc(n, size, flags | __GFP_ACCOUNT);
+	set_active_memcg(old_memcg);
+	mem_cgroup_put(memcg);
+
+	return ptr;
+}
+
 void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
 				    size_t align, gfp_t flags)
 {
author	Alexei Starovoitov <ast@kernel.org>	2023-02-10 18:59:57 -0800
committer	Alexei Starovoitov <ast@kernel.org>	2023-02-10 19:00:13 -0800
commit	ab86cf337a5b64f0456d2d0d01edc939ad5c20bb (patch)
tree	6c9048dbac70ddf85660493933287c979ac4921d /kernel/bpf/syscall.c
parent	7e2a9ebe8126206157056cca5dd30a656f3d8a81 (diff)
parent	bf3965082491601bf9cd6d9a0ce2d88cb219168a (diff)