summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-03libbpf: Add base BTF accessorAndrii Nakryiko
Add ability to get base BTF. It can be also used to check if BTF is split BTF. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202065244.530571-3-andrii@kernel.org
2020-12-03tools/bpftool: Emit name <anon> for anonymous BTFsAndrii Nakryiko
For consistency of output, emit "name <anon>" for BTFs without the name. This keeps output more consistent and obvious. Suggested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202065244.530571-2-andrii@kernel.org
2020-12-03uapi: fix statx attribute value overlap for DAX & MOUNT_ROOTEric Sandeen
STATX_ATTR_MOUNT_ROOT and STATX_ATTR_DAX got merged with the same value, so one of them needs fixing. Move STATX_ATTR_DAX. While we're in here, clarify the value-matching scheme for some of the attributes, and explain why the value for DAX does not match. Fixes: 80340fe3605c ("statx: add mount_root") Fixes: 712b2698e4c0 ("fs/stat: Define DAX statx attribute") Link: https://lore.kernel.org/linux-fsdevel/7027520f-7c79-087e-1d00-743bdefa1a1e@redhat.com/ Link: https://lore.kernel.org/lkml/20201202214629.1563760-1-ira.weiny@intel.com/ Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: <stable@vger.kernel.org> # 5.8 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-03pwm: sl28cpld: fix getting driver data in pwm callbacksUwe Kleine-König
Currently .get_state() and .apply() use dev_get_drvdata() on the struct device related to the pwm chip. This only works after .probe() called platform_set_drvdata() which in this driver happens only after pwmchip_add() and so comes possibly too late. Instead of setting the driver data earlier use the traditional container_of approach as this way the driver data is conceptually and computational nearer. Fixes: 9db33d221efc ("pwm: Add support for sl28cpld PWM controller") Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-03lib/syscall: fix syscall registers retrieval on 32-bit platformsWilly Tarreau
Lilith >_> and Claudio Bozzato of Cisco Talos security team reported that collect_syscall() improperly casts the syscall registers to 64-bit values leaking the uninitialized last 24 bytes on 32-bit platforms, that are visible in /proc/self/syscall. The cause is that info->data.args are u64 while syscall_get_arguments() uses longs, as hinted by the bogus pointer cast in the function. Let's just proceed like the other call places, by retrieving the registers into an array of longs before assigning them to the caller's array. This was successfully tested on x86_64, i386 and ppc32. Reference: CVE-2020-28588, TALOS-2020-1211 Fixes: 631b7abacd02 ("ptrace: Remove maxargs from task_current_syscall()") Cc: Greg KH <greg@kroah.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> (ppc32) Signed-off-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-03macvlan: Support for high multicast packet rateThomas Karlsson
Background: Broadcast and multicast packages are enqueued for later processing. This queue was previously hardcoded to 1000. This proved insufficient for handling very high packet rates. This resulted in packet drops for multicast. While at the same time unicast worked fine. The change: This patch make the queue length adjustable to accommodate for environments with very high multicast packet rate. But still keeps the default value of 1000 unless specified. The queue length is specified as a request per macvlan using the IFLA_MACVLAN_BC_QUEUE_LEN parameter. The actual used queue length will then be the maximum of any macvlan connected to the same port. The actual used queue length for the port can be retrieved (read only) by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification. This will be followed up by a patch to iproute2 in order to adjust the parameter from userspace. Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se> Link: https://lore.kernel.org/r/dd4673b2-7eab-edda-6815-85c67ce87f63@paneda.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-03rtw88: debug: Fix uninitialized memory in debugfs codeDan Carpenter
This code does not ensure that the whole buffer is initialized and none of the callers check for errors so potentially none of the buffer is initialized. Add a memset to eliminate this bug. Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/X8ilOfVz3pf0T5ec@mwanda
2020-12-02Merge branch 'switch to memcg-based memory accounting'Alexei Starovoitov
Roman Gushchin says: ==================== Currently bpf is using the memlock rlimit for the memory accounting. This approach has its downsides and over time has created a significant amount of problems: 1) The limit is per-user, but because most bpf operations are performed as root, the limit has a little value. 2) It's hard to come up with a specific maximum value. Especially because the counter is shared with non-bpf use cases (e.g. memlock()). Any specific value is either too low and creates false failures or is too high and useless. 3) Charging is not connected to the actual memory allocation. Bpf code should manually calculate the estimated cost and charge the counter, and then take care of uncharging, including all fail paths. It adds to the code complexity and makes it easy to leak a charge. 4) There is no simple way of getting the current value of the counter. We've used drgn for it, but it's far from being convenient. 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had a function to "explain" this case for users. 6) rlimits are generally considered as (at least partially) obsolete. They do not provide a comprehensive system for the control of physical resources: memory, cpu, io etc. All resource control developments in the recent years were related to cgroups. In order to overcome these problems let's switch to the memory cgroup-based memory accounting of bpf objects. With the recent addition of the percpu memory accounting, now it's possible to provide a comprehensive accounting of the memory used by bpf programs and maps. This approach has the following advantages: 1) The limit is per-cgroup and hierarchical. It's way more flexible and allows a better control over memory usage by different workloads. 2) The actual memory consumption is taken into account. It happens automatically on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also performed automatically on releasing the memory. So the code on the bpf side becomes simpler and safer. 3) There is a simple way to get the current value and statistics. Cgroup-based accounting adds new requirements: 1) The kernel config should have CONFIG_CGROUPS and CONFIG_MEMCG_KMEM enabled. These options are usually enabled, maybe excluding tiny builds for embedded devices. 2) The system should have a configured cgroup hierarchy, including reasonable memory limits and/or guarantees. Modern systems usually delegate this task to systemd or similar task managers. Without meeting these requirements there are no limits on how much memory bpf can use and a non-root user is able to hurt the system by allocating too much. But because per-user rlimits do not provide a functional system to protect and manage physical resources anyway, anyone who seriously depends on it, should use cgroups. When a bpf map is created, the memory cgroup of the process which creates the map is recorded. Subsequently all memory allocation related to the bpf map are charged to the same cgroup. It includes allocations made from interrupts and by any processes. Bpf program memory is charged to the memory cgroup of a process which loads the program. The patchset consists of the following parts: 1) 4 mm patches are required on the mm side, otherwise vmallocs cannot be mapped to userspace 2) memcg-based accounting for various bpf objects: progs and maps 3) removal of the rlimit-based accounting 4) removal of rlimit adjustments in userspace samples v9: - always charge the saved memory cgroup, by Daniel, Toke and Alexei - added bpf_map_kzalloc() - rebase and minor fixes v8: - extended the cover letter to be more clear on new requirements, by Daniel - an approximate value is provided by map memlock info, by Alexei v7: - introduced bpf_map_kmalloc_node() and bpf_map_alloc_percpu(), by Alexei - switched allocations made from an interrupt context to new helpers, by Daniel - rebase and minor fixes v6: - rebased to the latest version of the remote charging API - fixed signatures, added acks v5: - rebased to the latest version of the remote charging API - implemented kmem accounting from an interrupt context, by Shakeel - rebased to latest changes in mm allowed to map vmallocs to userspace - fixed a build issue in kselftests, by Alexei - fixed a use-after-free bug in bpf_map_free_deferred() - added bpf line info coverage, by Shakeel - split bpf map charging preparations into a separate patch v4: - covered allocations made from an interrupt context, by Daniel - added some clarifications to the cover letter v3: - droped the userspace part for further discussions/refinements, by Andrii and Song v2: - fixed build issue, caused by the remaining rlimit-based accounting for sockhash maps ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-12-02bpf: samples: Do not touch RLIMIT_MEMLOCKRoman Gushchin
Since bpf is not using rlimit memlock for the memory accounting and control, do not change the limit in sample applications. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-35-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf progsRoman Gushchin
Do not use rlimit-based memory accounting for bpf progs. It has been replaced with memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-34-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting infra for bpf mapsRoman Gushchin
Remove rlimit-based accounting infrastructure code, which is not used anymore. To provide a backward compatibility, use an approximation of the bpf map memory footprint as a "memlock" value, available to a user via map info. The approximation is based on the maximal number of elements and key and value sizes. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-33-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf local storage mapsRoman Gushchin
Do not use rlimit-based memory accounting for bpf local storage maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-32-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for xskmap mapsRoman Gushchin
Do not use rlimit-based memory accounting for xskmap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-31-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for stackmap mapsRoman Gushchin
Do not use rlimit-based memory accounting for stackmap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-30-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for sockmap and sockhash mapsRoman Gushchin
Do not use rlimit-based memory accounting for sockmap and sockhash maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-29-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf ringbufferRoman Gushchin
Do not use rlimit-based memory accounting for bpf ringbuffer. It has been replaced with the memcg-based memory accounting. bpf_ringbuf_alloc() can't return anything except ERR_PTR(-ENOMEM) and a valid pointer, so to simplify the code make it return NULL in the first case. This allows to drop a couple of lines in ringbuf_map_alloc() and also makes it look similar to other memory allocating function like kmalloc(). Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-28-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for reuseport_array mapsRoman Gushchin
Do not use rlimit-based memory accounting for reuseport_array maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-27-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for queue_stack_maps mapsRoman Gushchin
Do not use rlimit-based memory accounting for queue_stack maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-26-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for lpm_trie mapsRoman Gushchin
Do not use rlimit-based memory accounting for lpm_trie maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-25-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for hashtab mapsRoman Gushchin
Do not use rlimit-based memory accounting for hashtab maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-24-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for devmap mapsRoman Gushchin
Do not use rlimit-based memory accounting for devmap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-23-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for cgroup storage mapsRoman Gushchin
Do not use rlimit-based memory accounting for cgroup storage maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-22-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for cpumap mapsRoman Gushchin
Do not use rlimit-based memory accounting for cpumap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-21-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf_struct_ops mapsRoman Gushchin
Do not use rlimit-based memory accounting for bpf_struct_ops maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-20-guro@fb.com
2020-12-02bpf: Eliminate rlimit-based memory accounting for arraymap mapsRoman Gushchin
Do not use rlimit-based memory accounting for arraymap maps. It has been replaced with the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-19-guro@fb.com
2020-12-02bpf: Refine memcg-based memory accounting for xskmap mapsRoman Gushchin
Extend xskmap memory accounting to include the memory taken by the xsk_map_node structure. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-18-guro@fb.com
2020-12-02bpf: Refine memcg-based memory accounting for sockmap and sockhash mapsRoman Gushchin
Include internal metadata into the memcg-based memory accounting. Also include the memory allocated on updating an element. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-17-guro@fb.com
2020-12-02bpf: Memcg-based memory accounting for bpf local storage mapsRoman Gushchin
Account memory used by bpf local storage maps: per-socket, per-inode and per-task storages. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-16-guro@fb.com
2020-12-02bpf: Memcg-based memory accounting for bpf ringbufferRoman Gushchin
Enable the memcg-based memory accounting for the memory used by the bpf ringbuffer. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-15-guro@fb.com
2020-12-02bpf: Memcg-based memory accounting for lpm_trie mapsRoman Gushchin
Include lpm trie and lpm trie node objects into the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-14-guro@fb.com
2020-12-02bpf: Refine memcg-based memory accounting for hashtab mapsRoman Gushchin
Include percpu objects and the size of map metadata into the accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-13-guro@fb.com
2020-12-02bpf: Refine memcg-based memory accounting for devmap mapsRoman Gushchin
Include map metadata and the node size (struct bpf_dtab_netdev) into the accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-12-guro@fb.com
2020-12-02bpf: Memcg-based memory accounting for cgroup storage mapsRoman Gushchin
Account memory used by cgroup storage maps including metadata structures. Account the percpu memory for the percpu flavor of cgroup storage. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-11-guro@fb.com
2020-12-02bpf: Refine memcg-based memory accounting for cpumap mapsRoman Gushchin
Include metadata and percpu data into the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-10-guro@fb.com
2020-12-02bpf: Refine memcg-based memory accounting for arraymap mapsRoman Gushchin
Include percpu arrays and auxiliary data into the memcg-based memory accounting. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-9-guro@fb.com
2020-12-02bpf: Memcg-based memory accounting for bpf mapsRoman Gushchin
This patch enables memcg-based memory accounting for memory allocated by __bpf_map_area_alloc(), which is used by many types of bpf maps for large initial memory allocations. Please note, that __bpf_map_area_alloc() should not be used outside of map creation paths without setting the active memory cgroup to the map's memory cgroup. Following patches in the series will refine the accounting for some of the map types. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-8-guro@fb.com
2020-12-02bpf: Prepare for memcg-based memory accounting for bpf mapsRoman Gushchin
Bpf maps can be updated from an interrupt context and in such case there is no process which can be charged. It makes the memory accounting of bpf maps non-trivial. Fortunately, after commit 4127c6504f25 ("mm: kmem: enable kernel memcg accounting from interrupt contexts") and commit b87d8cefe43c ("mm, memcg: rework remote charging API to support nesting") it's finally possible. To make the ownership model simple and consistent, when the map is created, the memory cgroup of the current process is recorded. All subsequent allocations related to the bpf map are charged to the same memory cgroup. It includes allocations made by any processes (even if they do belong to a different cgroup) and from interrupts. This commit introduces 3 new helpers, which will be used by following commits to enable the accounting of bpf maps memory: - bpf_map_kmalloc_node() - bpf_map_kzalloc() - bpf_map_alloc_percpu() They are wrapping popular memory allocation functions. They set the active memory cgroup to the map's memory cgroup and add __GFP_ACCOUNT to the passed gfp flags. Then they call into the corresponding memory allocation function and restore the original active memory cgroup. These helpers are supposed to use everywhere except the map creation path. During the map creation when the map structure is allocated by itself, it cannot be passed to those helpers. In those cases default memory allocation function will be used with the __GFP_ACCOUNT flag. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-7-guro@fb.com
2020-12-02bpf: Memcg-based memory accounting for bpf progsRoman Gushchin
Include memory used by bpf programs into the memcg-based accounting. This includes the memory used by programs itself, auxiliary data, statistics and bpf line info. A memory cgroup containing the process which loads the program is getting charged. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-6-guro@fb.com
2020-12-02mm: Convert page kmemcg type to a page memcg flagRoman Gushchin
PageKmemcg flag is currently defined as a page type (like buddy, offline, table and guard). Semantically it means that the page was accounted as a kernel memory by the page allocator and has to be uncharged on the release. As a side effect of defining the flag as a page type, the accounted page can't be mapped to userspace (look at page_has_type() and comments above). In particular, this blocks the accounting of vmalloc-backed memory used by some bpf maps, because these maps do map the memory to userspace. One option is to fix it by complicating the access to page->mapcount, which provides some free bits for page->page_type. But it's way better to move this flag into page->memcg_data flags. Indeed, the flag makes no sense without enabled memory cgroups and memory cgroup pointer set in particular. This commit replaces PageKmemcg() and __SetPageKmemcg() with PageMemcgKmem() and an open-coded OR operation setting the memcg pointer with the MEMCG_DATA_KMEM bit. __ClearPageKmemcg() can be simple deleted, as the whole memcg_data is zeroed at once. As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be compiled out. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20201027001657.3398190-5-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-5-guro@fb.com
2020-12-02mm: Introduce page memcg flagsRoman Gushchin
The lowest bit in page->memcg_data is used to distinguish between struct memory_cgroup pointer and a pointer to a objcgs array. All checks and modifications of this bit are open-coded. Let's formalize it using page memcg flags, defined in enum page_memcg_data_flags. Additional flags might be added later. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20201027001657.3398190-4-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-4-guro@fb.com
2020-12-02mm: memcontrol/slab: Use helpers to access slab page's memcg_dataRoman Gushchin
To gather all direct accesses to struct page's memcg_data field in one place, let's introduce 3 new helpers to use in the slab accounting code: struct obj_cgroup **page_objcgs(struct page *page); struct obj_cgroup **page_objcgs_check(struct page *page); bool set_page_objcgs(struct page *page, struct obj_cgroup **objcgs); They are similar to the corresponding API for generic pages, except that the setter can return false, indicating that the value has been already set from a different thread. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lkml.kernel.org/r/20201027001657.3398190-3-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-3-guro@fb.com
2020-12-02mm: memcontrol: Use helpers to read page's memcg dataRoman Gushchin
Patch series "mm: allow mapping accounted kernel pages to userspace", v6. Currently a non-slab kernel page which has been charged to a memory cgroup can't be mapped to userspace. The underlying reason is simple: PageKmemcg flag is defined as a page type (like buddy, offline, etc), so it takes a bit from a page->mapped counter. Pages with a type set can't be mapped to userspace. But in general the kmemcg flag has nothing to do with mapping to userspace. It only means that the page has been accounted by the page allocator, so it has to be properly uncharged on release. Some bpf maps are mapping the vmalloc-based memory to userspace, and their memory can't be accounted because of this implementation detail. This patchset removes this limitation by moving the PageKmemcg flag into one of the free bits of the page->mem_cgroup pointer. Also it formalizes accesses to the page->mem_cgroup and page->obj_cgroups using new helpers, adds several checks and removes a couple of obsolete functions. As the result the code became more robust with fewer open-coded bit tricks. This patch (of 4): Currently there are many open-coded reads of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits uses 2 existing helpers and introduces a new helper to converts all read sides to calls of these helpers: struct mem_cgroup *page_memcg(struct page *page); struct mem_cgroup *page_memcg_rcu(struct page *page); struct mem_cgroup *page_memcg_check(struct page *page); page_memcg_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_memcg() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
2020-12-02vxlan: fix error return code in __vxlan_dev_create()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1606903122-2098-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02net: pasemi: fix error return code in pasemi_mac_open()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 72b05b9940f0 ("pasemi_mac: RX/TX ring management cleanup") Fixes: 8d636d8bc5ff ("pasemi_mac: jumbo frame support") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1606903035-1838-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02cxgb3: fix error return code in t3_sge_alloc_qset()Zhang Changzhong
Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: b1fb1f280d09 ("cxgb3 - Fix dma mapping error path") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Link: https://lore.kernel.org/r/1606902965-1646-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02bareudp: constify device_type declarationJonas Bonn
device_type may be declared as const. Signed-off-by: Jonas Bonn <jonas@norrbonn.se> Link: https://lore.kernel.org/r/20201202122324.564918-1-jonas@norrbonn.se Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02Merge branch 'nfc-s3fwrn5-support-a-uart-interface'Jakub Kicinski
Bongsu Jeon says: ==================== nfc: s3fwrn5: Support a UART interface S3FWRN82 is the Samsung's NFC chip that supports the UART communication. Before adding the UART driver module, I did refactoring the s3fwrn5_i2c module to reuse the common blocks. ==================== Link: https://lore.kernel.org/r/1606909661-3814-1-git-send-email-bongsu.jeon@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02nfc: s3fwrn5: Support a UART interfaceBongsu Jeon
Since S3FWRN82 NFC Chip, The UART interface can be used. S3FWRN82 uses NCI protocol and supports I2C and UART interface. Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02nfc: s3fwrn5: extract the common phy blocksBongsu Jeon
Extract the common phy blocks to reuse it. The UART module will use the common blocks. Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-02nfc: s3fwrn5: reduce the EN_WAIT_TIMEBongsu Jeon
The delay of 20ms is enough to enable and wake up the Samsung's nfc chip. Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>