summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-01-25virtchnl: do structure hardeningJesse Brandeburg
The virtchnl interface can have a bunch of "soft" defined structures hardened by using explicit sizes for declarations, and then referring to the enum type that uses them in a comment. None of these changes should change any of the structure sizes. Also, remove a duplicate line in a switch statement and let two cases uses the same code. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-25virtchnl: update header and increase header clarityJesse Brandeburg
We already have the SPDX header, so just leave a copyright notice with an updated year and get rid of the boilerplate header (so 2002!). In addition, update a couple of comments to clarify how the various parts of the virtchannel header interaction work. No functional changes. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-25virtchnl: remove unused structure declarationJesse Brandeburg
Nothing uses virtchnl_msg, just remove it. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-01-25Merge branch 'Enable cpumasks to be used as kptrs'Alexei Starovoitov
David Vernet says: ==================== This is part 3 of https://lore.kernel.org/all/20230119235833.2948341-1-void@manifault.com/ Part 2: https://lore.kernel.org/bpf/20230120192523.3650503-1-void@manifault.com/ This series is based off of commit b613d335a743 ("bpf: Allow trusted args to walk struct when checking BTF IDs"). Changelog: ---------- v2 -> v3: - Rebase onto master (commit described above). Only conflict that required resolution was updating the task_kfunc selftest suite error message location. - Put copyright onto one line in kernel/bpf/cpumask.c. - Remove now-unneeded pid-checking logic from progs/nested_trust_success.c. - Fix a couple of small grammatical typos in documentation. v1 -> v2: - Put back 'static' keyword in bpf_find_btf_id() (kernel test robot <lkp@intel.com>) - Surround cpumask kfuncs in __diag() blocks to avoid no-prototype build warnings (kernel test robot <lkp@intel.com>) - Enable ___init suffixes to a type definition to signal that a type is a nocast alias of another type. That is, that when passed to a kfunc that expects one of the two types, the verifier will reject the other even if they're equivalent according to the C standard (Kumar and Alexei) - Reject NULL for all trusted args, not just PTR_TO_MEM (Kumar) - Reject both NULL and PTR_MAYBE_NULL for all trusted args (Kumar and Alexei ) - Improve examples given in cpumask documentation (Alexei) - Use __success macro for nested_trust test (Alexei) - Fix comment typo in struct bpf_cpumask comment header. - Fix another example in the bpf_cpumask doc examples. - Add documentation for ___init suffix change mentioned above. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf/docs: Document the nocast aliasing behavior of ___initDavid Vernet
When comparing BTF IDs for pointers being passed to kfunc arguments, the verifier will allow pointer types that are equivalent according to the C standard. For example, for: struct bpf_cpumask { cpumask_t cpumask; refcount_t usage; }; The verifier will allow a struct bpf_cpumask * to be passed to a kfunc that takes a const struct cpumask * (cpumask_t is a typedef of struct cpumask). The exception to this rule is if a type is suffixed with ___init, such as: struct nf_conn___init { struct nf_conn ct; }; The verifier will _not_ allow a struct nf_conn___init * to be passed to a kfunc that expects a struct nf_conn *. This patch documents this behavior in the kfuncs documentation page. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-8-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf/docs: Document how nested trusted fields may be definedDavid Vernet
A prior change defined a new BTF_TYPE_SAFE_NESTED macro in the verifier which allows developers to specify when a pointee field in a struct type should inherit its parent pointer's trusted status. This patch updates the kfuncs documentation to specify this macro and how it can be used. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-7-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf/docs: Document cpumask kfuncs in a new fileDavid Vernet
Now that we've added a series of new cpumask kfuncs, we should document them so users can easily use them. This patch adds a new cpumasks.rst file to document them. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-6-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25selftests/bpf: Add selftest suite for cpumask kfuncsDavid Vernet
A recent patch added a new set of kfuncs for allocating, freeing, manipulating, and querying cpumasks. This patch adds a new 'cpumask' selftest suite which verifies their behavior. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-5-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25selftests/bpf: Add nested trust selftests suiteDavid Vernet
Now that defining trusted fields in a struct is supported, we should add selftests to verify the behavior. This patch adds a few such testcases. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-4-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf: Enable cpumasks to be queried and used as kptrsDavid Vernet
Certain programs may wish to be able to query cpumasks. For example, if a program that is tracing percpu operations wishes to track which tasks end up running on which CPUs, it could be useful to associate that with the tasks' cpumasks. Similarly, programs tracking NUMA allocations, CPU scheduling domains, etc, could potentially benefit from being able to see which CPUs a task could be migrated to. This patch enables these types of use cases by introducing a series of bpf_cpumask_* kfuncs. Amongst these kfuncs, there are two separate "classes" of operations: 1. kfuncs which allow the caller to allocate and mutate their own cpumask kptrs in the form of a struct bpf_cpumask * object. Such kfuncs include e.g. bpf_cpumask_create() to allocate the cpumask, and bpf_cpumask_or() to mutate it. "Regular" cpumasks such as p->cpus_ptr may not be passed to these kfuncs, and the verifier will ensure this is the case by comparing BTF IDs. 2. Read-only operations which operate on const struct cpumask * arguments. For example, bpf_cpumask_test_cpu(), which tests whether a CPU is set in the cpumask. Any trusted struct cpumask * or struct bpf_cpumask * may be passed to these kfuncs. The verifier allows struct bpf_cpumask * even though the kfunc is defined with struct cpumask * because the first element of a struct bpf_cpumask is a cpumask_t, so it is safe to cast. A follow-on patch will add selftests which validate these kfuncs, and another will document them. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25bpf: Disallow NULLable pointers for trusted kfuncsDavid Vernet
KF_TRUSTED_ARGS kfuncs currently have a subtle and insidious bug in validating pointers to scalars. Say that you have a kfunc like the following, which takes an array as the first argument: bool bpf_cpumask_empty(const struct cpumask *cpumask) { return cpumask_empty(cpumask); } ... BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS) ... If a BPF program were to invoke the kfunc with a NULL argument, it would crash the kernel. The reason is that struct cpumask is defined as a bitmap, which is itself defined as an array, and is accessed as a memory address by bitmap operations. So when the verifier analyzes the register, it interprets it as a pointer to a scalar struct, which is an array of size 8. check_mem_reg() then sees that the register is NULL and returns 0, and the kfunc crashes when it passes it down to the cpumask wrappers. To fix this, this patch adds a check for KF_ARG_PTR_TO_MEM which verifies that the register doesn't contain a possibly-NULL pointer if the kfunc is KF_TRUSTED_ARGS. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230125143816.721952-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25Merge branch 'mptcp-fixes'David S. Miller
Jeremy Kerr says: ==================== net: mctp: struct sock lifetime fixes This series is a set of fixes for the sock lifetime handling in the AF_MCTP code, fixing a uaf reported by Noam Rathaus <noamr@ssd-disclosure.com>. The Fixes: tags indicate the original patches affected, but some tweaking to backport to those commits may be needed; I have a separate branch with backports to 5.15 if that helps with stable trees. Of course, any comments/queries most welcome. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: mark socks as dead on unhash, prevent re-addJeremy Kerr
Once a socket has been unhashed, we want to prevent it from being re-used in a sk_key entry as part of a routing operation. This change marks the sk as SOCK_DEAD on unhash, which prevents addition into the net's key list. We need to do this during the key add path, rather than key lookup, as we release the net keys_lock between those operations. Fixes: 4a992bbd3650 ("mctp: Implement message fragmentation & reassembly") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: hold key reference when looking up a general keyPaolo Abeni
Currently, we have a race where we look up a sock through a "general" (ie, not directly associated with the (src,dest,tag) tuple) key, then drop the key reference while still holding the key's sock. This change expands the key reference until we've finished using the sock, and hence the sock reference too. Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>. Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: move expiry timer delete to unhashJeremy Kerr
Currently, we delete the key expiry timer (in sk->close) before unhashing the sk. This means that another thread may find the sk through its presence on the key list, and re-queue the timer. This change moves the timer deletion to the unhash, after we have made the key no longer observable, so the timer cannot be re-queued. Fixes: 7b14e15ae6f4 ("mctp: Implement a timeout for tags") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: mctp: add an explicit reference from a mctp_sk_key to sockJeremy Kerr
Currently, we correlate the mctp_sk_key lifetime to the sock lifetime through the sock hash/unhash operations, but this is pretty tenuous, and there are cases where we may have a temporary reference to an unhashed sk. This change makes the reference more explicit, by adding a hold on the sock when it's associated with a mctp_sk_key, released on final key unref. Fixes: 73c618456dc5 ("mctp: locking, lifetime and validity changes for sk_keys") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25Merge branch 'ravb-fixes'David S. Miller
Yoshihiro Shimoda says: ==================== net: ravb: Fix potential issues Fix potentiall issues on the ravb driver. Changes from v2: https://lore.kernel.org/all/20230123131331.1425648-1-yoshihiro.shimoda.uh@renesas.com/ - Add Reviewed-by in the patch [2/2]. - Add a commit description in the patch [2/2]. Changes from v1: https://lore.kernel.org/all/20230119043920.875280-1-yoshihiro.shimoda.uh@renesas.com/ - Fix typo in the patch [1/2]. - Add Reviewed-by in the patch [1/2]. - Fix "Fixed" tag in the patch [2/2]. - Fix a comment indentation of the code in the patch [2/2]. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: ravb: Fix possible hang if RIS2_QFF1 happenYoshihiro Shimoda
Since this driver enables the interrupt by RIC2_QFE1, this driver should clear the interrupt flag if it happens. Otherwise, the interrupt causes to hang the system. Note that this also fix a minor coding style (a comment indentation) around the fixed code. Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: ravb: Fix lack of register setting after system resumed for Gen3Yoshihiro Shimoda
After system entered Suspend to RAM, registers setting of this hardware is reset because the SoC will be turned off. On R-Car Gen3 (info->ccc_gac), ravb_ptp_init() is called in ravb_probe() only. So, after system resumed, it lacks of the initial settings for ptp. So, add ravb_ptp_{init,stop}() into ravb_{resume,suspend}(). Fixes: f5d7837f96e5 ("ravb: ptp: Add CONFIG mode support") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: ethtool: fix NULL pointer dereference in pause_prepare_data()Vladimir Oltean
In the following call path: ethnl_default_dumpit -> ethnl_default_dump_one -> ctx->ops->prepare_data -> pause_prepare_data struct genl_info *info will be passed as NULL, and pause_prepare_data() dereferences it while getting the extended ack pointer. To avoid that, just set the extack to NULL if "info" is NULL, since the netlink extack handling messages know how to deal with that. The pattern "info ? info->extack : NULL" is present in quite a few other "prepare_data" implementations, so it's clear that it's a more general problem to be dealt with at a higher level, but the code should have at least adhered to the current conventions to avoid the NULL dereference. Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reported-by: syzbot+9d44aae2720fc40b8474@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net: ethtool: fix NULL pointer dereference in stats_prepare_data()Vladimir Oltean
In the following call path: ethnl_default_dumpit -> ethnl_default_dump_one -> ctx->ops->prepare_data -> stats_prepare_data struct genl_info *info will be passed as NULL, and stats_prepare_data() dereferences it while getting the extended ack pointer. To avoid that, just set the extack to NULL if "info" is NULL, since the netlink extack handling messages know how to deal with that. The pattern "info ? info->extack : NULL" is present in quite a few other "prepare_data" implementations, so it's clear that it's a more general problem to be dealt with at a higher level, but the code should have at least adhered to the current conventions to avoid the NULL dereference. Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/x25: Fix to not accept on connected socketHyunwoo Kim
When listen() and accept() are called on an x25 socket that connect() succeeds, accept() succeeds immediately. This is because x25_connect() queues the skb to sk->sk_receive_queue, and x25_accept() dequeues it. This creates a child socket with the sk of the parent x25 socket, which can cause confusion. Fix x25_listen() to return -EINVAL if the socket has already been successfully connect()ed to avoid this issue. Signed-off-by: Hyunwoo Kim <v4bel@theori.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25Merge branch 's390-ism-generalized-interface'David S. Miller
Jan Karcher says: ==================== drivers/s390/net/ism: Add generalized interface Previously, there was no clean separation between SMC-D code and the ISM device driver.This patch series addresses the situation to make ISM available for uses outside of SMC-D. In detail: SMC-D offers an interface via struct smcd_ops, which only the ISM module implements so far. However, there is no real separation between the smcd and ism modules, which starts right with the ISM device initialization, which calls directly into the SMC-D code. This patch series introduces a new API in the ISM module, which allows registration of arbitrary clients via include/linux/ism.h: struct ism_client. Furthermore, it introduces a "pure" struct ism_dev (i.e. getting rid of dependencies on SMC-D in the device structure), and adds a number of API calls for data transfers via ISM (see ism_register_dmb() & friends). Still, the ISM module implements the SMC-D API, and therefore has a number of internal helper functions for that matter. Note that the ISM API is consciously kept thin for now (as compared to the SMC-D API calls), as a number of API calls are only used with SMC-D and hardly have any meaningful usage beyond SMC-D, e.g. the VLAN-related calls. v1 -> v2: Removed s390x dependency which broke config for other archs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/smc: De-tangle ism and smc device initializationStefan Raspl
The struct device for ISM devices was part of struct smcd_dev. Move to struct ism_dev, provide a new API call in struct smcd_ops, and convert existing SMCD code accordingly. Furthermore, remove struct smcd_dev from struct ism_dev. This is the final part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25s390/ism: Consolidate SMC-D-related codeStefan Raspl
The ism module had SMC-D-specific code sprinkled across the entire module. We are now consolidating the SMC-D-specific parts into the latter parts of the module, so it becomes more clear what code is intended for use with ISM, and which parts are glue code for usage in the context of SMC-D. This is the fourth part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/smc: Separate SMC-D and ISM APIsStefan Raspl
We separate the code implementing the struct smcd_ops API in the ISM device driver from the functions that may be used by other exploiters of ISM devices. Note: We start out small, and don't offer the whole breadth of the ISM device for public use, as many functions are specific to or likely only ever used in the context of SMC-D. This is the third part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/smc: Register SMC-D as ISM clientStefan Raspl
Register the smc module with the new ism device driver API. This is the second part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/ism: Add new API for client registrationStefan Raspl
Add a new API that allows other drivers to concurrently access ISM devices. To do so, we introduce a new API that allows other modules to register for ISM device usage. Furthermore, we move the GID to struct ism, where it belongs conceptually, and rename and relocate struct smcd_event to struct ism_event. This is the first part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25s390/ism: Introduce struct ism_dmbStefan Raspl
Conceptually, a DMB is a structure that belongs to ISM devices. However, SMC currently 'owns' this structure. So future exploiters of ISM devices would be forced to include SMC headers to work - which is just weird. Therefore, we switch ISM to struct ism_dmb, introduce a new public header with the definition (will be populated with further API calls later on), and, add a thin wrapper to please SMC. Since structs smcd_dmb and ism_dmb are identical, we can simply convert between the two for now. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/ism: Add missing calls to disable bus-masteringStefan Raspl
Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/smc: Terminate connections prior to device removalStefan Raspl
Removing an ISM device prior to terminating its associated connections doesn't end well. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25virtio-net: Reduce debug name field size to 16 bytesParav Pandit
virtio queue index can be maximum of 65535. 16 bytes are enough to store the vq name with the existing string prefix. With this change, send queue struct saves 24 bytes and receive queue saves whole cache line worth 64 bytes per structure due to saving in alignment bytes. Pahole results before: pahole -s drivers/net/virtio_net.o | \ grep -e "send_queue" -e "receive_queue" send_queue 1112 0 receive_queue 1280 1 Pahole results after: pahole -s drivers/net/virtio_net.o | \ grep -e "send_queue" -e "receive_queue" send_queue 1088 0 receive_queue 1216 1 Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-24devlink: remove a dubious assumption in fmsg dumpingJakub Kicinski
Build bot detects that err may be returned uninitialized in devlink_fmsg_prepare_skb(). This is not really true because all fmsgs users should create at least one outer nest, and therefore fmsg can't be completely empty. That said the assumption is not trivial to confirm, so let's follow the bots advice, anyway. This code does not seem to have changed since its inception in commit 1db64e8733f6 ("devlink: Add devlink formatted message (fmsg) API") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230124035231.787381-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24bpf: Allow trusted args to walk struct when checking BTF IDsDavid Vernet
When validating BTF types for KF_TRUSTED_ARGS kfuncs, the verifier currently enforces that the top-level type must match when calling the kfunc. In other words, the verifier does not allow the BPF program to pass a bitwise equivalent struct, despite it being allowed according to the C standard. For example, if you have the following type: struct nf_conn___init { struct nf_conn ct; }; The C standard stipulates that it would be safe to pass a struct nf_conn___init to a kfunc expecting a struct nf_conn. The verifier currently disallows this, however, as semantically kfuncs may want to enforce that structs that have equivalent types according to the C standard, but have different BTF IDs, are not able to be passed to kfuncs expecting one or the other. For example, struct nf_conn___init may not be queried / looked up, as it is allocated but may not yet be fully initialized. On the other hand, being able to pass types that are equivalent according to the C standard will be useful for other types of kfunc / kptrs enabled by BPF. For example, in a follow-on patch, a series of kfuncs will be added which allow programs to do bitwise queries on cpumasks that are either allocated by the program (in which case they'll be a 'struct bpf_cpumask' type that wraps a cpumask_t as its first element), or a cpumask that was allocated by the main kernel (in which case it will just be a straight cpumask_t, as in task->cpus_ptr). Having the two types of cpumasks allows us to distinguish between the two for when a cpumask is read-only vs. mutatable. A struct bpf_cpumask can be mutated by e.g. bpf_cpumask_clear(), whereas a regular cpumask_t cannot be. On the other hand, a struct bpf_cpumask can of course be queried in the exact same manner as a cpumask_t, with e.g. bpf_cpumask_test_cpu(). If we were to enforce that top level types match, then a user that's passing a struct bpf_cpumask to a read-only cpumask_t argument would have to cast with something like bpf_cast_to_kern_ctx() (which itself would need to be updated to expect the alias, and currently it only accommodates a single alias per prog type). Additionally, not specifying KF_TRUSTED_ARGS is not an option, as some kfuncs take one argument as a struct bpf_cpumask *, and another as a struct cpumask * (i.e. cpumask_t). In order to enable this, this patch relaxes the constraint that a KF_TRUSTED_ARGS kfunc must have strict type matching, and instead only enforces strict type matching if a type is observed to be a "no-cast alias" (i.e., that the type names are equivalent, but one is suffixed with ___init). Additionally, in order to try and be conservative and match existing behavior / expectations, this patch also enforces strict type checking for acquire kfuncs. We were already enforcing it for release kfuncs, so this should also improve the consistency of the semantics for kfuncs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230120192523.3650503-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-24bpf: Enable annotating trusted nested pointersDavid Vernet
In kfuncs, a "trusted" pointer is a pointer that the kfunc can assume is safe, and which the verifier will allow to be passed to a KF_TRUSTED_ARGS kfunc. Currently, a KF_TRUSTED_ARGS kfunc disallows any pointer to be passed at a nonzero offset, but sometimes this is in fact safe if the "nested" pointer's lifetime is inherited from its parent. For example, the const cpumask_t *cpus_ptr field in a struct task_struct will remain valid until the task itself is destroyed, and thus would also be safe to pass to a KF_TRUSTED_ARGS kfunc. While it would be conceptually simple to enable this by using BTF tags, gcc unfortunately does not yet support this. In the interim, this patch enables support for this by using a type-naming convention. A new BTF_TYPE_SAFE_NESTED macro is defined in verifier.c which allows a developer to specify the nested fields of a type which are considered trusted if its parent is also trusted. The verifier is also updated to account for this. A patch with selftests will be added in a follow-on change, along with documentation for this feature. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230120192523.3650503-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Perform SCTP vtag verification for ABORT/SHUTDOWN_COMPLETE according to RFC 9260, Sect 8.5.1. 2) Fix infinite loop if SCTP chunk size is zero in for_each_sctp_chunk(). And remove useless check in this macro too. 3) Revert DATA_SENT state in the SCTP tracker, this was applied in the previous merge window. Next patch in this series provides a more simple approach to multihoming support. 4) Unify HEARTBEAT_ACKED and ESTABLISHED states for SCTP multihoming support, use default ESTABLISHED of 210 seconds based on heartbeat timeout * maximum number of retransmission + round-trip timeout. Otherwise, SCTP conntrack entry that represents secondary paths remain stale in the table for up to 5 days. This is a slightly large batch with fixes for the SCTP connection tracking helper, all patches from Sriram Yagnaraman. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: conntrack: unify established states for SCTP paths Revert "netfilter: conntrack: add sctp DATA_SENT state" netfilter: conntrack: fix bug in for_each_sctp_chunk netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE ==================== Link: https://lore.kernel.org/r/20230124183933.4752-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24ice: move devlink port creation/deletionPaul M Stillwell Jr
Commit a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") moved the code to create and destroy the devlink PF port. This was fine, but created a corner case issue in the case of ice_register_netdev() failing. In that case, the driver would end up calling ice_devlink_destroy_pf_port() twice. Additionally, it makes no sense to tie creation of the devlink PF port to the creation of the netdev so separate out the code to create/destroy the devlink PF port from the netdev code. This makes it a cleaner interface. Fixes: a286ba738714 ("ice: reorder PF/representor devlink port register/unregister flows") Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24net: mscc: ocelot: fix incorrect verify_enabled reporting in ethtool get_mm()Vladimir Oltean
We don't read the verify_enabled variable from hardware in the MAC Merge layer state GET operation, instead we always leave it set to "false". The user may think something is wrong if they set verify_enabled to true, then read it back and see it's still false, even though the configuration took place. Fixes: 6505b6805655 ("net: mscc: ocelot: add MAC Merge layer support for VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230123184538.3420098-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24sctp: fail if no bound addresses can be used for a given scopeMarcelo Ricardo Leitner
Currently, if you bind the socket to something like: servaddr.sin6_family = AF_INET6; servaddr.sin6_port = htons(0); servaddr.sin6_scope_id = 0; inet_pton(AF_INET6, "::1", &servaddr.sin6_addr); And then request a connect to: connaddr.sin6_family = AF_INET6; connaddr.sin6_port = htons(20000); connaddr.sin6_scope_id = if_nametoindex("lo"); inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr); What the stack does is: - bind the socket - create a new asoc - to handle the connect - copy the addresses that can be used for the given scope - try to connect But the copy returns 0 addresses, and the effect is that it ends up trying to connect as if the socket wasn't bound, which is not the desired behavior. This unexpected behavior also allows KASLR leaks through SCTP diag interface. The fix here then is, if when trying to copy the addresses that can be used for the scope used in connect() it returns 0 addresses, bail out. This is what TCP does with a similar reproducer. Reported-by: Pietro Borrello <borrello@diag.uniroma1.it> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24Merge tag 'modules-6.2-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module fix from Luis Chamberlain: "Theis is a fix we have been delaying for v6.2 due to lack of early testing on linux-next. The commit has been sitting in linux-next since December and testing has also been now a bit extensive by a few developers. Since this is a fix which definitely will go to v6.3 it should also apply to v6.2 so if there are any issues we pick them up earlier rather than later. The fix fixes a regression since v5.3, prior to me helping with module maintenance, however, the issue is real in that in the worst case now can prevent boot. We've discussed all possible corner cases [0] and at last do feel this is ready for v6.2-rc6" Link https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/ [0] * tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: Don't wait for GOING modules
2023-01-24nfp: flower: change get/set_eeprom logic and enable for flower repsJames Hershaw
The changes in this patch are as follows: - Alter the logic of get/set_eeprom functions to use the helper function nfp_app_from_netdev() which handles differentiating between an nfp_net and a nfp_repr. This allows us to get an agnostic backpointer to the pdev. - Enable the various eeprom commands by adding the 'get_eeprom_len', 'get_eeprom', 'set_eeprom' callbacks to the nfp_port_ethtool_ops struct. This allows the eeprom commands to work on representor interfaces, similar to a previous patch which added it to the vnics. Currently these are being used to configure persistent MAC addresses for the physical ports on the nfp. Signed-off-by: James Hershaw <james.hershaw@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230123134135.293278-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24net/sched: sch_taprio: do not schedule in taprio_reset()Eric Dumazet
As reported by syzbot and hinted by Vinicius, I should not have added a qdisc_synchronize() call in taprio_reset() taprio_reset() can be called with qdisc spinlock held (and BH disabled) as shown in included syzbot report [1]. Only taprio_destroy() needed this synchronization, as explained in the blamed commit changelog. [1] BUG: scheduling while atomic: syz-executor150/5091/0x00000202 2 locks held by syz-executor150/5091: Modules linked in: Preemption disabled at: [<0000000000000000>] 0x0 Kernel panic - not syncing: scheduling while atomic: panic_on_warn set ... CPU: 1 PID: 5091 Comm: syz-executor150 Not tainted 6.2.0-rc3-syzkaller-00219-g010a74f52203 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106 panic+0x2cc/0x626 kernel/panic.c:318 check_panic_on_warn.cold+0x19/0x35 kernel/panic.c:238 __schedule_bug.cold+0xd5/0xfe kernel/sched/core.c:5836 schedule_debug kernel/sched/core.c:5865 [inline] __schedule+0x34e4/0x5450 kernel/sched/core.c:6500 schedule+0xde/0x1b0 kernel/sched/core.c:6682 schedule_timeout+0x14e/0x2a0 kernel/time/timer.c:2167 schedule_timeout_uninterruptible kernel/time/timer.c:2201 [inline] msleep+0xb6/0x100 kernel/time/timer.c:2322 qdisc_synchronize include/net/sch_generic.h:1295 [inline] taprio_reset+0x93/0x270 net/sched/sch_taprio.c:1703 qdisc_reset+0x10c/0x770 net/sched/sch_generic.c:1022 dev_reset_queue+0x92/0x130 net/sched/sch_generic.c:1285 netdev_for_each_tx_queue include/linux/netdevice.h:2464 [inline] dev_deactivate_many+0x36d/0x9f0 net/sched/sch_generic.c:1351 dev_deactivate+0xed/0x1b0 net/sched/sch_generic.c:1374 qdisc_graft+0xe4a/0x1380 net/sched/sch_api.c:1080 tc_modify_qdisc+0xb6b/0x19a0 net/sched/sch_api.c:1689 rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141 netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564 netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline] netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356 netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xd3/0x120 net/socket.c:734 ____sys_sendmsg+0x712/0x8c0 net/socket.c:2476 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2530 __sys_sendmsg+0xf7/0x1c0 net/socket.c:2559 do_syscall_x64 arch/x86/entry/common.c:50 [inline] Fixes: 3a415d59c1db ("net/sched: sch_taprio: fix possible use-after-free") Link: https://lore.kernel.org/netdev/167387581653.2747.13878941339893288655.git-patchwork-notify@kernel.org/T/ Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20230123084552.574396-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24ipv6: Make ip6_route_output_flags_noref() static.Guillaume Nault
This function is only used in net/ipv6/route.c and has no reason to be visible outside of it. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/50706db7f675e40b3594d62011d9363dce32b92e.1674495822.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24Merge tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust fix from Miguel Ojeda: - Avoid evaluating arguments in 'pr_*' macros in 'unsafe' blocks * tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linux: rust: print: avoid evaluating arguments in `pr_*` macros in `unsafe` blocks
2023-01-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Pass the correct address to mte_clear_page_tags() on initialising a tagged page - Plug a race against a GICv4.1 doorbell interrupt while saving the vgic-v3 pending state. x86: - A command line parsing fix and a clang compilation fix for selftests - A fix for a longstanding VMX issue, that surprisingly was only found now to affect real world guests" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Make reclaim_period_ms input always be positive KVM: x86/vmx: Do not skip segment attributes if unusable bit is set selftests: kvm: move declaration at the beginning of main() KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation KVM: arm64: Pass the actual page address to mte_clear_page_tags()
2023-01-24Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Six fixes, all in drivers. The biggest are the UFS devfreq fixes which address a lock inversion and the two iscsi_tcp fixes which try to prevent a use after free from userspace still accessing an area which the kernel has released (seen by KASAN)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: device_handler: alua: Remove a might_sleep() annotation scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress scsi: ufs: core: Fix devfreq deadlocks scsi: hpsa: Fix allocation size for scsi_host_alloc() scsi: target: core: Fix warning on RT kernels
2023-01-24netlink: fix spelling mistake in dump size assertJakub Kicinski
Commit 2c7bc10d0f7b ("netlink: add macro for checking dump ctx size") misspelled the name of the assert as asset, missing an R. Reported-by: Ido Schimmel <idosch@idosch.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230123222224.732338-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24Merge tag 'nfsd-6.2-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Nail another UAF in NFSD's filecache * tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't free files unconditionally in __nfsd_file_cache_purge
2023-01-24Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds
Pull fscrypt MAINTAINERS entry update from Eric Biggers: "Update the MAINTAINERS file entry for fscrypt" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: MAINTAINERS: update fscrypt git repo
2023-01-24module: Don't wait for GOING modulesPetr Pavlu
During a system boot, it can happen that the kernel receives a burst of requests to insert the same module but loading it eventually fails during its init call. For instance, udev can make a request to insert a frequency module for each individual CPU when another frequency module is already loaded which causes the init function of the new module to return an error. Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading"), the kernel waits for modules in MODULE_STATE_GOING state to finish unloading before making another attempt to load the same module. This creates unnecessary work in the described scenario and delays the boot. In the worst case, it can prevent udev from loading drivers for other devices and might cause timeouts of services waiting on them and subsequently a failed boot. This patch attempts a different solution for the problem 6e6de3dee51a was trying to solve. Rather than waiting for the unloading to complete, it returns a different error code (-EBUSY) for modules in the GOING state. This should avoid the error situation that was described in 6e6de3dee51a (user space attempting to load a dependent module because the -EEXIST error code would suggest to user space that the first module had been loaded successfully), while avoiding the delay situation too. This has been tested on linux-next since December 2022 and passes all kmod selftests except test 0009 with module compression enabled but it has been confirmed that this issue has existed and has gone unnoticed since prior to this commit and can also be reproduced without module compression with a simple usleep(5000000) on tools/modprobe.c [0]. These failures are caused by hitting the kernel mod_concurrent_max and can happen either due to a self inflicted kernel module auto-loead DoS somehow or on a system with large CPU count and each CPU count incorrectly triggering many module auto-loads. Both of those issues need to be fixed in-kernel. [0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/ Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") Co-developed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Petr Mladek <pmladek@suse.com> [mcgrof: enhance commit log with testing and kmod test result interpretation ] Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>