summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-12-07Merge tag 'cgroup-for-6.7-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Just one fix. Commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic") changed how freezing state is recorded which made cgroup_freezing() disagree with the actual state of the task while thawing triggering a warning. Fix it by updating cgroup_freezing()" * tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup_freezer: cgroup_freezing: Check if not frozen
2023-12-07ACPI: utils: Fix error path in acpi_evaluate_reference()Rafael J. Wysocki
If a pointer to an uninitialized struct acpi_handle_list is passed to acpi_evaluate_reference() and it decides to bail out early, either because acpi_evaluate_object() fails, or because it produces invalid data, the handles pointer from the struct acpi_handle_list will be passed to kfree() and if it is not NULL, the kernel will crash on an attempt to free unallocated memory. Address this by moving the "end" label in acpi_evaluate_reference() to the end of the function, which is sufficient, because no cleanup is needed in that case. Fixes: 2e57d10a6591 ("ACPI: utils: Dynamically determine acpi_handle_list size") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Woody Suwalski <terraluna977@gmail.com>
2023-12-07Merge tag 'wq-for-6.7-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Just one patch to fix a bug which can crash the kernel if the housekeeping and wq_unbound_cpu cpumask configuration combination leaves the latter empty" * tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Make sure that wq_unbound_cpumask is never empty
2023-12-07Merge tag 'regmap-fix-v6.7-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "An incremental fix for the fix introduced during the merge window for caching of the selector for windowed register ranges. We were incorrectly leaking an error code in the case where the last selector accessed was for some reason not cached" * tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix bogus error on regcache_sync success
2023-12-07Merge tag 'devicetree-fixes-for-6.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes from Rob Herring: - Fix dt-extract-compatibles for builds with in tree build directory - Drop Xinlei Lee <xinlei.lee@mediatek.com> bouncing email - Fix the of_reconfig_get_state_change() return value documentation - Add missing #power-domain-cells property to QCom MPM - Fix warnings in i.MX LCDIF and adi,adv7533 * tag 'devicetree-fixes-for-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: display: adi,adv75xx: Document #sound-dai-cells dt-bindings: lcdif: Properly describe the i.MX23 interrupts dt-bindings: interrupt-controller: Allow #power-domain-cells of: dynamic: Fix of_reconfig_get_state_change() return value documentation dt-bindings: display: mediatek: dsi: remove Xinlei's mail dt: dt-extract-compatibles: Don't follow symlinks when walking tree
2023-12-07Merge tag 'platform-drivers-x86-v6.7-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Ilpo Järvinen: - Fix i8042 filter resource handling, input, and suspend issues in asus-wmi - Skip zero instance WMI blocks to avoid issues with some laptops - Differentiate dev/production keys in mlxbf-bootctl - Correct surface serdev related return value to avoid leaking errno into userspace - Error checking fixes * tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/mellanox: Check devm_hwmon_device_register_with_groups() return value platform/mellanox: Add null pointer checks for devm_kasprintf() mlxbf-bootctl: correctly identify secure boot with development keys platform/x86: wmi: Skip blocks with zero instances platform/surface: aggregator: fix recv_buf() return value platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend platform/x86: asus-wmi: Filter Volume key presses if also reported via atkbd platform/x86: asus-wmi: Change q500a_i8042_filter() into a generic i8042-filter platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
2023-12-07Merge tag 'x86-int80-20231207' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 int80 fixes from Dave Hansen: "Avoid VMM misuse of 'int 0x80' handling in TDX and SEV guests. It also has the very nice side effect of getting rid of a bunch of assembly entry code" * tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tdx: Allow 32-bit emulation by default x86/entry: Do not allow external 0x80 interrupts x86/entry: Convert INT 0x80 emulation to IDTENTRY x86/coco: Disable 32-bit emulation by default on TDX and SEV
2023-12-07Merge tag 'md-fixes-20231207-1' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7 Pull MD fix from Song: "This change from Yu Kuai fixes a bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=218200" * tag 'md-fixes-20231207-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: split MD_RECOVERY_NEEDED out of mddev_resume
2023-12-07ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7Takashi Iwai
Lenovo Yoga Pro 7 14APH8 (PCI SSID 17aa:3882) seems requiring the similar workaround like Yoga 9 model for the bass speaker. Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/CAGGk=CRRQ1L9p771HsXTN_ebZP41Qj+3gw35Gezurn+nokRewg@mail.gmail.com Link: https://lore.kernel.org/r/20231207182035.30248-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-07md: split MD_RECOVERY_NEEDED out of mddev_resumeYu Kuai
New mddev_resume() calls are added to synchronize IO with array reconfiguration, however, this introduces a performance regression while adding it in md_start_sync(): 1) someone sets MD_RECOVERY_NEEDED first; 2) daemon thread grabs reconfig_mutex, then clears MD_RECOVERY_NEEDED and queues a new sync work; 3) daemon thread releases reconfig_mutex; 4) in md_start_sync a) check that there are spares that can be added/removed, then suspend the array; b) remove_and_add_spares may not be called, or called without really add/remove spares; c) resume the array, then set MD_RECOVERY_NEEDED again! Loop between 2 - 4, then mddev_suspend() will be called quite often, for consequence, normal IO will be quite slow. Fix this problem by don't set MD_RECOVERY_NEEDED again in md_start_sync(), hence the loop will be broken. Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Suggested-by: Song Liu <song@kernel.org> Reported-by: Janpieter Sollie <janpieter.sollie@edpnet.be> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218200 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231207020724.2797445-1-yukuai1@huaweicloud.com
2023-12-07vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warningStefano Garzarella
After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") in CentOS Stream 9, CI reported the following error: In file included from ./include/linux/kernel.h:17, from ./include/linux/list.h:9, from ./include/linux/preempt.h:11, from ./include/linux/spinlock.h:56, from net/vmw_vsock/virtio_transport_common.c:9: net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘: ./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror] 20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) | ^~ ./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘ 26 | (__typecheck(x, y) && __no_side_effects(x, y)) | ^~~~~~~~~~~ ./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘ 36 | __builtin_choose_expr(__safe_cmp(x, y), \ | ^~~~~~~~~~ ./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘ 45 | #define min(x, y) __careful_cmp(x, y, <) | ^~~~~~~~~~~~~ net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘ 63 | int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); We could solve it by using min_t(), but this operation seems entirely unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(), which performs almost the same check, returning at most MAX_SKB_FRAGS elements. So, let's eliminate this unnecessary comparison. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Cc: avkrasnov@salutedevices.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07net/smc: fix missing byte order conversion in CLC handshakeWen Gu
The byte order conversions of ISM GID and DMB token are missing in process of CLC accept and confirm. So fix it. Fixes: 3d9725a6a133 ("net/smc: common routine for CLC accept and confirm") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07net: dsa: microchip: provide a list of valid protocols for xmit handlerSean Nyekjaer
Provide a list of valid protocols for which the driver will provide it's deferred xmit handler. When using DSA_TAG_PROTO_KSZ8795 protocol, it does not provide a "connect" method, therefor ksz_connect() is not allocating ksz_tagger_data. This avoids the following null pointer dereference: ksz_connect_tag_protocol from dsa_register_switch+0x9ac/0xee0 dsa_register_switch from ksz_switch_register+0x65c/0x828 ksz_switch_register from ksz_spi_probe+0x11c/0x168 ksz_spi_probe from spi_probe+0x84/0xa8 spi_probe from really_probe+0xc8/0x2d8 Fixes: ab32f56a4100 ("net: dsa: microchip: ptp: add packet transmission timestamping") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20231206071655.1626479-1-sean@geanix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07Merge branch 'generic-netlink-multicast-fixes'Jakub Kicinski
Ido Schimmel says: ==================== Generic netlink multicast fixes Restrict two generic netlink multicast groups - in the "psample" and "NET_DM" families - to be root-only with the appropriate capabilities. See individual patches for more details. ==================== Link: https://lore.kernel.org/r/20231206213102.1824398-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" groupIdo Schimmel
The "NET_DM" generic netlink family notifies drop locations over the "events" multicast group. This is problematic since by default generic netlink allows non-root users to listen to these notifications. Fix by adding a new field to the generic netlink multicast group structure that when set prevents non-root users or root without the 'CAP_SYS_ADMIN' capability (in the user namespace owning the network namespace) from joining the group. Set this field for the "events" group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the nature of the information that is shared over this group. Note that the capability check in this case will always be performed against the initial user namespace since the family is not netns aware and only operates in the initial network namespace. A new field is added to the structure rather than using the "flags" field because the existing field uses uAPI flags and it is inappropriate to add a new uAPI flag for an internal kernel check. In net-next we can rework the "flags" field to use internal flags and fold the new field into it. But for now, in order to reduce the amount of changes, add a new field. Since the information can only be consumed by root, mark the control plane operations that start and stop the tracing as root-only using the 'GENL_ADMIN_PERM' flag. Tested using [1]. Before: # capsh -- -c ./dm_repo # capsh --drop=cap_sys_admin -- -c ./dm_repo After: # capsh -- -c ./dm_repo # capsh --drop=cap_sys_admin -- -c ./dm_repo Failed to join "events" multicast group [1] $ cat dm.c #include <stdio.h> #include <netlink/genl/ctrl.h> #include <netlink/genl/genl.h> #include <netlink/socket.h> int main(int argc, char **argv) { struct nl_sock *sk; int grp, err; sk = nl_socket_alloc(); if (!sk) { fprintf(stderr, "Failed to allocate socket\n"); return -1; } err = genl_connect(sk); if (err) { fprintf(stderr, "Failed to connect socket\n"); return err; } grp = genl_ctrl_resolve_grp(sk, "NET_DM", "events"); if (grp < 0) { fprintf(stderr, "Failed to resolve \"events\" multicast group\n"); return grp; } err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE); if (err) { fprintf(stderr, "Failed to join \"events\" multicast group\n"); return err; } return 0; } $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o dm_repo dm.c Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231206213102.1824398-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07psample: Require 'CAP_NET_ADMIN' when joining "packets" groupIdo Schimmel
The "psample" generic netlink family notifies sampled packets over the "packets" multicast group. This is problematic since by default generic netlink allows non-root users to listen to these notifications. Fix by marking the group with the 'GENL_UNS_ADMIN_PERM' flag. This will prevent non-root users or root without the 'CAP_NET_ADMIN' capability (in the user namespace owning the network namespace) from joining the group. Tested using [1]. Before: # capsh -- -c ./psample_repo # capsh --drop=cap_net_admin -- -c ./psample_repo After: # capsh -- -c ./psample_repo # capsh --drop=cap_net_admin -- -c ./psample_repo Failed to join "packets" multicast group [1] $ cat psample.c #include <stdio.h> #include <netlink/genl/ctrl.h> #include <netlink/genl/genl.h> #include <netlink/socket.h> int join_grp(struct nl_sock *sk, const char *grp_name) { int grp, err; grp = genl_ctrl_resolve_grp(sk, "psample", grp_name); if (grp < 0) { fprintf(stderr, "Failed to resolve \"%s\" multicast group\n", grp_name); return grp; } err = nl_socket_add_memberships(sk, grp, NFNLGRP_NONE); if (err) { fprintf(stderr, "Failed to join \"%s\" multicast group\n", grp_name); return err; } return 0; } int main(int argc, char **argv) { struct nl_sock *sk; int err; sk = nl_socket_alloc(); if (!sk) { fprintf(stderr, "Failed to allocate socket\n"); return -1; } err = genl_connect(sk); if (err) { fprintf(stderr, "Failed to connect socket\n"); return err; } err = join_grp(sk, "config"); if (err) return err; err = join_grp(sk, "packets"); if (err) return err; return 0; } $ gcc -I/usr/include/libnl3 -lnl-3 -lnl-genl-3 -o psample_repo psample.c Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling") Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231206213102.1824398-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07Merge branch 'fixes-for-ktls'Jakub Kicinski
John Fastabend says: ==================== Couple fixes for TLS and BPF interactions. ==================== Link: https://lore.kernel.org/r/20231206232706.374377-1-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07bpf: sockmap, updating the sg structure should also update currJohn Fastabend
Curr pointer should be updated when the sg structure is shifted. Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20231206232706.374377-3-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07net: tls, update curr on splice as wellJohn Fastabend
The curr pointer must also be updated on the splice similar to how we do this for other copy types. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07x86/tdx: Allow 32-bit emulation by defaultKirill A. Shutemov
32-bit emulation was disabled on TDX to prevent a possible attack by a VMM injecting an interrupt on vector 0x80. Now that int80_emulation() has a check for external interrupts the limitation can be lifted. To distinguish software interrupts from external ones, int80_emulation() checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0. On TDX, the VAPIC state (including ISR) is protected and cannot be manipulated by the VMM. The ISR bit is set by the microcode flow during the handling of posted interrupts. [ dhansen: more changelog tweaks ] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07x86/entry: Do not allow external 0x80 interruptsThomas Gleixner
The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The kernel expects to receive a software interrupt as a result of the INT 0x80 instruction. However, an external interrupt on the same vector also triggers the same codepath. An external interrupt on vector 0x80 will currently be interpreted as a 32-bit system call, and assuming that it was a user context. Panic on external interrupts on the vector. To distinguish software interrupts from external ones, the kernel checks the APIC ISR bit relevant to the 0x80 vector. For software interrupts, this bit will be 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07x86/entry: Convert INT 0x80 emulation to IDTENTRYThomas Gleixner
There is no real reason to have a separate ASM entry point implementation for the legacy INT 0x80 syscall emulation on 64-bit. IDTENTRY provides all the functionality needed with the only difference that it does not: - save the syscall number (AX) into pt_regs::orig_ax - set pt_regs::ax to -ENOSYS Both can be done safely in the C code of an IDTENTRY before invoking any of the syscall related functions which depend on this convention. Aside of ASM code reduction this prepares for detecting and handling a local APIC injected vector 0x80. [ kirill.shutemov: More verbose comments ] Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07x86/coco: Disable 32-bit emulation by default on TDX and SEVKirill A. Shutemov
The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The kernel expects to receive a software interrupt as a result of the INT 0x80 instruction. However, an external interrupt on the same vector triggers the same handler. The kernel interprets an external interrupt on vector 0x80 as a 32-bit system call that came from userspace. A VMM can inject external interrupts on any arbitrary vector at any time. This remains true even for TDX and SEV guests where the VMM is untrusted. Put together, this allows an untrusted VMM to trigger int80 syscall handling at any given point. The content of the guest register file at that moment defines what syscall is triggered and its arguments. It opens the guest OS to manipulation from the VMM side. Disable 32-bit emulation by default for TDX and SEV. User can override it with the ia32_emulation=y command line option. [ dhansen: reword the changelog ] Reported-by: Supraja Sridhara <supraja.sridhara@inf.ethz.ch> Reported-by: Benedict Schlüter <benedict.schlueter@inf.ethz.ch> Reported-by: Mark Kuhne <mark.kuhne@inf.ethz.ch> Reported-by: Andrin Bertschi <andrin.bertschi@inf.ethz.ch> Reported-by: Shweta Shinde <shweta.shinde@inf.ethz.ch> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+: 1da5c9b x86: Introduce ia32_enabled() Cc: <stable@vger.kernel.org> # v6.0+
2023-12-07Merge tag 'nf-23-12-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Incorrect nf_defrag registration for bpf link infra, from D. Wythe. 2) Skip inactive elements in pipapo set backend walk to avoid double deactivation, from Florian Westphal. 3) Fix NFT_*_F_PRESENT check with big endian arch, also from Florian. 4) Bail out if number of expressions in NFTA_DYNSET_EXPRESSIONS mismatch stateful expressions in set declaration. 5) Honor family in table lookup by handle. Broken since 4.16. 6) Use sk_callback_lock to protect access to sk->sk_socket in xt_owner. sock_orphan() might zap this pointer, from Phil Sutter. All of these fixes address broken stuff for several releases. * tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: xt_owner: Fix for unsafe access of sk->sk_socket netfilter: nf_tables: validate family when identifying table via handle netfilter: nf_tables: bail out on mismatching dynset and set expressions netfilter: nf_tables: fix 'exist' matching on bigendian arches netfilter: nft_set_pipapo: skip inactive elements during set walk netfilter: bpf: fix bad registration on nf_defrag ==================== Link: https://lore.kernel.org/r/20231206180357.959930-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07io_uring/af_unix: disable sending io_uring over socketsPavel Begunkov
File reference cycles have caused lots of problems for io_uring in the past, and it still doesn't work exactly right and races with unix_stream_read_generic(). The safest fix would be to completely disallow sending io_uring files via sockets via SCM_RIGHT, so there are no possible cycles invloving registered files and thus rendering SCM accounting on the io_uring side unnecessary. Cc: <stable@vger.kernel.org> Fixes: 0091bfc81741b ("io_uring/af_unix: defer registered files gc to io_uring release") Reported-and-suggested-by: Jann Horn <jannh@google.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c716c88321939156909cfa1bd8b0faaf1c804103.1701868795.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-07Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-12-06 We've added 4 non-merge commits during the last 6 day(s) which contain a total of 7 files changed, 185 insertions(+), 55 deletions(-). The main changes are: 1) Fix race found by syzkaller on prog_array_map_poke_run when a BPF program's kallsym symbols were still missing, from Jiri Olsa. 2) Fix BPF verifier's branch offset comparison for BPF_JMP32 | BPF_JA, from Yonghong Song. 3) Fix xsk's poll handling to only set mask on bound xsk sockets, from Yewon Choi. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add test for early update in prog_array_map_poke_run bpf: Fix prog_array_map_poke_run map poke update xsk: Skip polling event check for unbound socket bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4 ==================== Link: https://lore.kernel.org/r/20231206220528.12093-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07Merge tag 'nvme-6.7-2023-12-7' of git://git.infradead.org/nvme into block-6.7Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.7 - Proper nvme ctrl state setting (Keith) - Passthrough command optimization (Keith) - Spectre fix (Nitesh) - Kconfig clarifications (Shin'ichiro) - Frozen state deadlock fix (Bitao) - Power setting quirk (Georg)" * tag 'nvme-6.7-2023-12-7' of git://git.infradead.org/nvme: nvme-pci: Add sleep quirk for Kingston drives nvme: fix deadlock between reset and scan nvme: prevent potential spectre v1 gadget nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions nvme-ioctl: move capable() admin check to the end nvme: ensure reset state check ordering nvme: introduce helper function to get ctrl state
2023-12-07nvme-pci: Add sleep quirk for Kingston drivesGeorg Gottleuber
Some Kingston NV1 and A2000 are wasting a lot of power on specific TUXEDO platforms in s2idle sleep if 'Simple Suspend' is used. This patch applies a new quirk 'Force No Simple Suspend' to achieve a low power sleep without 'Simple Suspend'. Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com> Cc: <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-12-07Merge tag 'imx-fixes-6.7' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 6.7: - A MAINTAINERS update to reinstate freescale ARM64 DT directory in i.MX entry. - A series from Alexander Stein to fix #pwm-cells for imx8-ss. - A series from Haibo Chen to fix GPIO node name for i.MX93 and i.MX8ULP. - Add parkmode-disable-ss-quirk for DWC3 on i.MX8MP and i.MX8MQ to fix an issue that the controller may hang when processing transactions under heavy USB traffic from multiple endpoints. - Fix mediamix block power on/off for i.MX93 by correcting the power domain clock to be 'nic_media'. - A couple of Ethernet PHY clock regression fixes for imx6ul-pico and imx6q-skov board. - Fix edma3 power domain for i.MX8QM to fix a panic during startup process. * tag 'imx-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx28-xea: Pass the 'model' property ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry arm64: dts: imx8-apalis: set wifi regulator to always-on ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init arm64: dts: imx8ulp: update gpio node name to align with register address arm64: dts: imx93: update gpio node name to align with register address arm64: dts: imx93: correct mediamix power arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3 ARM: dts: imx6q: skov: fix ethernet clock regression arm64: dt: imx93: tqma9352-mba93xxla: Fix LPUART2 pad config Link: https://lore.kernel.org/r/20231207005202.GF270430@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-12-07nfp: flower: fix for take a mutex lock in soft irq context and rcu lockHui Zhou
The neighbour event callback call the function nfp_tun_write_neigh, this function will take a mutex lock and it is in soft irq context, change the work queue to process the neighbour event. Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock() in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6. Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload") CC: stable@vger.kernel.org # 6.2+ Signed-off-by: Hui Zhou <hui.zhou@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-07ALSA: pcmtest: stop timer before buffer is releasedIvan Orlov
Stop timer in the 'trigger' and 'sync_stop' callbacks since we want the timer to be stopped before the DMA buffer is released. Otherwise, it could trigger a kernel panic in some circumstances, for instance when the DMA buffer is already released but the timer callback is still running. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20231206223211.12761-1-ivan.orlov0322@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-06Merge tag 'parisc-for-6.7-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "A single line patch for parisc which fixes the build in tinyconfig configurations: - Fix asm operand number out of range build error in bug table" * tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix asm operand number out of range build error in bug table
2023-12-07ALSA: hda/realtek: Add Framework laptop 16 to quirksMario Limonciello
The Framework 16" laptop has the same controller as other Framework models. Apply the presence detection quirk. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231206193927.2996-1-mario.limonciello@amd.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-12-06Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-12-05 (ice, i40e, iavf) This series contains updates to ice, i40e and iavf drivers. Michal fixes incorrect usage of VF MSIX value and index calculation for ice. Marcin restores disabling of Rx VLAN filtering which was inadvertently removed for ice. Ivan Vecera corrects improper messaging of MFS port for i40e. Jake fixes incorrect checking of coalesce values on iavf. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: iavf: validate tx_coalesce_usecs even if rx_coalesce_usecs is zero i40e: Fix unexpected MFS warning message ice: Restore fix disabling RX VLAN filtering ice: change vfs.num_msix_per to vf->num_msix ==================== Link: https://lore.kernel.org/r/20231205211918.2123019-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06net: dsa: mv88e6xxx: Restore USXGMII support for 6393XTobias Waldekranz
In 4a56212774ac, USXGMII support was added for 6393X, but this was lost in the PCS conversion (the blamed commit), most likely because these efforts where more or less done in parallel. Restore this feature by porting Michal's patch to fit the new implementation. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Michal Smulski <michal.smulski@ooma.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Fixes: e5b732a275f5 ("net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20231205221359.3926018-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-06tcp: do not accept ACK of bytes we never sentEric Dumazet
This patch is based on a detailed report and ideas from Yepeng Pan and Christian Rossow. ACK seq validation is currently following RFC 5961 5.2 guidelines: The ACK value is considered acceptable only if it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT). All incoming segments whose ACK value doesn't satisfy the above condition MUST be discarded and an ACK sent back. It needs to be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a duplicate (SEG.ACK < SND.UNA), it can be ignored. If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return". The "ignored" above implies that the processing of the incoming data segment continues, which means the ACK value is treated as acceptable. This mitigation makes the ACK check more stringent since any ACK < SND.UNA wouldn't be accepted, instead only ACKs that are in the range ((SND.UNA - MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through. This can be refined for new (and possibly spoofed) flows, by not accepting ACK for bytes that were never sent. This greatly improves TCP security at a little cost. I added a Fixes: tag to make sure this patch will reach stable trees, even if the 'blamed' patch was adhering to the RFC. tp->bytes_acked was added in linux-4.2 Following packetdrill test (courtesy of Yepeng Pan) shows the issue at hand: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1024) = 0 // ---------------- Handshake ------------------- // // when window scale is set to 14 the window size can be extended to // 65535 * (2^14) = 1073725440. Linux would accept an ACK packet // with ack number in (Server_ISN+1-1073725440. Server_ISN+1) // ,though this ack number acknowledges some data never // sent by the server. +0 < S 0:0(0) win 65535 <mss 1400,nop,wscale 14> +0 > S. 0:0(0) ack 1 <...> +0 < . 1:1(0) ack 1 win 65535 +0 accept(3, ..., ...) = 4 // For the established connection, we send an ACK packet, // the ack packet uses ack number 1 - 1073725300 + 2^32, // where 2^32 is used to wrap around. // Note: we used 1073725300 instead of 1073725440 to avoid possible // edge cases. // 1 - 1073725300 + 2^32 = 3221241997 // Oops, old kernels happily accept this packet. +0 < . 1:1001(1000) ack 3221241997 win 65535 // After the kernel fix the following will be replaced by a challenge ACK, // and prior malicious frame would be dropped. +0 > . 1:1(0) ack 1001 Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yepeng Pan <yepeng.pan@cispa.de> Reported-by: Christian Rossow <rossow@cispa.de> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20231205161841.2702925-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-07drm/exynos: fix a wrong error checkingInki Dae
Fix a wrong error checking in exynos_drm_dma.c module. In the exynos_drm_register_dma function, both arm_iommu_create_mapping() and iommu_get_domain_for_dev() functions are expected to return NULL as an error. However, the error checking is performed using the statement if(IS_ERR(mapping)), which doesn't provide a suitable error value. So check if 'mapping' is NULL, and if it is, return -ENODEV. This issue[1] was reported by Dan. Changelog v1: - fix build warning. [1] https://lore.kernel.org/all/33e52277-1349-472b-a55b-ab5c3462bfcf@moroto.mountain/ Reported-by : Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-12-07drm/exynos: fix a potential error pointer dereferenceXiang Yang
Smatch reports the warning below: drivers/gpu/drm/exynos/exynos_hdmi.c:1864 hdmi_bind() error: 'crtc' dereferencing possible ERR_PTR() The return value of exynos_drm_crtc_get_by_type maybe ERR_PTR(-ENODEV), which can not be used directly. Fix this by checking the return value before using it. Signed-off-by: Xiang Yang <xiangyang3@huawei.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2023-12-07nvmem: Do not expect fixed layouts to grab a layout driverMiquel Raynal
Two series lived in parallel for some time, which led to this situation: - The nvmem-layout container is used for dynamic layouts - We now expect fixed layouts to also use the nvmem-layout container but this does not require any additional driver, the support is built-in the nvmem core. Ensure we don't refuse to probe for wrong reasons. Fixes: 27f699e578b1 ("nvmem: core: add support for fixed cells *layout*") Cc: stable@vger.kernel.org Reported-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Tested-by: Rafał Miłecki <rafal@milecki.pl> Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Link: https://lore.kernel.org/r/20231124193814.360552-1-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-07parport: Add support for Brainboxes IX/UC/PX parallel cardsCameron Williams
Adds support for Intashield IX-500/IX-550, UC-146/UC-157, PX-146/PX-157, PX-203 and PX-475 (LPT port) Cc: stable@vger.kernel.org Signed-off-by: Cameron Williams <cang1@live.co.uk> Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lore.kernel.org/r/AS4PR02MB790389C130410BD864C8DCC9C4A6A@AS4PR02MB7903.eurprd02.prod.outlook.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-07serial: 8250_dw: Add ACPI ID for Granite Rapids-D UARTAndy Shevchenko
Granite Rapids-D has an additional UART that is enumerated via ACPI. Add ACPI ID for it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/r/20231205195524.2705965-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-07serial: ma35d1: Validate console index before assignmentAndi Shyti
The console is immediately assigned to the ma35d1 port without checking its index. This oversight can lead to out-of-bounds errors when the index falls outside the valid '0' to MA35_UART_NR range. Such scenario trigges ran error like the following: UBSAN: array-index-out-of-bounds in drivers/tty/serial/ma35d1_serial.c:555:51 index -1 is out of range for type 'uart_ma35d1_port [17] Check the index before using it and bail out with a warning. Fixes: 930cbf92db01 ("tty: serial: Add Nuvoton ma35d1 serial driver support") Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Cc: Jacky Huang <ychuang3@nuvoton.com> Cc: <stable@vger.kernel.org> # v6.5+ Link: https://lore.kernel.org/r/20231204163804.1331415-2-andi.shyti@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-07usb: gadget: f_hid: fix report descriptor allocationKonstantin Aladyshev
The commit 89ff3dfac604 ("usb: gadget: f_hid: fix f_hidg lifetime vs cdev") has introduced a bug that leads to hid device corruption after the replug operation. Reverse device managed memory allocation for the report descriptor to fix the issue. Tested: This change was tested on the AMD EthanolX CRB server with the BMC based on the OpenBMC distribution. The BMC provides KVM functionality via the USB gadget device: - before: KVM page refresh results in a broken USB device, - after: KVM page refresh works without any issues. Fixes: 89ff3dfac604 ("usb: gadget: f_hid: fix f_hidg lifetime vs cdev") Cc: stable@vger.kernel.org Signed-off-by: Konstantin Aladyshev <aladyshev22@gmail.com> Link: https://lore.kernel.org/r/20231206080744.253-2-aladyshev22@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()Jiexun Wang
I conducted real-time testing and observed that madvise_cold_or_pageout_pte_range() causes significant latency under memory pressure, which can be effectively reduced by adding cond_resched() within the loop. I tested on the LicheePi 4A board using Cylictest for latency testing and Ftrace for latency tracing. The board uses TH1520 processor and has a memory size of 8GB. The kernel version is 6.5.0 with the PREEMPT_RT patch applied. The script I tested is as follows: echo wakeup_rt > /sys/kernel/tracing/current_tracer echo 1 > /sys/kernel/tracing/tracing_on echo 0 > /sys/kernel/tracing/tracing_max_latency stress-ng --vm 8 --vm-bytes 2G & cyclictest --mlockall --smp --priority=99 --distance=0 --duration=30m echo 0 > /sys/kernel/tracing/tracing_on cat /sys/kernel/tracing/trace The tracing results before modification are as follows: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 6.5.0-rt6-r1208-00003-g999d221864bf # -------------------------------------------------------------------- # latency: 2552 us, #6/6, CPU#3 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: cyclictest-196 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _--------=> CPU# # / _-------=> irqs-off/BH-disabled # | / _------=> need-resched # || / _-----=> need-resched-lazy # ||| / _----=> hardirq/softirq # |||| / _---=> preempt-depth # ||||| / _--=> preempt-lazy-depth # |||||| / _-=> migrate-disable # ||||||| / delay # cmd pid |||||||| time | caller # \ / |||||||| \ | / stress-n-206 3dn.h512 2us : 206:120:R + [003] 196: 0:R cyclictest stress-n-206 3dn.h512 7us : <stack trace> => __ftrace_trace_stack => __trace_stack => probe_wakeup => ttwu_do_activate => try_to_wake_up => wake_up_process => hrtimer_wakeup => __hrtimer_run_queues => hrtimer_interrupt => riscv_timer_interrupt => handle_percpu_devid_irq => generic_handle_domain_irq => riscv_intc_irq => handle_riscv_irq => do_irq stress-n-206 3dn.h512 9us#: 0 stress-n-206 3d...3.. 2544us : __schedule stress-n-206 3d...3.. 2545us : 206:120:R ==> [003] 196: 0:R cyclictest stress-n-206 3d...3.. 2551us : <stack trace> => __ftrace_trace_stack => __trace_stack => probe_wakeup_sched_switch => __schedule => preempt_schedule => migrate_enable => rt_spin_unlock => madvise_cold_or_pageout_pte_range => walk_pgd_range => __walk_page_range => walk_page_range => madvise_pageout => madvise_vma_behavior => do_madvise => sys_madvise => do_trap_ecall_u => ret_from_exception The tracing results after modification are as follows: # tracer: wakeup_rt # # wakeup_rt latency trace v1.1.5 on 6.5.0-rt6-r1208-00004-gca3876fc69a6-dirty # -------------------------------------------------------------------- # latency: 1689 us, #6/6, CPU#0 | (M:preempt_rt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: cyclictest-217 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _--------=> CPU# # / _-------=> irqs-off/BH-disabled # | / _------=> need-resched # || / _-----=> need-resched-lazy # ||| / _----=> hardirq/softirq # |||| / _---=> preempt-depth # ||||| / _--=> preempt-lazy-depth # |||||| / _-=> migrate-disable # ||||||| / delay # cmd pid |||||||| time | caller # \ / |||||||| \ | / stress-n-232 0dn.h413 1us+: 232:120:R + [000] 217: 0:R cyclictest stress-n-232 0dn.h413 12us : <stack trace> => __ftrace_trace_stack => __trace_stack => probe_wakeup => ttwu_do_activate => try_to_wake_up => wake_up_process => hrtimer_wakeup => __hrtimer_run_queues => hrtimer_interrupt => riscv_timer_interrupt => handle_percpu_devid_irq => generic_handle_domain_irq => riscv_intc_irq => handle_riscv_irq => do_irq stress-n-232 0dn.h413 19us#: 0 stress-n-232 0d...3.. 1671us : __schedule stress-n-232 0d...3.. 1676us+: 232:120:R ==> [000] 217: 0:R cyclictest stress-n-232 0d...3.. 1687us : <stack trace> => __ftrace_trace_stack => __trace_stack => probe_wakeup_sched_switch => __schedule => preempt_schedule => migrate_enable => free_unref_page_list => release_pages => free_pages_and_swap_cache => tlb_batch_pages_flush => tlb_flush_mmu => unmap_page_range => unmap_vmas => unmap_region => do_vmi_align_munmap.constprop.0 => do_vmi_munmap => __vm_munmap => sys_munmap => do_trap_ecall_u => ret_from_exception After the modification, the cause of maximum latency is no longer madvise_cold_or_pageout_pte_range(), so this modification can reduce the latency caused by madvise_cold_or_pageout_pte_range(). Currently the madvise_cold_or_pageout_pte_range() function exhibits significant latency under memory pressure, which can be effectively reduced by adding cond_resched() within the loop. When the batch_count reaches SWAP_CLUSTER_MAX, we reschedule the task to ensure fairness and avoid long lock holding times. Link: https://lkml.kernel.org/r/85363861af65fac66c7a98c251906afc0d9c8098.1695291046.git.wangjiexun@tinylab.org Signed-off-by: Jiexun Wang <wangjiexun@tinylab.org> Cc: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()Ryusuke Konishi
If nilfs2 reads a disk image with corrupted segment usage metadata, and its segment usage information is marked as an error for the segment at the write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs during log writing. Segments newly allocated for writing with nilfs_sufile_alloc() will not have this error flag set, but this unexpected situation will occur if the segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active segment) was marked in error. Fix this issue by inserting a sanity check to treat it as a file system corruption. Since error returns are not allowed during the execution phase where nilfs_sufile_set_segment_usage() is used, this inserts the sanity check into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the segment usage record to be updated and sets it up in a dirty state for writing. In addition, nilfs_sufile_set_segment_usage() is also called when canceling log writing and undoing segment usage update, so in order to avoid issuing the same kernel warning in that case, in case of cancellation, avoid checking the error flag in nilfs_sufile_set_segment_usage(). Link: https://lkml.kernel.org/r/20231205085947.4431-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+14e9f834f6ddecece094@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=14e9f834f6ddecece094 Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTISidhartha Kumar
After commit a08c7193e4f1 "mm/filemap: remove hugetlb special casing in filemap.c", hugetlb pages are stored in the page cache in base page sized indexes. This leads to multi index stores in the xarray which is only supporting through CONFIG_XARRAY_MULTI. The other page cache user of multi index stores ,THP, selects XARRAY_MULTI. Have CONFIG_HUGETLB_PAGE follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE && !CONFIG_XARRAY_MULTI config. Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06scripts/gdb: fix lx-device-list-bus and lx-device-list-classFlorian Fainelli
After the conversion to bus_to_subsys() and class_to_subsys(), the gdb scripts listing the system buses and classes respectively was broken, fix those by returning the subsys_priv pointer and have the various caller de-reference either the 'bus' or 'class' structure members accordingly. Link: https://lkml.kernel.org/r/20231130043317.174188-1-florian.fainelli@broadcom.com Fixes: 7b884b7f24b4 ("driver core: class.c: convert to only use class_to_subsys") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06MAINTAINERS: drop Antti PalosaariBagas Sanjaya
He is currently inactive (last message from him is two years ago [1]). His media tree [2] is also dormant (latest activity is 6 years ago), yet his site is still online [3]. Drop him from MAINTAINERS and add CREDITS entry for him. We thank him for maintaining various DVB drivers. [1]: https://lore.kernel.org/all/660772b3-0597-02db-ed94-c6a9be04e8e8@iki.fi/ [2]: https://git.linuxtv.org/anttip/media_tree.git/ [3]: https://palosaari.fi/linux/ Link: https://lkml.kernel.org/r/20231130083848.5396-1-bagasdotme@gmail.com Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Antti Palosaari <crope@iki.fi> Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06highmem: fix a memory copy problem in memcpy_from_folioSu Hui
Clang static checker complains that value stored to 'from' is never read. And memcpy_from_folio() only copy the last chunk memory from folio to destination. Use 'to += chunk' to replace 'from += chunk' to fix this typo problem. Link: https://lkml.kernel.org/r/20231130034017.1210429-1-suhui@nfschina.com Fixes: b23d03ef7af5 ("highmem: add memcpy_to_folio() and memcpy_from_folio()") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Tom Rix <trix@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-06nilfs2: fix missing error check for sb_set_blocksize callRyusuke Konishi
When mounting a filesystem image with a block size larger than the page size, nilfs2 repeatedly outputs long error messages with stack traces to the kernel log, such as the following: getblk(): invalid block size 8192 requested logical block size: 512 ... Call Trace: dump_stack_lvl+0x92/0xd4 dump_stack+0xd/0x10 bdev_getblk+0x33a/0x354 __breadahead+0x11/0x80 nilfs_search_super_root+0xe2/0x704 [nilfs2] load_nilfs+0x72/0x504 [nilfs2] nilfs_mount+0x30f/0x518 [nilfs2] legacy_get_tree+0x1b/0x40 vfs_get_tree+0x18/0xc4 path_mount+0x786/0xa88 __ia32_sys_mount+0x147/0x1a8 __do_fast_syscall_32+0x56/0xc8 do_fast_syscall_32+0x29/0x58 do_SYSENTER_32+0x15/0x18 entry_SYSENTER_32+0x98/0xf1 ... This overloads the system logger. And to make matters worse, it sometimes crashes the kernel with a memory access violation. This is because the return value of the sb_set_blocksize() call, which should be checked for errors, is not checked. The latter issue is due to out-of-buffer memory being accessed based on a large block size that caused sb_set_blocksize() to fail for buffers read with the initial minimum block size that remained unupdated in the super_block structure. Since nilfs2 mkfs tool does not accept block sizes larger than the system page size, this has been overlooked. However, it is possible to create this situation by intentionally modifying the tool or by passing a filesystem image created on a system with a large page size to a system with a smaller page size and mounting it. Fix this issue by inserting the expected error handling for the call to sb_set_blocksize(). Link: https://lkml.kernel.org/r/20231129141547.4726-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>