summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-28sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()Cong Wang
The following race condition could trigger a NULL pointer dereference: sock_map_link_detach(): sock_map_link_update_prog(): mutex_lock(&sockmap_mutex); ... sockmap_link->map = NULL; mutex_unlock(&sockmap_mutex); mutex_lock(&sockmap_mutex); ... sock_map_prog_link_lookup(sockmap_link->map); mutex_unlock(&sockmap_mutex); <continue> Fix it by adding a NULL pointer check. In this specific case, it makes no sense to update a link which is being released. Reported-by: Ruan Bonan <bonan.ruan@u.nus.edu> Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs") Cc: Yonghong Song <yonghong.song@linux.dev> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20241026185522.338562-1-xiyou.wangcong@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-24Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix an out-of-bounds read in bpf_link_show_fdinfo for BPF sockmap link file descriptors (Hou Tao) - Fix BPF arm64 JIT's address emission with tag-based KASAN enabled reserving not enough size (Peter Collingbourne) - Fix BPF verifier do_misc_fixups patching for inlining of the bpf_get_branch_snapshot BPF helper (Andrii Nakryiko) - Fix a BPF verifier bug and reject BPF program write attempts into read-only marked BPF maps (Daniel Borkmann) - Fix perf_event_detach_bpf_prog error handling by removing an invalid check which would skip BPF program release (Jiri Olsa) - Fix memory leak when parsing mount options for the BPF filesystem (Hou Tao) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Check validity of link->type in bpf_link_show_fdinfo() bpf: Add the missing BPF_LINK_TYPE invocation for sockmap bpf: fix do_misc_fixups() for bpf_get_branch_snapshot() bpf,perf: Fix perf_event_detach_bpf_prog error handling selftests/bpf: Add test for passing in uninit mtu_len selftests/bpf: Add test for writes to .rodata bpf: Remove MEM_UNINIT from skb/xdp MTU helpers bpf: Fix overloading of MEM_UNINIT's meaning bpf: Add MEM_WRITE attribute bpf: Preserve param->string when parsing mount options bpf, arm64: Fix address emission with tag-based KASAN enabled
2024-10-24Merge tag 'net-6.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfiler, xfrm and bluetooth. Oddly this includes a fix for a posix clock regression; in our previous PR we included a change there as a pre-requisite for networking one. That fix proved to be buggy and requires the follow-up included here. Thomas suggested we should send it, given we sent the buggy patch. Current release - regressions: - posix-clock: Fix unbalanced locking in pc_clock_settime() - netfilter: fix typo causing some targets not to load on IPv6 Current release - new code bugs: - xfrm: policy: remove last remnants of pernet inexact list Previous releases - regressions: - core: fix races in netdev_tx_sent_queue()/dev_watchdog() - bluetooth: fix UAF on sco_sock_timeout - eth: hv_netvsc: fix VF namespace also in synthetic NIC NETDEV_REGISTER event - eth: usbnet: fix name regression - eth: be2net: fix potential memory leak in be_xmit() - eth: plip: fix transmit path breakage Previous releases - always broken: - sched: deny mismatched skip_sw/skip_hw flags for actions created by classifiers - netfilter: bpf: must hold reference on net namespace - eth: virtio_net: fix integer overflow in stats - eth: bnxt_en: replace ptp_lock with irqsave variant - eth: octeon_ep: add SKB allocation failures handling in __octep_oq_process_rx() Misc: - MAINTAINERS: add Simon as an official reviewer" * tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits) net: dsa: mv88e6xxx: support 4000ps cycle counter period net: dsa: mv88e6xxx: read cycle counter period from hardware net: dsa: mv88e6xxx: group cycle counter coefficients net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x Bluetooth: ISO: Fix UAF on iso_sock_timeout Bluetooth: SCO: Fix UAF on sco_sock_timeout Bluetooth: hci_core: Disable works on hci_unregister_dev posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() r8169: avoid unsolicited interrupts net: sched: use RCU read-side critical section in taprio_dump() net: sched: fix use-after-free in taprio_change() net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers net: usb: usbnet: fix name regression mlxsw: spectrum_router: fix xa_store() error checking virtio_net: fix integer overflow in stats net: fix races in netdev_tx_sent_queue()/dev_watchdog() net: wwan: fix global oob in wwan_rtnl_policy netfilter: xtables: fix typo causing some targets not to load on IPv6 ...
2024-10-24Merge tag 'hid-for-linus-20241024' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Jiri Kosina: "Device-specific functionality quirks for Thinkpad X1 Gen3, Logitech Bolt and some Goodix touchpads (Bartłomiej Maryńczak, Hans de Goede and Kenneth Albanowski)" * tag 'hid-for-linus-20241024' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: lenovo: Add support for Thinkpad X1 Tablet Gen 3 keyboard HID: multitouch: Add quirk for Logitech Bolt receiver w/ Casa touchpad HID: i2c-hid: Delayed i2c resume wakeup for 0x0d42 Goodix touchpad
2024-10-24Merge tag 'loongarch-fixes-6.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Get correct cores_per_package for SMT systems, enable IRQ if do_ale() triggered in irq-enabled context, and fix some bugs about vDSO, memory managenent, hrtimer in KVM, etc" * tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Mark hrtimer to expire in hard interrupt context LoongArch: Make KASAN usable for variable cpu_vabits LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space LoongArch: Don't crash in stack_top() for tasks without vDSO LoongArch: Set correct size for vDSO code mapping LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context LoongArch: Get correct cores_per_package for SMT systems LoongArch: Use "Exception return address" to comment ERA
2024-10-24Merge tag 'probes-fixes-v6.12-rc4.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - objpool: Fix choosing allocation for percpu slots Fixes to allocate objpool's percpu slots correctly according to the GFP flag. It checks whether "any bit" in GFP_ATOMIC is set to choose the vmalloc source, but it should check "all bits" in GFP_ATOMIC flag is set, because GFP_ATOMIC is a combined flag. - tracing/probes: Fix MAX_TRACE_ARGS limit handling If more than MAX_TRACE_ARGS are passed for creating a probe event, the entries over MAX_TRACE_ARG in trace_arg array are not initialized. Thus if the kernel accesses those entries, it crashes. This rejects creating event if the number of arguments is over MAX_TRACE_ARGS. - tracing: Consider the NUL character when validating the event length A strlen() is used when parsing the event name, and the original code does not consider the terminal null byte. Thus it can pass the name one byte longer than the buffer. This fixes to check it correctly. * tag 'probes-fixes-v6.12-rc4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Consider the NULL character when validating the event length tracing/probes: Fix MAX_TRACE_ARGS limit handling objpool: fix choosing allocation for percpu slots
2024-10-24Merge tag 'for-6.12-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - mount option fixes: - fix handling of compression mount options on remount - reject rw remount in case there are options that don't work in read-write mode (like rescue options) - fix zone accounting of unusable space - fix in-memory corruption when merging extent maps - fix delalloc range locking for sector < page - use more convenient default value of drop subtree threshold, clean more subvolumes without the fallback to marking quotas inconsistent - fix smatch warning about incorrect value passed to ERR_PTR * tag 'for-6.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item() btrfs: reject ro->rw reconfiguration if there are hard ro requirements btrfs: fix read corruption due to race with extent map merging btrfs: fix the delalloc range locking if sector size < page size btrfs: qgroup: set a more sane default value for subtree drop threshold btrfs: clear force-compress on remount when compress mount option is given btrfs: zoned: fix zone unusable accounting for freed reserved extent
2024-10-24Merge tag 'jfs-6.12-rc5' of github.com:kleikamp/linux-shaggyLinus Torvalds
Pull jfs fix from David Kleikamp: "Fix a regression introduced in 6.12-rc1" * tag 'jfs-6.12-rc5' of github.com:kleikamp/linux-shaggy: jfs: Fix sanity check in dbMount
2024-10-24Merge tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Lots of hotfixes: - transaction restart injection has been shaking out a few things - fix a data corruption in the buffered write path on -ENOSPC, found by xfstests generic/299 - Some small show_options fixes - Repair mismatches in inode hash type, seed: different snapshot versions of an inode must have the same hash/type seed, used for directory entries and xattrs. We were checking the hash seed, but not the type, and a user contributed a filesystem where the hash type on one inode had somehow been flipped; these fixes allow his filesystem to repair. Additionally, the hash type flip made some directory entries invisible, which were then recreated by userspace; so the hash check code now checks for duplicate non dangling dirents, and renames one of them if necessary. - Don't use wait_event_interruptible() in recovery: this fixes some filesystems failing to mount with -ERESTARTSYS - Workaround for kvmalloc not supporting > INT_MAX allocations, causing an -ENOMEM when allocating the sorted array of journal keys: this allows a 75 TB filesystem to mount - Make sure bch_inode_unpacked.bi_snapshot is set in the old inode compat path: this alllows Marcin's filesystem (in use since before 6.7) to repair and mount" * tag 'bcachefs-2024-10-22' of https://github.com/koverstreet/bcachefs: (26 commits) bcachefs: Set bch_inode_unpacked.bi_snapshot in old inode path bcachefs: Mark more errors as AUTOFIX bcachefs: Workaround for kvmalloc() not supporting > INT_MAX allocations bcachefs: Don't use wait_event_interruptible() in recovery bcachefs: Fix __bch2_fsck_err() warning bcachefs: fsck: Improve hash_check_key() bcachefs: bch2_hash_set_or_get_in_snapshot() bcachefs: Repair mismatches in inode hash seed, type bcachefs: Add hash seed, type to inode_to_text() bcachefs: INODE_STR_HASH() for bch_inode_unpacked bcachefs: Run in-kernel offline fsck without ratelimit errors bcachefs: skip mount option handle for empty string. bcachefs: fix incorrect show_options results bcachefs: Fix data corruption on -ENOSPC in buffered write path bcachefs: bch2_folio_reservation_get_partial() is now better behaved bcachefs: fix disk reservation accounting in bch2_folio_reservation_get() bcachefS: ec: fix data type on stripe deletion bcachefs: Don't use commit_do() unnecessarily bcachefs: handle restarts in bch2_bucket_io_time_reset() bcachefs: fix restart handling in __bch2_resume_logged_op_finsert() ...
2024-10-24Revert "9p: Enable multipage folios"Dominique Martinet
This reverts commit 1325e4a91a405f88f1b18626904d37860a4f9069. using multipage folios apparently break some madvise operations like MADV_PAGEOUT which do not reliably unload the specified page anymore, Revert the patch until that is figured out. Reported-by: Andrii Nakryiko <andrii@kernel.org> Fixes: 1325e4a91a40 ("9p: Enable multipage folios") Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-24Merge branch 'add-the-missing-bpf_link_type-invocation-for-sockmap'Andrii Nakryiko
Hou Tao says: ==================== Add the missing BPF_LINK_TYPE invocation for sockmap From: Hou Tao <houtao1@huawei.com> Hi, The tiny patch set fixes the out-of-bound read problem when reading the fdinfo of sock map link fd. And in order to spot such omission early for the newly-added link type in the future, it also checks the validity of the link->type and adds a WARN_ONCE() for missed invocation. Please see individual patches for more details. And comments are always welcome. v3: * patch #2: check and warn the validity of link->type instead of adding a static assertion for bpf_link_type_strs array. v2: http://lore.kernel.org/bpf/d49fa2f4-f743-c763-7579-c3cab4dd88cb@huaweicloud.com ==================== Link: https://lore.kernel.org/r/20241024013558.1135167-1-houtao@huaweicloud.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-10-24bpf: Check validity of link->type in bpf_link_show_fdinfo()Hou Tao
If a newly-added link type doesn't invoke BPF_LINK_TYPE(), accessing bpf_link_type_strs[link->type] may result in an out-of-bounds access. To spot such missed invocations early in the future, checking the validity of link->type in bpf_link_show_fdinfo() and emitting a warning when such invocations are missed. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241024013558.1135167-3-houtao@huaweicloud.com
2024-10-24bpf: Add the missing BPF_LINK_TYPE invocation for sockmapHou Tao
There is an out-of-bounds read in bpf_link_show_fdinfo() for the sockmap link fd. Fix it by adding the missing BPF_LINK_TYPE invocation for sockmap link Also add comments for bpf_link_type to prevent missing updates in the future. Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241024013558.1135167-2-houtao@huaweicloud.com
2024-10-24Merge branch 'net-dsa-mv88e6xxx-fix-mv88e6393x-phc-frequency-on-internal-clock'Paolo Abeni
Shenghao Yang says: ==================== net: dsa: mv88e6xxx: fix MV88E6393X PHC frequency on internal clock The MV88E6393X family of switches can additionally run their cycle counters using a 250MHz internal clock instead of the usual 125MHz external clock [1]. The driver currently assumes all designs utilize that external clock, but MikroTik's RB5009 uses the internal source - causing the PHC to be seen running at 2x real time in userspace, making synchronization with ptp4l impossible. This series adds support for reading off the cycle counter frequency known to the hardware in the TAI_CLOCK_PERIOD register and picking an appropriate set of scaling coefficients instead of using a fixed set for each switch family. Patch 1 groups those cycle counter coefficients into a new structure to make it easier to pass them around. Patch 2 modifies PTP initialization to probe TAI_CLOCK_PERIOD and use an appropriate set of coefficients. Patch 3 adds support for 4000ps cycle counter periods. Changes since v2 [2]: - Patch 1: "net: dsa: mv88e6xxx: group cycle counter coefficients" - Moved declaration of mv88e6xxx_cc_coeffs to avoid moving that in Patch 2. - Patch 2: "net: dsa: mv88e6xxx: read cycle counter period from hardware" - Removed move of mv88e6xxx_cc_coeffs declaration. - Patch 3: "net: dsa: mv88e6xxx: support 4000ps cycle counter periods" - No change. [1] https://lore.kernel.org/netdev/d6622575-bf1b-445a-b08f-2739e3642aae@lunn.ch/ [2] https://lore.kernel.org/netdev/20241006145951.719162-1-me@shenghaoyang.info/ ==================== Link: https://patch.msgid.link/20241020063833.5425-1-me@shenghaoyang.info Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24net: dsa: mv88e6xxx: support 4000ps cycle counter periodShenghao Yang
The MV88E6393X family of devices can run its cycle counter off an internal 250MHz clock instead of an external 125MHz one. Add support for this cycle counter period by adding another set of coefficients and lowering the periodic cycle counter read interval to compensate for faster overflows at the increased frequency. Otherwise, the PHC runs at 2x real time in userspace and cannot be synchronized. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Shenghao Yang <me@shenghaoyang.info> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24net: dsa: mv88e6xxx: read cycle counter period from hardwareShenghao Yang
Instead of relying on a fixed mapping of hardware family to cycle counter frequency, pull this information from the MV88E6XXX_TAI_CLOCK_PERIOD register. This lets us support switches whose cycle counter frequencies depend on board design. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Shenghao Yang <me@shenghaoyang.info> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24net: dsa: mv88e6xxx: group cycle counter coefficientsShenghao Yang
Instead of having them as individual fields in ptp_ops, wrap the coefficients in a separate struct so they can be referenced together. Fixes: de776d0d316f ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Shenghao Yang <me@shenghaoyang.info> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24net: usb: qmi_wwan: add Fibocom FG132 0x0112 compositionReinhard Speyerer
Add Fibocom FG132 0x0112 composition: T: Bus=03 Lev=02 Prnt=06 Port=01 Cnt=02 Dev#= 10 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2cb7 ProdID=0112 Rev= 5.15 S: Manufacturer=Fibocom Wireless Inc. S: Product=Fibocom Module S: SerialNumber=xxxxxxxx C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=81(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=83(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Link: https://patch.msgid.link/ZxLKp5YZDy-OM0-e@arcor.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER eventHaiyang Zhang
The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to a new namespace after the VF registration, the VF won't be moved together. To make the behavior more consistent, add a namespace check for synthetic NIC's NETDEV_REGISTER event (generated during its move), and move the VF if it is not in the same namespace. Cc: stable@vger.kernel.org Fixes: c0a41b887ce6 ("hv_netvsc: move VF to same namespace as netvsc device") Suggested-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1729275922-17595-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876xTim Harvey
The well-known errata regarding EEE not being functional on various KSZ switches has been refactored a few times. Recently the refactoring has excluded several switches that the errata should also apply to. Disable EEE for additional switches with this errata and provide additional comments referring to the public errata document. The original workaround for the errata was applied with a register write to manually disable the EEE feature in MMD 7:60 which was being applied for KSZ9477/KSZ9897/KSZ9567 switch ID's. Then came commit 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver") and commit 6068e6d7ba50 ("net: dsa: microchip: remove KSZ9477 PHY errata handling") which moved the errata from the switch driver to the PHY driver but only for PHY_ID_KSZ9477 (PHY ID) however that PHY code was dead code because an entry was never added for PHY_ID_KSZ9477 via MODULE_DEVICE_TABLE. This was apparently realized much later and commit 54a4e5c16382 ("net: phy: micrel: add Microchip KSZ 9477 to the device table") added the PHY_ID_KSZ9477 to the PHY driver but as the errata was only being applied to PHY_ID_KSZ9477 it's not completely clear what switches that relates to. Later commit 6149db4997f5 ("net: phy: micrel: fix KSZ9477 PHY issues after suspend/resume") breaks this again for all but KSZ9897 by only applying the errata for that PHY ID. Following that this was affected with commit 08c6d8bae48c("net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)") which removes the blatant register write to MMD 7:60 and replaces it by setting phydev->eee_broken_modes = -1 so that the generic phy-c45 code disables EEE but this is only done for the KSZ9477_CHIP_ID (Switch ID). Lastly commit 0411f73c13af ("net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.") adds some additional switches that were missing to the errata due to the previous changes. This commit adds an additional set of switches. Fixes: 0411f73c13af ("net: dsa: microchip: disable EEE for KSZ8567/KSZ9567/KSZ9896/KSZ9897.") Signed-off-by: Tim Harvey <tharvey@gateworks.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20241018160658.781564-1-tharvey@gateworks.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24Merge tag 'for-net-2024-10-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_core: Disable works on hci_unregister_dev - SCO: Fix UAF on sco_sock_timeout - ISO: Fix UAF on iso_sock_timeout * tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: ISO: Fix UAF on iso_sock_timeout Bluetooth: SCO: Fix UAF on sco_sock_timeout Bluetooth: hci_core: Disable works on hci_unregister_dev ==================== Link: https://patch.msgid.link/20241023143005.2297694-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-24Merge tag 'ipsec-2024-10-22' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-10-22 1) Fix routing behavior that relies on L4 information for xfrm encapsulated packets. From Eyal Birger. 2) Remove leftovers of pernet policy_inexact lists. From Florian Westphal. 3) Validate new SA's prefixlen when the selector family is not set from userspace. From Sabrina Dubroca. 4) Fix a kernel-infoleak when dumping an auth algorithm. From Petr Vaganov. Please pull or let me know if there are problems. ipsec-2024-10-22 * tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix one more kernel-infoleak in algo dumping xfrm: validate new SA's prefixlen using SA family when sel.family is unset xfrm: policy: remove last remnants of pernet inexact list xfrm: respect ip protocols rules criteria when performing dst lookups xfrm: extract dst lookup parameters into a struct ==================== Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()Andrii Nakryiko
We need `goto next_insn;` at the end of patching instead of `continue;`. It currently works by accident by making verifier re-process patched instructions. Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Fixes: 314a53623cd4 ("bpf: inline bpf_get_branch_snapshot() helper") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://lore.kernel.org/r/20241023161916.2896274-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-23bpf,perf: Fix perf_event_detach_bpf_prog error handlingJiri Olsa
Peter reported that perf_event_detach_bpf_prog might skip to release the bpf program for -ENOENT error from bpf_prog_array_copy. This can't happen because bpf program is stored in perf event and is detached and released only when perf event is freed. Let's drop the -ENOENT check and make sure the bpf program is released in any case. Fixes: 170a7e3ea070 ("bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found") Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241023200352.3488610-1-jolsa@kernel.org Closes: https://lore.kernel.org/lkml/20241022111638.GC16066@noisy.programming.kicks-ass.net/
2024-10-23Bluetooth: ISO: Fix UAF on iso_sock_timeoutLuiz Augusto von Dentz
conn->sk maybe have been unlinked/freed while waiting for iso_conn_lock so this checks if the conn->sk is still valid by checking if it part of iso_sk_list. Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23Bluetooth: SCO: Fix UAF on sco_sock_timeoutLuiz Augusto von Dentz
conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock so this checks if the conn->sk is still valid by checking if it part of sco_sk_list. Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465 Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23Bluetooth: hci_core: Disable works on hci_unregister_devLuiz Augusto von Dentz
This make use of disable_work_* on hci_unregister_dev since the hci_dev is about to be freed new submissions are not disarable. Fixes: 0d151a103775 ("Bluetooth: hci_core: cancel all works upon hci_unregister_dev()") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-10-23LoongArch: KVM: Mark hrtimer to expire in hard interrupt contextHuacai Chen
Like commit 2c0d278f3293f ("KVM: LAPIC: Mark hrtimer to expire in hard interrupt context") and commit 9090825fa9974 ("KVM: arm/arm64: Let the timer expire in hardirq context on RT"), On PREEMPT_RT enabled kernels unmarked hrtimers are moved into soft interrupt expiry mode by default. Then the timers are canceled from an preempt-notifier which is invoked with disabled preemption which is not allowed on PREEMPT_RT. The timer callback is short so in could be invoked in hard-IRQ context. So let the timer expire on hard-IRQ context even on -RT. This fix a "scheduling while atomic" bug for PREEMPT_RT enabled kernels: BUG: scheduling while atomic: qemu-system-loo/1011/0x00000002 Modules linked in: amdgpu rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ns CPU: 1 UID: 0 PID: 1011 Comm: qemu-system-loo Tainted: G W 6.12.0-rc2+ #1774 Tainted: [W]=WARN Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 Stack : ffffffffffffffff 0000000000000000 9000000004e3ea38 9000000116744000 90000001167475a0 0000000000000000 90000001167475a8 9000000005644830 90000000058dc000 90000000058dbff8 9000000116747420 0000000000000001 0000000000000001 6a613fc938313980 000000000790c000 90000001001c1140 00000000000003fe 0000000000000001 000000000000000d 0000000000000003 0000000000000030 00000000000003f3 000000000790c000 9000000116747830 90000000057ef000 0000000000000000 9000000005644830 0000000000000004 0000000000000000 90000000057f4b58 0000000000000001 9000000116747868 900000000451b600 9000000005644830 9000000003a13998 0000000010000020 00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d ... Call Trace: [<9000000003a13998>] show_stack+0x38/0x180 [<9000000004e3ea34>] dump_stack_lvl+0x84/0xc0 [<9000000003a71708>] __schedule_bug+0x48/0x60 [<9000000004e45734>] __schedule+0x1114/0x1660 [<9000000004e46040>] schedule_rtlock+0x20/0x60 [<9000000004e4e330>] rtlock_slowlock_locked+0x3f0/0x10a0 [<9000000004e4f038>] rt_spin_lock+0x58/0x80 [<9000000003b02d68>] hrtimer_cancel_wait_running+0x68/0xc0 [<9000000003b02e30>] hrtimer_cancel+0x70/0x80 [<ffff80000235eb70>] kvm_restore_timer+0x50/0x1a0 [kvm] [<ffff8000023616c8>] kvm_arch_vcpu_load+0x68/0x2a0 [kvm] [<ffff80000234c2d4>] kvm_sched_in+0x34/0x60 [kvm] [<9000000003a749a0>] finish_task_switch.isra.0+0x140/0x2e0 [<9000000004e44a70>] __schedule+0x450/0x1660 [<9000000004e45cb0>] schedule+0x30/0x180 [<ffff800002354c70>] kvm_vcpu_block+0x70/0x120 [kvm] [<ffff800002354d80>] kvm_vcpu_halt+0x60/0x3e0 [kvm] [<ffff80000235b194>] kvm_handle_gspr+0x3f4/0x4e0 [kvm] [<ffff80000235f548>] kvm_handle_exit+0x1c8/0x260 [kvm] Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-10-23LoongArch: Make KASAN usable for variable cpu_vabitsHuacai Chen
Currently, KASAN on LoongArch assume the CPU VA bits is 48, which is true for Loongson-3 series, but not for Loongson-2 series (only 40 or lower), this patch fix that issue and make KASAN usable for variable cpu_vabits. Solution is very simple: Just define XRANGE_SHADOW_SHIFT which means valid address length from VA_BITS to min(cpu_vabits, VA_BITS). Cc: stable@vger.kernel.org Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-10-23posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()Jinjie Ruan
If get_clock_desc() succeeds, it calls fget() for the clockid's fd, and get the clk->rwsem read lock, so the error path should release the lock to make the lock balance and fput the clockid's fd to make the refcount balance and release the fd related resource. However the below commit left the error path locked behind resulting in unbalanced locking. Check timespec64_valid_strict() before get_clock_desc() to fix it, because the "ts" is not changed after that. Fixes: d8794ac20a29 ("posix-clock: Fix missing timespec64 check in pc_clock_settime()") Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Anna-Maria Behnsen <anna-maria@linutronix.de> [pabeni@redhat.com: fixed commit message typo] Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23r8169: avoid unsolicited interruptsHeiner Kallweit
It was reported that after resume from suspend a PCI error is logged and connectivity is broken. Error message is: PCI error (cmd = 0x0407, status_errs = 0x0000) The message seems to be a red herring as none of the error bits is set, and the PCI command register value also is normal. Exception handling for a PCI error includes a chip reset what apparently brakes connectivity here. The interrupt status bit triggering the PCI error handling isn't actually used on PCIe chip versions, so it's not clear why this bit is set by the chip. Fix this by ignoring this bit on PCIe chip versions. Fixes: 0e4851502f84 ("r8169: merge with version 8.001.00 of Realtek's r8168 driver") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219388 Tested-by: Atlas Yu <atlas.yu@canonical.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/78e2f535-438f-4212-ad94-a77637ac6c9c@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net: sched: use RCU read-side critical section in taprio_dump()Dmitry Antipov
Fix possible use-after-free in 'taprio_dump()' by adding RCU read-side critical section there. Never seen on x86 but found on a KASAN-enabled arm64 system when investigating https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa: [T15862] BUG: KASAN: slab-use-after-free in taprio_dump+0xa0c/0xbb0 [T15862] Read of size 4 at addr ffff0000d4bb88f8 by task repro/15862 [T15862] [T15862] CPU: 0 UID: 0 PID: 15862 Comm: repro Not tainted 6.11.0-rc1-00293-gdefaf1a2113a-dirty #2 [T15862] Hardware name: QEMU QEMU Virtual Machine, BIOS edk2-20240524-5.fc40 05/24/2024 [T15862] Call trace: [T15862] dump_backtrace+0x20c/0x220 [T15862] show_stack+0x2c/0x40 [T15862] dump_stack_lvl+0xf8/0x174 [T15862] print_report+0x170/0x4d8 [T15862] kasan_report+0xb8/0x1d4 [T15862] __asan_report_load4_noabort+0x20/0x2c [T15862] taprio_dump+0xa0c/0xbb0 [T15862] tc_fill_qdisc+0x540/0x1020 [T15862] qdisc_notify.isra.0+0x330/0x3a0 [T15862] tc_modify_qdisc+0x7b8/0x1838 [T15862] rtnetlink_rcv_msg+0x3c8/0xc20 [T15862] netlink_rcv_skb+0x1f8/0x3d4 [T15862] rtnetlink_rcv+0x28/0x40 [T15862] netlink_unicast+0x51c/0x790 [T15862] netlink_sendmsg+0x79c/0xc20 [T15862] __sock_sendmsg+0xe0/0x1a0 [T15862] ____sys_sendmsg+0x6c0/0x840 [T15862] ___sys_sendmsg+0x1ac/0x1f0 [T15862] __sys_sendmsg+0x110/0x1d0 [T15862] __arm64_sys_sendmsg+0x74/0xb0 [T15862] invoke_syscall+0x88/0x2e0 [T15862] el0_svc_common.constprop.0+0xe4/0x2a0 [T15862] do_el0_svc+0x44/0x60 [T15862] el0_svc+0x50/0x184 [T15862] el0t_64_sync_handler+0x120/0x12c [T15862] el0t_64_sync+0x190/0x194 [T15862] [T15862] Allocated by task 15857: [T15862] kasan_save_stack+0x3c/0x70 [T15862] kasan_save_track+0x20/0x3c [T15862] kasan_save_alloc_info+0x40/0x60 [T15862] __kasan_kmalloc+0xd4/0xe0 [T15862] __kmalloc_cache_noprof+0x194/0x334 [T15862] taprio_change+0x45c/0x2fe0 [T15862] tc_modify_qdisc+0x6a8/0x1838 [T15862] rtnetlink_rcv_msg+0x3c8/0xc20 [T15862] netlink_rcv_skb+0x1f8/0x3d4 [T15862] rtnetlink_rcv+0x28/0x40 [T15862] netlink_unicast+0x51c/0x790 [T15862] netlink_sendmsg+0x79c/0xc20 [T15862] __sock_sendmsg+0xe0/0x1a0 [T15862] ____sys_sendmsg+0x6c0/0x840 [T15862] ___sys_sendmsg+0x1ac/0x1f0 [T15862] __sys_sendmsg+0x110/0x1d0 [T15862] __arm64_sys_sendmsg+0x74/0xb0 [T15862] invoke_syscall+0x88/0x2e0 [T15862] el0_svc_common.constprop.0+0xe4/0x2a0 [T15862] do_el0_svc+0x44/0x60 [T15862] el0_svc+0x50/0x184 [T15862] el0t_64_sync_handler+0x120/0x12c [T15862] el0t_64_sync+0x190/0x194 [T15862] [T15862] Freed by task 6192: [T15862] kasan_save_stack+0x3c/0x70 [T15862] kasan_save_track+0x20/0x3c [T15862] kasan_save_free_info+0x4c/0x80 [T15862] poison_slab_object+0x110/0x160 [T15862] __kasan_slab_free+0x3c/0x74 [T15862] kfree+0x134/0x3c0 [T15862] taprio_free_sched_cb+0x18c/0x220 [T15862] rcu_core+0x920/0x1b7c [T15862] rcu_core_si+0x10/0x1c [T15862] handle_softirqs+0x2e8/0xd64 [T15862] __do_softirq+0x14/0x20 Fixes: 18cdd2f0998a ("net/sched: taprio: taprio_dump and taprio_change are protected by rtnl_mutex") Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20241018051339.418890-2-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net: sched: fix use-after-free in taprio_change()Dmitry Antipov
In 'taprio_change()', 'admin' pointer may become dangling due to sched switch / removal caused by 'advance_sched()', and critical section protected by 'q->current_entry_lock' is too small to prevent from such a scenario (which causes use-after-free detected by KASAN). Fix this by prefer 'rcu_replace_pointer()' over 'rcu_assign_pointer()' to update 'admin' immediately before an attempt to schedule freeing. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: syzbot+b65e0af58423fc8a73aa@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b65e0af58423fc8a73aa Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Link: https://patch.msgid.link/20241018051339.418890-1-dmantipov@yandex.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions ↵Vladimir Oltean
created by classifiers tcf_action_init() has logic for checking mismatches between action and filter offload flags (skip_sw/skip_hw). AFAIU, this is intended to run on the transition between the new tc_act_bind(flags) returning true (aka now gets bound to classifier) and tc_act_bind(act->tcfa_flags) returning false (aka action was not bound to classifier before). Otherwise, the check is skipped. For the case where an action is not standalone, but rather it was created by a classifier and is bound to it, tcf_action_init() skips the check entirely, and this means it allows mismatched flags to occur. Taking the matchall classifier code path as an example (with mirred as an action), the reason is the following: 1 | mall_change() 2 | -> mall_replace_hw_filter() 3 | -> tcf_exts_validate_ex() 4 | -> flags |= TCA_ACT_FLAGS_BIND; 5 | -> tcf_action_init() 6 | -> tcf_action_init_1() 7 | -> a_o->init() 8 | -> tcf_mirred_init() 9 | -> tcf_idr_create_from_flags() 10 | -> tcf_idr_create() 11 | -> p->tcfa_flags = flags; 12 | -> tc_act_bind(flags)) 13 | -> tc_act_bind(act->tcfa_flags) When invoked from tcf_exts_validate_ex() like matchall does (but other classifiers validate their extensions as well), tcf_action_init() runs in a call path where "flags" always contains TCA_ACT_FLAGS_BIND (set by line 4). So line 12 is always true, and line 13 is always true as well. No transition ever takes place, and the check is skipped. The code was added in this form in commit c86e0209dc77 ("flow_offload: validate flags of filter and actions"), but I'm attributing the blame even earlier in that series, to when TCA_ACT_FLAGS_SKIP_HW and TCA_ACT_FLAGS_SKIP_SW were added to the UAPI. Following the development process of this change, the check did not always exist in this form. A change took place between v3 [1] and v4 [2], AFAIU due to review feedback that it doesn't make sense for action flags to be different than classifier flags. I think I agree with that feedback, but it was translated into code that omits enforcing this for "classic" actions created at the same time with the filters themselves. There are 3 more important cases to discuss. First there is this command: $ tc qdisc add dev eth0 clasct $ tc filter add dev eth0 ingress matchall skip_sw \ action mirred ingress mirror dev eth1 which should be allowed, because prior to the concept of dedicated action flags, it used to work and it used to mean the action inherited the skip_sw/skip_hw flags from the classifier. It's not a mismatch. Then we have this command: $ tc qdisc add dev eth0 clasct $ tc filter add dev eth0 ingress matchall skip_sw \ action mirred ingress mirror dev eth1 skip_hw where there is a mismatch and it should be rejected. Finally, we have: $ tc qdisc add dev eth0 clasct $ tc filter add dev eth0 ingress matchall skip_sw \ action mirred ingress mirror dev eth1 skip_sw where the offload flags coincide, and this should be treated the same as the first command based on inheritance, and accepted. [1]: https://lore.kernel.org/netdev/20211028110646.13791-9-simon.horman@corigine.com/ [2]: https://lore.kernel.org/netdev/20211118130805.23897-10-simon.horman@corigine.com/ Fixes: 7adc57651211 ("flow_offload: add skip_hw and skip_sw to control if offload the action") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241017161049.3570037-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23tracing: Consider the NULL character when validating the event lengthLeo Yan
strlen() returns a string length excluding the null byte. If the string length equals to the maximum buffer length, the buffer will have no space for the NULL terminating character. This commit checks this condition and returns failure for it. Link: https://lore.kernel.org/all/20241007144724.920954-1-leo.yan@arm.com/ Fixes: dec65d79fd26 ("tracing/probe: Check event name length correctly") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-10-23tracing/probes: Fix MAX_TRACE_ARGS limit handlingMikel Rychliski
When creating a trace_probe we would set nr_args prior to truncating the arguments to MAX_TRACE_ARGS. However, we would only initialize arguments up to the limit. This caused invalid memory access when attempting to set up probes with more than 128 fetchargs. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 1769 Comm: cat Not tainted 6.11.0-rc7+ #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 RIP: 0010:__set_print_fmt+0x134/0x330 Resolve the issue by applying the MAX_TRACE_ARGS limit earlier. Return an error when there are too many arguments instead of silently truncating. Link: https://lore.kernel.org/all/20240930202656.292869-1-mikel@mikelr.com/ Fixes: 035ba76014c0 ("tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init") Signed-off-by: Mikel Rychliski <mikel@mikelr.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-10-22selftests/bpf: Add test for passing in uninit mtu_lenDaniel Borkmann
Add a small test to pass an uninitialized mtu_len to the bpf_check_mtu() helper to probe whether the verifier rejects it under !CAP_PERFMON. # ./vmtest.sh -- ./test_progs -t verifier_mtu [...] ./test_progs -t verifier_mtu [ 1.414712] tsc: Refined TSC clocksource calibration: 3407.993 MHz [ 1.415327] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcd52370, max_idle_ns: 440795242006 ns [ 1.416463] clocksource: Switched to clocksource tsc [ 1.429842] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430283] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #510/1 verifier_mtu/uninit/mtu: write rejected:OK #510 verifier_mtu:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-5-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22selftests/bpf: Add test for writes to .rodataDaniel Borkmann
Add a small test to write a (verification-time) fixed vs unknown but bounded-sized buffer into .rodata BPF map and assert that both get rejected. # ./vmtest.sh -- ./test_progs -t verifier_const [...] ./test_progs -t verifier_const [ 1.418717] tsc: Refined TSC clocksource calibration: 3407.994 MHz [ 1.419113] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcde90a1, max_idle_ns: 440795222066 ns [ 1.419972] clocksource: Switched to clocksource tsc [ 1.449596] bpf_testmod: loading out-of-tree module taints kernel. [ 1.449958] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #475/1 verifier_const/rodata/strtol: write rejected:OK #475/2 verifier_const/bss/strtol: write accepted:OK #475/3 verifier_const/data/strtol: write accepted:OK #475/4 verifier_const/rodata/mtu: write rejected:OK #475/5 verifier_const/bss/mtu: write accepted:OK #475/6 verifier_const/data/mtu: write accepted:OK #475/7 verifier_const/rodata/mark: write with unknown reg rejected:OK #475/8 verifier_const/rodata/mark: write with unknown reg rejected:OK #475 verifier_const:OK #476/1 verifier_const_or/constant register |= constant should keep constant type:OK #476/2 verifier_const_or/constant register |= constant should not bypass stack boundary checks:OK #476/3 verifier_const_or/constant register |= constant register should keep constant type:OK #476/4 verifier_const_or/constant register |= constant register should not bypass stack boundary checks:OK #476 verifier_const_or:OK Summary: 2/12 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22bpf: Remove MEM_UNINIT from skb/xdp MTU helpersDaniel Borkmann
We can now undo parts of 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error") as discussed in [0]. Given the BPF helpers now have MEM_WRITE tag, the MEM_UNINIT can be cleared. The mtu_len is an input as well as output argument, meaning, the BPF program has to set it to something. It cannot be uninitialized. Therefore, allowing uninitialized memory and zeroing it on error would be odd. It was done as an interim step in 4b3786a6c539 as the desired behavior could not have been expressed before the introduction of MEM_WRITE tag. Fixes: 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/a86eb76d-f52f-dee4-e5d2-87e45de3e16f@iogearbox.net [0] Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22bpf: Fix overloading of MEM_UNINIT's meaningDaniel Borkmann
Lonial reported an issue in the BPF verifier where check_mem_size_reg() has the following code: if (!tnum_is_const(reg->var_off)) /* For unprivileged variable accesses, disable raw * mode so that the program is required to * initialize all the memory that the helper could * just partially fill up. */ meta = NULL; This means that writes are not checked when the register containing the size of the passed buffer has not a fixed size. Through this bug, a BPF program can write to a map which is marked as read-only, for example, .rodata global maps. The problem is that MEM_UNINIT's initial meaning that "the passed buffer to the BPF helper does not need to be initialized" which was added back in commit 435faee1aae9 ("bpf, verifier: add ARG_PTR_TO_RAW_STACK type") got overloaded over time with "the passed buffer is being written to". The problem however is that checks such as the above which were added later via 06c1c049721a ("bpf: allow helpers access to variable memory") set meta to NULL in order force the user to always initialize the passed buffer to the helper. Due to the current double meaning of MEM_UNINIT, this bypasses verifier write checks to the memory (not boundary checks though) and only assumes the latter memory is read instead. Fix this by reverting MEM_UNINIT back to its original meaning, and having MEM_WRITE as an annotation to BPF helpers in order to then trigger the BPF verifier checks for writing to memory. Some notes: check_arg_pair_ok() ensures that for ARG_CONST_SIZE{,_OR_ZERO} we can access fn->arg_type[arg - 1] since it must contain a preceding ARG_PTR_TO_MEM. For check_mem_reg() the meta argument can be removed altogether since we do check both BPF_READ and BPF_WRITE. Same for the equivalent check_kfunc_mem_size_reg(). Fixes: 7b3552d3f9f6 ("bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access") Fixes: 97e6d7dab1ca ("bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access") Fixes: 15baa55ff5b0 ("bpf/verifier: allow all functions to read user provided context") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22bpf: Add MEM_WRITE attributeDaniel Borkmann
Add a MEM_WRITE attribute for BPF helper functions which can be used in bpf_func_proto to annotate an argument type in order to let the verifier know that the helper writes into the memory passed as an argument. In the past MEM_UNINIT has been (ab)used for this function, but the latter merely tells the verifier that the passed memory can be uninitialized. There have been bugs with overloading the latter but aside from that there are also cases where the passed memory is read + written which currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former ARG_PTR_TO_{LONG,INT} args in case of error"). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22bpf: Preserve param->string when parsing mount optionsHou Tao
In bpf_parse_param(), keep the value of param->string intact so it can be freed later. Otherwise, the kmalloc area pointed to by param->string will be leaked as shown below: unreferenced object 0xffff888118c46d20 (size 8): comm "new_name", pid 12109, jiffies 4295580214 hex dump (first 8 bytes): 61 6e 79 00 38 c9 5c 7e any.8.\~ backtrace (crc e1b7f876): [<00000000c6848ac7>] kmemleak_alloc+0x4b/0x80 [<00000000de9f7d00>] __kmalloc_node_track_caller_noprof+0x36e/0x4a0 [<000000003e29b886>] memdup_user+0x32/0xa0 [<0000000007248326>] strndup_user+0x46/0x60 [<0000000035b3dd29>] __x64_sys_fsconfig+0x368/0x3d0 [<0000000018657927>] x64_sys_call+0xff/0x9f0 [<00000000c0cabc95>] do_syscall_64+0x3b/0xc0 [<000000002f331597>] entry_SYSCALL_64_after_hwframe+0x4b/0x53 Fixes: 6c1752e0b6ca ("bpf: Support symbolic BPF FS delegation mount options") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20241022130133.3798232-1-houtao@huaweicloud.com
2024-10-22jfs: Fix sanity check in dbMountDave Kleikamp
MAXAG is a legitimate value for bmp->db_numag Fixes: e63866a47556 ("jfs: fix out-of-bounds in dbNextAG() and diAlloc()") Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-22btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item()Yue Haibing
The ret may be zero in btrfs_search_dir_index_item() and should not passed to ERR_PTR(). Now btrfs_unlink_subvol() is the only caller to this, reconstructed it to check ERR_PTR(-ENOENT) while ret >= 0. This fixes smatch warnings: fs/btrfs/dir-item.c:353 btrfs_search_dir_index_item() warn: passing zero to 'ERR_PTR' Fixes: 9dcbe16fccbb ("btrfs: use btrfs_for_each_slot in btrfs_search_dir_index_item") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-22btrfs: reject ro->rw reconfiguration if there are hard ro requirementsQu Wenruo
[BUG] Syzbot reports the following crash: BTRFS info (device loop0 state MCS): disabling free space tree BTRFS info (device loop0 state MCS): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1) BTRFS info (device loop0 state MCS): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2) Oops: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:backup_super_roots fs/btrfs/disk-io.c:1691 [inline] RIP: 0010:write_all_supers+0x97a/0x40f0 fs/btrfs/disk-io.c:4041 Call Trace: <TASK> btrfs_commit_transaction+0x1eae/0x3740 fs/btrfs/transaction.c:2530 btrfs_delete_free_space_tree+0x383/0x730 fs/btrfs/free-space-tree.c:1312 btrfs_start_pre_rw_mount+0xf28/0x1300 fs/btrfs/disk-io.c:3012 btrfs_remount_rw fs/btrfs/super.c:1309 [inline] btrfs_reconfigure+0xae6/0x2d40 fs/btrfs/super.c:1534 btrfs_reconfigure_for_mount fs/btrfs/super.c:2020 [inline] btrfs_get_tree_subvol fs/btrfs/super.c:2079 [inline] btrfs_get_tree+0x918/0x1920 fs/btrfs/super.c:2115 vfs_get_tree+0x90/0x2b0 fs/super.c:1800 do_new_mount+0x2be/0xb40 fs/namespace.c:3472 do_mount fs/namespace.c:3812 [inline] __do_sys_mount fs/namespace.c:4020 [inline] __se_sys_mount+0x2d6/0x3c0 fs/namespace.c:3997 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [CAUSE] To support mounting different subvolume with different RO/RW flags for the new mount APIs, btrfs introduced two workaround to support this feature: - Skip mount option/feature checks if we are mounting a different subvolume - Reconfigure the fs to RW if the initial mount is RO Combining these two, we can have the following sequence: - Mount the fs ro,rescue=all,clear_cache,space_cache=v1 rescue=all will mark the fs as hard read-only, so no v2 cache clearing will happen. - Mount a subvolume rw of the same fs. We go into btrfs_get_tree_subvol(), but fc_mount() returns EBUSY because our new fc is RW, different from the original fs. Now we enter btrfs_reconfigure_for_mount(), which switches the RO flag first so that we can grab the existing fs_info. Then we reconfigure the fs to RW. - During reconfiguration, option/features check is skipped This means we will restart the v2 cache clearing, and convert back to v1 cache. This will trigger fs writes, and since the original fs has "rescue=all" option, it skips the csum tree read. And eventually causing NULL pointer dereference in super block writeback. [FIX] For reconfiguration caused by different subvolume RO/RW flags, ensure we always run btrfs_check_options() to ensure we have proper hard RO requirements met. In fact the function btrfs_check_options() doesn't really do many complex checks, but hard RO requirement and some feature dependency checks, thus there is no special reason not to do the check for mount reconfiguration. Reported-by: syzbot+56360f93efa90ff15870@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/0000000000008c5d090621cb2770@google.com/ Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes") CC: stable@vger.kernel.org # 6.8+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-22btrfs: fix read corruption due to race with extent map mergingBoris Burkov
In debugging some corrupt squashfs files, we observed symptoms of corrupt page cache pages but correct on-disk contents. Further investigation revealed that the exact symptom was a correct page followed by an incorrect, duplicate, page. This got us thinking about extent maps. commit ac05ca913e9f ("Btrfs: fix race between using extent maps and merging them") enforces a reference count on the primary `em` extent_map being merged, as that one gets modified. However, since, commit 3d2ac9922465 ("btrfs: introduce new members for extent_map") both 'em' and 'merge' get modified, which started modifying 'merge' and thus introduced the same race. We were able to reproduce this by looping the affected squashfs workload in parallel on a bunch of separate btrfs-es while also dropping caches. We are still working on a simple enough reproducer to make into an fstest. The simplest fix is to stop modifying 'merge', which is not essential, as it is dropped immediately after the merge. This behavior is simply a consequence of the order of the two extent maps being important in computing the new values. Modify merge_ondisk_extents to take prev and next by const* and also take a third merged parameter that it puts the results in. Note that this introduces the rather odd behavior of passing 'em' to merge_ondisk_extents as a const * and as a regular ptr. Fixes: 3d2ac9922465 ("btrfs: introduce new members for extent_map") CC: stable@vger.kernel.org # 6.11+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-22btrfs: fix the delalloc range locking if sector size < page sizeQu Wenruo
Inside lock_delalloc_folios(), there are several problems related to sector size < page size handling: - Set the writer locks without checking if the folio is still valid We call btrfs_folio_start_writer_lock() just like it's folio_lock(). But since the folio may not even be the folio of the current mapping, we can easily screw up the folio->private. - The range is not clamped inside the page This means we can over write other bitmaps if the start/len is not properly handled, and trigger the btrfs_subpage_assert(). - @processed_end is always rounded up to page end If the delalloc range is not page aligned, and we need to retry (returning -EAGAIN), then we will unlock to the page end. Thankfully this is not a huge problem, as now btrfs_folio_end_writer_lock() can handle range larger than the locked range, and only unlock what is already locked. Fix all these problems by: - Lock and check the folio first, then call btrfs_folio_set_writer_lock() So that if we got a folio not belonging to the inode, we won't touch folio->private. - Properly truncate the range inside the page - Update @processed_end to the locked range end Fixes: 1e1de38792e0 ("btrfs: make process_one_page() to handle subpage locking") CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-22btrfs: qgroup: set a more sane default value for subtree drop thresholdQu Wenruo
Since commit 011b46c30476 ("btrfs: skip subtree scan if it's too high to avoid low stall in btrfs_commit_transaction()"), btrfs qgroup can automatically skip large subtree scan at the cost of marking qgroup inconsistent. It's designed to address the final performance problem of snapshot drop with qgroup enabled, but to be safe the default value is BTRFS_MAX_LEVEL, requiring a user space daemon to set a different value to make it work. I'd say it's not a good idea to rely on user space tool to set this default value, especially when some operations (snapshot dropping) can be triggered immediately after mount, leaving a very small window to that that sysfs interface. So instead of disabling this new feature by default, enable it with a low threshold (3), so that large subvolume tree drop at mount time won't cause huge qgroup workload. CC: stable@vger.kernel.org # 6.1 Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-22btrfs: clear force-compress on remount when compress mount option is givenFilipe Manana
After the migration to use fs context for processing mount options we had a slight change in the semantics for remounting a filesystem that was mounted with compress-force. Before we could clear compress-force by passing only "-o compress[=algo]" during a remount, but after that change that does not work anymore, force-compress is still present and one needs to pass "-o compress-force=no,compress[=algo]" to the mount command. Example, when running on a kernel 6.8+: $ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi $ mount | grep sdi /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/) $ mount -o remount,compress=zlib:5 /mnt/sdi $ mount | grep sdi /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/) On a 6.7 kernel (or older): $ mount -o compress-force=zlib:9 /dev/sdi /mnt/sdi $ mount | grep sdi /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress-force=zlib:9,discard=async,space_cache=v2,subvolid=5,subvol=/) $ mount -o remount,compress=zlib:5 /mnt/sdi $ mount | grep sdi /dev/sdi on /mnt/sdi type btrfs (rw,relatime,compress=zlib:5,discard=async,space_cache=v2,subvolid=5,subvol=/) So update btrfs_parse_param() to clear "compress-force" when "compress" is given, providing the same semantics as kernel 6.7 and older. Reported-by: Roman Mamedov <rm@romanrm.net> Link: https://lore.kernel.org/linux-btrfs/20241014182416.13d0f8b0@nvm/ CC: stable@vger.kernel.org # 6.8+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-22net: usb: usbnet: fix name regressionOliver Neukum
The fix for MAC addresses broke detection of the naming convention because it gave network devices no random MAC before bind() was called. This means that the check for the local assignment bit was always negative as the address was zeroed from allocation, instead of from overwriting the MAC with a unique hardware address. The correct check for whether bind() has altered the MAC is done with is_zero_ether_addr Signed-off-by: Oliver Neukum <oneukum@suse.com> Reported-by: Greg Thelen <gthelen@google.com> Diagnosed-by: John Sperbeck <jsperbeck@google.com> Fixes: bab8eb0dd4cb9 ("usbnet: modern method to get random MAC") Link: https://patch.msgid.link/20241017071849.389636-1-oneukum@suse.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>