linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-01-30	Merge tag 'nf-25-01-30' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains one Netfilter fix: 1) Reject mismatching sum of field_len with set key length which allows to create a set without inconsistent pipapo rule width and set key length. * tag 'nf-25-01-30' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: reject mismatching sum of field_len with set key length ==================== Link: https://patch.msgid.link/20250130113307.2327470-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30	MAINTAINERS: add Neal to TCP maintainers	Jakub Kicinski
	Neal Cardwell has been indispensable in TCP reviews and investigations, especially protocol-related. Neal is also the author of packetdrill. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250129191332.2526140-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30	net: revert RTNL changes in unregister_netdevice_many_notify()	Eric Dumazet
	This patch reverts following changes: 83419b61d187 net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2) ae646f1a0bb9 net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1) cfa579f66656 net: no longer hold RTNL while calling flush_all_backlogs() This caused issues in layers holding a private mutex: cleanup_net() rtnl_lock(); mutex_lock(subsystem_mutex); unregister_netdevice(); rtnl_unlock(); // LOCKDEP violation rtnl_lock(); I will revisit this in next cycle, opt-in for the new behavior from safe contexts only. Fixes: cfa579f66656 ("net: no longer hold RTNL while calling flush_all_backlogs()") Fixes: ae646f1a0bb9 ("net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 1)") Fixes: 83419b61d187 ("net: reduce RTNL hold duration in unregister_netdevice_many_notify() (part 2)") Reported-by: syzbot+5b9196ecf74447172a9a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6789d55f.050a0220.20d369.004e.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250129142726.747726-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30	net: hsr: fix fill_frame_info() regression vs VLAN packets	Eric Dumazet
	Stephan Wurm reported that my recent patch broke VLAN support. Apparently skb->mac_len is not correct for VLAN traffic as shown by debug traces [1]. Use instead pskb_may_pull() to make sure the expected header is present in skb->head. Many thanks to Stephan for his help. [1] kernel: skb len=170 headroom=2 headlen=170 tailroom=20 mac=(2,14) mac_len=14 net=(16,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 start=0 offset=0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 priority=0x0 mark=0x0 alloc_cpu=0 vlan_all=0x0 encapsulation=0 inner(proto=0x0000, mac=0, net=0, trans=0) kernel: dev name=prp0 feat=0x0000000000007000 kernel: sk family=17 type=3 proto=0 kernel: skb headroom: 00000000: 74 00 kernel: skb linear: 00000000: 01 0c cd 01 00 01 00 d0 93 53 9c cb 81 00 80 00 kernel: skb linear: 00000010: 88 b8 00 01 00 98 00 00 00 00 61 81 8d 80 16 52 kernel: skb linear: 00000020: 45 47 44 4e 43 54 52 4c 2f 4c 4c 4e 30 24 47 4f kernel: skb linear: 00000030: 24 47 6f 43 62 81 01 14 82 16 52 45 47 44 4e 43 kernel: skb linear: 00000040: 54 52 4c 2f 4c 4c 4e 30 24 44 73 47 6f 6f 73 65 kernel: skb linear: 00000050: 83 07 47 6f 49 64 65 6e 74 84 08 67 8d f5 93 7e kernel: skb linear: 00000060: 76 c8 00 85 01 01 86 01 00 87 01 00 88 01 01 89 kernel: skb linear: 00000070: 01 00 8a 01 02 ab 33 a2 15 83 01 00 84 03 03 00 kernel: skb linear: 00000080: 00 91 08 67 8d f5 92 77 4b c6 1f 83 01 00 a2 1a kernel: skb linear: 00000090: a2 06 85 01 00 83 01 00 84 03 03 00 00 91 08 67 kernel: skb linear: 000000a0: 8d f5 92 77 4b c6 1f 83 01 00 kernel: skb tailroom: 00000000: 80 18 02 00 fe 4e 00 00 01 01 08 0a 4f fd 5e d1 kernel: skb tailroom: 00000010: 4f fd 5e cd Fixes: b9653d19e556 ("net: hsr: avoid potential out-of-bound access in fill_frame_info()") Reported-by: Stephan Wurm <stephan.wurm@a-eberle.de> Tested-by: Stephan Wurm <stephan.wurm@a-eberle.de> Closes: https://lore.kernel.org/netdev/Z4o_UC0HweBHJ_cw@PC-LX-SteWu/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250129130007.644084-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-30	kbuild: Use -fzero-init-padding-bits=all	Kees Cook
	GCC 15 introduces a regression in "= { 0 }" style initialization of unions that Linux has depended on for eliminating uninitialized variable contents. GCC does not seem likely to fix it[1], instead suggesting[2] that affected projects start using -fzero-init-padding-bits=unions. To avoid future surprises beyond just the current situation with unions, enable -fzero-init-padding-bits=all when available (GCC 15+). This will correctly zero padding bits in unions and structs that might have been left uninitialized, and will make sure there is no immediate regression in union initializations. As seen in the stackinit KUnit selftest union cases, which were passing before, were failing under GCC 15: not ok 18 test_small_start_old_zero ok 29 test_small_start_dynamic_partial # SKIP XFAIL uninit bytes: 63 ok 32 test_small_start_assigned_dynamic_partial # SKIP XFAIL uninit bytes: 63 ok 67 test_small_start_static_partial # SKIP XFAIL uninit bytes: 63 ok 70 test_small_start_static_all # SKIP XFAIL uninit bytes: 56 ok 73 test_small_start_dynamic_all # SKIP XFAIL uninit bytes: 56 ok 82 test_small_start_assigned_static_partial # SKIP XFAIL uninit bytes: 63 ok 85 test_small_start_assigned_static_all # SKIP XFAIL uninit bytes: 56 ok 88 test_small_start_assigned_dynamic_all # SKIP XFAIL uninit bytes: 56 The above all now pass again with -fzero-init-padding-bits=all added. This also fixes the following cases for struct initialization that had been XFAIL until now because there was no compiler support beyond the larger "-ftrivial-auto-var-init=zero" option: ok 38 test_small_hole_static_all # SKIP XFAIL uninit bytes: 3 ok 39 test_big_hole_static_all # SKIP XFAIL uninit bytes: 124 ok 40 test_trailing_hole_static_all # SKIP XFAIL uninit bytes: 7 ok 42 test_small_hole_dynamic_all # SKIP XFAIL uninit bytes: 3 ok 43 test_big_hole_dynamic_all # SKIP XFAIL uninit bytes: 124 ok 44 test_trailing_hole_dynamic_all # SKIP XFAIL uninit bytes: 7 ok 58 test_small_hole_assigned_static_all # SKIP XFAIL uninit bytes: 3 ok 59 test_big_hole_assigned_static_all # SKIP XFAIL uninit bytes: 124 ok 60 test_trailing_hole_assigned_static_all # SKIP XFAIL uninit bytes: 7 ok 62 test_small_hole_assigned_dynamic_all # SKIP XFAIL uninit bytes: 3 ok 63 test_big_hole_assigned_dynamic_all # SKIP XFAIL uninit bytes: 124 ok 64 test_trailing_hole_assigned_dynamic_all # SKIP XFAIL uninit bytes: 7 All of the above now pass when built under GCC 15. Tests can be seen with: ./tools/testing/kunit/kunit.py run stackinit --arch=x86_64 \ --make_option CC=gcc-15 Clang continues to fully initialize these kinds of variables[3] without additional flags. Suggested-by: Jakub Jelinek <jakub@redhat.com> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118403 [1] Link: https://lore.kernel.org/linux-toolchains/Z0hRrrNU3Q+ro2T7@tucnak/ [2] Link: https://github.com/llvm/llvm-project/commit/7a086e1b2dc05f54afae3591614feede727601fa [3] Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20250127191031.245214-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-01-30	stackinit: Add union initialization to selftests	Kees Cook
	The stack initialization selftests were checking scalars, strings, and structs, but not unions. Add union tests (which are mostly identical setup to structs). This catches the recent union initialization behavioral changes seen in GCC 15. Before GCC 15, this new test passes: ok 18 test_small_start_old_zero With GCC 15, it fails: not ok 18 test_small_start_old_zero Specifically, a union with a larger member where a smaller member is initialized with the older "= { 0 }" syntax: union test_small_start { char one:1; char two; short three; unsigned long four; struct big_struct { unsigned long array[8]; } big; }; This is a regression in compiler behavior that Linux has depended on. GCC does not seem likely to fix it, instead suggesting that affected projects start using -fzero-init-padding-bits=unions: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118403 Link: https://lore.kernel.org/r/20250127191031.245214-2-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-01-30	stackinit: Add old-style zero-init syntax to struct tests	Kees Cook
	The deprecated way to do a full zero init of a structure is with "= { 0 }", but we weren't testing this style. Add it. Link: https://lore.kernel.org/r/20250127191031.245214-1-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-01-30	Merge tag 'ntfs3_for_6.14' of ↵	Linus Torvalds
	https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 fixes from Konstantin Komarov: - unify inode corruption marking and mark them as bad immediately upon detection of an error in attribute enumeration - folio cleanup * tag 'ntfs3_for_6.14' of https://github.com/Paragon-Software-Group/linux-ntfs3: fs/ntfs3: Unify inode corruption marking with _ntfs_bad_inode() fs/ntfs3: Mark inode as bad as soon as error detected in mi_enum_attr() ntfs3: Remove an access to page->index
2025-01-30	Merge tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs	Linus Torvalds
	Pull bcachefs fixes from Kent Overstreet: - second half of a fix for a bug that'd been causing oopses on filesystems using snapshots with memory pressure (key cache fills for snaphots btrees are tricky) - build fix for strange compiler configurations that double stack frame size - "journal stuck timeout" now takes into account device latency: this fixes some spurious warnings, and the main remaining source of SRCU lock hold time warnings (I'm no longer seeing this in my CI, so any users still seeing this should definitely ping me) - fix for slow/hanging unmounts (" Improve journal pin flushing") - some more tracepoint fixes/improvements, to chase down the "rebalance isn't making progress" issues * tag 'bcachefs-2025-01-29' of git://evilpiepirate.org/bcachefs: bcachefs: Improve trace_move_extent_finish bcachefs: Fix trace_copygc bcachefs: Journal writes are now IOPRIO_CLASS_RT bcachefs: Improve journal pin flushing bcachefs: fix bch2_btree_node_flags bcachefs: rebalance, copygc enabled are runtime opts bcachefs: Improve decompression error messages bcachefs: bset_blacklisted_journal_seq is now AUTOFIX bcachefs: "Journal stuck" timeout now takes into account device latency bcachefs: Reduce stack frame size of __bch2_str_hash_check_key() bcachefs: Fix btree_trans_peek_key_cache()
2025-01-30	io_uring/net: don't retry connect operation on EPOLLERR	Jens Axboe
	If a socket is shutdown before the connection completes, POLLERR is set in the poll mask. However, connect ignores this as it doesn't know, and attempts the connection again. This may lead to a bogus -ETIMEDOUT result, where it should have noticed the POLLERR and just returned -ECONNRESET instead. Have the poll logic check for whether or not POLLERR is set in the mask, and if so, mark the request as failed. Then connect can appropriately fail the request rather than retry it. Reported-by: Sergey Galas <ssgalas@cloud.ru> Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/discussions/1335 Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-30	Merge branch 'mptcp-blackhole-only-if-1st-syn-retrans-w-o-mpc-is-accepted'	Paolo Abeni
	Matthieu Baerts says: ==================== mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted Here are two small fixes for issues introduced in v6.12. - Patch 1: reset the mpc_drop mark for other SYN retransmits, to only consider an MPTCP blackhole when the first SYN retransmitted without the MPTCP options is accepted, as initially intended. - Patch 2: also mention in the doc that the blackhole_timeout sysctl knob is per-netns, like all the others. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> ==================== Link: https://patch.msgid.link/20250129-net-mptcp-blackhole-fix-v1-0-afe88e5a6d2c@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	doc: mptcp: sysctl: blackhole_timeout is per-netns	Matthieu Baerts (NGI0)
	All other sysctl entries mention it, and it is a per-namespace sysctl. So mention it as well. Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted	Matthieu Baerts (NGI0)
	The Fixes commit mentioned this: > An MPTCP firewall blackhole can be detected if the following SYN > retransmission after a fallback to "plain" TCP is accepted. But in fact, this blackhole was detected if any following SYN retransmissions after a fallback to TCP was accepted. That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp' to 0, and 'mpc_drop' will never be reset to 0 after. This is an issue, because some not so unusual situations might cause the kernel to detect a false-positive blackhole, e.g. a client trying to connect to a server while the network is not ready yet, causing a few SYN retransmissions, before reaching the end server. Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	ALSA: hda/realtek: Workaround for resume on Dell Venue 11 Pro 7130	Takashi Iwai
	It was reported that the headphone output on Dell Venue 11 Pro 7130 becomes mono after PM resume. The cause seems to be the BIOS setting up the codec COEF 0x0d bit 0x40 wrongly by some reason, and restoring the original value 0x2800 fixes the problem. This patch adds the quirk entry to perform the COEF restore. Cc: <stable@vger.kernel.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219697 Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1235686 Link: https://patch.msgid.link/20250130123301.8996-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-01-30	netfilter: nf_tables: reject mismatching sum of field_len with set key length	Pablo Neira Ayuso
	The field length description provides the length of each separated key field in the concatenation, each field gets rounded up to 32-bits to calculate the pipapo rule width from pipapo_init(). The set key length provides the total size of the key aligned to 32-bits. Register-based arithmetics still allows for combining mismatching set key length and field length description, eg. set key length 10 and field description [ 5, 4 ] leading to pipapo width of 12. Cc: stable@vger.kernel.org Fixes: 3ce67e3793f4 ("netfilter: nf_tables: do not allow mismatch field size and set key length") Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-30	Merge branch 'fix-missing-rtnl-lock-in-suspend-path'	Paolo Abeni
	Kory Maincent says: ==================== Fix missing rtnl lock in suspend path Fix the suspend path by ensuring the rtnl lock is held where required. Calls to open, close and WOL operations must be performed under the rtnl lock to prevent conflicts with ongoing ndo operations. Discussion about this issue can be found here: https://lore.kernel.org/netdev/20250120141926.1290763-1-kory.maincent@bootlin.com/ While working on the ravb fix, it was discovered that the sh_eth driver has the same issue. This patch series addresses both drivers. I do not have access to hardware for either of these MACs, so it would be great if maintainers or others with the relevant boards could test these fixes. v2: https://lore.kernel.org/r/20250123-fix_missing_rtnl_lock_phy_disconnect-v2-0-e6206f5508ba@bootlin.com v1: https://lore.kernel.org/r/20250122-fix_missing_rtnl_lock_phy_disconnect-v1-0-8cb9f6f88fd1@bootlin.com Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> ==================== Link: https://patch.msgid.link/20250129-fix_missing_rtnl_lock_phy_disconnect-v3-0-24c4ba185a92@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	net: sh_eth: Fix missing rtnl lock in suspend/resume path	Kory Maincent
	Fix the suspend/resume path by ensuring the rtnl lock is held where required. Calls to sh_eth_close, sh_eth_open and wol operations must be performed under the rtnl lock to prevent conflicts with ongoing ndo operations. Fixes: b71af04676e9 ("sh_eth: add more PM methods") Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	net: ravb: Fix missing rtnl lock in suspend/resume path	Kory Maincent
	Fix the suspend/resume path by ensuring the rtnl lock is held where required. Calls to ravb_open, ravb_close and wol operations must be performed under the rtnl lock to prevent conflicts with ongoing ndo operations. Without this fix, the following warning is triggered: [ 39.032969] ============================= [ 39.032983] WARNING: suspicious RCU usage [ 39.033019] ----------------------------- [ 39.033033] drivers/net/phy/phy_device.c:2004 suspicious rcu_dereference_protected() usage! ... [ 39.033597] stack backtrace: [ 39.033613] CPU: 0 UID: 0 PID: 174 Comm: python3 Not tainted 6.13.0-rc7-next-20250116-arm64-renesas-00002-g35245dfdc62c #7 [ 39.033623] Hardware name: Renesas SMARC EVK version 2 based on r9a08g045s33 (DT) [ 39.033628] Call trace: [ 39.033633] show_stack+0x14/0x1c (C) [ 39.033652] dump_stack_lvl+0xb4/0xc4 [ 39.033664] dump_stack+0x14/0x1c [ 39.033671] lockdep_rcu_suspicious+0x16c/0x22c [ 39.033682] phy_detach+0x160/0x190 [ 39.033694] phy_disconnect+0x40/0x54 [ 39.033703] ravb_close+0x6c/0x1cc [ 39.033714] ravb_suspend+0x48/0x120 [ 39.033721] dpm_run_callback+0x4c/0x14c [ 39.033731] device_suspend+0x11c/0x4dc [ 39.033740] dpm_suspend+0xdc/0x214 [ 39.033748] dpm_suspend_start+0x48/0x60 [ 39.033758] suspend_devices_and_enter+0x124/0x574 [ 39.033769] pm_suspend+0x1ac/0x274 [ 39.033778] state_store+0x88/0x124 [ 39.033788] kobj_attr_store+0x14/0x24 [ 39.033798] sysfs_kf_write+0x48/0x6c [ 39.033808] kernfs_fop_write_iter+0x118/0x1a8 [ 39.033817] vfs_write+0x27c/0x378 [ 39.033825] ksys_write+0x64/0xf4 [ 39.033833] __arm64_sys_write+0x18/0x20 [ 39.033841] invoke_syscall+0x44/0x104 [ 39.033852] el0_svc_common.constprop.0+0xb4/0xd4 [ 39.033862] do_el0_svc+0x18/0x20 [ 39.033870] el0_svc+0x3c/0xf0 [ 39.033880] el0t_64_sync_handler+0xc0/0xc4 [ 39.033888] el0t_64_sync+0x154/0x158 [ 39.041274] ravb 11c30000.ethernet eth0: Link is Down Reported-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Closes: https://lore.kernel.org/netdev/4c6419d8-c06b-495c-b987-d66c2e1ff848@tuxon.dev/ Fixes: 0184165b2f42 ("ravb: add sleep PM suspend/resume support") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	Merge tag 'for-net-2025-01-29' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - btusb: mediatek: Add locks for usb_driver_claim_interface() - L2CAP: accept zero as a special value for MTU auto-selection - btusb: Fix possible infinite recursion of btusb_reset - Add ABI doc for sysfs reset - btnxpuart: Fix glitches seen in dual A2DP streaming * tag 'for-net-2025-01-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming Bluetooth: Add ABI doc for sysfs reset Bluetooth: Fix possible infinite recursion of btusb_reset Bluetooth: btusb: mediatek: Add locks for usb_driver_claim_interface() ==================== Link: https://patch.msgid.link/20250129210057.1318963-1-luiz.dentz@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-30	debugfs: Fix the missing initializations in __debugfs_file_get()	Al Viro
	both method table pointers in debugfs_fsdata need to be initialized, obviously, and calculating the bitmap of present methods would also go better if we start with initialized state. Fixes: 41a0ecc0997c ("debugfs: get rid of dynamically allocation proxy_ops") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20250129191937.GR1977892@ZenIV Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-29	selftests/net: Add test for loading devbound XDP program in generic mode	Toke Høiland-Jørgensen
	Add a test to bpf_offload.py for loading a devbound XDP program in generic mode, checking that it fails correctly. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250127131344.238147-2-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	net: xdp: Disallow attaching device-bound programs in generic mode	Toke Høiland-Jørgensen
	Device-bound programs are used to support RX metadata kfuncs. These kfuncs are driver-specific and rely on the driver context to read the metadata. This means they can't work in generic XDP mode. However, there is no check to disallow such programs from being attached in generic mode, in which case the metadata kfuncs will be called in an invalid context, leading to crashes. Fix this by adding a check to disallow attaching device-bound programs in generic mode. Fixes: 2b3486bc2d23 ("bpf: Introduce device-bound XDP programs") Reported-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Closes: https://lore.kernel.org/r/dae862ec-43b5-41a0-8edf-46c59071cdda@hetzner-cloud.de Tested-by: Marcus Wichelmann <marcus.wichelmann@hetzner-cloud.de> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250127131344.238147-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	tcp: correct handling of extreme memory squeeze	Jon Maloy
	Testing with iperf3 using the "pasta" protocol splicer has revealed a problem in the way tcp handles window advertising in extreme memory squeeze situations. Under memory pressure, a socket endpoint may temporarily advertise a zero-sized window, but this is not stored as part of the socket data. The reasoning behind this is that it is considered a temporary setting which shouldn't influence any further calculations. However, if we happen to stall at an unfortunate value of the current window size, the algorithm selecting a new value will consistently fail to advertise a non-zero window once we have freed up enough memory. This means that this side's notion of the current window size is different from the one last advertised to the peer, causing the latter to not send any data to resolve the sitution. The problem occurs on the iperf3 server side, and the socket in question is a completely regular socket with the default settings for the fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket. The following excerpt of a logging session, with own comments added, shows more in detail what is happening: // tcp_v4_rcv(->) // tcp_rcv_established(->) [5201<->39222]: ==== Activating log @ net/ipv4/tcp_input.c/tcp_data_queue()/5257 ==== [5201<->39222]: tcp_data_queue(->) [5201<->39222]: DROPPING skb [265600160..265665640], reason: SKB_DROP_REASON_PROTO_MEM [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] [copied_seq 259909392->260034360 (124968), unread 5565800, qlen 85, ofoq 0] [OFO queue: gap: 65480, len: 0] [5201<->39222]: tcp_data_queue(<-) [5201<->39222]: __tcp_transmit_skb(->) [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] [5201<->39222]: tcp_select_window(->) [5201<->39222]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) ? --> TRUE [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] returning 0 [5201<->39222]: tcp_select_window(<-) [5201<->39222]: ADVERTISING WIN 0, ACK_SEQ: 265600160 [5201<->39222]: [__tcp_transmit_skb(<-) [5201<->39222]: tcp_rcv_established(<-) [5201<->39222]: tcp_v4_rcv(<-) // Receive queue is at 85 buffers and we are out of memory. // We drop the incoming buffer, although it is in sequence, and decide // to send an advertisement with a window of zero. // We don't update tp->rcv_wnd and tp->rcv_wup accordingly, which means // we unconditionally shrink the window. [5201<->39222]: tcp_recvmsg_locked(->) [5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160 [5201<->39222]: [new_win = 0, win_now = 131184, 2 * win_now = 262368] [5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0] [5201<->39222]: NOT calling tcp_send_ack() [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] [5201<->39222]: __tcp_cleanup_rbuf(<-) [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] [copied_seq 260040464->260040464 (0), unread 5559696, qlen 85, ofoq 0] returning 6104 bytes [5201<->39222]: tcp_recvmsg_locked(<-) // After each read, the algorithm for calculating the new receive // window in __tcp_cleanup_rbuf() finds it is too small to advertise // or to update tp->rcv_wnd. // Meanwhile, the peer thinks the window is zero, and will not send // any more data to trigger an update from the interrupt mode side. [5201<->39222]: tcp_recvmsg_locked(->) [5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160 [5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368] [5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0] [5201<->39222]: NOT calling tcp_send_ack() [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] [5201<->39222]: __tcp_cleanup_rbuf(<-) [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] [copied_seq 260099840->260171536 (71696), unread 5428624, qlen 83, ofoq 0] returning 131072 bytes [5201<->39222]: tcp_recvmsg_locked(<-) // The above pattern repeats again and again, since nothing changes // between the reads. [...] [5201<->39222]: tcp_recvmsg_locked(->) [5201<->39222]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160 [5201<->39222]: [new_win = 262144, win_now = 131184, 2 * win_now = 262368] [5201<->39222]: [new_win >= (2 * win_now) ? --> time_to_ack = 0] [5201<->39222]: NOT calling tcp_send_ack() [tp->rcv_wup: 265469200, tp->rcv_wnd: 262144, tp->rcv_nxt 265600160] [5201<->39222]: __tcp_cleanup_rbuf(<-) [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184] [copied_seq 265600160->265600160 (0), unread 0, qlen 0, ofoq 0] returning 54672 bytes [5201<->39222]: tcp_recvmsg_locked(<-) // The receive queue is empty, but no new advertisement has been sent. // The peer still thinks the receive window is zero, and sends nothing. // We have ended up in a deadlock situation. Note that well behaved endpoints will send win0 probes, so the problem will not occur. Furthermore, we have observed that in these situations this side may send out an updated 'th->ack_seq´ which is not stored in tp->rcv_wup as it should be. Backing ack_seq seems to be harmless, but is of course still wrong from a protocol viewpoint. We fix this by updating the socket state correctly when a packet has been dropped because of memory exhaustion and we have to advertize a zero window. Further testing shows that the connection recovers neatly from the squeeze situation, and traffic can continue indefinitely. Fixes: e2142825c120 ("net: tcp: send zero-window ACK when no memory") Cc: Menglong Dong <menglong8.dong@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20250127231304.1465565-1-jmaloy@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	bgmac: reduce max frame size to support just MTU 1500	Rafał Miłecki
	bgmac allocates new replacement buffer before handling each received frame. Allocating & DMA-preparing 9724 B each time consumes a lot of CPU time. Ideally bgmac should just respect currently set MTU but it isn't the case right now. For now just revert back to the old limited frame size. This change bumps NAT masquerade speed by ~95%. Since commit 8218f62c9c9b ("mm: page_frag: use initial zero offset for page_frag_alloc_align()"), the bgmac driver fails to open its network interface successfully and runs out of memory in the following call stack: bgmac_open -> bgmac_dma_init -> bgmac_dma_rx_skb_for_slot -> netdev_alloc_frag BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768. Eventually we land into __page_frag_alloc_align() with the following parameters across multiple successive calls: __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0 __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=10048 __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=20096 __page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=30144 So in that case we do indeed have offset + fragsz (40192) > size (32768) and so we would eventually return NULL. Reverting to the older 1500 bytes MTU allows the network driver to be usable again. Fixes: 8c7da63978f1 ("bgmac: configure MTU and add support for frames beyond 8192 byte size") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> [florian: expand commit message about recent commits] Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20250127175159.1788246-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	Merge branch 'vsock-transport-reassignment-and-error-handling-issues'	Jakub Kicinski
	Michal Luczaj says: ==================== vsock: Transport reassignment and error handling issues Series deals with two issues: - socket reference count imbalance due to an unforgiving transport release (triggered by transport reassignment); - unintentional API feature, a failing connect() making the socket impossible to use for any subsequent connect() attempts. v2: https://lore.kernel.org/20250121-vsock-transport-vs-autobind-v2-0-aad6069a4e8c@rbox.co v1: https://lore.kernel.org/20250117-vsock-transport-vs-autobind-v1-0-c802c803762d@rbox.co ==================== Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-0-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	vsock/test: Add test for connect() retries	Michal Luczaj
	Deliberately fail a connect() attempt; expect error. Then verify that subsequent attempt (using the same socket) can still succeed, rather than fail outright. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-6-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	vsock/test: Add test for UAF due to socket unbinding	Michal Luczaj
	Fail the autobind, then trigger a transport reassign. Socket might get unbound from unbound_sockets, which then leads to a reference count underflow. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-5-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	vsock/test: Introduce vsock_connect_fd()	Michal Luczaj
	Distill timeout-guarded vsock_connect_fd(). Adapt callers. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-4-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	vsock/test: Introduce vsock_bind()	Michal Luczaj
	Add a helper for socket()+bind(). Adapt callers. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-3-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	vsock: Allow retrying on connect() failure	Michal Luczaj
	sk_err is set when a (connectible) connect() fails. Effectively, this makes an otherwise still healthy SS_UNCONNECTED socket impossible to use for any subsequent connection attempts. Clear sk_err upon trying to establish a connection. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Luigi Leonardi <leonardi@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-2-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	vsock: Keep the binding until socket destruction	Michal Luczaj
	Preserve sockets bindings; this includes both resulting from an explicit bind() and those implicitly bound through autobind during connect(). Prevents socket unbinding during a transport reassignment, which fixes a use-after-free: 1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2) 2. transport->release() calls vsock_remove_bound() without checking if sk was bound and moved to bound list (refcnt=1) 3. vsock_bind() assumes sk is in unbound list and before __vsock_insert_bound(vsock_bound_sockets()) calls __vsock_remove_bound() which does: list_del_init(&vsk->bound_table); // nop sock_put(&vsk->sk); // refcnt=0 BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730 Read of size 4 at addr ffff88816b46a74c by task a.out/2057 dump_stack_lvl+0x68/0x90 print_report+0x174/0x4f6 kasan_report+0xb9/0x190 __vsock_bind+0x62e/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 2057: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x85/0x90 kmem_cache_alloc_noprof+0x131/0x450 sk_prot_alloc+0x5b/0x220 sk_alloc+0x2c/0x870 __vsock_create.constprop.0+0x2e/0xb60 vsock_create+0xe4/0x420 __sock_create+0x241/0x650 __sys_socket+0xf2/0x1a0 __x64_sys_socket+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 2057: kasan_save_stack+0x1e/0x40 kasan_save_track+0x10/0x30 kasan_save_free_info+0x37/0x60 __kasan_slab_free+0x4b/0x70 kmem_cache_free+0x1a1/0x590 __sk_destruct+0x388/0x5a0 __vsock_bind+0x5e1/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150 RIP: 0010:refcount_warn_saturate+0xce/0x150 __vsock_bind+0x66d/0x730 vsock_bind+0x97/0xe0 __sys_bind+0x154/0x1f0 __x64_sys_bind+0x6e/0xb0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e refcount_t: underflow; use-after-free. WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150 RIP: 0010:refcount_warn_saturate+0xee/0x150 vsock_remove_bound+0x187/0x1e0 __vsock_release+0x383/0x4a0 vsock_release+0x90/0x120 __sock_release+0xa3/0x250 sock_close+0x14/0x20 __fput+0x359/0xa80 task_work_run+0x107/0x1d0 do_exit+0x847/0x2560 do_group_exit+0xb8/0x250 __x64_sys_exit_group+0x3a/0x50 x64_sys_call+0xfec/0x14f0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://patch.msgid.link/20250128-vsock-transport-vs-autobind-v3-1-1cf57065b770@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29	riscv: add a warning when physical memory address overflows	Yunhui Cui
	The part of physical memory that exceeds the size of the linear mapping will be discarded. When the system starts up normally, a warning message will be printed to prevent confusion caused by the mismatch between the system memory and the actual physical memory. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240814062625.19794-1-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-29	audit: Initialize lsmctx to avoid memory allocation error	Huacai Chen
	When audit is enabled in a kernel build, and there are no LSMs active that support LSM labeling, it is possible that local variable lsmctx in the AUDIT_SIGNAL_INFO handler in audit_receive_msg() could be used before it is properly initialize. Then kmalloc() will try to allocate a large amount of memory with the uninitialized length. This patch corrects this problem by initializing the lsmctx to a safe value when it is declared, which avoid errors like: WARNING: CPU: 2 PID: 443 at mm/page_alloc.c:4727 __alloc_pages_noprof ... ra: 9000000003059644 ___kmalloc_large_node+0x84/0x1e0 ERA: 900000000304d588 __alloc_pages_noprof+0x4c8/0x1040 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 00000004 (PPLV0 +PIE -PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 000c0000 [BRK] (IS= ECode=12 EsubCode=0) PRID: 0014c010 (Loongson-64bit, Loongson-3A5000) CPU: 2 UID: 0 PID: 443 Comm: auditd Not tainted 6.13.0-rc1+ #1899 ... Call Trace: [<9000000002def6a8>] show_stack+0x30/0x148 [<9000000002debf58>] dump_stack_lvl+0x68/0xa0 [<9000000002e0fe18>] __warn+0x80/0x108 [<900000000407486c>] report_bug+0x154/0x268 [<90000000040ad468>] do_bp+0x2a8/0x320 [<9000000002dedda0>] handle_bp+0x120/0x1c0 [<900000000304d588>] __alloc_pages_noprof+0x4c8/0x1040 [<9000000003059640>] ___kmalloc_large_node+0x80/0x1e0 [<9000000003061504>] __kmalloc_noprof+0x2c4/0x380 [<9000000002f0f7ac>] audit_receive_msg+0x764/0x1530 [<9000000002f1065c>] audit_receive+0xe4/0x1c0 [<9000000003e5abe8>] netlink_unicast+0x340/0x450 [<9000000003e5ae9c>] netlink_sendmsg+0x1a4/0x4a0 [<9000000003d9ffd0>] __sock_sendmsg+0x48/0x58 [<9000000003da32f0>] __sys_sendto+0x100/0x170 [<9000000003da3374>] sys_sendto+0x14/0x28 [<90000000040ad574>] do_syscall+0x94/0x138 [<9000000002ded318>] handle_syscall+0xb8/0x158 Fixes: 6fba89813ccf333d ("lsm: ensure the correct LSM context releaser") Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> [PM: resolved excessive line length in the backtrace] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-01-30	kconfig: fix memory leak in sym_warn_unmet_dep()	Masahiro Yamada
	The string allocated in sym_warn_unmet_dep() is never freed, leading to a memory leak when an unmet dependency is detected. Fixes: f8f69dc0b4e0 ("kconfig: make unmet dependency warnings readable") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Petr Vorel <pvorel@suse.cz>
2025-01-30	kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST	Masahiro Yamada
	Most 'make *config' commands use .config as the base configuration file. When .config does not exist, Kconfig tries to load a file listed in KCONFIG_DEFCONFIG_LIST instead. However, since commit b75b0a819af9 ("kconfig: change defconfig_list option to environment variable"), warning messages have displayed an incorrect file name in such cases. Below is a demonstration using Debian Trixie. While loading /boot/config-6.12.9-amd64, the warning messages incorrectly show .config as the file name. With this commit, the correct file name is displayed in warnings. [Before] $ rm -f .config $ make config # # using defaults found in /boot/config-6.12.9-amd64 # .config:6804:warning: symbol value 'm' invalid for FB_BACKLIGHT .config:9895:warning: symbol value 'm' invalid for ANDROID_BINDER_IPC [After] $ rm -f .config $ make config # # using defaults found in /boot/config-6.12.9-amd64 # /boot/config-6.12.9-amd64:6804:warning: symbol value 'm' invalid for FB_BACKLIGHT /boot/config-6.12.9-amd64:9895:warning: symbol value 'm' invalid for ANDROID_BINDER_IPC Fixes: b75b0a819af9 ("kconfig: change defconfig_list option to environment variable") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-29	cifs: Add mount option -o reparse=none	Pali Rohár
	This new mount option allows to completely disable creating new reparse points. When -o sfu or -o mfsymlinks or -o symlink= is not specified then creating any special file (fifo, socket, symlink, block and char) will fail with -EOPNOTSUPP error. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Add mount option -o symlink= for choosing symlink create type	Pali Rohár
	Currently Linux CIFS client creates a new symlink of the first flavor which is allowed by mount options, parsed in this order: -o (no)mfsymlinks, -o (no)sfu, -o (no)unix (+ its aliases) and -o reparse=[type]. Introduce a new mount option -o symlink= for explicitly choosing a symlink flavor. Possible options are: -o symlink=default - The default behavior, like before this change. -o symlink=none - Disallow creating a new symlinks -o symlink=native - Create as native SMB symlink reparse point -o symlink=unix - Create via SMB1 unix extension command -o symlink=mfsymlinks - Create as regular file of mfsymlinks format -o symlink=sfu - Create as regular system file of SFU format -o symlink=nfs - Create as NFS reparse point -o symlink=wsl - Create as WSL reparse point So for example specifying -o sfu,mfsymlinks,symlink=native will allow to parse symlinks also of SFU and mfsymlinks types (which are disabled by default unless mount option is explicitly specified), but new symlinks will be created under native SMB type (which parsing is always enabled). Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Fix creating and resolving absolute NT-style symlinks	Pali Rohár
	If the SMB symlink is stored on NT server in absolute form then it points to the NT object hierarchy, which is different from POSIX one and needs some conversion / mapping. To make interoperability with Windows SMB server and WSL subsystem, reuse its logic of mapping between NT paths and POSIX paths into Linux SMB client. WSL subsystem on Windows uses for -t drvfs mount option -o symlinkroot= which specifies the POSIX path where are expected to be mounted lowercase Windows drive letters (without colon). Do same for Linux SMB client and add a new mount option -o symlinkroot= which mimics the drvfs mount option of the same name. It specifies where in the Linux VFS hierarchy is the root of the DOS / Windows drive letters, and translates between absolute NT-style symlinks and absolute Linux VFS symlinks. Default value of symlinkroot is "/mnt", same what is using WSL. Note that DOS / Windows drive letter symlinks are just subset of all possible NT-style symlinks. Drive letters live in NT subtree \??\ and important details about NT paths and object hierarchy are in the comments in this change. When symlink target location from non-POSIX SMB server is in absolute form (indicated by absence of SYMLINK_FLAG_RELATIVE) then it is converted to Linux absolute symlink according to symlinkroot configuration. And when creating a new symlink on non-POSIX SMB server in absolute form then Linux absolute target is converted to NT-style according to symlinkroot configuration. When SMB server is POSIX, then this change does not affect neither reading target location of symlink, nor creating a new symlink. It is expected that POSIX SMB server works with POSIX paths where the absolute root is /. This change improves interoperability of absolute SMB symlinks with Windows SMB servers. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Simplify reparse point check in cifs_query_path_info() function	Pali Rohár
	For checking if path is reparse point and setting data->reparse_point member, it is enough to check if ATTR_REPARSE is present. It is not required to call CIFS_open() without OPEN_REPARSE_POINT and checking for -EOPNOTSUPP error code. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove symlink member from cifs_open_info_data union	Pali Rohár
	Member 'symlink' is part of the union in struct cifs_open_info_data. Its value is assigned on few places, but is always read through another union member 'reparse_point'. So to make code more readable, always use only 'reparse_point' member and drop whole union structure. No function change. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Update description about ACL permissions	Pali Rohár
	There are some incorrect information about individual SMB permission constants like WRITE_DAC can change ownership, or incomplete information to distinguish between ACL types (discretionary vs system) and there is completely missing information how permissions apply for directory objects and what is meaning of GENERIC_* bits. Also there is missing constant for MAXIMUM_ALLOWED permission. Fix and extend description of all SMB permission constants to match the reality, how the reference Windows SMB / NTFS implementation handles them. Links to official Microsoft documentation related to permissions: https://learn.microsoft.com/en-us/windows/win32/fileio/file-access-rights-constants https://learn.microsoft.com/en-us/windows/win32/secauthz/access-mask https://learn.microsoft.com/en-us/windows/win32/secauthz/standard-access-rights https://learn.microsoft.com/en-us/windows/win32/secauthz/generic-access-rights https://learn.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntcreatefile Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move ↵	Pali Rohár
	to common/smb2pdu.h Function parse_reparse_posix() parses NFS-style reparse points, which are used only by Windows NFS server since Windows Server 2012 version. This style is not understood by Microsoft POSIX/Interix/SFU/SUA subsystems. So make it clear that parse_reparse_posix() function and reparse_posix_data structure are not POSIX general, but rather NFS specific. All reparse buffer structures are defined in common/smb2pdu.h and have _buffer suffix. So move struct reparse_posix_data from client/cifspdu.h to common/smb2pdu.h and rename it to reparse_nfs_data_buffer for consistency. Note that also SMB specification in [MS-FSCC] document, section 2.1.2.6 defines it under name "Network File System (NFS) Reparse Data Buffer". So use this name for consistency. Having this structure in common/smb2pdu.h can be useful for ksmbd server code as NFS-style reparse points is the preferred way for implementing support for special files. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove struct reparse_posix_data from struct cifs_open_info_data	Pali Rohár
	Linux SMB client already supports more reparse point types but only the reparse_posix_data is defined in union of struct cifs_open_info_data. This union is currently used as implicit casting between point types. With this code style, it hides information that union is used for pointer casting, and just in mknod_nfs() and posix_reparse_to_fattr() functions. Other reparse point buffers do not use this kind of casting. So remove reparse_posix_data from reparse part of struct cifs_open_info_data and for all cases of reparse buffer use just struct reparse_data_buffer *buf. Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove unicode parameter from parse_reparse_point() function	Pali Rohár
	This parameter is always true, so remove it and also remove dead code which is never called (for all false code paths). Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Fix getting and setting SACLs over SMB1	Pali Rohár
	SMB1 callback get_cifs_acl_by_fid() currently ignores its last argument and therefore ignores request for SACL_SECINFO. Fix this issue by correctly propagating info argument from get_cifs_acl() and get_cifs_acl_by_fid() to CIFSSMBGetCIFSACL() function and pass SACL_SECINFO when requested. For accessing SACLs it is needed to open object with SYSTEM_SECURITY access. Pass this flag when trying to get or set SACLs. Same logic is in the SMB2+ code path. This change fixes getting and setting of "system.cifs_ntsd_full" and "system.smb3_ntsd_full" xattrs over SMB1 as currently it silentely ignored SACL part of passed xattr buffer. Fixes: 3970acf7ddb9 ("SMB3: Add support for getting and setting SACLs") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Remove intermediate object of failed create SFU call	Pali Rohár
	Check if the server honored ATTR_SYSTEM flag by CREATE_OPTION_SPECIAL option. If not then server does not support ATTR_SYSTEM and newly created file is not SFU compatible, which means that the call failed. If CREATE was successful but either setting ATTR_SYSTEM failed or writing type/data information failed then remove the intermediate object created by CREATE. Otherwise intermediate empty object stay on the server. This ensures that if the creating of SFU files with system attribute is unsupported by the server then no empty file stay on the server as a result of unsupported operation. This is for example case with Samba server and Linux tmpfs storage without enabled xattr support (where Samba stores ATTR_SYSTEM bit). Cc: stable@vger.kernel.org Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Validate EAs for WSL reparse points	Pali Rohár
	Major and minor numbers for char and block devices are mandatory for stat. So check that the WSL EA $LXDEV is present for WSL CHR and BLK reparse points. WSL reparse point tag determinate type of the file. But file type is present also in the WSL EA $LXMOD. So check that both file types are same. Fixes: 78e26bec4d6d ("smb: client: parse uid, gid, mode and dev from WSL reparse points") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM	Pali Rohár
	STATUS_PRIVILEGE_NOT_HELD indicates that user does not have privilege to issue some operation, for example to create symlink. Currently STATUS_PRIVILEGE_NOT_HELD is translated to -EIO. Change it to -EPERM which better describe this error code. Note that there is no ERR* code usable in ntstatus_to_dos_map[] table which can be used to -EPERM translation, so do explicit translation in map_smb_to_linux_error() function. Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Tom Talpey <tom@talpey.com> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-29	Merge tag 'soundwire-6.14-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire updates from Vinod Koul: - SoundWire multi lane support to use multiple lanes if supported - Stream handling of DEPREPARED state - AMD wake register programming for power off mode * tag 'soundwire-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: soundwire: amd: clear wake enable register for power off mode soundwire: generic_bandwidth_allocation: count the bandwidth of active streams only SoundWire: pass stream to compute_params() soundwire: generic_bandwidth_allocation: add lane in sdw_group_params soundwire: generic_bandwidth_allocation: select data lane soundwire: generic_bandwidth_allocation: check required freq accurately soundwire: generic_bandwidth_allocation: correct clk_freq check in sdw_select_row_col Soundwire: generic_bandwidth_allocation: set frame shape on fly Soundwire: stream: program BUSCLOCK_SCALE Soundwire: add sdw_slave_get_scale_index helper soundwire: generic_bandwidth_allocation: skip DEPREPARED streams soundwire: stream: set DEPREPARED state earlier soundwire: add lane_used_bandwidth in struct sdw_bus soundwire: mipi_disco: read lane mapping properties from ACPI soundwire: add lane field in sdw_port_runtime soundwire: bus: Move irq mapping cleanup into devres
2025-01-29	Merge tag 'phy-for-6.14' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy updates from Vinod Koul: "Lots of Qualcomm and Rockchip device support. New Support: - Qualcomm SAR2130P qmp usb, SAR2130P qmp pcie, QCS615 qusb2 and PCIe, IPQ5424 qmp pcie, IPQ5424 QUSB2 and USB3 PHY - Rockchip rk3576 combo phy support Updates: - Drop Shengyang for JH7110 maintainer - Freescale hdmi register calculation optimization - Rockchip pcie phy mutex and regmap updates" * tag 'phy-for-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (37 commits) dt-bindings: phy: qcom,qmp-pcie: document the SM8350 two lanes PCIe PHY phy: rockchip: phy-rockchip-typec: Fix Copyright description dt-bindings: phy: qcom,ipq8074-qmp-pcie: Document the IPQ5424 QMP PCIe PHYs phy: qcom-qusb2: Add support for QCS615 dt-bindings: usb: qcom,dwc3: Add QCS615 to USB DWC3 bindings phy: core: Simplify API of_phy_simple_xlate() implementation phy: sun4i-usb: Remove unused of_gpio.h phy: HiSilicon: Don't use "proxy" headers phy: samsung-ufs: switch back to syscon_regmap_lookup_by_phandle() phy: qualcomm: qmp-pcie: add support for SAR2130P phy: qualcomm: qmp-pcie: define several new registers phy: qualcomm: qmp-pcie: split PCS_LANE1 region phy: qualcomm: qmp-combo: add support for SAR2130P dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Add SAR2130P compatible dt-bindings: phy: qcom,sc8280xp-qmp-usb43dp: Add SAR2130P compatible phy: freescale: fsl-samsung-hdmi: Clean up fld_tg_code calculation phy: freescale: fsl-samsung-hdmi: Stop searching when exact match is found phy: freescale: fsl-samsung-hdmi: Expand Integer divider range phy: rockchip-naneng-combo: add rk3576 support dt-bindings: phy: rockchip: add rk3576 compatible ...