linux/linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
2025-02-26	iomap: advance the iter on direct I/O	Brian Foster
	Update iomap direct I/O to advance the iter directly rather than via iter.processed. Update each mapping type helper to advance based on the amount of data processed and return success or failure. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250224144757.237706-3-bfoster@redhat.com Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26	iomap: advance the iter directly on buffered read	Brian Foster
	iomap buffered read advances the iter via iter.processed. To continue separating iter advance from return status, update iomap_readpage_iter() to advance the iter instead of returning the number of bytes processed. In turn, drop the offset parameter and sample the updated iter->pos at the start of the function. Update the callers to loop based on remaining length in the current iteration instead of number of bytes processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250224144757.237706-2-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	Merge patch series "iomap: incremental per-operation iter advance"	Christian Brauner
	Brian Foster <bfoster@redhat.com> says: This is a first pass at supporting more incremental, per-operation iomap_iter advancement. The motivation for this is folio_batch support for zero range, where the fs provides a batch of folios to process in certain situations. Since the batch may not be logically contiguous, processing loops require a bit more flexibility than the typical offset based iteration. The current iteration model basically has the operation _iter() handler lift the pos/length wrt to the current iomap out of the iomap_iter, process it locally, then return the result to be stored in iter.processed. The latter is overloaded with error status, so the handler must decide whether to return error or a partial completion (i.e. consider a short write). iomap_iter() then uses the result to advance the iter and look up the next iomap. The updated model proposed in this series is to allow an operation to advance the iter itself as subranges are processed and then return success or failure in iter.processed. Note that at least initially, this is implemented as an optional mode to minimize churn. This series converts operations that use iomap_write_begin(): buffered write, unshare, and zero range. The main advantage of this is that the future folio_batch work can be plumbed down into the folio get path more naturally, and the associated codepath can advance the iter itself when appropriate rather than require each operation to manage the gaps in the range being processed. Some secondary advantages are a little less boilerplate code for walking ranges and more clear semantics for partial completions in the event of errors, etc. * patches from https://lore.kernel.org/r/20250207143253.314068-1-bfoster@redhat.com: iomap: advance the iter directly on zero range iomap: advance the iter directly on unshare range iomap: advance the iter directly on buffered writes iomap: support incremental iomap_iter advances iomap: export iomap_iter_advance() and return remaining length iomap: lift iter termination logic from iomap_iter_advance() iomap: lift error code check out of iomap_iter_advance() iomap: refactor iomap_iter() length check and tracepoint iomap: split out iomap check and reset logic from iter advance iomap: factor out iomap length helper Link: https://lore.kernel.org/r/20250207143253.314068-1-bfoster@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: advance the iter directly on zero range	Brian Foster
	Modify zero range to advance the iter directly. Replace the local pos and length calculations with direct advances and loop based on iter state instead. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-11-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: advance the iter directly on unshare range	Brian Foster
	Modify unshare range to advance the iter directly. Replace the local pos and length calculations with direct advances and loop based on iter state instead. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-10-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: advance the iter directly on buffered writes	Brian Foster
	Modify the buffered write path to advance the iter directly. Replace the local pos and length calculations with direct advances and loop based on iter state instead. Also remove the -EAGAIN return hack as it is no longer necessary now that separate return channels exist for processing progress and error returns. For example, the existing write handler must return either a count of bytes written or error if the write is interrupted, but presumably wants to return -EAGAIN directly in order to break the higher level iomap_iter() loop. Since the current iteration may have made some progress, it unwinds the iter on the way out to return the error while ensuring that portion of the write can be retried. If -EAGAIN occurs at any point beyond the first iteration, iomap_file_buffered_write() will then observe progress based on iter->pos to return a short write. With incremental advances on the iomap_iter, iomap_write_iter() can simply return the error. iomap_iter() completes whatever progress was made based on iomap_iter position and still breaks out of the iter loop based on the error code in iter.processed. The end result of the write is similar in terms of being a short write if progress was made or error return otherwise. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-9-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: support incremental iomap_iter advances	Brian Foster
	The current iomap_iter iteration model reads the mapping from the filesystem, processes the subrange of the operation associated with the current mapping, and returns the number of bytes processed back to the iteration code. The latter advances the position and remaining length of the iter in preparation for the next iteration. At the _iter() handler level, this tends to produce a processing loop where the local code pulls the current position and remaining length out of the iter, iterates it locally based on file offset, and then breaks out when the associated range has been fully processed. This works well enough for current handlers, but upcoming enhancements require a bit more flexibility in certain situations. Enhancements for zero range will lead to a situation where the processing loop is no longer a pure ascending offset walk, but rather dictated by pagecache state and folio lookup. Since folio lookup and write preparation occur at different levels, it is more difficult to manage position and length outside of the iter. To provide more flexibility to certain iomap operations, introduce support for incremental iomap_iter advances from within the operation itself. This allows more granular advances for operations that might not use the typical file offset based walk. Note that the semantics for operations that use incremental advances is slightly different than traditional operations. Operations that advance the iter directly are expected to return success or failure (i.e. 0 or negative error code) in iter.processed rather than the number of bytes processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-8-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: export iomap_iter_advance() and return remaining length	Brian Foster
	As a final step for generic iter advance, export the helper and update it to return the remaining length of the current iteration after the advance. This will usually be 0 in the iomap_iter() case, but will be useful for the various operations that iterate on their own and will be updated to advance as they progress. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-7-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: lift iter termination logic from iomap_iter_advance()	Brian Foster
	The iter termination logic in iomap_iter_advance() is only needed by iomap_iter() to determine whether to proceed with the next mapping for an ongoing operation. The old logic sets ret to 1 and then terminates if the operation is complete (iter->len == 0) or the previous iteration performed no work and the mapping has not been marked stale. The stale check exists to allow operations to retry the current mapping if an inconsistency has been detected. To further genericize iomap_iter_advance(), lift the termination logic into iomap_iter() and update the former to return success (0) or an error code. iomap_iter() continues on successful advance and non-zero iter->len or otherwise terminates in the no progress (and not stale) or error cases. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-6-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: lift error code check out of iomap_iter_advance()	Brian Foster
	The error code is only used to check whether iomap_iter() should terminate due to an error returned in iter.processed. Lift the check out of iomap_iter_advance() in preparation to make it more generic. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-5-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: refactor iomap_iter() length check and tracepoint	Brian Foster
	iomap_iter() checks iomap.length to skip individual code blocks not appropriate for the initial case where there is no mapping in the iter. To prepare for upcoming changes, refactor the code to jump straight to the ->iomap_begin() handler in the initial case and move the tracepoint to the top of the function so it always executes. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-4-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: split out iomap check and reset logic from iter advance	Brian Foster
	In preparation for more granular iomap_iter advancing, break out some of the logic associated with higher level iteration from iomap_advance_iter(). Specifically, factor the iomap reset code into a separate helper and lift the iomap.length check into the calling code, similar to how ->iomap_end() calls are handled. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-3-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-10	iomap: factor out iomap length helper	Brian Foster
	In preparation to support more granular iomap iter advancing, factor the pos/len values as parameters to length calculation. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/r/20250207143253.314068-2-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	Merge patch series "iomap: allow the file system to submit the writeback bios"	Christian Brauner
	Christoph Hellwig <hch@lst.de> says: This series contains the iomap prep work to support zoned XFS. The biggest changes are: - an option to reuse the ioend code for direct writes in addition to the current use for buffered writeback, which allows the file system to track completions on a per-bio basis instead of the current end_io callback which operates on the entire I/O. Note that it might make sense to split the ioend code from buffered-io.c into its own file with this. Let me know what you think of that and I can include it in the next version - change of the writeback_ops so that the submit_bio call can be done by the file system. Note that btrfs will also need this eventually when it starts using iomap - helpers to split ioend to the zone append queue_limits that plug into the previous item above. - a new ANON_WRITE flags for writes that don't have a block number assigned to them at the iomap level, leaving the file system to do that work in the submission handler. Note that btrfs wants something similar also for compressed I/O, which should be able to reuse this, maybe with minor tweaks. - passing private data to a few more helper The XFS changes to use this will be posted to the xfs list only to not spam fsdevel too much. * patches from https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.de: iomap: pass private data to iomap_truncate_page iomap: pass private data to iomap_zero_range iomap: pass private data to iomap_page_mkwrite iomap: add a io_private field to struct iomap_ioend iomap: optionally use ioends for direct I/O iomap: factor out a iomap_dio_done helper iomap: move common ioend code to ioend.c iomap: split bios to zone append limits in the submission handlers iomap: add a IOMAP_F_ANON_WRITE flag iomap: simplify io_flags and io_type in struct iomap_ioend iomap: allow the file system to submit the writeback bios Link: https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: pass private data to iomap_truncate_page	Christoph Hellwig
	Allow the file system to pass private data which can be used by the iomap_begin and iomap_end methods through the private pointer in the iomap_iter structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-12-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: pass private data to iomap_zero_range	Christoph Hellwig
	Allow the file system to pass private data which can be used by the iomap_begin and iomap_end methods through the private pointer in the iomap_iter structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-11-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: pass private data to iomap_page_mkwrite	Christoph Hellwig
	Allow the file system to pass private data which can be used by the iomap_begin and iomap_end methods through the private pointer in the iomap_iter structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-10-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: add a io_private field to struct iomap_ioend	Christoph Hellwig
	Add a private data field to struct iomap_ioend so that the file system can attach information to it. Zoned XFS will use this for a pointer to the open zone. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-9-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: optionally use ioends for direct I/O	Christoph Hellwig
	struct iomap_ioend currently tracks outstanding buffered writes and has some really nice code in core iomap and XFS to merge contiguous I/Os an defer them to userspace for completion in a very efficient way. For zoned writes we'll also need a per-bio user context completion to record the written blocks, and the infrastructure for that would look basically like the ioend handling for buffered I/O. So instead of reinventing the wheel, reuse the existing infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-8-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: factor out a iomap_dio_done helper	Christoph Hellwig
	Split out the struct iomap-dio level final completion from iomap_dio_bio_end_io into a helper to clean up the code and make it reusable. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-7-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: move common ioend code to ioend.c	Christoph Hellwig
	This code will be reused for direct I/O soon, so split it out of buffered-io.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-6-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: split bios to zone append limits in the submission handlers	Christoph Hellwig
	Provide helpers for file systems to split bios in the direct I/O and writeback I/O submission handlers. The split ioends are chained to the parent ioend so that only the parent ioend originally generated by the iomap layer will be processed after all the chained off children have completed. This is based on the block layer bio chaining that has supported a similar mechanism for a long time. This Follows btrfs' lead and don't try to build bios to hardware limits for zone append commands, but instead build them as normal unconstrained bios and split them to the hardware limits in the I/O submission handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-5-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: add a IOMAP_F_ANON_WRITE flag	Christoph Hellwig
	Add a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does not have a target block assigned to it yet at iomap time and the file system will do that in the bio submission handler, splitting the I/O as needed. This is used to implement Zone Append based I/O for zoned XFS, where splitting writes to the hardware limits and assigning a zone to them happens just before sending the I/O off to the block layer, but could also be useful for other things like compressed I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-4-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: simplify io_flags and io_type in struct iomap_ioend	Christoph Hellwig
	The ioend fields for distinct types of I/O are a bit complicated. Consolidate them into a single io_flag field with it's own flags decoupled from the iomap flags. This also prepares for adding a new flag that is unrelated to both of the iomap namespaces. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-3-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-06	iomap: allow the file system to submit the writeback bios	Christoph Hellwig
	Change ->prepare_ioend to ->submit_ioend and require file systems that implement it to submit the bio. This is needed for file systems that do their own work on the bios before submitting them to the block layer like btrfs or zoned xfs. To make this easier also pass the writeback context to the method. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.de Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-02	Linux 6.14-rc1v6.14-rc1	Linus Torvalds

2025-02-02	Merge tag 'turbostat-2025.02.02' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Fix regression that affinitized forked child in one-shot mode. - Harden one-shot mode against hotplug online/offline - Enable RAPL SysWatt column by default - Add initial PTL, CWF platform support - Harden initial PMT code in response to early use - Enable first built-in PMT counter: CWF c1e residency - Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results * tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits) tools/power turbostat: version 2025.02.02 tools/power turbostat: Add CPU%c1e BIC for CWF tools/power turbostat: Harden one-shot mode against cpu offline tools/power turbostat: Fix forked child affinity regression tools/power turbostat: Add tcore clock PMT type tools/power turbostat: version 2025.01.14 tools/power turbostat: Allow adding PMT counters directly by sysfs path tools/power turbostat: Allow mapping multiple PMT files with the same GUID tools/power turbostat: Add PMT directory iterator helper tools/power turbostat: Extend PMT identification with a sequence number tools/power turbostat: Return default value for unmapped PMT domains tools/power turbostat: Check for non-zero value when MSR probing tools/power turbostat: Enhance turbostat self-performance visibility tools/power turbostat: Add fixed RAPL PSYS divisor for SPR tools/power turbostat: Fix PMT mmaped file size rounding tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT tools/power turbostat: Add an NMI column tools/power turbostat: add Busy% to "show idle" tools/power turbostat: Introduce --force parameter tools/power turbostat: Improve --help output ...
2025-02-02	Merge tag 'sh-for-v6.14-tag1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "Fixes and improvements for sh: - replace seq_printf() with the more efficient seq_put_decimal_ull_width() to increase performance when stress reading /proc/interrupts (David Wang) - migrate sh to the generic rule for built-in DTB to help avoid race conditions during parallel builds which can occur because Kbuild decends into arch//boot/dts twice (Masahiro Yamada) - replace select with imply in the board Kconfig for enabling hardware with complex dependencies. This addresses warnings which were reported by the kernel test robot (Geert Uytterhoeven)" tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: boards: Use imply to enable hardware with complex dependencies sh: Migrate to the generic rule for built-in DTB sh: irq: Use seq_put_decimal_ull_width() for decimal values
2025-02-02	tools/power turbostat: version 2025.02.02	Len Brown
	Summary of Changes since 2024.11.30: Fix regression in 2023.11.07 that affinitized forked child in one-shot mode. Harden one-shot mode against hotplug online/offline Enable RAPL SysWatt column by default. Add initial PTL, CWF platform support. Harden initial PMT code in response to early use. Enable first built-in PMT counter: CWF c1e residency Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results. Signed-off-by: Len Brown <len.brown@intel.com>
2025-02-01	Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull misc vfs cleanups from Al Viro: "Two unrelated patches - one is a removal of long-obsolete include in overlayfs (it used to need fs/internal.h, but the extern it wanted has been moved back to include/linux/namei.h) and another introduces convenience helper constructing struct qstr by a NUL-terminated string" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add a string-to-qstr constructor fs/overlayfs/namei.c: get rid of include ../internal.h
2025-02-01	Merge tag 'mips_6.14_1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Revert commit breaking sysv ipc for o32 ABI" * tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: Revert "mips: fix shmctl/semctl/msgctl syscall for o32"
2025-02-01	Merge tag 'v6.14-rc-smb3-client-fixes-part2' of ↵	Linus Torvalds
	git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - various updates for special file handling: symlink handling, support for creating sockets, cleanups, new mount options (e.g. to allow disabling using reparse points for them, and to allow overriding the way symlinks are saved), and fixes to error paths - fix for kerberos mounts (allow IAKerb) - SMB1 fix for stat and for setting SACL (auditing) - fix an incorrect error code mapping - cleanups" * tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits) cifs: Fix parsing native symlinks directory/file type cifs: update internal version number cifs: Add support for creating WSL-style symlinks smb3: add support for IAKerb cifs: Fix struct FILE_ALL_INFO cifs: Add support for creating NFS-style symlinks cifs: Add support for creating native Windows sockets cifs: Add mount option -o reparse=none cifs: Add mount option -o symlink= for choosing symlink create type cifs: Fix creating and resolving absolute NT-style symlinks cifs: Simplify reparse point check in cifs_query_path_info() function cifs: Remove symlink member from cifs_open_info_data union cifs: Update description about ACL permissions cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h cifs: Remove struct reparse_posix_data from struct cifs_open_info_data cifs: Remove unicode parameter from parse_reparse_point() function cifs: Fix getting and setting SACLs over SMB1 cifs: Remove intermediate object of failed create SFU call cifs: Validate EAs for WSL reparse points cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM ...
2025-02-01	Merge tag 'driver-core-6.14-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull debugfs fix from Greg KH: "Here is a single debugfs fix from Al to resolve a reported regression in the driver-core tree. It has been reported to fix the issue" * tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: debugfs: Fix the missing initializations in __debugfs_file_get()
2025-02-01	Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 8 are cc:stable and the remainder address post-6.13 issues. 13 are for MM and 8 are for non-MM. All are singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) MAINTAINERS: include linux-mm for xarray maintenance revert "xarray: port tests to kunit" MAINTAINERS: add lib/test_xarray.c mailmap, MAINTAINERS, docs: update Carlos's email address mm/hugetlb: fix hugepage allocation for interleaved memory nodes mm: gup: fix infinite loop within __get_longterm_locked mm, swap: fix reclaim offset calculation error during allocation .mailmap: update email address for Christopher Obbard kfence: skip __GFP_THISNODE allocations on NUMA systems nilfs2: fix possible int overflows in nilfs_fiemap() mm: compaction: use the proper flag to determine watermarks kernel: be more careful about dup_mmap() failures and uprobe registering mm/fake-numa: handle cases with no SRAT info mm: kmemleak: fix upper boundary check for physical address objects mailmap: add an entry for Hamza Mahfooz MAINTAINERS: mailmap: update Yosry Ahmed's email address scripts/gdb: fix aarch64 userspace detection in get_current_task mm/vmscan: accumulate nr_demoted for accurate demotion statistics ocfs2: fix incorrect CPU endianness conversion causing mount failure mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc() ...
2025-02-01	Merge tag 'media/v6.14-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fix from Mauro Carvalho Chehab: "A revert for a regression in the uvcvideo driver" * tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: Revert "media: uvcvideo: Require entities to have a non-zero unique ID"
2025-02-01	MAINTAINERS: include linux-mm for xarray maintenance	Andrew Morton
	MM developers have an interest in the xarray code. Cc: David Gow <davidgow@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	revert "xarray: port tests to kunit"	Andrew Morton
	Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code. Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com Cc: David Gow <davidgow@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tamir Duberstein <tamird@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: add lib/test_xarray.c	Tamir Duberstein
	Ensure test-only changes are sent to the relevant maintainer. Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com Signed-off-by: Tamir Duberstein <tamird@gmail.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap, MAINTAINERS, docs: update Carlos's email address	Carlos Bilbao
	Update .mailmap to reflect my new (and final) primary email address, carlos.bilbao@kernel.org. Also update contact information in files Documentation/translations/sp_SP/index.rst and MAINTAINERS. Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Carlos Bilbao <bilbao@vt.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/hugetlb: fix hugepage allocation for interleaved memory nodes	Ritesh Harjani (IBM)
	gather_bootmem_prealloc() assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads. e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com> Suggested-by: Muchun Song <muchun.song@linux.dev> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Gang Li <gang.li@linux.dev> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: gup: fix infinite loop within __get_longterm_locked	Zhaoyang Huang
	We can run into an infinite loop in __get_longterm_locked() when collect_longterm_unpinnable_folios() finds only folios that are isolated from the LRU or were never added to the LRU. This can happen when all folios to be pinned are never added to the LRU, for example when vm_ops->fault allocated pages using cma_alloc() and never added them to the LRU. Fix it by simply taking a look at the list in the single caller, to see if anything was added. [zhaoyang.huang@unisoc.com: move definition of local] Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aijun Sun <aijun.sun@unisoc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm, swap: fix reclaim offset calculation error during allocation	Kairui Song
	There is a code error that will cause the swap entry allocator to reclaim and check the whole cluster with an unexpected tail offset instead of the part that needs to be reclaimed. This may cause corruption of the swap map, so fix it. Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chris Li <chrisl@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	.mailmap: update email address for Christopher Obbard	Christopher Obbard
	Update my email address. Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kfence: skip __GFP_THISNODE allocations on NUMA systems	Marco Elver
	On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on a particular node, and failure to allocate on the desired node will result in a failed allocation. Skip __GFP_THISNODE allocations if we are running on a NUMA system, since KFENCE can't guarantee which node its pool pages are allocated on. Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chistoph Lameter <cl@linux.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	nilfs2: fix possible int overflows in nilfs_fiemap()	Nikita Zhandarovich
	Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff0a89 ("nilfs2: fiemap support") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: compaction: use the proper flag to determine watermarks	yangge
	There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of memory. I have configured 16GB of CMA memory on each NUMA node, and starting a 32GB virtual machine with device passthrough is extremely slow, taking almost an hour. Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of no-CMA memory on a NUMA node can be used as virtual machine memory. There is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the order-0 watermark check, causing the __compaction_suitable() function to consistently return true. For costly allocations, if the __compaction_suitable() function always returns true, it causes the __alloc_pages_slowpath() function to fail to exit at the appropriate point. This prevents timely fallback to allocating memory on other nodes, ultimately resulting in excessively long virtual machine startup times. Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED \|\| compact_result == COMPACT_DEFERRED) goto nopage; // should exit __alloc_pages_slowpath() from here We could use the real unmovable allocation context to have __zone_watermark_unusable_free() subtract CMA pages, and thus we won't pass the order-0 check anymore once the non-CMA part is exhausted. There is some risk that in some different scenario the compaction could in fact migrate pages from the exhausted non-CMA part of the zone to the CMA part and succeed, and we'll skip it instead. But only __GFP_NORETRY allocations should be affected in the immediate "goto nopage" when compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY anyway and won't fail without trying to compact-migrate the non-CMA pageblocks into CMA pageblocks first, so it should be fine. After this fix, it only takes a few tens of seconds to start a 32GB virtual machine with device passthrough functionality. Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/ Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com Signed-off-by: yangge <yangge1116@126.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kernel: be more careful about dup_mmap() failures and uprobe registering	Liam R. Howlett
	If a memory allocation fails during dup_mmap(), the maple tree can be left in an unsafe state for other iterators besides the exit path. All the locks are dropped before the exit_mmap() call (in mm/mmap.c), but the incomplete mm_struct can be reached through (at least) the rmap finding the vmas which have a pointer back to the mm_struct. Up to this point, there have been no issues with being able to find an mm_struct that was only partially initialised. Syzbot was able to make the incomplete mm_struct fail with recent forking changes, so it has been proven unsafe to use the mm_struct that hasn't been initialised, as referenced in the link below. Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to invalid mm") fixed the uprobe access, it does not completely remove the race. This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the oom side (even though this is extremely unlikely to be selected as an oom victim in the race window), and sets MMF_UNSTABLE to avoid other potential users from using a partially initialised mm_struct. When registering vmas for uprobe, skip the vmas in an mm that is marked unstable. Modifying a vma in an unstable mm may cause issues if the mm isn't fully initialised. Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/ Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/fake-numa: handle cases with no SRAT info	Bruno Faccini
	Handle more gracefully cases where no SRAT information is available, like in VMs with no Numa support, and allow fake-numa configuration to complete successfully in these cases Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”) Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: kmemleak: fix upper boundary check for physical address objects	Catalin Marinas
	Memblock allocations are registered by kmemleak separately, based on their physical address. During the scanning stage, it checks whether an object is within the min_low_pfn and max_low_pfn boundaries and ignores it otherwise. With the recent addition of __percpu pointer leak detection (commit 6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak started reporting leaks in setup_zone_pageset() and setup_per_cpu_pageset(). These were caused by the node_data[0] object (initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn) boundary. The non-strict upper boundary check introduced by commit 84c326299191 ("mm: kmemleak: check physical address when scan") causes the pg_data_t object to be ignored (not scanned) and the __percpu pointers it contains to be reported as leaks. Make the max_low_pfn upper boundary check strict when deciding whether to ignore a physical address object and not scan it. Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Patrick Wang <patrick.wang.shcn@gmail.com> Cc: <stable@vger.kernel.org> [6.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap: add an entry for Hamza Mahfooz	Hamza Mahfooz
	Map my previous work email to my current one. Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hans verkuil <hverkuil@xs4all.nl> Cc: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>