Age | Commit message (Collapse) | Author |
|
As of now only device names are printed out over __xfs_printk().
The device names are not persistent across reboots which in case
of searching for origin of corruption brings another task to properly
identify the devices. This patch add XFS UUID upon every mount/umount
event which will make the identification much easier.
Signed-off-by: Lukas Herbolt <lukas@herbolt.com>
[sandeen: rebase onto current upstream kernel]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
When lazysbcount is enabled, fsstress and loop mount/unmount test report
the following problems:
XFS (loop0): SB summary counter sanity check failed
XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460,
xfs_sb block 0x0
XFS (loop0): Unmount and run xfs_repair
XFS (loop0): First 128 bytes of corrupted metadata buffer:
00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00 XFSB.........(..
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a i.|._.D..t....4Z
00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80 ..... ..........
00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................
00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00 ................
00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00 ................
00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19 ................
XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply
+0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580). Shutting down filesystem.
XFS (loop0): Please unmount the filesystem and rectify the problem(s)
XFS (loop0): log mount/recovery failed: error -117
XFS (loop0): log mount failed
This corruption will shutdown the file system and the file system will
no longer be mountable. The following script can reproduce the problem,
but it may take a long time.
#!/bin/bash
device=/dev/sda
testdir=/mnt/test
round=0
function fail()
{
echo "$*"
exit 1
}
mkdir -p $testdir
while [ $round -lt 10000 ]
do
echo "******* round $round ********"
mkfs.xfs -f $device
mount $device $testdir || fail "mount failed!"
fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null &
sleep 4
killall -w fsstress
umount $testdir
xfs_repair -e $device > /dev/null
if [ $? -eq 2 ];then
echo "ERR CODE 2: Dirty log exception during repair."
exit 1
fi
round=$(($round+1))
done
With lazysbcount is enabled, There is no additional lock protection for
reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the
m_ifree, this will make the m_ifree greater than m_icount. For example,
consider the following sequence and ifreedelta is postive:
CPU0 CPU1
xfs_log_sb xfs_trans_unreserve_and_mod_sb
---------- ------------------------------
percpu_counter_sum(&mp->m_icount)
percpu_counter_add_batch(&mp->m_icount,
idelta, XFS_ICOUNT_BATCH)
percpu_counter_add(&mp->m_ifree, ifreedelta);
percpu_counter_sum(&mp->m_ifree)
After this, incorrect inode count (sb_ifree > sb_icount) will be writen to
the log. In the subsequent writing of sb, incorrect inode count (sb_ifree >
sb_icount) will fail to pass the boundary check in xfs_validate_sb_write()
that cause the file system shutdown.
When lazysbcount is enabled, we don't need to guarantee that Lazy sb
counters are completely correct, but we do need to guarantee that sb_ifree
<= sb_icount. On the other hand, the constraint that m_ifree <= m_icount
must be satisfied any time that there /cannot/ be other threads allocating
or freeing inode chunks. If the constraint is violated under these
circumstances, sb_i{count,free} (the ondisk superblock inode counters)
maybe incorrect and need to be marked sick at unmount, the count will
be rebuilt on the next mount.
Fixes: 8756a5af1819 ("libxfs: add more bounds checking to sb sanity checks")
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Clean up resources if resetting the dotdot entry doesn't succeed.
Observed through code inspection.
Fixes: 5838d0356bb3 ("xfs: reset child dir '..' entry when unlinking child")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: scrub inode core when checking metadata files
Running the online fsck QA fuzz tests, I noticed that we were
consistently missing fuzzed records in the inode cores of the realtime
freespace files and the quota files. This patch adds the ability to
check inode cores in xchk_metadata_inode_forks.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-check-metadata-inode-records-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: check inode core when scrubbing metadata files
xfs: don't warn about files that are exactly s_maxbytes long
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: strengthen file mapping scrub
This series strengthens the file extent mapping scrubber in various
ways, such as confirming that there are enough bmap records to match up
with the rmap records for this file, checking delalloc reservations,
checking for no unwritten extents in metadata files, invalid CoW fork
formats, and weird things like shared CoW fork extents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-bmap-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: teach scrub to flag non-extents format cow forks
xfs: check that CoW fork extents are not shared
xfs: check quota files for unwritten extents
xfs: block map scrub should handle incore delalloc reservations
xfs: teach scrub to check for adjacent bmaps when rmap larger than bmap
xfs: fix perag loop in xchk_bmap_check_rmaps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: enhance fs summary counter scrubber
This series makes two changes to the fs summary counter scrubber: first,
we should mark the scrub incomplete when we can't read the AG headers.
Second, it fixes a functionality gap where we don't actually check the
free rt extent count.
v23.2: fix pointless inline
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-fscounters-enhancements-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: online checking of the free rt extent count
xfs: skip fscounters comparisons when the scan is incomplete
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: improve rt metadata use for scrub
This short series makes some small changes to the way we handle the
realtime metadata inodes. First, we now make sure that the bitmap and
summary file forks are always loaded at mount time so that every
scrubber won't have to call xfs_iread_extents. This won't be easy if
we're, say, cross-referencing realtime space allocations. The second
change makes the ILOCK annotations more consistent with the rest of XFS.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-fix-rtmeta-ilocking-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: make rtbitmap ILOCKing consistent when scanning the rt bitmap file
xfs: load rtbitmap and rtsummary extent mapping btrees at mount time
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: fix incorrect return values in online fsck
Here we fix a couple of problems with the errno values that we return to
userspace.
v23.2: fix vague wording of comment
v23.3: fix the commit message to discuss what's really going on in this
patch
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-fix-return-value-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: don't return -EFSCORRUPTED from repair when resources cannot be grabbed
xfs: don't retry repairs harder when EAGAIN is returned
xfs: fix return code when fatal signal encountered during dquot scrub
xfs: return EINTR when a fatal signal terminates scrub
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: clean up memory allocations in online fsck
This series standardizes the GFP_ flags that we use for memory
allocation in online scrub, and convert the callers away from the old
kmem_alloc code that was ported from Irix.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-cleanup-malloc-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: pivot online scrub away from kmem.[ch]
xfs: initialize the check_owner object fully
xfs: standardize GFP flags usage in online scrub
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.2-mergeA
xfs: fix handling of AG[IF] header buffers during scrub
While reading through the online fsck code, I noticed that the setup
code for AG metadata scrubs will attach the AGI, the AGF, and the AGFL
buffers to the transaction. It isn't necessary to hold the AGFL buffer,
since any code that wants to do anything with the AGFL will need to hold
the AGF to know which parts of the AGFL are active. Therefore, we only
need to hold the AGFL when scrubbing the AGFL itself.
The second bug fixed by this patchset is one that I observed while
testing online repair. When a buffer is held across a transaction roll,
its buffer log item will be detached if the bli was clean before the
roll. If we are holding the AG headers to maintain a lock on an AG, we
then need to set the buffer type on the new bli to avoid confusing the
logging code later.
There's also a bug fix for uninitialized memory in the directory scanner
that didn't fit anywhere else.
Ths patchset finishes off by teaching the AGFL repair function to look
for and discard crosslinked blocks instead of putting them back on the
AGFL.
v23.2: Log the buffers before rolling the transaction to keep the moving
forward in the log and avoid the bli falling off.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
* tag 'scrub-fix-ag-header-handling-6.2_2022-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: make AGFL repair function avoid crosslinked blocks
xfs: log the AGI/AGF buffers when rolling transactions during an AG repair
xfs: don't track the AGFL buffer in the scrub AG context
xfs: fully initialize xfs_da_args in xchk_directory_blocks
|
|
Metadata files (e.g. realtime bitmaps and quota files) do not show up in
the bulkstat output, which means that scrub-by-handle does not work;
they can only be checked through a specific scrub type. Therefore, each
scrub type calls xchk_metadata_inode_forks to check the metadata for
whatever's in the file.
Unfortunately, that function doesn't actually check the inode record
itself. Refactor the function a bit to make that happen.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
We can handle files that are exactly s_maxbytes bytes long; we just
can't handle anything larger than that.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
CoW forks only exist in memory, which means that they can only ever have
an incore extent tree. Hence they must always be FMT_EXTENTS, so check
this when we're scrubbing them.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Ensure that extents in an inode's CoW fork are not marked as shared in
the refcount btree.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Teach scrub to flag quota files containing unwritten extents.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Enhance the block map scrubber to check delayed allocation reservations.
Though there are no physical space allocations to check, we do need to
make sure that the range of file offsets being mapped are correct, and
to bump the lastoff cursor so that key order checking works correctly.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
When scrub is checking file fork mappings against rmap records and
the rmap record starts before or ends after the bmap record, check the
adjacent bmap records to make sure that they're adjacent to the one
we're checking. This helps us to detect cases where the rmaps cover
territory that the bmaps do not.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
sparse complains that we can return an uninitialized error from this
function and that pag could be uninitialized. We know that there are no
zero-AG filesystems and hence we had to call xchk_bmap_check_ag_rmaps at
least once, so this is not actually possible, but I'm too worn out from
automated complaints from unsophisticated AIs so let's just fix this and
move on to more interesting problems, eh?
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Teach the summary count checker to count the number of free realtime
extents and compare that to the superblock copy.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
xfs_rtalloc_query_range scans the realtime bitmap file in order of
increasing file offset, so this caller can take ILOCK_SHARED on the rt
bitmap inode instead of ILOCK_EXCL. This isn't going to yield any
practical benefits at mount time, but we'd like to make the locking
usage consistent around xfs_rtalloc_query_all calls. Make all the
places we do this use the same xfs_ilock lockflags for consistency.
Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
If we tried to repair something but the repair failed with -EDEADLOCK,
that means that the repair function couldn't grab some resource it
needed and wants us to try again. If we try again (with TRY_HARDER) but
still can't get all the resources we need, the repair fails and errors
remain on the filesystem.
Right now, repair returns the -EDEADLOCK to the caller as -EFSCORRUPTED,
which results in XFS_SCRUB_OFLAG_CORRUPT being passed out to userspace.
This is not correct because repair has not determined that anything is
corrupt. If the repair had been invoked on an object that could be
optimized but wasn't corrupt (OFLAG_PREEN), the inability to grab
resources will be reported to userspace as corrupt metadata, and users
will be unnecessarily alarmed that their suboptimal metadata turned into
a corruption.
Fix this by returning zero so that the results of the actual scrub will
be copied back out to userspace.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
If any part of the per-AG summary counter scan loop aborts without
collecting all of the data we need, the scrubber's observation data will
be invalid. Set the incomplete flag so that we abort the scrub without
reporting false corruptions. Document the data dependency here too.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
It turns out that GETFSMAP and online fsck have had a bug for years due
to their use of ILOCK_SHARED to coordinate their linear scans of the
realtime bitmap. If the bitmap file's data fork happens to be in BTREE
format and the scan occurs immediately after mounting, the incore bmbt
will not be populated, leading to ASSERTs tripping over the incorrect
inode state. Because the bitmap scans always lock bitmap buffers in
increasing order of file offset, it is appropriate for these two callers
to take a shared ILOCK to improve scalability.
To fix this problem, load both data and attr fork state into memory when
mounting the realtime inodes. Realtime metadata files aren't supposed
to have an attr fork so the second step is likely a nop.
On most filesystems this is unlikely since the rtbitmap data fork is
usually in extents format, but it's possible to craft a filesystem that
will by fragmenting the free space in the data section and growfsing the
rt section.
Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap")
Also-Fixes: 46d9bfb5e706 ("xfs: cross-reference the realtime bitmap")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Convert all the online scrub code to use the Linux slab allocator
functions directly instead of going through the kmem wrappers.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Repair functions will not return EAGAIN -- if they were not able to
obtain resources, they should return EDEADLOCK (like the rest of online
fsck) to signal that we need to grab all the resources and try again.
Hence we don't need to deal with this case except as a debugging
assertion.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Initialize the check_owner list head so that we don't corrupt the list.
Reduce the scope of the object pointer.
Fixes: 858333dcf021 ("xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
If the scrub process is sent a fatal signal while we're checking dquots,
the predicate for this will set the error code to -EINTR. Don't then
squash that into -ECANCELED, because the wrong errno turns up in the
trace output.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
If the program calling online fsck is terminated with a fatal signal,
bail out to userspace by returning EINTR, not EAGAIN. EAGAIN is used by
scrubbers to indicate that we should try again with more resources
locked, and not to indicate that the operation was cancelled. The
miswiring is mostly harmless, but it shows up in the trace data.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Teach the AGFL repair function to check each block of the proposed AGFL
against the rmap btree. If the rmapbt finds any mappings that are not
OWN_AG, strike that block from the list.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Memory allocation usage is the same throughout online fsck -- we want
kernel memory, we have to be able to back out if we can't allocate
memory, and we don't want to spray dmesg with memory allocation failure
reports. Standardize the GFP flag usage and document these requirements.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
Currently, the only way to lock an allocation group is to hold the AGI
and AGF buffers. If a repair needs to roll the transaction while
repairing some AG metadata, it maintains that lock by holding the two
buffers across the transaction roll and joins them afterwards.
However, repair is not like other parts of XFS that employ the bhold -
roll - bjoin sequence because it's possible that the AGI or AGF buffers
are not actually dirty before the roll. This presents two problems --
First, we need to redirty those buffers to keep them moving along in the
log to avoid pinning the log tail. Second, a clean buffer log item can
detach from the buffer. If this happens, the buffer type state is
discarded along with the bli and must be reattached before the next time
the buffer is logged. If it is not, the logging code will complain and
log recovery will not work properly.
An earlier version of this patch tried to fix the second problem by
re-setting the buffer type in the bli after joining the buffer to the
new transaction, but that looked weird and didn't solve the first
problem. Instead, solve both problems by logging the buffer before
rolling the transaction.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
While scrubbing an allocation group, we don't need to hold the AGFL
buffer as part of the scrub context. All that is necessary to lock an
AG is to hold the AGI and AGF buffers, so fix all the existing users of
the AGFL buffer to grab them only when necessary.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
While running the online fsck test suite, I noticed the following
assertion in the kernel log (edited for brevity):
XFS: Assertion failed: 0, file: fs/xfs/xfs_health.c, line: 571
------------[ cut here ]------------
WARNING: CPU: 3 PID: 11667 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
CPU: 3 PID: 11667 Comm: xfs_scrub Tainted: G W 5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:assfail+0x46/0x4a [xfs]
Call Trace:
<TASK>
xfs_dir2_isblock+0xcc/0xe0
xchk_directory_blocks+0xc7/0x420
xchk_directory+0x53/0xb0
xfs_scrub_metadata+0x2b6/0x6b0
xfs_scrubv_metadata+0x35e/0x4d0
xfs_ioc_scrubv_metadata+0x111/0x160
xfs_file_ioctl+0x4ec/0xef0
__x64_sys_ioctl+0x82/0xa0
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
This assertion triggers in xfs_dirattr_mark_sick when the caller passes
in a whichfork value that is neither of XFS_{DATA,ATTR}_FORK. The cause
of this is that xchk_directory_blocks only partially initializes the
xfs_da_args structure that is passed to xfs_dir2_isblock. If the data
fork is not correct, the XFS_IS_CORRUPT clause will trigger. My
development branch reports this failure to the health monitoring
subsystem, which accesses the uninitialized args->whichfork field,
leading the the assertion tripping. We really shouldn't be passing
random stack contents around, so the solution here is to force the
compiler to zero-initialize the struct.
Found by fuzzing u3.bmx[0].blockcount = middlebit on xfs/1554.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"Several fixes for CXL region creation crashes, leaks and failures.
This is mainly fallout from the original implementation of dynamic CXL
region creation (instantiate new physical memory pools) that arrived
in v6.0-rc1.
Given the theme of "failures in the presence of pass-through decoders"
this also includes new regression test infrastructure for that case.
Summary:
- Fix region creation crash with pass-through decoders
- Fix region creation crash when no decoder allocation fails
- Fix region creation crash when scanning regions to enforce the
increasing physical address order constraint that CXL mandates
- Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
1:1 memory-device-to-region associations.
- Fix a memory leak for cxl_region objects when regions with active
targets are deleted
- Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
emulated proximity domains.
- Fix region creation failure for switch attached devices downstream
of a single-port host-bridge
- Fix false positive memory leak of cxl_region objects by recycling
recently used region ids rather than freeing them
- Add regression test infrastructure for a pass-through decoder
configuration
- Fix some mailbox payload handling corner cases"
* tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Recycle region ids
cxl/region: Fix 'distance' calculation with passthrough ports
tools/testing/cxl: Add a single-port host-bridge regression config
tools/testing/cxl: Fix some error exits
cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
cxl/region: Fix cxl_region leak, cleanup targets at region delete
cxl/region: Fix region HPA ordering validation
cxl/pmem: Use size_add() against integer overflow
cxl/region: Fix decoder allocation crash
ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
cxl/region: Fix null pointer dereference due to pass through decoder commit
cxl/mbox: Add a check on input payload size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix two regressions:
- Commit 54cc3dbfc10d ("hwmon: (pmbus) Add regulator supply into
macro") resulted in regulator undercount when disabling regulators.
Revert it.
- The thermal subsystem rework caused the scmi driver to no longer
register with the thermal subsystem because index values no longer
match. To fix the problem, the scmi driver now directly registers
with the thermal subsystem, no longer through the hwmon core"
* tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
Revert "hwmon: (pmbus) Add regulator supply into macro"
hwmon: (scmi) Register explicitly with Thermal Framework
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Cooper Lake's stepping to the PEBS guest/host events isolation
fixed microcode revisions checking quirk
- Update Icelake and Sapphire Rapids events constraints
- Use the standard energy unit for Sapphire Rapids in RAPL
- Fix the hw_breakpoint test to fail more graciously on !SMP configs
* tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
perf/x86/intel: Fix pebs event constraints for SPR
perf/x86/intel: Fix pebs event constraints for ICL
perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
perf/hw_breakpoint: test: Skip the test if dependencies unmet
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add new Intel CPU models
- Enforce that TDX guests are successfully loaded only on TDX hardware
where virtualization exception (#VE) delivery on kernel memory is
disabled because handling those in all possible cases is "essentially
impossible"
- Add the proper include to the syscall wrappers so that BTF can see
the real pt_regs definition and not only the forward declaration
* tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add several Intel server CPU model numbers
x86/tdx: Panic on bad configs that #VE on "private" memory access
x86/tdx: Prepare for using "INFO" call for a second purpose
x86/syscall: Include asm/ptrace.h in syscall_wrapper header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use POSIX-compatible grep options
- Document git-related tips for reproducible builds
- Fix a typo in the modpost rule
- Suppress SIGPIPE error message from gcc-ar and llvm-ar
- Fix segmentation fault in the menuconfig search
* tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix segmentation fault in menuconfig search
kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar
kbuild: fix typo in modpost
Documentation: kbuild: Add description of git for reproducible builds
kbuild: use POSIX-compatible grep option
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the pKVM stage-1 walker erronously using the stage-2 accessor
- Correctly convert vcpu->kvm to a hyp pointer when generating an
exception in a nVHE+MTE configuration
- Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
- Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
- Document the boot requirements for FGT when entering the kernel at
EL1
x86:
- Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
- Make argument order consistent for kvcalloc()
- Userspace API fixes for DEBUGCTL and LBRs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix a typo about the usage of kvcalloc()
KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
arm64: booting: Document our requirements for fine grained traps with SME
KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
KVM: arm64: Fix bad dereference on MTE-enabled systems
KVM: arm64: Use correct accessor to parse stage-1 PTEs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"One fix for silencing a smatch warning, and a small cleanup patch"
* tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: simplify sysenter and syscall setup
x86/xen: silence smatch warning in pmu_msr_chk_emulated()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of bugs, including some regressions, the most serious of
which was one which would cause online resizes to fail with file
systems with metadata checksums enabled.
Also fix a warning caused by the newly added fortify string checker,
plus some bugs that were found using fuzzed file systems"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
ext4: fix wrong return err in ext4_load_and_init_journal()
ext4: fix warning in 'ext4_da_release_space'
ext4: fix BUG_ON() when directory entry has invalid rec_len
ext4: update the backup superblock's at the end of the online resize
|
|
Pull cifs fixes from Steve French:
"One symlink handling fix and two fixes foir multichannel issues with
iterating channels, including for oplock breaks when leases are
disabled"
* tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix use-after-free on the link name
cifs: avoid unnecessary iteration of tcp sessions
cifs: always iterate smb sessions using primary channel
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull `lTracing fixes for 6.1-rc3:
- Fixed NULL pointer dereference in the ring buffer wait-waiters code
for machines that have less CPUs than what nr_cpu_ids returns.
The buffer array is of size nr_cpu_ids, but only the online CPUs get
initialized.
- Fixed use after free call in ftrace_shutdown.
- Fix accounting of if a kprobe is enabled
- Fix NULL pointer dereference on error path of fprobe rethook_alloc().
- Fix unregistering of fprobe_kprobe_handler
- Fix memory leak in kprobe test module
* tag 'trace-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
tracing/fprobe: Fix to check whether fprobe is registered correctly
fprobe: Check rethook_alloc() return in rethook initialization
kprobe: reverse kp->flags when arm_kprobe failed
ftrace: Fix use-after-free for dynamic ftrace_ops
ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
* Fix the pKVM stage-1 walker erronously using the stage-2 accessor
* Correctly convert vcpu->kvm to a hyp pointer when generating
an exception in a nVHE+MTE configuration
* Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
* Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
* Document the boot requirements for FGT when entering the kernel
at EL1
|
|
x86:
* Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
* Make argument order consistent for kvcalloc()
* Userspace API fixes for DEBUGCTL and LBRs
|
|
With the new fortify string system, rework the memcpy to avoid this
warning:
memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)
Cc: stable@kernel.org
Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The return value is wrong in ext4_load_and_init_journal(). The local
variable 'err' need to be initialized before goto out. The original code
in __ext4_fill_super() is fine because it has two return values 'ret'
and 'err' and 'ret' is initialized as -EINVAL. After we factor out
ext4_load_and_init_journal(), this code is broken. So fix it by directly
returning -EINVAL in the error handler path.
Cc: stable@kernel.org
Fixes: 9c1dd22d7422 ("ext4: factor out ext4_load_and_init_journal()")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Syzkaller report issue as follows:
EXT4-fs (loop0): Free/Dirty block details
EXT4-fs (loop0): free_blocks=0
EXT4-fs (loop0): dirty_blocks=0
EXT4-fs (loop0): Block reservation details
EXT4-fs (loop0): i_reserved_data_blocks=0
EXT4-fs warning (device loop0): ext4_da_release_space:1527: ext4_da_release_space: ino 18, to_free 1 with only 0 reserved data blocks
------------[ cut here ]------------
WARNING: CPU: 0 PID: 92 at fs/ext4/inode.c:1528 ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1524
Modules linked in:
CPU: 0 PID: 92 Comm: kworker/u4:4 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:ext4_da_release_space+0x25e/0x370 fs/ext4/inode.c:1528
RSP: 0018:ffffc900015f6c90 EFLAGS: 00010296
RAX: 42215896cd52ea00 RBX: 0000000000000000 RCX: 42215896cd52ea00
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: 1ffff1100e907d96 R08: ffffffff816aa79d R09: fffff520002bece5
R10: fffff520002bece5 R11: 1ffff920002bece4 R12: ffff888021fd2000
R13: ffff88807483ecb0 R14: 0000000000000001 R15: ffff88807483e740
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555569ba628 CR3: 000000000c88e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ext4_es_remove_extent+0x1ab/0x260 fs/ext4/extents_status.c:1461
mpage_release_unused_pages+0x24d/0xef0 fs/ext4/inode.c:1589
ext4_writepages+0x12eb/0x3be0 fs/ext4/inode.c:2852
do_writepages+0x3c3/0x680 mm/page-writeback.c:2469
__writeback_single_inode+0xd1/0x670 fs/fs-writeback.c:1587
writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1870
wb_writeback+0x41f/0x7b0 fs/fs-writeback.c:2044
wb_do_writeback fs/fs-writeback.c:2187 [inline]
wb_workfn+0x3cb/0xef0 fs/fs-writeback.c:2227
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Above issue may happens as follows:
ext4_da_write_begin
ext4_create_inline_data
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
__ext4_ioctl
ext4_ext_migrate -> will lead to eh->eh_entries not zero, and set extent flag
ext4_da_write_begin
ext4_da_convert_inline_data_to_extent
ext4_da_write_inline_data_begin
ext4_da_map_blocks
ext4_insert_delayed_block
if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk))
if (!ext4_es_scan_clu(inode, &ext4_es_is_mapped, lblk))
ext4_clu_mapped(inode, EXT4_B2C(sbi, lblk)); -> will return 1
allocated = true;
ext4_es_insert_delayed_block(inode, lblk, allocated);
ext4_writepages
mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); -> return -ENOSPC
mpage_release_unused_pages(&mpd, give_up_on_write); -> give_up_on_write == 1
ext4_es_remove_extent
ext4_da_release_space(inode, reserved);
if (unlikely(to_free > ei->i_reserved_data_blocks))
-> to_free == 1 but ei->i_reserved_data_blocks == 0
-> then trigger warning as above
To solve above issue, forbid inode do migrate which has inline data.
Cc: stable@kernel.org
Reported-by: syzbot+c740bb18df70ad00952e@syzkaller.appspotmail.com
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221018022701.683489-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The rec_len field in the directory entry has to be a multiple of 4. A
corrupted filesystem image can be used to hit a BUG() in
ext4_rec_len_to_disk(), called from make_indexed_dir().
------------[ cut here ]------------
kernel BUG at fs/ext4/ext4.h:2413!
...
RIP: 0010:make_indexed_dir+0x53f/0x5f0
...
Call Trace:
<TASK>
? add_dirent_to_buf+0x1b2/0x200
ext4_add_entry+0x36e/0x480
ext4_add_nondir+0x2b/0xc0
ext4_create+0x163/0x200
path_openat+0x635/0xe90
do_filp_open+0xb4/0x160
? __create_object.isra.0+0x1de/0x3b0
? _raw_spin_unlock+0x12/0x30
do_sys_openat2+0x91/0x150
__x64_sys_open+0x6c/0xa0
do_syscall_64+0x3c/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
The fix simply adds a call to ext4_check_dir_entry() to validate the
directory entry, returning -EFSCORRUPTED if the entry is invalid.
CC: stable@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216540
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Link: https://lore.kernel.org/r/20221012131330.32456-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|