Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Per-domain device-list locking fixes for the AMD IOMMU driver
- Fix incorrect use of smp_processor_id() in the NVidia-specific part
of the ARM-SMMU-v3 driver
- Intel IOMMU driver fixes:
- Remove cache tags before disabling ATS
- Avoid draining PRQ in sva mm release path
- Fix qi_batch NULL pointer with nested parent domain
* tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/vt-d: Avoid draining PRQ in sva mm release path
iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
iommu/vt-d: Remove cache tags before disabling ATS
iommu/amd: Add lockdep asserts for domain->dev_list
iommu/amd: Put list_add/del(dev_data) back under the domain->lock
iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fix from Damien Le Moal:
- Fix an OF node reference leak in the sata_highbank driver
* tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
i2c-host-fixes for v6.13-rc3
- Replaced jiffies with msec for timeout calculations.
- Added a sentinel to the 'of_device_id' array in Nomadik.
- Rounded up bus period calculation in RIIC.
|
|
Pull smb client fixes from Steve French:
- fix rmmod leak
- two minor cleanups
- fix for unlink/rename with pending i/o
* tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: destroy cfid_put_wq on module exit
cifs: Use str_yes_no() helper in cifs_ses_add_channel()
cifs: Fix rmdir failure due to ongoing I/O on deleted file
smb3: fix compiler warning in reparse code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few fairly small fixes for v6.13, the most substatial one being
disabling STIG mode for Cadence QSPI controllers on Altera SoCFPGA
platforms since it doesn't work"
* tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-cadence-qspi: Disable STIG mode for Altera SoCFPGA.
spi: rockchip: Fix PM runtime count on no-op cs
spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of additional changes, one ensuring we give AXP717 enough
time to stabilise after changing voltages which fixes serious
stability issues on some platforms and another documenting the DT
support required for the Qualcomm WCN6750"
* tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: axp20x: AXP717: set ramp_delay
regulator: dt-bindings: qcom,qca6390-pmu: document wcn6750-pmu
|
|
Pull drm fixes from Dave Airlie:
"This is the weekly fixes pull for drm. Just has i915, xe and amdgpu
changes in it. Nothing too major in here:
i915:
- Don't use indexed register writes needlessly [dsb]
- Stop using non-posted DSB writes for legacy LUT [color]
- Fix NULL pointer dereference in capture_engine
- Fix memory leak by correcting cache object name in error handler
xe:
- Fix a KUNIT test error message (Mirsad Todorovac)
- Fix an invalidation fence PM ref leak (Daniele)
- Fix a register pool UAF (Lucas)
amdgpu:
- ISP hw init fix
- SR-IOV fixes
- Fix contiguous VRAM mapping for UVD on older GPUs
- Fix some regressions due to drm scheduler changes
- Workload profile fixes
- Cleaner shader fix
amdkfd:
- Fix DMA map direction for migration
- Fix a potential null pointer dereference
- Cacheline size fixes
- Runtime PM fix"
* tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel:
drm/xe/reg_sr: Remove register pool
drm/xe: Call invalidation_fence_fini for PT inval fences in error state
drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
drm/amdkfd: pause autosuspend when creating pdd
drm/amdgpu: fix when the cleaner shader is emitted
drm/amdgpu: Fix ISP HW init issue
drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
drm/amdkfd: hard-code cacheline size for gfx11
drm/amdkfd: Dereference null return value
drm/i915: Fix memory leak by correcting cache object name in error handler
drm/i915: Fix NULL pointer dereference in capture_engine
drm/i915/color: Stop using non-posted DSB writes for legacy LUT
drm/i915/dsb: Don't use indexed register writes needlessly
drm/amdkfd: Correct the migration DMA map direction
drm/amd/pm: Set SMU v13.0.7 default workload type
drm/amd/pm: Initialize power profile mode
amdgpu/uvd: get ring reference from rq scheduler
drm/amdgpu: fix UVD contiguous CS mapping problem
drm/amdgpu: use sjt mec fw on gfx943 for sriov
Revert "drm/amdgpu: Fix ISP hw init issue"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management documentation fix from Rafael Wysocki:
"Fix a runtime PM documentation mistake that may mislead someone into
making a coding mistake (Paul Barker)"
* tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation: PM: Clarify pm_runtime_resume_and_get() return value
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two coding mistakes, one in the ACPI resources handling code
and one in ACPICA:
- Relocate the addr->info.mem.caching check in acpi_decode_space() to
only execute it if the resource is of the correct type (Ilpo
Järvinen)
- Don't release a context_mutex that was never acquired in
acpi_remove_address_space_handler() (Daniil Tatianin)"
* tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
ACPI: resource: Fix memory resource type union access
|
|
Kumar Kartikeya Dwivedi says:
====================
Explicit raw_tp NULL arguments
This set reverts the raw_tp masking changes introduced in commit
cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") and
replaces it wwith an explicit list of tracepoints and their arguments
which need to be annotated as PTR_MAYBE_NULL. More context on the
fallout caused by the masking fix and subsequent discussions can be
found in [0].
To remedy this, we implement a solution of explicitly defined tracepoint
and define which args need to be marked NULL or scalar (for IS_ERR
case). The commit logs describes the details of this approach in detail.
We will follow up this solution an approach Eduard is working on to
perform automated analysis of NULL-ness of tracepoint arguments. The
current PoC is available here:
- LLVM branch with the analysis:
https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
- Python script for merging of analysis results:
https://gist.github.com/eddyz87/e47c164466a60e8d49e6911cff146f47
The idea is to infer a tri-state verdict for each tracepoint parameter:
definitely not null, can be null, unknown (in which case no assumptions
should be made).
Using this information, the verifier in most cases will be able to
precisely determine the state of the tracepoint parameter without any
human effort. At that point, the table maintained manually in this set
can be dropped and replace with this automated analysis tool's result.
This will be kept up to date with each kernel release.
[0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
Changelog:
----------
v2 -> v3:
v2: https://lore.kernel.org/bpf/20241213175127.2084759-1-memxor@gmail.com
* Address Eduard's nits, add Reviewed-by
v1 -> v2:
v1: https://lore.kernel.org/bpf/20241211020156.18966-1-memxor@gmail.com
* Address comments from Jiri
* Mark module tracepoints args NULL by default
* Add more sunrpc tracepoints
* Unify scalar or null handling
* Address comments from Alexei
* Use bitmask approach suggested in review
* Unify scalar or null handling
* Drop most tests that rely on CONFIG options
* Drop scripts to generate tests
====================
Link: https://patch.msgid.link/20241213221929.3495062-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add tests to ensure that arguments are correctly marked based on their
specified positions, and whether they get marked correctly as maybe
null. For modules, all tracepoint parameters should be marked
PTR_MAYBE_NULL by default.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL. However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL check branch to be dead code eliminated.
A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps. This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].
Fix this by maintaining an explicit list of tracepoints where the
arguments are known to be NULL, and mark the positional arguments as
PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
are known to be ERR_PTR, and mark these arguments as scalar values to
prevent potential dereference.
Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
shifted by the zero-indexed argument number x 4. This can be represented
as follows:
1st arg: 0x1
2nd arg: 0x10
3rd arg: 0x100
... and so on (likewise for ERR_PTR case).
In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Each
compilation unit will be analyzed and results will be collated to find
whether a tracepoint pointer is definitely not null, maybe null, or an
unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
A proof of concept of this tool from Eduard is available at [3].
Note that in case we don't find a specification in the raw_tp_null_args
array and the tracepoint belongs to a kernel module, we will
conservatively mark the arguments as PTR_MAYBE_NULL. This is because
unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
freely to the tracepoint. We don't protect against such tracepoints
passing ERR_PTR (which is uncommon anyway), lest we mark all such
arguments as SCALAR_VALUE.
While we are it, let's adjust the test raw_tp_null to not perform
dereference of the skb->mark, as that won't be allowed anymore, and make
it more robust by using inline assembly to test the dead code
elimination behavior, which should still stay the same.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
[1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
[2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
[3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch reverts commit
cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
patch was well-intended and meant to be as a stop-gap fixing branch
prediction when the pointer may actually be NULL at runtime. Eventually,
it was supposed to be replaced by an automated script or compiler pass
detecting possibly NULL arguments and marking them accordingly.
However, it caused two main issues observed for production programs and
failed to preserve backwards compatibility. First, programs relied on
the verifier not exploring == NULL branch when pointer is not NULL, thus
they started failing with a 'dereference of scalar' error. Next,
allowing raw_tp arguments to be modified surfaced the warning in the
verifier that warns against reg->off when PTR_MAYBE_NULL is set.
More information, context, and discusson on both problems is available
in [0]. Overall, this approach had several shortcomings, and the fixes
would further complicate the verifier's logic, and the entire masking
scheme would have to be removed eventually anyway.
Hence, revert the patch in preparation of a better fix avoiding these
issues to replace this commit.
[0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
Reported-by: Manu Bretelle <chantra@meta.com>
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull block fixes from Jens Axboe:
- Series from Damien fixing issues with the zoned write plugging
- Fix for a potential UAF in block cgroups
- Fix deadlock around queue freezing and the sysfs lock
- Various little cleanups and fixes
* tag 'block-6.13-20241213' of git://git.kernel.dk/linux:
block: Fix potential deadlock while freezing queue and acquiring sysfs_lock
block: Fix queue_iostats_passthrough_show()
blk-mq: Clean up blk_mq_requeue_work()
mq-deadline: Remove a local variable
blk-iocost: Avoid using clamp() on inuse in __propagate_weights()
block: Make bio_iov_bvec_set() accept pointer to const iov_iter
block: get wp_offset by bdev_offset_from_zone_start
blk-cgroup: Fix UAF in blkcg_unpin_online()
MAINTAINERS: update Coly Li's email address
block: Prevent potential deadlocks in zone write plug error recovery
dm: Fix dm-zoned-reclaim zone write pointer alignment
block: Ignore REQ_NOWAIT for zone reset and zone finish operations
block: Use a zone write plug BIO work for REQ_NOWAIT BIOs
|
|
Pull io_uring fix from Jens Axboe:
"A single fix for a regression introduced in the 6.13 merge window"
* tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux:
io_uring/rsrc: don't put/free empty buffers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Ira Weiny:
- sysbot fix for out of bounds access
* tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl
|
|
ARC GCC compiler is packaged starting from Fedora 39i and the GCC
variant of cross compile tools has arc-linux-gnu- prefix and not
arc-linux-. This is causing that CROSS_COMPILE variable is left unset.
This change allows builds without need to supply CROSS_COMPILE argument
if distro package is used.
Before this change:
$ make -j 128 ARCH=arc W=1 drivers/infiniband/hw/mlx4/
gcc: warning: ‘-mcpu=’ is deprecated; use ‘-mtune=’ or ‘-march=’ instead
gcc: error: unrecognized command-line option ‘-mmedium-calls’
gcc: error: unrecognized command-line option ‘-mlock’
gcc: error: unrecognized command-line option ‘-munaligned-access’
[1] https://packages.fedoraproject.org/pkgs/cross-gcc/gcc-arc-linux-gnu/index.html
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
Pull xfs fixes from Carlos Maiolino:
- Fixes for scrub subsystem
- Fix quota crashes
- Fix sb_spino_align checons on large fsblock sizes
- Fix discarded superblock updates
- Fix stuck unmount due to a locked inode
* tag 'xfs-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
xfs: port xfs_ioc_start_commit to multigrain timestamps
xfs: return from xfs_symlink_verify early on V4 filesystems
xfs: fix zero byte checking in the superblock scrubber
xfs: check pre-metadir fields correctly
xfs: don't crash on corrupt /quotas dirent
xfs: don't move nondir/nonreg temporary repair files to the metadir namespace
xfs: fix sb_spino_align checks for large fsblock sizes
xfs: convert quotacheck to attach dquot buffers
xfs: attach dquot buffer to dquot log item buffer
xfs: clean up log item accesses in xfs_qm_dqflush{,_done}
xfs: separate dquot buffer reads from xfs_dqflush
xfs: don't lose solo dquot update transactions
xfs: don't lose solo superblock counter update transactions
xfs: avoid nested calls to __xfs_trans_commit
xfs: only run precommits once per transaction object
xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
xfs: fix scrub tracepoints when inode-rooted btrees are involved
xfs: update btree keys correctly when _insrec splits an inode root block
xfs: fix error bailout in xfs_rtginode_create
xfs: fix null bno_hint handling in xfs_rtallocate_rtg
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- arm64 stacktrace: address some fallout from the recent changes to
unwinding across exception boundaries
- Ensure the arm64 signal delivery failure is recoverable - only
override the return registers after all the user accesses took place
- Fix the arm64 kselftest access to SVCR - only when SME is detected
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
kselftest/arm64: abi: fix SVCR detection
arm64: signal: Ensure signal delivery failure is recoverable
arm64: stacktrace: Don't WARN when unwinding other tasks
arm64: stacktrace: Skip reporting LR at exception boundaries
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- avoid taking a mutex while resolving jump_labels in the mutex
implementation
- avoid trying to resolve the early boot DT pointer via the linear map
- avoid trying to IPI kfence TLB flushes, as kfence might flush with
IRQs disabled
- avoid calling PMD destructors on PMDs that were never constructed
* tag 'riscv-for-linus-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: mm: Do not call pmd dtor on vmemmap page table teardown
riscv: Fix IPIs usage in kfence_protect_page()
riscv: Fix wrong usage of __pa() on a fixmap address
riscv: Fixup boot failure when CONFIG_DEBUG_RT_MUTEXES=y
|
|
Merge an ACPICA fix for 6.13-rc3:
- Don't release a context_mutex that was never acquired in
acpi_remove_address_space_handler() (Daniil Tatianin).
* acpica:
ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
|
|
into HEAD
KVM/riscv fixes for 6.13, take #1
- Replace csr_write() with csr_set() for HVIEN PMU overflow bit
|
|
Snapshot the output of CPUID.0xD.[1..n] during kvm.ko initiliaization to
avoid the overead of CPUID during runtime. The offset, size, and metadata
for CPUID.0xD.[1..n] sub-leaves does not depend on XCR0 or XSS values, i.e.
is constant for a given CPU, and thus can be cached during module load.
On Intel's Emerald Rapids, CPUID is *wildly* expensive, to the point where
recomputing XSAVE offsets and sizes results in a 4x increase in latency of
nested VM-Enter and VM-Exit (nested transitions can trigger
xstate_required_size() multiple times per transition), relative to using
cached values. The issue is easily visible by running `perf top` while
triggering nested transitions: kvm_update_cpuid_runtime() shows up at a
whopping 50%.
As measured via RDTSC from L2 (using KVM-Unit-Test's CPUID VM-Exit test
and a slightly modified L1 KVM to handle CPUID in the fastpath), a nested
roundtrip to emulate CPUID on Skylake (SKX), Icelake (ICX), and Emerald
Rapids (EMR) takes:
SKX 11650
ICX 22350
EMR 28850
Using cached values, the latency drops to:
SKX 6850
ICX 9000
EMR 7900
The underlying issue is that CPUID itself is slow on ICX, and comically
slow on EMR. The problem is exacerbated on CPUs which support XSAVES
and/or XSAVEC, as KVM invokes xstate_required_size() twice on each
runtime CPUID update, and because there are more supported XSAVE features
(CPUID for supported XSAVE feature sub-leafs is significantly slower).
SKX:
CPUID.0xD.2 = 348 cycles
CPUID.0xD.3 = 400 cycles
CPUID.0xD.4 = 276 cycles
CPUID.0xD.5 = 236 cycles
<other sub-leaves are similar>
EMR:
CPUID.0xD.2 = 1138 cycles
CPUID.0xD.3 = 1362 cycles
CPUID.0xD.4 = 1068 cycles
CPUID.0xD.5 = 910 cycles
CPUID.0xD.6 = 914 cycles
CPUID.0xD.7 = 1350 cycles
CPUID.0xD.8 = 734 cycles
CPUID.0xD.9 = 766 cycles
CPUID.0xD.10 = 732 cycles
CPUID.0xD.11 = 718 cycles
CPUID.0xD.12 = 734 cycles
CPUID.0xD.13 = 1700 cycles
CPUID.0xD.14 = 1126 cycles
CPUID.0xD.15 = 898 cycles
CPUID.0xD.16 = 716 cycles
CPUID.0xD.17 = 748 cycles
CPUID.0xD.18 = 776 cycles
Note, updating runtime CPUID information multiple times per nested
transition is itself a flaw, especially since CPUID is a mandotory
intercept on both Intel and AMD. E.g. KVM doesn't need to ensure emulated
CPUID state is up-to-date while running L2. That flaw will be fixed in a
future patch, as deferring runtime CPUID updates is more subtle than it
appears at first glance, the benefits aren't super critical to have once
the XSAVE issue is resolved, and caching CPUID output is desirable even if
KVM's updates are deferred.
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241211013302.1347853-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix several low-level issues in gpio-graniterapids
- fix an initialization order issue that manifests itself with
__counted_by() checks in gpio-ljca
- don't default to y for CONFIG_GPIO_MVEBU with COMPILE_TEST enabled
- move the DEFAULT_SYMBOL_NAMESPACE define before the export.h include
in gpio-idio-16
* tag 'gpio-fixes-for-v6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: idio-16: Actually make use of the GPIO_IDIO_16 symbol namespace
gpio: graniterapids: Fix GPIO Ack functionality
gpio: graniterapids: Check if GPIO line can be used for IRQs
gpio: graniterapids: Determine if GPIO pad can be used by driver
gpio: graniterapids: Fix invalid RXEVCFG register bitmask
gpio: graniterapids: Fix invalid GPI_IS register offset
gpio: graniterapids: Fix incorrect BAR assignment
gpio: graniterapids: Fix vGPIO driver crash
gpio: ljca: Initialize num before accessing item in ljca_gpio_config
gpio: GPIO_MVEBU should not default to y when compile-testing
|
|
For storing a value to a queue attribute, the queue_attr_store function
first freezes the queue (->q_usage_counter(io)) and then acquire
->sysfs_lock. This seems not correct as the usual ordering should be to
acquire ->sysfs_lock before freezing the queue. This incorrect ordering
causes the following lockdep splat which we are able to reproduce always
simply by accessing /sys/kernel/debug file using ls command:
[ 57.597146] WARNING: possible circular locking dependency detected
[ 57.597154] 6.12.0-10553-gb86545e02e8c #20 Tainted: G W
[ 57.597162] ------------------------------------------------------
[ 57.597168] ls/4605 is trying to acquire lock:
[ 57.597176] c00000003eb56710 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x58/0xc0
[ 57.597200]
but task is already holding lock:
[ 57.597207] c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
[ 57.597226]
which lock already depends on the new lock.
[ 57.597233]
the existing dependency chain (in reverse order) is:
[ 57.597241]
-> #5 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[ 57.597255] down_write+0x6c/0x18c
[ 57.597264] start_creating+0xb4/0x24c
[ 57.597274] debugfs_create_dir+0x2c/0x1e8
[ 57.597283] blk_register_queue+0xec/0x294
[ 57.597292] add_disk_fwnode+0x2e4/0x548
[ 57.597302] brd_alloc+0x2c8/0x338
[ 57.597309] brd_init+0x100/0x178
[ 57.597317] do_one_initcall+0x88/0x3e4
[ 57.597326] kernel_init_freeable+0x3cc/0x6e0
[ 57.597334] kernel_init+0x34/0x1cc
[ 57.597342] ret_from_kernel_user_thread+0x14/0x1c
[ 57.597350]
-> #4 (&q->debugfs_mutex){+.+.}-{4:4}:
[ 57.597362] __mutex_lock+0xfc/0x12a0
[ 57.597370] blk_register_queue+0xd4/0x294
[ 57.597379] add_disk_fwnode+0x2e4/0x548
[ 57.597388] brd_alloc+0x2c8/0x338
[ 57.597395] brd_init+0x100/0x178
[ 57.597402] do_one_initcall+0x88/0x3e4
[ 57.597410] kernel_init_freeable+0x3cc/0x6e0
[ 57.597418] kernel_init+0x34/0x1cc
[ 57.597426] ret_from_kernel_user_thread+0x14/0x1c
[ 57.597434]
-> #3 (&q->sysfs_lock){+.+.}-{4:4}:
[ 57.597446] __mutex_lock+0xfc/0x12a0
[ 57.597454] queue_attr_store+0x9c/0x110
[ 57.597462] sysfs_kf_write+0x70/0xb0
[ 57.597471] kernfs_fop_write_iter+0x1b0/0x2ac
[ 57.597480] vfs_write+0x3dc/0x6e8
[ 57.597488] ksys_write+0x84/0x140
[ 57.597495] system_call_exception+0x130/0x360
[ 57.597504] system_call_common+0x160/0x2c4
[ 57.597516]
-> #2 (&q->q_usage_counter(io)#21){++++}-{0:0}:
[ 57.597530] __submit_bio+0x5ec/0x828
[ 57.597538] submit_bio_noacct_nocheck+0x1e4/0x4f0
[ 57.597547] iomap_readahead+0x2a0/0x448
[ 57.597556] xfs_vm_readahead+0x28/0x3c
[ 57.597564] read_pages+0x88/0x41c
[ 57.597571] page_cache_ra_unbounded+0x1ac/0x2d8
[ 57.597580] filemap_get_pages+0x188/0x984
[ 57.597588] filemap_read+0x13c/0x4bc
[ 57.597596] xfs_file_buffered_read+0x88/0x17c
[ 57.597605] xfs_file_read_iter+0xac/0x158
[ 57.597614] vfs_read+0x2d4/0x3b4
[ 57.597622] ksys_read+0x84/0x144
[ 57.597629] system_call_exception+0x130/0x360
[ 57.597637] system_call_common+0x160/0x2c4
[ 57.597647]
-> #1 (mapping.invalidate_lock#2){++++}-{4:4}:
[ 57.597661] down_read+0x6c/0x220
[ 57.597669] filemap_fault+0x870/0x100c
[ 57.597677] xfs_filemap_fault+0xc4/0x18c
[ 57.597684] __do_fault+0x64/0x164
[ 57.597693] __handle_mm_fault+0x1274/0x1dac
[ 57.597702] handle_mm_fault+0x248/0x484
[ 57.597711] ___do_page_fault+0x428/0xc0c
[ 57.597719] hash__do_page_fault+0x30/0x68
[ 57.597727] do_hash_fault+0x90/0x35c
[ 57.597736] data_access_common_virt+0x210/0x220
[ 57.597745] _copy_from_user+0xf8/0x19c
[ 57.597754] sel_write_load+0x178/0xd54
[ 57.597762] vfs_write+0x108/0x6e8
[ 57.597769] ksys_write+0x84/0x140
[ 57.597777] system_call_exception+0x130/0x360
[ 57.597785] system_call_common+0x160/0x2c4
[ 57.597794]
-> #0 (&mm->mmap_lock){++++}-{4:4}:
[ 57.597806] __lock_acquire+0x17cc/0x2330
[ 57.597814] lock_acquire+0x138/0x400
[ 57.597822] __might_fault+0x7c/0xc0
[ 57.597830] filldir64+0xe8/0x390
[ 57.597839] dcache_readdir+0x80/0x2d4
[ 57.597846] iterate_dir+0xd8/0x1d4
[ 57.597855] sys_getdents64+0x88/0x2d4
[ 57.597864] system_call_exception+0x130/0x360
[ 57.597872] system_call_common+0x160/0x2c4
[ 57.597881]
other info that might help us debug this:
[ 57.597888] Chain exists of:
&mm->mmap_lock --> &q->debugfs_mutex --> &sb->s_type->i_mutex_key#3
[ 57.597905] Possible unsafe locking scenario:
[ 57.597911] CPU0 CPU1
[ 57.597917] ---- ----
[ 57.597922] rlock(&sb->s_type->i_mutex_key#3);
[ 57.597932] lock(&q->debugfs_mutex);
[ 57.597940] lock(&sb->s_type->i_mutex_key#3);
[ 57.597950] rlock(&mm->mmap_lock);
[ 57.597958]
*** DEADLOCK ***
[ 57.597965] 2 locks held by ls/4605:
[ 57.597971] #0: c0000000137c12f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0xcc/0x154
[ 57.597989] #1: c0000018e27c6810 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: iterate_dir+0x94/0x1d4
Prevent the above lockdep warning by acquiring ->sysfs_lock before
freezing the queue while storing a queue attribute in queue_attr_store
function. Later, we also found[1] another function __blk_mq_update_nr_
hw_queues where we first freeze queue and then acquire the ->sysfs_lock.
So we've also updated lock ordering in __blk_mq_update_nr_hw_queues
function and ensured that in all code paths we follow the correct lock
ordering i.e. acquire ->sysfs_lock before freezing the queue.
[1] https://lore.kernel.org/all/CAFj5m9Ke8+EHKQBs_Nk6hqd=LGXtk4mUxZUN5==ZcCjnZSBwHw@mail.gmail.com/
Reported-by: kjain@linux.ibm.com
Fixes: af2814149883 ("block: freeze the queue in queue_attr_store")
Tested-by: kjain@linux.ibm.com
Cc: hch@lst.de
Cc: axboe@kernel.dk
Cc: ritesh.list@gmail.com
Cc: ming.lei@redhat.com
Cc: gjoyce@linux.ibm.com
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241210144222.1066229-1-nilay@linux.ibm.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fixes; all look small, device-specific and boring"
* tag 'sound-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: Intel: sof_sdw: Add space for a terminator into DAIs array
ASoC: fsl_spdif: change IFACE_PCM to IFACE_MIXER
ASoC: fsl_xcvr: change IFACE_PCM to IFACE_MIXER
ASoC: tas2781: Fix calibration issue in stress test
ASoC: audio-graph-card: Call of_node_put() on correct node
ASoC: amd: yc: Fix the wrong return value
ALSA: control: Avoid WARN() for symlink errors
sound: usb: format: don't warn that raw DSD is unsupported
sound: usb: enable DSD output for ddHiFi TC44C
ALSA: hda/realtek: Add new alc2xx-fixup-headset-mic model
ALSA: hda/ca0132: Use standard HD-audio quirk matching helpers
ALSA: usb-audio: Add implicit feedback quirk for Yamaha THR5
ALSA: hda/realtek - Add support for ASUS Zen AIO 27 Z272SD_A272SD audio
ALSA: hda/realtek: Fix headset mic on Acer Nitro 5
ALSA: hda: cs35l56: Remove calls to cs35l56_force_sync_asp1_registers_from_cache()
|
|
Pull documentation fix from Jonathan Corbet:
"A single fix for a docs-build regression caused by the
EXPORT_SYMBOL_NS() mass change"
* tag 'docs-6.13-fix' of git://git.lwn.net/linux:
scripts/kernel-doc: Get -export option working again
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Fix for memcg unreclaimable slab stats drift when post-charging large
kmalloc allocations (Shakeel Butt)
* tag 'slab-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
memcg: slub: fix SUnreclaim for post charged objects
|
|
It appears that the relatively popular RK3399 SoC has been put together
using a large amount of illicit substances, as experiments reveal that its
integration of GIC500 exposes the *secure* programming interface to
non-secure.
This has some pretty bad effects on the way priorities are handled, and
results in a dead machine if booting with pseudo-NMI enabled
(irqchip.gicv3_pseudo_nmi=1) if the kernel contains 18fdb6348c480 ("arm64:
irqchip/gic-v3: Select priorities at boot time"), which relies on the
priorities being programmed using the NS view.
Let's restore some sanity by going one step further and disable security
altogether in this case. This is not any worse, and puts us in a mode where
priorities actually make some sense.
Huge thanks to Mark Kettenis who initially identified this issue on
OpenBSD, and to Chen-Yu Tsai who reported the problem in Linux.
Fixes: 18fdb6348c480 ("arm64: irqchip/gic-v3: Select priorities at boot time")
Reported-by: Mark Kettenis <mark.kettenis@xs4all.nl>
Reported-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen-Yu Tsai <wens@csie.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241213141037.3995049-1-maz@kernel.org
|
|
percpu_base is used in various percpu functions that expect variable in
__percpu address space. Correct the declaration of percpu_base to
void __iomem * __percpu *percpu_base;
to declare the variable as __percpu pointer.
The patch fixes several sparse warnings:
irq-gic.c:1172:44: warning: incorrect type in assignment (different address spaces)
irq-gic.c:1172:44: expected void [noderef] __percpu *[noderef] __iomem *percpu_base
irq-gic.c:1172:44: got void [noderef] __iomem *[noderef] __percpu *
...
irq-gic.c:1231:43: warning: incorrect type in argument 1 (different address spaces)
irq-gic.c:1231:43: expected void [noderef] __percpu *__pdata
irq-gic.c:1231:43: got void [noderef] __percpu *[noderef] __iomem *percpu_base
There were no changes in the resulting object files.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/all/20241213145809.2918-2-ubizjak@gmail.com
|
|
Make queue_iostats_passthrough_show() report 0/1 in sysfs instead of 0/4.
This patch fixes the following sparse warning:
block/blk-sysfs.c:266:31: warning: incorrect type in argument 1 (different base types)
block/blk-sysfs.c:266:31: expected unsigned long var
block/blk-sysfs.c:266:31: got restricted blk_flags_t
Cc: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Fixes: 110234da18ab ("block: enable passthrough command statistics")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241212212941.1268662-4-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Move a statement that occurs in both branches of an if-statement in front
of the if-statement. Fix a typo in a source code comment. No functionality
has been changed.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20241212212941.1268662-3-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since commit fde02699c242 ("block: mq-deadline: Remove support for zone
write locking"), the local variable 'insert_before' is assigned once and
is used once. Hence remove this local variable.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20241212212941.1268662-2-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When using svcr_in to check ZA and Streaming Mode, we should make sure
that the value in x2 is correct, otherwise it may trigger an Illegal
instruction if FEAT_SVE and !FEAT_SME.
Fixes: 43e3f85523e4 ("kselftest/arm64: Add SME support to syscall ABI test")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241211111639.12344-1-o451686892@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When a PASID is used for SVA by a device, it's possible that the PASID
entry is cleared before the device flushes all ongoing DMA requests and
removes the SVA domain. This can occur when an exception happens and the
process terminates before the device driver stops DMA and calls the
iommu driver to unbind the PASID.
There's no need to drain the PRQ in the mm release path. Instead, the PRQ
will be drained in the SVA unbind path.
Unfortunately, commit c43e1ccdebf2 ("iommu/vt-d: Drain PRQs when domain
removed from RID") changed this behavior by unconditionally draining the
PRQ in intel_pasid_tear_down_entry(). This can lead to a potential
sleeping-in-atomic-context issue.
Smatch static checker warning:
drivers/iommu/intel/prq.c:95 intel_iommu_drain_pasid_prq()
warn: sleeping in atomic context
To avoid this issue, prevent draining the PRQ in the SVA mm release path
and restore the previous behavior.
Fixes: c43e1ccdebf2 ("iommu/vt-d: Drain PRQs when domain removed from RID")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/linux-iommu/c5187676-2fa2-4e29-94e0-4a279dc88b49@stanley.mountain/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241212021529.1104745-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The qi_batch is allocated when assigning cache tag for a domain. While
for nested parent domain, it is missed. Hence, when trying to map pages
to the nested parent, NULL dereference occurred. Also, there is potential
memleak since there is no lock around domain->qi_batch allocation.
To solve it, add a helper for qi_batch allocation, and call it in both
the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain().
BUG: kernel NULL pointer dereference, address: 0000000000000200
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8104795067 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
Call Trace:
? __die+0x24/0x70
? page_fault_oops+0x80/0x150
? do_user_addr_fault+0x63/0x7b0
? exc_page_fault+0x7c/0x220
? asm_exc_page_fault+0x26/0x30
? cache_tag_flush_range_np+0x13c/0x260
intel_iommu_iotlb_sync_map+0x1a/0x30
iommu_map+0x61/0xf0
batch_to_domain+0x188/0x250
iopt_area_fill_domains+0x125/0x320
? rcu_is_watching+0x11/0x50
iopt_map_pages+0x63/0x100
iopt_map_common.isra.0+0xa7/0x190
iopt_map_user_pages+0x6a/0x80
iommufd_ioas_map+0xcd/0x1d0
iommufd_fops_ioctl+0x118/0x1c0
__x64_sys_ioctl+0x93/0xc0
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
Cc: stable@vger.kernel.org
Co-developed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241210130322.17175-1-yi.l.liu@intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The current implementation removes cache tags after disabling ATS,
leading to potential memory leaks and kernel crashes. Specifically,
CACHE_TAG_DEVTLB type cache tags may still remain in the list even
after the domain is freed, causing a use-after-free condition.
This issue really shows up when multiple VFs from different PFs
passed through to a single user-space process via vfio-pci. In such
cases, the kernel may crash with kernel messages like:
BUG: kernel NULL pointer dereference, address: 0000000000000014
PGD 19036a067 P4D 1940a3067 PUD 136c9b067 PMD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 74 UID: 0 PID: 3183 Comm: testCli Not tainted 6.11.9 #2
RIP: 0010:cache_tag_flush_range+0x9b/0x250
Call Trace:
<TASK>
? __die+0x1f/0x60
? page_fault_oops+0x163/0x590
? exc_page_fault+0x72/0x190
? asm_exc_page_fault+0x22/0x30
? cache_tag_flush_range+0x9b/0x250
? cache_tag_flush_range+0x5d/0x250
intel_iommu_tlb_sync+0x29/0x40
intel_iommu_unmap_pages+0xfe/0x160
__iommu_unmap+0xd8/0x1a0
vfio_unmap_unpin+0x182/0x340 [vfio_iommu_type1]
vfio_remove_dma+0x2a/0xb0 [vfio_iommu_type1]
vfio_iommu_type1_ioctl+0xafa/0x18e0 [vfio_iommu_type1]
Move cache_tag_unassign_domain() before iommu_disable_pci_caps() to fix
it.
Fixes: 3b1d9e2b2d68 ("iommu/vt-d: Add cache tag assignment interface")
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20241129020506.576413-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Commit eaf62ce1563b ("arm64/signal: Set up and restore the GCS
context for signal handlers") introduced a potential failure point
at the end of setup_return(). This is unfortunate as it is too late
to deliver a SIGSEGV: if that SIGSEGV is handled, the subsequent
sigreturn will end up returning to the original handler, which is
not the intention (since we failed to deliver that signal).
Make sure this does not happen by calling gcs_signal_entry()
at the very beginning of setup_return(), and add a comment just
after to discourage error cases being introduced from that point
onwards.
While at it, also take care of copy_siginfo_to_user(): since it may
fail, we shouldn't be calling it after setup_return() either. Call
it before setup_return() instead, and move the setting of X1/X2
inside setup_return() where it belongs (after the "point of no
failure").
Background: the first part of setup_rt_frame(), including
setup_sigframe(), has no impact on the execution of the interrupted
thread. The signal frame is written to the stack, but the stack
pointer remains unchanged. Failure at this stage can be recovered by
a SIGSEGV handler, and sigreturn will restore the original context,
at the point where the original signal occurred. On the other hand,
once setup_return() has updated registers including SP, the thread's
control flow has been modified and we must deliver the original
signal.
Fixes: eaf62ce1563b ("arm64/signal: Set up and restore the GCS context for signal handlers")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241210160940.2031997-1-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
dlserver time is accounted when:
- dlserver is active and the dlserver proxies the cfs task.
- dlserver is active but deferred and cfs task runs after being picked
through the normal fair class pick.
dl_server_update is called in two places to make sure that both the
above times are accounted for. But it doesn't check if dlserver is
active or not. Now that we have this dl_server_active flag, we can
consolidate dl_server_update into one place and all we need to check is
whether dlserver is active or not. When dlserver is active there is only
two possible conditions:
- dlserver is deferred.
- cfs task is running on behalf of dlserver.
Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server")
Signed-off-by: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # ROCK 5B
Link: https://lore.kernel.org/r/20241213032244.877029-2-vineeth@bitbyteword.org
|
|
dlserver can get dequeued during a dlserver pick_task due to the delayed
deueue feature and this can lead to issues with dlserver logic as it
still thinks that dlserver is on the runqueue. The dlserver throttling
and replenish logic gets confused and can lead to double enqueue of
dlserver.
Double enqueue of dlserver could happend due to couple of reasons:
Case 1
------
Delayed dequeue feature[1] can cause dlserver being stopped during a
pick initiated by dlserver:
__pick_next_task
pick_task_dl -> server_pick_task
pick_task_fair
pick_next_entity (if (sched_delayed))
dequeue_entities
dl_server_stop
server_pick_task goes ahead with update_curr_dl_se without knowing that
dlserver is dequeued and this confuses the logic and may lead to
unintended enqueue while the server is stopped.
Case 2
------
A race condition between a task dequeue on one cpu and same task's enqueue
on this cpu by a remote cpu while the lock is released causing dlserver
double enqueue.
One cpu would be in the schedule() and releasing RQ-lock:
current->state = TASK_INTERRUPTIBLE();
schedule();
deactivate_task()
dl_stop_server();
pick_next_task()
pick_next_task_fair()
sched_balance_newidle()
rq_unlock(this_rq)
at which point another CPU can take our RQ-lock and do:
try_to_wake_up()
ttwu_queue()
rq_lock()
...
activate_task()
dl_server_start() --> first enqueue
wakeup_preempt() := check_preempt_wakeup_fair()
update_curr()
update_curr_task()
if (current->dl_server)
dl_server_update()
enqueue_dl_entity() --> second enqueue
This bug was not apparent as the enqueue in dl_server_start doesn't
usually happen because of the defer logic. But as a side effect of the
first case(dequeue during dlserver pick), dl_throttled and dl_yield will
be set and this causes the time accounting of dlserver to messup and
then leading to a enqueue in dl_server_start.
Have an explicit flag representing the status of dlserver to avoid the
confusion. This is set in dl_server_start and reset in dlserver_stop.
Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Marcel Ziswiler <marcel.ziswiler@codethink.co.uk> # ROCK 5B
Link: https://lkml.kernel.org/r/20241213032244.877029-1-vineeth@bitbyteword.org
|
|
esre_attribute::store() is not needed since commit af97a77bc01c (efi:
Move some sysfs files to be read-only by root). Drop it.
Found by https://github.com/jirislaby/clang-struct.
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-efi@vger.kernel.org
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into next-rc
xfs: bug fixes for 6.13 [01/12]
Bug fixes for 6.13.
This has been running on the djcloud for months with no problems. Enjoy!
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
Take advantage of the multigrain timestamp APIs to ensure that nobody
can sneak in and write things to a file between starting a file update
operation and committing the results. This should have been part of the
multigrain timestamp merge, but I forgot to fling it at jlayton when he
resubmitted the patchset due to developer bandwidth problems.
Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 4e40eff0b5737c ("fs: add infrastructure for multigrain timestamps")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
|
V4 symlink blocks didn't have headers, so return early if this is a V4
filesystem.
Cc: <stable@vger.kernel.org> # v5.1
Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The logic to check that the region past the end of the superblock is all
zeroes is wrong -- we don't want to check only the bytes past the end of
the maximally sized ondisk superblock structure as currently defined in
xfs_format.h; we want to check the bytes beyond the end of the ondisk as
defined by the feature bits.
Port the superblock size logic from xfs_repair and then put it to use in
xfs_scrub.
Cc: <stable@vger.kernel.org> # v4.15
Fixes: 21fb4cb1981ef7 ("xfs: scrub the secondary superblocks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
The checks that were added to the superblock scrubber for metadata
directories aren't quite right -- the old inode pointers are now defined
to be zeroes until someone else reuses them. Also consolidate the new
metadir field checks to one place; they were inexplicably scattered
around.
Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 28d756d4d562dc ("xfs: update sb field checks when metadir is turned on")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
If the /quotas dirent points to an inode but the inode isn't loadable
(and hence mkdir returns -EEXIST), don't crash, just bail out.
Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: e80fbe1ad8eff7 ("xfs: use metadir for quota inodes")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Only directories or regular files are allowed in the metadata directory
tree. Don't move the repair tempfile to the metadir namespace if this
is not true; this will cause the inode verifiers to trip.
xrep_tempfile_adjust_directory_tree opportunistically moves sc->tempip
from the regular directory tree to the metadata directory tree if sc->ip
is part of the metadata directory tree. However, the scrub setup
functions grab sc->ip and create sc->tempip before we actually get
around to checking if the file mode is the right type for the scrubber.
IOWs, you can invoke the symlink scrubber with the file handle of a
subdirectory in the metadir. xrep_setup_symlink will create a temporary
symlink file, xrep_tempfile_adjust_directory_tree will foolishly try to
set the METADATA flag on the temp symlink, which trips the inode
verifier in the inode item precommit, which shuts down the filesystem
when expensive checks are turned on. If they're /not/ turned on, then
xchk_symlink will return ENOENT when it sees that it's been passed a
symlink, but the invalid inode could still get flushed to disk. We
don't want that.
Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 9dc31acb01a1c7 ("xfs: move repair temporary files to the metadata directory tree")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
For a sparse inodes filesystem, mkfs.xfs computes the values of
sb_spino_align and sb_inoalignmt with the following code:
int cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;
if (cfg->sb_feat.crcs_enabled)
cluster_size *= cfg->inodesize / XFS_DINODE_MIN_SIZE;
sbp->sb_spino_align = cluster_size >> cfg->blocklog;
sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK *
cfg->inodesize >> cfg->blocklog;
On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results
in cluster_size = 8192 * (512 / 256) = 16384. As a result,
sb_spino_align and sb_inoalignmt are both set to zero. Unfortunately,
this trips the new sb_spino_align check that was just added to
xfs_validate_sb_common, and the mkfs fails:
# mkfs.xfs -f -b size=64k, /dev/sda
meta-data=/dev/sda isize=512 agcount=4, agsize=81136 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1 nrext64=1
= exchange=0 metadir=0
data = bsize=65536 blocks=324544, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=65536 ascii-ci=0, ftype=1, parent=0
log =internal log bsize=65536 blocks=5006, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=65536 blocks=0, rtextents=0
= rgcount=0 rgsize=0 extents
Discarding blocks...Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: writing AG headers failed, err=22
Prior to commit 59e43f5479cce1 this all worked fine, even if "sparse"
inodes are somewhat meaningless when everything fits in a single
fsblock. Adjust the checks to handle existing filesystems.
Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 59e43f5479cce1 ("xfs: sb_spino_align is not verified")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Now that we've converted the dquot logging machinery to attach the dquot
buffer to the li_buf pointer so that the AIL dqflush doesn't have to
allocate or read buffers in a reclaim path, do the same for the
quotacheck code so that the reclaim shrinker dqflush call doesn't have
to do that either.
Cc: <stable@vger.kernel.org> # v6.12
Fixes: 903edea6c53f09 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|