Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, the drivers one in ufs simply delays running a work
queue and the generic one in zoned storage switches to a more correct
API that tries the standard buddy allocator first (for small
allocations); this fixes an allocation problem with small allocations
seen under memory pressure"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Start the RTC update work later
scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, usual leaders in amdgpu and xe, then a panel quirk, and
some fixes to imagination and panthor drivers. Seems around the usual
level for this time and don't know of any big problems.
amdgpu:
- Brightness fix
- DC vbios parsing fix
- ACPI fix
- SMU 14.x fix
- Power workload profile fix
- GC partitioning fix
- Debugfs fixes
imagination:
- Track PVR context per file
- Break ref-counting cycle
panel-orientation-quirks:
- Fix matching Lenovo Yoga Tab 3 X90F
panthor:
- Lock VM array
- Be strict about I/O mapping flags
xe:
- Fix ccs_mode setting for Xe2 and later
- Synchronize ccs_mode setting with client creation
- Apply scheduling WA for LNL in additional places as needed
- Fix leak and lock handling in error paths of xe_exec ioctl
- Fix GGTT allocation leak leading to eventual crash in SR-IOV
- Move run_ticks update out of job handling to avoid synchronization
with reader"
* tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
drm/panthor: Be stricter about IO mapping flags
drm/panthor: Lock XArray when getting entries for the VM
drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strict
drm/xe: Stop accumulating LRC timestamp on job_free
drm/xe/pf: Fix potential GGTT allocation leak
drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTL
drm/xe: Fix possible exec queue leak in exec IOCTL
drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
drm/amdgpu: Adjust debugfs eviction and IB access permissions
drm/amdgpu: Adjust debugfs register access permissions
drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
drm/amd/pm: correct the workload setting
drm/amd/pm: always pick the pptable from IFWI
drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
drm/amd/display: parse umc_info or vram_info based on ASIC
drm/amd/display: Fix brightness level not retained over reboot
drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout
drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout
drm/xe: Move LNL scheduling WA to xe_device.h
drm/xe: Use the filelist from drm for ccs_mode change
...
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Fix ccs_mode setting for Xe2 and later (Balasubramani)
- Synchronize ccs_mode setting with client creation (Balasubramani)
- Apply scheduling WA for LNL in additional places as needed
(Nirmoy)
- Fix leak and lock handling in error paths of xe_exec ioctl
(Matthew Brost)
- Fix GGTT allocation leak leading to eventual crash in SR-IOV
(Michal Wajdeczko)
- Move run_ticks update out of job handling to avoid synchronization
with reader (Lucas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/4ffcebtluaaaohquxfyf5babpihmtscxwad3jjmt5nggwh2xpm@ztw67ucywttg
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
imagination:
- Track PVR context per file
- Break ref-counting cycle
panel-orientation-quirks:
- Fix matching Lenovo Yoga Tab 3 X90F
panthor:
- Lock VM array
- Be strict about I/O mapping flags
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241108085058.GA37468@linux.fritz.box
|
|
We silence btree errors in btree_node_scan, since it's probing and
errors are expected: add a fake pass so that btree_node_scan is no
longer recovery pass 0, and we don't think we're in btree node scan when
reading btree roots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we truncate a bset (due to it extending past the end of the btree
node), we can't skip the rest of the validation for e.g. the packed
format (if it's the first bset in the node).
Reported-by: syzbot+4d722d3c539d77c7bc82@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When the Tx FIFO is empty and the last command has no STOP bit
set, the master holds SCL low. If I2C_DYNAMIC_TAR_UPDATE is not
set, BIT(13) MST_ON_HOLD of IC_RAW_INTR_STAT is not enabled,
causing the __i2c_dw_disable() timeout. This is quite similar to
commit 2409205acd3c ("i2c: designware: fix __i2c_dw_disable() in
case master is holding SCL low"). Also check BIT(7)
MST_HOLD_TX_FIFO_EMPTY in IC_STATUS, which is available when
IC_STAT_FOR_CLK_STRETCH is set.
Fixes: 2409205acd3c ("i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low")
Co-developed-by: Xiaowu Ding <xiaowu.ding@jaguarmicro.com>
Signed-off-by: Xiaowu Ding <xiaowu.ding@jaguarmicro.com>
Co-developed-by: Angus Chen <angus.chen@jaguarmicro.com>
Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com>
Signed-off-by: Liu Peibao <loven.liu@jaguarmicro.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Still more changes floating than wished at this late stage, but all
are small device-specific fixes, and look less troublesome.
Including a few ASoC quirk / ID additoins, a series of ASoC STM fixes,
HD-audio conexant codec regression fix, and other various quirks and
device-specific fixes"
* tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bits
ASoC: stm: Prevent potential division by zero in stm32_sai_get_clk_div()
ASoC: stm: Prevent potential division by zero in stm32_sai_mclk_round_rate()
ASoC: amd: yc: Support dmic on another model of Lenovo Thinkpad E14 Gen 6
ASoC: SOF: amd: Fix for incorrect DMA ch status register offset
ASoC: amd: yc: fix internal mic on Xiaomi Book Pro 14 2022
ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove
MAINTAINERS: Generic Sound Card section
ALSA: usb-audio: Add quirk for HP 320 FHD Webcam
ASoC: tas2781: Add new driver version for tas2563 & tas2781 qfn chip
ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init()
ALSA: ump: Don't enumeration invalid groups for legacy rawmidi
Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- dvb-core fixes for vb2 check and device registration
- v4l2-core: fix an issue with error handling for VIDIOC_G_CTRL
- vb2 core: fix an issue with vb plane copy logic
- videobuf2-core: copy vb planes unconditionally
- vivid: fix buffer overwrite when using > 32 buffers
- vivid: fix a potential division by zero due to an issue at v4l2-tpg
- some spectre vulnerability fixes
- several OOM access fixes
- some buffer overflow fixes
* tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: videobuf2-core: copy vb planes unconditionally
media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set
media: vivid: fix buffer overwrite when using > 32 buffers
media: pulse8-cec: fix data timestamp at pulse8_setup()
media: cec: extron-da-hd-4k-plus: don't use -1 as an error code
media: stb0899_algo: initialize cfr before using it
media: adv7604: prevent underflow condition when reporting colorspace
media: cx24116: prevent overflows on SNR calculus
media: ar0521: don't overflow when checking PLL values
media: s5p-jpeg: prevent buffer overflows
media: av7110: fix a spectre vulnerability
media: mgb4: protect driver against spectre
media: dvb_frontend: don't play tricks with underflow values
media: dvbdev: prevent the risk of out of memory access
media: v4l2-tpg: prevent the risk of a division by zero
media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl()
media: dvb-core: add missing buffer index check
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fix from Vlastimil Babka:
- Fix for duplicate caches in some arm64 configurations with
CONFIG_SLAB_BUCKETS (Koichiro Den)
* tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_buckets_create
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more one-liners that fix some user visible problems:
- use correct range when clearing qgroup reservations after COW
- properly reset freed delayed ref list head
- fix ro/rw subvolume mounts to be backward compatible with old and
new mount API"
* tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix the length of reserved qgroup to free
btrfs: reinitialize delayed ref list after deleting it from the list
btrfs: fix per-subvolume RO/RW flags with new mount API
|
|
Pull bcachefs fixes from Kent Overstreet:
"Some trivial syzbot fixes, two more serious btree fixes found by
looping single_devices.ktest small_nodes:
- Topology error on split after merge, where we accidentaly picked
the node being deleted for the pivot, resulting in an assertion pop
- New nodes being preallocated were left on the freedlist, unlocked,
resulting in them sometimes being accidentally freed: this dated
from pre-cycle detector, when we could leave them locked. This
should have resulted in more explosions and fireworks, but turned
out to be surprisingly hard to hit because the preallocated nodes
were being used right away.
The fix for this is bigger than we'd like - reworking btree list
handling was a bit invasive - but we've now got more assertions and
it's well tested.
- Also another mishandled transaction restart fix (in
btree_node_prefetch) - we're almost done with those"
* tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix UAF in __promote_alloc() error path
bcachefs: Change OPT_STR max to be 1 less than the size of choices array
bcachefs: btree_cache.freeable list fixes
bcachefs: check the invalid parameter for perf test
bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket
bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recovery
bcachefs: Fix topology errors on split after merge
bcachefs: Ancient versions with bad bkey_formats are no longer supported
bcachefs: Fix error handling in bch2_btree_node_prefetch()
bcachefs: Fix null ptr deref in bucket_gen_get()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here is a (hopefully) final round of arm64 fixes for 6.12 that address
some user-visible floating point register corruption. Both of the
Marks have been working on this for a couple of weeks and we've ended
up in a position where SVE is solid but SME still has enough pending
issues that the most pragmatic solution for the release and stable
backports is to disable the feature. Yes, it's a shame, but the
hardware is rare as hen's teeth at the moment and we're better off
getting back to a known good state before fixing it all properly.
We're also improving the selftests for 6.13 to help avoid merging
broken code in the future.
Anyway, the good news is that we're removing a lot more code than
we're adding.
Summary:
- Fix handling of SVE traps from userspace on preemptible kernels
when converting the saved floating point state into SVE state.
- Remove broken support for the SMCCCv1.3 "SVE discard hint"
optimisation.
- Disable SME support, as the current support code suffers from
numerous issues around signal delivery, ptrace access and
context-switch which can lead to user-visible corruption of the
register state"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Kconfig: Make SME depend on BROKEN for now
arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint
arm64/sve: Discard stale CPU state when handling SVE traps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Madhavan Srinivasan:
- Fix spurious interrupts in Book3S HV Nested KVM
Thanks to Gautam Menghani.
* tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts
|
|
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.
However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.
The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.
Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
|
|
Hide KVM's pt_mode module param behind CONFIG_BROKEN, i.e. disable support
for virtualizing Intel PT via guest/host mode unless BROKEN=y. There are
myriad bugs in the implementation, some of which are fatal to the guest,
and others which put the stability and health of the host at risk.
For guest fatalities, the most glaring issue is that KVM fails to ensure
tracing is disabled, and *stays* disabled prior to VM-Enter, which is
necessary as hardware disallows loading (the guest's) RTIT_CTL if tracing
is enabled (enforced via a VMX consistency check). Per the SDM:
If the logical processor is operating with Intel PT enabled (if
IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load
IA32_RTIT_CTL" VM-entry control must be 0.
On the host side, KVM doesn't validate the guest CPUID configuration
provided by userspace, and even worse, uses the guest configuration to
decide what MSRs to save/load at VM-Enter and VM-Exit. E.g. configuring
guest CPUID to enumerate more address ranges than are supported in hardware
will result in KVM trying to passthrough, save, and load non-existent MSRs,
which generates a variety of WARNs, ToPA ERRORs in the host, a potential
deadlock, etc.
Fixes: f99e3daf94ff ("KVM: x86: Add Intel PT virtualization work mode")
Cc: stable@vger.kernel.org
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Message-ID: <20241101185031.1799556-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Always set irr_pending (to true) when updating APICv status to fix a bug
where KVM fails to set irr_pending when userspace sets APIC state and
APICv is disabled, which ultimate results in KVM failing to inject the
pending interrupt(s) that userspace stuffed into the vIRR, until another
interrupt happens to be emulated by KVM.
Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
be true if APICv is enabled, because not all vIRR updates will be visible
to KVM.
Hit the bug with a big hammer, even though strictly speaking KVM can scan
the vIRR and set/clear irr_pending as appropriate for this specific case.
The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't
touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as
the shortlog suggests, deleted code that updated irr_pending.
Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with
with the crucial difference that kvm_apic_update_apicv() did the scan even
when APICv was being *disabled*, e.g. due to an AVIC inhibition.
struct kvm_lapic *apic = vcpu->arch.apic;
if (vcpu->arch.apicv_active) {
/* irr_pending is always true when apicv is activated. */
apic->irr_pending = true;
apic->isr_count = 1;
} else {
apic->irr_pending = (apic_search_irr(apic) != -1);
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78
("kvm: lapic: Introduce APICv update helper function"), prior to which KVM
unconditionally set irr_pending to true in kvm_apic_set_state(), i.e.
assumed that the new virtual APIC state could have a pending IRQ.
Furthermore, in addition to introducing this issue, commit 755c2bf87860
also papered over the underlying bug: KVM doesn't ensure CPUs and devices
see APICv as disabled prior to searching the IRR. Waiting until KVM
emulates an EOI to update irr_pending "works", but only because KVM won't
emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of
memory barriers in between. I.e. leaving irr_pending set is basically
hacking around bad ordering.
So, effectively revert to the pre-b26a695a1d78 behavior for state restore,
even though it's sub-optimal if no IRQs are pending, in order to provide a
minimal fix, but leave behind a FIXME to document the ugliness. With luck,
the ordering issue will be fixed and the mess will be cleaned up in the
not-too-distant future.
Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Reported-by: Yong He <zhuangel570@gmail.com>
Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20241106015135.2462147-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Ensure that snp gctx page allocation is adequately deallocated on
failure during snp_launch_start.
Fixes: 136d8bc931c8 ("KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command")
CC: Sean Christopherson <seanjc@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: Ashish Kalra <ashish.kalra@amd.com>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: John Allen <john.allen@amd.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: "David S. Miller" <davem@davemloft.net>
CC: Michael Roth <michael.roth@amd.com>
CC: Luis Chamberlain <mcgrof@kernel.org>
CC: Russ Weight <russ.weight@linux.dev>
CC: Danilo Krummrich <dakr@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: "Rafael J. Wysocki" <rafael@kernel.org>
CC: Tianfei zhang <tianfei.zhang@intel.com>
CC: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Message-ID: <20241105010558.1266699-2-dionnaglaze@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In 08a7d2525511 ("tools arch x86: Sync the msr-index.h copy with the
kernel sources"), VMX_BASIC_MEM_TYPE_WB was removed. Use X86_MEMTYPE_WB
instead.
Fixes: 08a7d2525511 ("tools arch x86: Sync the msr-index.h copy with the
kernel sources")
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Message-ID: <20241106034031.503291-1-jsperbeck@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM x86 and selftests fixes for 6.12:
- Increase the timeout for the memslot performance selftest to avoid false
failures on arm64 and nested x86 platforms.
- Fix a goof in the guest_memfd selftest where a for-loop initialized a
bit mask to zero instead of BIT(0).
- Disable strict aliasing when building KVM selftests to prevent the
compiler from treating things like "u64 *" to "uint64_t *" cases as
undefined behavior, which can lead to nasty, hard to debug failures.
- Force -march=x86-64-v2 for KVM x86 selftests if and only if the uarch
is supported by the compiler.
- When emulating a guest TLB flush for a nested guest, flush vpid01, not
vpid02, if L2 is active but VPID is disabled in vmcs12, i.e. if L2 and
L1 are sharing VPID '0' (from L1's perspective).
- Fix a bug in the SNP initialization flow where KVM would return '0' to
userspace instead of -errno on failure.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.12
A moderately large pile of small changes here, split fairly evenly
between fixes and ID additions/quirks and all of it driver specific.
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 6.12-rc7
Here's a fix for a long-standing use-after-free in an io_edgeport debug
printk and some new modem device ids.
All have been in linux-next with no reported issues.
* tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: qcserial: add support for Sierra Wireless EM86xx
USB: serial: io_edgeport: fix use after free in debug printk
USB: serial: option: add Quectel RG650V
USB: serial: option: add Fibocom FG132 0x0112 composition
|
|
This fixes an assertion pop where we try to navigate to the target of
the backpointer, and the path level isn't what we expect.
Reported-by: syzbot+b17df21b4d370f2dc330@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Needs to match the assert later when we resize...
Reported-by: syzbot+e8eff054face85d7ea41@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The write buffer needs to be specifically flushed when going RO: keys in
the journal that haven't yet been moved to the write buffer don't have a
journal pin yet.
This fixes numerous syzbot bugs, all with symptoms of still doing writes
after we've got RO.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-11-07:
amdgpu:
- Brightness fix
- DC vbios parsing fix
- ACPI fix
- SMU 14.x fix
- Power workload profile fix
- GC partitioning fix
- Debugfs fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241107182722.14147-1-alexander.deucher@amd.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"An update for the maintainers of the AMD driver following some job
changes there"
* tag 'spi-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
MAINTAINERS: update AMD SPI maintainer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of small fixes for drivers, nothing particularly remarkable"
* tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rk808: Add apply_bit for BUCK3 on RK809
regulator: rtq2208: Fix uninitialized use of regulator_config
|
|
Map my previously used email address to my @linux.dev address.
Link: https://lkml.kernel.org/r/20241103234411.2522-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Alex Elder <elder@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geliang Tang <geliang@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Mathieu Othacehe <m.othacehe@gmail.com>
Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: Matt Ranostay <matt@ranostay.sg>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Cc: Quentin Monnet <qmo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove():
[ 57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12
[ 57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper. Leaking 1 clusters and removing the entry
[ 57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004
[...]
[ 57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[...]
[ 57.331328] Call Trace:
[ 57.331477] <TASK>
[...]
[ 57.333511] ? do_user_addr_fault+0x3e5/0x740
[ 57.333778] ? exc_page_fault+0x70/0x170
[ 57.334016] ? asm_exc_page_fault+0x2b/0x30
[ 57.334263] ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10
[ 57.334596] ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
[ 57.334913] ocfs2_xa_remove_entry+0x23/0xc0
[ 57.335164] ocfs2_xa_set+0x704/0xcf0
[ 57.335381] ? _raw_spin_unlock+0x1a/0x40
[ 57.335620] ? ocfs2_inode_cache_unlock+0x16/0x20
[ 57.335915] ? trace_preempt_on+0x1e/0x70
[ 57.336153] ? start_this_handle+0x16c/0x500
[ 57.336410] ? preempt_count_sub+0x50/0x80
[ 57.336656] ? _raw_read_unlock+0x20/0x40
[ 57.336906] ? start_this_handle+0x16c/0x500
[ 57.337162] ocfs2_xattr_block_set+0xa6/0x1e0
[ 57.337424] __ocfs2_xattr_set_handle+0x1fd/0x5d0
[ 57.337706] ? ocfs2_start_trans+0x13d/0x290
[ 57.337971] ocfs2_xattr_set+0xb13/0xfb0
[ 57.338207] ? dput+0x46/0x1c0
[ 57.338393] ocfs2_xattr_trusted_set+0x28/0x30
[ 57.338665] ? ocfs2_xattr_trusted_set+0x28/0x30
[ 57.338948] __vfs_removexattr+0x92/0xc0
[ 57.339182] __vfs_removexattr_locked+0xd5/0x190
[ 57.339456] ? preempt_count_sub+0x50/0x80
[ 57.339705] vfs_removexattr+0x5f/0x100
[...]
Reproducer uses faultinject facility to fail ocfs2_xa_remove() ->
ocfs2_xa_value_truncate() with -ENOMEM.
In this case the comment mentions that we can return 0 if
ocfs2_xa_cleanup_value_truncate() is going to wipe the entry
anyway. But the following 'rc' check is wrong and execution flow do
'ocfs2_xa_remove_entry(loc);' twice:
* 1st: in ocfs2_xa_cleanup_value_truncate();
* 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'.
Fix this by skipping the 2nd removal of the same entry and making
syzkaller repro happy.
Link: https://lkml.kernel.org/r/20241103193845.2940988-1-andrew.kanner@gmail.com
Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.")
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
Reported-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/
Tested-by: syzbot+386ce9e60fa1b18aac5b@syzkaller.appspotmail.com
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo.
This prevents the process from correctly identifying the fault address and
handling the error. From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'. This can lead to unpredictable behavior and
crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set. This effectively
restores the old behavior.
Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Co-developed-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When build with !CONFIG_MMU, the variable 'vmcore_mmap_ops'
is defined but not used:
>> fs/proc/vmcore.c:458:42: warning: unused variable 'vmcore_mmap_ops'
458 | static const struct vm_operations_struct vmcore_mmap_ops = {
Fix this by only defining it when CONFIG_MMU is enabled.
Link: https://lkml.kernel.org/r/20241101034803.9298-1-xiqi2@huawei.com
Fixes: 9cb218131de1 ("vmcore: introduce remap_oldmem_pfn_range()")
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/lkml/202410301936.GcE8yUos-lkp@intel.com/
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The inc_rlimit_get_ucounts() increments the specified rlimit counter and
then checks its limit. If the value exceeds the limit, the function
returns an error without decrementing the counter.
Link: https://lkml.kernel.org/r/20241101191940.3211128-1-roman.gushchin@linux.dev
Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting")
Signed-off-by: Andrei Vagin <avagin@google.com>
Co-developed-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Tested-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Gladkov <legion@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test should be skipped if initial conditions aren't fulfilled in the
start instead of failing and outputting non-compliant TAP logs. This kind
of failure pollutes the results. The initial conditions are:
- The test should only execute if /tmp file can be allocated.
- The test should only execute if huge pages are free.
Before:
TAP version 13
1..4
Bail out! Error opening file
: Read-only file system (30)
# Planned tests != run tests (4 != 0)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
After:
TAP version 13
1..0 # SKIP Unable to allocate file: Read-only file system
Link: https://lkml.kernel.org/r/20241101141557.3159432-1-usama.anjum@collabora.com
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Fixes: 3a103b5315b7 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()")
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If we add ``thp_anon=32,64K:always`` to the kernel command line, we
will see the following error:
[ 0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting
This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```,
as [KMG] must follow each number to especify its unit. So, the correct
format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```.
Therefore, adjust the documentation to reflect the correct format of the
parameter ``thp_anon=``.
Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com
Fixes: dd4d30d1cdbe ("mm: override mTHP "enabled" defaults at kernel cmdline")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Acked-by: Barry Song <baohua@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
damon_feed_loop_next_input() is inefficient and fragile to overflows.
Specifically, 'score_goal_diff_bp' calculation can overflow when 'score'
is high. The calculation is actually unnecessary at all because 'goal' is
a constant of value 10,000. Calculation of 'compensation' is again
fragile to overflow. Final calculation of return value for under-achiving
case is again fragile to overflow when the current score is
under-achieving the target.
Add two corner cases handling at the beginning of the function to make the
body easier to read, and rewrite the body of the function to avoid
overflows and the unnecessary bp value calcuation.
Link: https://lkml.kernel.org/r/20241031161203.47751-1-sj@kernel.org
Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/944f3d5b-9177-48e7-8ec9-7f1331a3fea3@roeck-us.net
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: <stable@vger.kernel.org> [6.8.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
DAMON's logics to determine if this is the time to apply damos schemes
assumes next_apply_sis is always set larger than current
passed_sample_intervals. And therefore assume continuously incrementing
passed_sample_intervals will make it reaches to the next_apply_sis in
future. The logic hence does apply the scheme and update next_apply_sis
only if passed_sample_intervals is same to next_apply_sis.
If Schemes apply interval is set as zero, however, next_apply_sis is set
same to current passed_sample_intervals, respectively. And
passed_sample_intervals is incremented before doing the next_apply_sis
check. Hence, next_apply_sis becomes larger than next_apply_sis, and the
logic says it is not the time to apply schemes and update next_apply_sis.
In other words, DAMON stops applying schemes until passed_sample_intervals
overflows.
Based on the documents and the common sense, a reasonable behavior for
such inputs would be applying the schemes for every sampling interval.
Handle the case by removing the assumption.
Link: https://lkml.kernel.org/r/20241031183757.49610-3-sj@kernel.org
Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/damon/core: fix handling of zero non-sampling intervals".
DAMON's internal intervals accounting logic is not correctly handling
non-sampling intervals of zero values for a wrong assumption. This could
cause unexpected monitoring behavior, and even result in infinite hang of
DAMON sysfs interface user threads in case of zero aggregation interval.
Fix those by updating the intervals accounting logic. For details of the
root case and solutions, please refer to commit messages of fixes.
This patch (of 2):
DAMON's logics to determine if this is the time to do aggregation and ops
update assumes next_{aggregation,ops_update}_sis are always set larger
than current passed_sample_intervals. And therefore it further assumes
continuously incrementing passed_sample_intervals every sampling interval
will make it reaches to the next_{aggregation,ops_update}_sis in future.
The logic therefore make the action and update
next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same
to the counts, respectively.
If Aggregation interval or Ops update interval are zero, however,
next_aggregation_sis or next_ops_update_sis are set same to current
passed_sample_intervals, respectively. And passed_sample_intervals is
incremented before doing the next_{aggregation,ops_update}_sis check.
Hence, passed_sample_intervals becomes larger than
next_{aggregation,ops_update}_sis, and the logic says it is not the time
to do the action and update next_{aggregation,ops_update}_sis forever,
until an overflow happens. In other words, DAMON stops doing aggregations
or ops updates effectively forever, and users cannot get monitoring
results.
Based on the documents and the common sense, a reasonable behavior for
such inputs is doing an aggregation and an ops update for every sampling
interval. Handle the case by removing the assumption.
Note that this could incur particular real issue for DAMON sysfs interface
users, in case of zero Aggregation interval. When user starts DAMON with
zero Aggregation interval and asks online DAMON parameter tuning via DAMON
sysfs interface, the request is handled by the aggregation callback.
Until the callback finishes the work, the user who requested the online
tuning just waits. Hence, the user will be stuck until the
passed_sample_intervals overflows.
Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@kernel.org
Fixes: 4472edf63d66 ("mm/damon/core: use number of passed access sampling as a timer")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [6.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After commit 94d7d9233951 ("mm: abstract the vma_merge()/split_vma()
pattern for mprotect() et al."), if vma_modify_flags() return error, the
vma is set to an error code. This will lead to an invalid prev be
returned.
Generally this shouldn't matter as the caller should treat an error as
indicating state is now invalidated, however unfortunately
apply_mlockall_flags() does not check for errors and assumes that
mlock_fixup() correctly maintains prev even if an error were to occur.
This patch fixes that assumption.
[lorenzo.stoakes@oracle.com: provide a better fix and rephrase the log]
Link: https://lkml.kernel.org/r/20241027123321.19511-1-richard.weiyang@gmail.com
Fixes: 94d7d9233951 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it
will use kmalloc if user specifies that combination. Here the reason why
combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does
not support all GFP flag, especially GFP_ATOMIC. So we should check if
gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first. This
ensures caller can sleep. And for the robustness, even if vmalloc fails,
it should retry with kmalloc to allocate it.
Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com
Fixes: aff1871bfc81 ("objpool: fix choosing allocation for percpu slots")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/
Cc: Leo Yan <leo.yan@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wu <wuqiang.matt@bytedance.com>
Cc: Mikel Rychliski <mikel@mikelr.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
OOM kills due to vastly overestimated free highatomic reserves were
observed:
... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
The second line above shows that the OOM kill was due to the following
condition:
free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
And the third line shows there were no free pages in any
MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'.
Therefore __zone_watermark_unusable_free() underestimated the usable free
memory by over 1GB, which resulted in the unnecessary OOM kill above.
The comments in __zone_watermark_unusable_free() warns about the potential
risk, i.e.,
If the caller does not have rights to reserves below the min
watermark then subtract the high-atomic reserves. This will
over-estimate the size of the atomic reserve but it avoids a search.
However, it is possible to keep track of free pages in reserved highatomic
pageblocks with a new per-zone counter nr_free_highatomic protected by the
zone lock, to avoid a search when calculating the usable free memory. And
the cost would be minimal, i.e., simple arithmetics in the highatomic
alloc/free/move paths.
Note that since nr_free_highatomic can be relatively small, using a
per-cpu counter might cause too much drift and defeat its purpose, in
addition to the extra memory overhead.
Dependson e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") - see [1]
[akpm@linux-foundation.org: s/if/else if/, per Johannes, stealth whitespace tweak]
Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/0d0ddb33-fcdc-43e2-801f-0c1df2031afb@suse.cz [1]
Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Link Lin <linkl@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the error recovery path of mlx5_vdpa_dev_add(), the cleanup is
executed and at the end put_device() is called which ends up calling
mlx5_vdpa_free(). This function will execute the same cleanup all over
again. Most resources support being cleaned up twice, but the recent
mlx5_vdpa_destroy_mr_resources() doesn't.
This change drops the explicit cleanup from within the
mlx5_vdpa_dev_add() and lets mlx5_vdpa_free() do its work.
This issue was discovered while trying to add 2 vdpa devices with the
same name:
$> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.2
$> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.3
... yields the following dump:
BUG: kernel NULL pointer dereference, address: 00000000000000b8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 4 UID: 0 PID: 2811 Comm: vdpa Not tainted 6.12.0-rc6 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:destroy_workqueue+0xe/0x2a0
Code: ...
RSP: 0018:ffff88814920b9a8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff888105c10000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff888100400168 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff888100120c00 R09: ffffffff828578c0
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff888131fd99a0 R14: 0000000000000000 R15: ffff888105c10580
FS: 00007fdfa6b4f740(0000) GS:ffff88852ca00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000b8 CR3: 000000018db09006 CR4: 0000000000372eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __die+0x20/0x60
? page_fault_oops+0x150/0x3e0
? exc_page_fault+0x74/0x130
? asm_exc_page_fault+0x22/0x30
? destroy_workqueue+0xe/0x2a0
mlx5_vdpa_destroy_mr_resources+0x2b/0x40 [mlx5_vdpa]
mlx5_vdpa_free+0x45/0x150 [mlx5_vdpa]
vdpa_release_dev+0x1e/0x50 [vdpa]
device_release+0x31/0x90
kobject_put+0x8d/0x230
mlx5_vdpa_dev_add+0x328/0x8b0 [mlx5_vdpa]
vdpa_nl_cmd_dev_add_set_doit+0x2b8/0x4c0 [vdpa]
genl_family_rcv_msg_doit+0xd0/0x120
genl_rcv_msg+0x180/0x2b0
? __vdpa_alloc_device+0x1b0/0x1b0 [vdpa]
? genl_family_rcv_msg_dumpit+0xf0/0xf0
netlink_rcv_skb+0x54/0x100
genl_rcv+0x24/0x40
netlink_unicast+0x1fc/0x2d0
netlink_sendmsg+0x1e4/0x410
__sock_sendmsg+0x38/0x60
? sockfd_lookup_light+0x12/0x60
__sys_sendto+0x105/0x160
? __count_memcg_events+0x53/0xe0
? handle_mm_fault+0x100/0x220
? do_user_addr_fault+0x40d/0x620
__x64_sys_sendto+0x20/0x30
do_syscall_64+0x4c/0x100
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fdfa6c66b57
Code: ...
RSP: 002b:00007ffeace22998 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000055a498608350 RCX: 00007fdfa6c66b57
RDX: 000000000000006c RSI: 000055a498608350 RDI: 0000000000000003
RBP: 00007ffeace229c0 R08: 00007fdfa6d35200 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000202 R12: 000055a4986082a0
R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeace233f3
</TASK>
Modules linked in: ...
CR2: 00000000000000b8
Fixes: 62111654481d ("vdpa/mlx5: Postpone MR deletion")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Message-Id: <20241105185101.1323272-2-dtatulea@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
|
|
If we error in data_update_init() after adding to the rhashtable of
outstanding promotes, kfree_rcu() is required.
Reported-by: Reed Riley <reed@riley.engineer>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Change OPT_STR max value to be 1 less than the "ARRAY_SIZE" of "_choices"
array. As a result, remove -1 from (opt->max-1) in bch2_opt_to_text.
The "_choices" array is a null-terminated array, so computing the maximum
using "ARRAY_SIZE" without subtracting 1 yields an incorrect result. Since
bch2_opt_validate don't subtract 1, as bch2_opt_to_text does, values
bigger than the actual maximum would pass through option validation.
Reported-by: syzbot+bee87a0c3291c06aa8c6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bee87a0c3291c06aa8c6
Fixes: 63c4b2545382 ("bcachefs: Better superblock opt validation")
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When allocating new btree nodes, we were leaving them on the freeable
list - unlocked - allowing them to be reclaimed: ouch.
Additionally, bch2_btree_node_free_never_used() ->
bch2_btree_node_hash_remove was putting it on the freelist, while
bch2_btree_node_free_never_used() was putting it back on the btree
update reserve list - ouch.
Originally, the code was written to always keep btree nodes on a list -
live or freeable - and this worked when new nodes were kept locked.
But now with the cycle detector, we can't keep nodes locked that aren't
tracked by the cycle detector; and this is fine as long as they're not
reachable.
We also have better and more robust leak detection now, with memory
allocation profiling, so the original justification no longer applies.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The perf_test does not check the number of iterations and threads
when it is zero. If nr_thread is 0, the perf test will keep
waiting for wakekup. If iteration is 0, it will cause exception
of division by zero. This can be reproduced by:
echo "rand_insert 0 1" > /sys/fs/bcachefs/${uuid}/perf_test
or
echo "rand_insert 1 0" > /sys/fs/bcachefs/${uuid}/perf_test
Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bio_kmalloc may return NULL, will cause NULL pointer dereference.
Add check NULL return for bio_kmalloc in journal_read_bucket.
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: ac10a9611d87 ("bcachefs: Some fixes for building in userspace")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If BCH_FS_may_go_rw is not yet set, it indicates to the transaction
commit path that updates should be done via the list of journal replay
keys.
This must be set before multithreaded use commences.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If a btree split picks a pivot that's being deleted by a btree node
merge, we're going to have problems.
Fix this by checking if the pivot is being deleted, the same as we check
for deletions in journal replay keys.
Found by single_devic.ktest small_nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Syzbot found an assertion pop, by generating an ancient filesystem
version with an invalid bkey_format (with fields that can overflow) as
well as packed keys that aren't representable unpacked.
This breaks key comparisons in all sorts of painful ways.
Filesystems have been automatically rewriting nodes with such invalid
formats for years; we can safely drop support for them.
Reported-by: syzbot+8a0109511de9d4b61217@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|