Age | Commit message (Collapse) | Author |
|
Spelling mistake (triple letters) in comment. Detected with the help of
Coccinelle.
Link: https://lkml.kernel.org/r/20220521111145.81697-80-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Spelling mistake (triple letters) in comment. Detected with the help of
Coccinelle.
Link: https://lkml.kernel.org/r/20220521111145.81697-94-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce process_mrelease syscall sanity tests which include tests
which expect to fail:
- process_mrelease with invalid pidfd and flags inputs
- process_mrelease on a live process with no pending signals
and valid process_mrelease usage which is expected to succeed. Because
process_mrelease has to be used against a process with a pending SIGKILL,
it's possible that the process exits before process_mrelease gets called.
In such cases we retry the test with a victim that allocates twice more
memory up to 1GB. This would require the victim process to spend more
time during exit and process_mrelease has a better chance of catching the
process before it exits and succeeding.
On success the test reports the amount of memory the child had to allocate
for reaping to succeed. Sample output:
$ mrelease_test
Success reaping a child with 1MB of memory allocations
On failure the test reports the failure. Sample outputs:
$ mrelease_test
All process_mrelease attempts failed!
$ mrelease_test
process_mrelease: Invalid argument
Link: https://lkml.kernel.org/r/20220518204316.13131-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 3a235693d3930e1276c8d9cc0ca5807ef292cf0a.
Its premise was that cgroup reclaim cares about freeing memory inside the
cgroup, and demotion just moves them around within the cgroup limit.
Hence, pages from toptier nodes should be reclaimed directly.
However, with NUMA balancing now doing tier promotions, demotion is part
of the page aging process. Global reclaim demotes the coldest toptier
pages to secondary memory, where their life continues and from which they
have a chance to get promoted back. Essentially, tiered memory systems
have an LRU order that spans multiple nodes.
When cgroup reclaims pages coming off the toptier directly, there can be
colder pages on lower tier nodes that were demoted by global reclaim.
This is an aging inversion, not unlike if cgroups were to reclaim directly
from the active lists while there are inactive pages.
Proactive reclaim is another factor. The goal of that it is to offload
colder pages from expensive RAM to cheaper storage. When lower tier
memory is available as an intermediate layer, we want offloading to take
advantage of it instead of bypassing to storage.
Revert the patch so that cgroups respect the LRU order spanning the memory
hierarchy.
Of note is a specific undercommit scenario, where all cgroup limits in the
system add up to <= available toptier memory. In that case, shuffling
pages out to lower tiers first to reclaim them from there is inefficient.
This is something could be optimized/short-circuited later on (although
care must be taken not to accidentally recreate the aging inversion).
Let's ensure correctness first.
Link: https://lkml.kernel.org/r/20220518190911.82400-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
By printing information, we can friendly prompt the status change
information of kfence by dmesg and record by syslog.
Also, set kfence_enabled to false only when needed.
Link: https://lkml.kernel.org/r/20220518073105.3160335-1-liu.yun@linux.dev
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Co-developed-by: Marco Elver <elver@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
percpu_alloc_percpu event trace"
Fix sparse warning about incorrect gfp_t cast.
Link: https://lkml.kernel.org/r/001979f3-e978-0998-cbed-61a4a2ac87b8@openvz.org
Fixes: f67bed134a05 ("percpu: improve percpu_alloc_percpu event trace")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
conversion"
Redefines __def_gfpflag_names array according to akpm@, willy@ and Joe
Perches recommendations.
Link: https://lkml.kernel.org/r/6f811e19-41c6-f3e8-fca6-23a19a62e313@openvz.org
Fixes: fe573327ffb1 ("tracing: incorrect gfp_t conversion")
Signed-off-by: Vasily Averin <vvs@openvz.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In isolate_single_pageblock() called by start_isolate_page_range(), there
are some pageblock isolation issues causing a potential infinite loop when
isolating a page range. This is reported by Qian Cai.
1. the pageblock was isolated by just changing pageblock migratetype
without checking unmovable pages. Calling set_migratetype_isolate() to
isolate pageblock properly.
2. an off-by-one error caused migrating pages unnecessarily, since the page
is not crossing pageblock boundary.
3. migrating a compound page across pageblock boundary then splitting the
free page later has a small race window that the free page might be
allocated again, so that the code will try again, causing an potential
infinite loop. Temporarily set the to-be-migrated page's pageblock to
MIGRATE_ISOLATE to prevent that and bail out early if no free page is
found after page migration.
An additional fix to split_free_page() aims to avoid crashing in
__free_one_page(). When the free page is split at the specified
split_pfn_offset, free_page_order should check both the first bit of
free_page_pfn and the last bit of split_pfn_offset and use the smaller
one. For example, if free_page_pfn=0x10000, split_pfn_offset=0xc000,
free_page_order should first be 0x8000 then 0x4000, instead of 0x4000 then
0x8000, which the original algorithm did.
[akpm@linux-foundation.org: suppress min() warning]
Link: https://lkml.kernel.org/r/20220524194756.1698351-1-zi.yan@sent.com
Fixes: b2c9e2fbba3253 ("mm: make alloc_contig_range work at pageblock granularity")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Ren <renzhengeek@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
I have been focusing on mm for the past two years. e.g. developing,
fixing bugs, reviewing related to HugeTLB system. I would like to help
Mike and other people working on HugeTLB by reviewing their work.
When I first introduced the vmemmmap reduction, I forgot to update
MAINTAINERS file. Let's update it as well. And rename "HUGETLB
FILESYSTEM" to "HUGETLB SUBSYSTEM" since some files are not only related
to filesystem but also memory management (the name of FILESYSTEM cannot
cover this area).
Link: https://lkml.kernel.org/r/20220521074103.79468-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ZSMALLOC depends on MMU so ZRAM should also depend on MMU since 'select'
does not follow any dependency chains.
Fixes this Kconfig warning:
WARNING: unmet direct dependencies detected for ZSMALLOC
Depends on [n]: MMU [=n]
Selected by [y]:
- ZRAM [=y] && BLK_DEV [=y] && BLOCK [=y] && SYSFS [=y] && (CRYPTO_LZO [=y] || CRYPTO_ZSTD [=m] || CRYPTO_LZ4 [=m] || CRYPTO_LZ4HC [=n] || CRYPTO_842 [=n])
Link: https://lkml.kernel.org/r/20220522204027.22964-1-rdunlap@infradead.org
Fixes: b3fbd58fcbb10 ("mm: Kconfig: simplify zswap configuration")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Shmem swapoff makes no progress: the index to indices is not incremented.
But "ret" is no longer a return value, so use folio_batch_count() instead.
Link: https://lkml.kernel.org/r/c32bee8a-f0aa-245-f94e-24dd271924fa@google.com
Fixes: da08e9b79323 ("mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If the first goto is taken, 'fd' is not opened yet (and is un-initialized).
So a direct return is safer.
Link: https://lkml.kernel.org/r/628312312eb40e0e39463a2c06415fde5295c716.1653229120.git.christophe.jaillet@wanadoo.fr
Fixes: c1a31a2f7a9c ("cgroup: fix racy check in alloc_pagecache_max_30M() helper function")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Zefan Li <lizefan.x@bytedance.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: David Vernet <void@manifault.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Offload writing printk() messages on consoles to per-console
kthreads.
It prevents soft-lockups when an extensive amount of messages is
printed. It was observed, for example, during boot of large systems
with a lot of peripherals like disks or network interfaces.
It prevents live-lockups that were observed, for example, when
messages about allocation failures were reported and a CPU handled
consoles instead of reclaiming the memory. It was hard to solve even
with rate limiting because it would need to take into account the
amount of messages and the speed of all consoles.
It is a must to have for real time. Otherwise, any printk() might
break latency guarantees.
The per-console kthreads allow to handle each console on its own
speed. Slow consoles do not longer slow down faster ones. And
printk() does not longer unpredictably slows down various code paths.
There are situations when the kthreads are either not available or
not reliable, for example, early boot, suspend, or panic. In these
situations, printk() uses the legacy mode and tries to handle
consoles immediately.
- Add documentation for the printk index.
* tag 'printk-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk, tracing: fix console tracepoint
printk: remove @console_locked
printk: extend console_lock for per-console locking
printk: add kthread console printers
printk: add functions to prefer direct printing
printk: add pr_flush()
printk: move buffer definitions into console_emit_next_record() caller
printk: refactor and rework printing logic
printk: add con_printk() macro for console details
printk: call boot_delay_msec() in printk_delay()
printk: get caller_id/timestamp after migration disable
printk: wake waiters for safe and NMI contexts
printk: wake up all waiters
printk: add missing memory barrier to wake_up_klogd()
printk: cpu sync always disable interrupts
printk: rename cpulock functions
printk/index: Printk index feature documentation
MAINTAINERS: Add printk indexing maintainers on mention of printk_index
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- Conversion of slub_debug stack traces to stackdepot, allowing more
useful debugfs-based inspection for e.g. memory leak debugging.
Allocation and free debugfs info now includes full traces and is
sorted by the unique trace frequency.
The stackdepot conversion was already attempted last year but
reverted by ae14c63a9f20. The memory overhead (while not actually
enabled on boot) has been meanwhile solved by making the large
stackdepot allocation dynamic. The xfstest issues haven't been
reproduced on current kernel locally nor in -next, so the slab cache
layout changes that originally made that bug manifest were probably
not the root cause.
- Refactoring of dma-kmalloc caches creation.
- Trivial cleanups such as removal of unused parameters, fixes and
clarifications of comments.
- Hyeonggon Yoo joins as a reviewer.
* tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
MAINTAINERS: add myself as reviewer for slab
mm/slub: remove unused kmem_cache_order_objects max
mm: slab: fix comment for __assume_kmalloc_alignment
mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
mm/slub: remove unneeded return value of slab_pad_check
mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()
mm/slub: remove meaningless node check in ___slab_alloc()
mm/slub: remove duplicate flag in allocate_slab()
mm/slub: remove unused parameter in setup_object*()
mm/slab.c: fix comments
slab, documentation: add description of debugfs files for SLUB caches
mm/slub: sort debugfs output by frequency of stack traces
mm/slub: distinguish and print stack traces in debugfs files
mm/slub: use stackdepot to save stack trace in objects
mm/slub: move struct track init out of set_track()
lib/stackdepot: allow requesting early initialization dynamically
mm/slub, kunit: Make slub_kunit unaffected by user specified flags
mm/slab: remove some unused functions
|
|
Commit c724c866bb70 ("linux/types.h: remove unnecessary __bitwise__")
was right that there are no users of __bitwise__ in the kernel, but it
turns out there are user space users of it that do expect it.
It is, after all, in the uapi directory, so user space usage is to be
expected.
Instead of reverting the commit completely, let's just clarify the
situation so that it doesn't happen again, and have some in-code
explanations for why that "__bitwise__" still exists.
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Link: https://lore.kernel.org/all/b5c0a68d-8387-4909-beea-f70ab9e6e3d5@kernel.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit b2a90f4fcb14 ("media: lirc: remove unused lirc features") removed
feature flags which were never implemented, but they are still used by
the lirc daemon went built from source.
Reinstate these symbols in order not to break the lirc build.
Fixes: b2a90f4fcb14 ("media: lirc: remove unused lirc features")
Link: https://lore.kernel.org/all/a0470450-ecfd-2918-e04a-7b57c1fd7694@kernel.org/
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The IXP4xx Kconfig we ended up with for mach-ixp4xx creates
as kismet warning:
WARNING: unmet direct dependencies detected for GPIO_IXP4XX
Depends on [n]: GPIOLIB [=y] && HAS_IOMEM [=y] && ARCH_IXP4XX [=y] && OF [=n]
Selected by [y]:
- ARCH_IXP4XX [=y] && <choice>
This is because it is possible to select ARCH_IXP4XX witout
OF while that selects the GPIO driver that now depends on
OF.
Fix this by creating a single ARCH_IXP4XX kconfig that selects
USE_OF.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Imre Kaloz <kaloz@openwrt.org>
Cc: Krzysztof Halasa <khalasa@piap.pl>
Link: https://lore.kernel.org/r/20220522072356.34062-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
When kernel handles the vm-exit caused by external interrupts and NMI,
it always sets kvm_intr_type to tell if it's dealing an IRQ or NMI. For
the PMI scenario, it could be IRQ or NMI.
However, intel_pt PMIs are only generated for HARDWARE perf events, and
HARDWARE events are always configured to generate NMIs. Use
kvm_handling_nmi_from_guest() to precisely identify if the intel_pt PMI
came from the guest; this avoids false positives if an intel_pt PMI/NMI
arrives while the host is handling an unrelated IRQ VM-Exit.
Fixes: db215756ae59 ("KVM: x86: More precisely identify NMI from guest when handling PMI")
Signed-off-by: Yanfei Xu <yanfei.xu@intel.com>
Message-Id: <20220523140821.1345605-1-yanfei.xu@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fixing side effect of the so-called opportunistic change in the commit.
Fixes: dc8a9febbab0 ("KVM: selftests: x86: Fix test failure on arch lbr capable platforms")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220518170118.66263-2-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit ddd7ed842627 ("x86/kvm: Alloc dummy async #PF token outside of
raw spinlock") leads to the following Smatch static checker warning:
arch/x86/kernel/kvm.c:212 kvm_async_pf_task_wake()
warn: sleeping in atomic context
arch/x86/kernel/kvm.c
202 raw_spin_lock(&b->lock);
203 n = _find_apf_task(b, token);
204 if (!n) {
205 /*
206 * Async #PF not yet handled, add a dummy entry for the token.
207 * Allocating the token must be down outside of the raw lock
208 * as the allocator is preemptible on PREEMPT_RT kernels.
209 */
210 if (!dummy) {
211 raw_spin_unlock(&b->lock);
--> 212 dummy = kzalloc(sizeof(*dummy), GFP_KERNEL);
^^^^^^^^^^
Smatch thinks the caller has preempt disabled. The `smdb.py preempt
kvm_async_pf_task_wake` output call tree is:
sysvec_kvm_asyncpf_interrupt() <- disables preempt
-> __sysvec_kvm_asyncpf_interrupt()
-> kvm_async_pf_task_wake()
The caller is this:
arch/x86/kernel/kvm.c
290 DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt)
291 {
292 struct pt_regs *old_regs = set_irq_regs(regs);
293 u32 token;
294
295 ack_APIC_irq();
296
297 inc_irq_stat(irq_hv_callback_count);
298
299 if (__this_cpu_read(apf_reason.enabled)) {
300 token = __this_cpu_read(apf_reason.token);
301 kvm_async_pf_task_wake(token);
302 __this_cpu_write(apf_reason.token, 0);
303 wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
304 }
305
306 set_irq_regs(old_regs);
307 }
The DEFINE_IDTENTRY_SYSVEC() is a wrapper that calls this function
from the call_on_irqstack_cond(). It's inside the call_on_irqstack_cond()
where preempt is disabled (unless it's already disabled). The
irq_enter/exit_rcu() functions disable/enable preempt.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The timer is disarmed when switching between TSC deadline and other modes;
however, the pending timer is still in-flight, so let's accurately remove
any traces of the previous mode.
Fixes: 4427593258 ("KVM: x86: thoroughly disarm LAPIC timer around TSC deadline switch")
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Drop the raw spinlock in kvm_async_pf_task_wake() before allocating the
the dummy async #PF token, the allocator is preemptible on PREEMPT_RT
kernels and must not be called from truly atomic contexts.
Opportunistically document why it's ok to loop on allocation failure,
i.e. why the function won't get stuck in an infinite loop.
Reported-by: Yajun Deng <yajun.deng@linux.dev>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Whenever x86_decode_emulated_instruction() detects a breakpoint, it
returns the value that kvm_vcpu_check_breakpoint() writes into its
pass-by-reference second argument. Unfortunately this is completely
bogus because the expected outcome of x86_decode_emulated_instruction
is an EMULATION_* value.
Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to
a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK
and x86_emulate_instruction() is called without having decoded the
instruction. This causes various havoc from running with a stale
emulation context.
The fix is to move the call to kvm_vcpu_check_breakpoint() where it was
before commit 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction
emulation with decoding") introduced x86_decode_emulated_instruction().
The other caller of the function does not need breakpoint checks,
because it is invoked as part of a vmexit and the processor has already
checked those before executing the instruction that #GP'd.
This fixes CVE-2022-1852.
Reported-by: Qiuhao Li <qiuhao@sysec.org>
Reported-by: Gaoning Pan <pgn@zju.edu.cn>
Reported-by: Yongkang Jia <kangel@zju.edu.cn>
Fixes: 4aa2691dcbd3 ("KVM: x86: Factor out x86 instruction emulation with decoding")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220311032801.3467418-2-seanjc@google.com>
[Rewrote commit message according to Qiuhao's report, since a patch
already existed to fix the bug. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
For some sev ioctl interfaces, the length parameter that is passed maybe
less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data
that PSP firmware returns. In this case, kmalloc will allocate memory
that is the size of the input rather than the size of the data.
Since PSP firmware doesn't fully overwrite the allocated buffer, these
sev ioctl interface may return uninitialized kernel slab memory.
Reported-by: Andy Nguyen <theflow@google.com>
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Peter Gonda <pgonda@google.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: eaf78265a4ab3 ("KVM: SVM: Move SEV code to separate file")
Fixes: 2c07ded06427d ("KVM: SVM: add support for SEV attestation command")
Fixes: 4cfdd47d6d95a ("KVM: SVM: Add KVM_SEV SEND_START command")
Fixes: d3d1af85e2c75 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
Fixes: eba04b20e4861 ("KVM: x86: Account a variety of miscellaneous allocations")
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Set the starting uABI size of KVM's guest FPU to 'struct kvm_xsave',
i.e. to KVM's historical uABI size. When saving FPU state for usersapce,
KVM (well, now the FPU) sets the FP+SSE bits in the XSAVE header even if
the host doesn't support XSAVE. Setting the XSAVE header allows the VM
to be migrated to a host that does support XSAVE without the new host
having to handle FPU state that may or may not be compatible with XSAVE.
Setting the uABI size to the host's default size results in out-of-bounds
writes (setting the FP+SSE bits) and data corruption (that is thankfully
caught by KASAN) when running on hosts without XSAVE, e.g. on Core2 CPUs.
WARN if the default size is larger than KVM's historical uABI size; all
features that can push the FPU size beyond the historical size must be
opt-in.
==================================================================
BUG: KASAN: slab-out-of-bounds in fpu_copy_uabi_to_guest_fpstate+0x86/0x130
Read of size 8 at addr ffff888011e33a00 by task qemu-build/681
CPU: 1 PID: 681 Comm: qemu-build Not tainted 5.18.0-rc5-KASAN-amd64 #1
Hardware name: /DG35EC, BIOS ECG3510M.86A.0118.2010.0113.1426 01/13/2010
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x45
print_report.cold+0x45/0x575
kasan_report+0x9b/0xd0
fpu_copy_uabi_to_guest_fpstate+0x86/0x130
kvm_arch_vcpu_ioctl+0x72a/0x1c50 [kvm]
kvm_vcpu_ioctl+0x47f/0x7b0 [kvm]
__x64_sys_ioctl+0x5de/0xc90
do_syscall_64+0x31/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
Allocated by task 0:
(stack is not available)
The buggy address belongs to the object at ffff888011e33800
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 0 bytes to the right of
512-byte region [ffff888011e33800, ffff888011e33a00)
The buggy address belongs to the physical page:
page:0000000089cd4adb refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11e30
head:0000000089cd4adb order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 dead000000000100 dead000000000122 ffff888001041c80
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888011e33900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888011e33980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888011e33a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff888011e33a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888011e33b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
Fixes: be50b2065dfa ("kvm: x86: Add support for getting/setting expanded xstate buffer")
Fixes: c60427dd50ba ("x86/fpu: Add uabi_size to guest_fpu")
Reported-by: Zdenek Kaspar <zkaspar82@gmail.com>
Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Zdenek Kaspar <zkaspar82@gmail.com>
Message-Id: <20220504001219.983513-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fix and feature for 5.19
- ultravisor communication device driver
- fix TEID on terminating storage key ops
|
|
KVM/riscv changes for 5.19
- Added Sv57x4 support for G-stage page table
- Added range based local HFENCE functions
- Added remote HFENCE functions based on VCPU requests
- Added ISA extension registers in ONE_REG interface
- Updated KVM RISC-V maintainers entry to cover selftests support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 5.19
- Add support for the ARMv8.6 WFxT extension
- Guard pages for the EL2 stacks
- Trap and emulate AArch32 ID registers to hide unsupported features
- Ability to select and save/restore the set of hypercalls exposed
to the guest
- Support for PSCI-initiated suspend in collaboration with userspace
- GICv3 register-based LPI invalidation support
- Move host PMU event merging into the vcpu data structure
- GICv3 ITS save/restore fixes
- The usual set of small-scale cleanups and fixes
[Due to the conflict, KVM_SYSTEM_EVENT_SEV_TERM is relocated
from 4 to 6. - Paolo]
|
|
On Arch LBR capable platforms, LBR_FMT in perf capability msr is 0x3f,
so the last format test will fail. Use a true invalid format(0x30) for
the test if it's running on these platforms. Opportunistically change
the file name to reflect the tests actually carried out.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Message-Id: <20220512084046.105479-1-weijiang.yang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In commit ec0671d5684a ("KVM: LAPIC: Delay trace_kvm_wait_lapic_expire
tracepoint to after vmexit", 2019-06-04), trace_kvm_wait_lapic_expire
was moved after guest_exit_irqoff() because invoking tracepoints within
kvm_guest_enter/kvm_guest_exit caused a lockdep splat.
These days this is not necessary, because commit 87fa7f3e98a1 ("x86/kvm:
Move context tracking where it belongs", 2020-07-09) restricted
the RCU extended quiescent state to be closer to vmentry/vmexit.
Moving the tracepoint back to __kvm_wait_lapic_expire is more accurate,
because it will be reported even if vcpu_enter_guest causes multiple
vmentries via the IPI/Timer fast paths, and it allows the removal of
advance_expire_delta.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1650961551-38390-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull page cache updates from Matthew Wilcox:
- Appoint myself page cache maintainer
- Fix how scsicam uses the page cache
- Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS
- Remove the AOP flags entirely
- Remove pagecache_write_begin() and pagecache_write_end()
- Documentation updates
- Convert several address_space operations to use folios:
- is_dirty_writeback
- readpage becomes read_folio
- releasepage becomes release_folio
- freepage becomes free_folio
- Change filler_t to require a struct file pointer be the first
argument like ->read_folio
* tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits)
nilfs2: Fix some kernel-doc comments
Appoint myself page cache maintainer
fs: Remove aops->freepage
secretmem: Convert to free_folio
nfs: Convert to free_folio
orangefs: Convert to free_folio
fs: Add free_folio address space operation
fs: Convert drop_buffers() to use a folio
fs: Change try_to_free_buffers() to take a folio
jbd2: Convert release_buffer_page() to use a folio
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio
reiserfs: Convert release_buffer_page() to use a folio
fs: Remove last vestiges of releasepage
ubifs: Convert to release_folio
reiserfs: Convert to release_folio
orangefs: Convert to release_folio
ocfs2: Convert to release_folio
nilfs2: Remove comment about releasepage
nfs: Convert to release_folio
jfs: Convert to release_folio
...
|
|
Pull iomap updates from Darrick Wong:
"There's a couple of corrections sent in by Andreas for some accounting
errors.
The biggest change this time around is that writeback errors longer
clear pageuptodate nor does XFS invalidate the page cache anymore.
This brings XFS (and gfs2/zonefs) behavior in line with every other
Linux filesystem driver, and fixes some UAF bugs that only cropped up
after willy turned on multipage folios for XFS in 5.18-rc1.
Regrettably, it took all the way to the end of the 5.18 cycle to find
the source of these bugs and reach a consensus that XFS' writeback
failure behavior from 20 years ago is no longer necessary.
Summary:
- Fix a couple of accounting errors in the buffered io code.
- Discontinue the practice of marking folios !uptodate and
invalidating them when writeback fails.
This fixes some UAF bugs when multipage folios are enabled, and
brings the behavior of XFS/gfs/zonefs into alignment with the
behavior of all the other Linux filesystems"
* tag 'iomap-5.19-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: don't invalidate folios after writeback errors
iomap: iomap_write_end cleanup
iomap: iomap_write_failed fix
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This includes several large patches to improve endian handling and
remove sparse warnings. The code previously used in/out, in-place
endianness conversion functions.
Other code cleanup includes the list iterator changes.
Finally, a long standing bug was found and fixed, caused by missed
decrement on an lock struct ref count"
* tag 'dlm-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (28 commits)
dlm: use kref_put_lock in __put_lkb
dlm: use kref_put_lock in put_rsb
dlm: remove unnecessary error assign
dlm: fix missing lkb refcount handling
fs: dlm: cast resource pointer to uintptr_t
dlm: replace usage of found with dedicated list iterator variable
dlm: remove usage of list iterator for list_add() after the loop body
dlm: fix pending remove if msg allocation fails
dlm: fix wake_up() calls for pending remove
dlm: check required context while close
dlm: cleanup lock handling in dlm_master_lookup
dlm: remove found label in dlm_master_lookup
dlm: remove __user conversion warnings
dlm: move conversion to compile time
dlm: use __le types for dlm messages
dlm: use __le types for rcom messages
dlm: use __le types for dlm header
dlm: use __le types for options header
dlm: add __CHECKER__ for false positives
dlm: move global to static inits
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Various bug fixes and cleanups for ext4.
In particular, move the crypto related fucntions from fs/ext4/super.c
into a new fs/ext4/crypto.c, and fix a number of bugs found by fuzzers
and error injection tools"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: only allow test_dummy_encryption when supported
ext4: fix bug_on in __es_tree_search
ext4: avoid cycles in directory h-tree
ext4: verify dir block before splitting it
ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state
ext4: fix bug_on in ext4_writepages
ext4: refactor and move ext4_ioctl_get_encryption_pwsalt()
ext4: cleanup function defs from ext4.h into crypto.c
ext4: move ext4 crypto code to its own file crypto.c
ext4: fix memory leak in parse_apply_sb_mount_options()
ext4: reject the 'commit' option on ext2 filesystems
ext4: remove duplicated #include of dax.h in inode.c
ext4: fix race condition between ext4_write and ext4_convert_inline_data
ext4: convert symlink external data block mapping to bdev
ext4: add nowait mode for ext4_getblk()
ext4: fix journal_ioprio mount option handling
ext4: mark group as trimmed only if it was fully scanned
ext4: fix use-after-free in ext4_rename_dir_prepare
ext4: add unmount filesystem message
ext4: remove unnecessary conditionals
...
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 fixes for v5.19 merge window:
- Build, sparse, UB, and CFI fixes
- Variable scope fix
- Audio pipe logging fix
- ICL+ DSI NULL dereference fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87sfozuj44.fsf@intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Clean up the allocation of glocks that have an address space attached
- Quota locking fix and quota iomap conversion
- Fix the FITRIM error reporting
- Some list iterator cleanups
* tag 'gfs2-v5.18-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Convert function bh_get to use iomap
gfs2: use i_lock spin_lock for inode qadata
gfs2: Return more useful errors from gfs2_rgrp_send_discards()
gfs2: Use container_of() for gfs2_glock(aspace)
gfs2: Explain some direct I/O oddities
gfs2: replace 'found' with dedicated list iterator variable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Features:
- subpage:
- support for PAGE_SIZE > 4K (previously only 64K)
- make it work with raid56
- repair super block num_devices automatically if it does not match
the number of device items
- defrag can convert inline extents to regular extents, up to now
inline files were skipped but the setting of mount option
max_inline could affect the decision logic
- zoned:
- minimal accepted zone size is explicitly set to 4MiB
- make zone reclaim less aggressive and don't reclaim if there are
enough free zones
- add per-profile sysfs tunable of the reclaim threshold
- allow automatic block group reclaim for non-zoned filesystems, with
sysfs tunables
- tree-checker: new check, compare extent buffer owner against owner
rootid
Performance:
- avoid blocking on space reservation when doing nowait direct io
writes (+7% throughput for reads and writes)
- NOCOW write throughput improvement due to refined locking (+3%)
- send: reduce pressure to page cache by dropping extent pages right
after they're processed
Core:
- convert all radix trees to xarray
- add iterators for b-tree node items
- support printk message index
- user bulk page allocation for extent buffers
- switch to bio_alloc API, use on-stack bios where convenient, other
bio cleanups
- use rw lock for block groups to favor concurrent reads
- simplify workques, don't allocate high priority threads for all
normal queues as we need only one
- refactor scrub, process chunks based on their constraints and
similarity
- allocate direct io structures on stack and pass around only
pointers, avoids allocation and reduces potential error handling
Fixes:
- fix count of reserved transaction items for various inode
operations
- fix deadlock between concurrent dio writes when low on free data
space
- fix a few cases when zones need to be finished
VFS, iomap:
- add helper to check if sb write has started (usable for assertions)
- new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io"
* tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits)
btrfs: zoned: introduce a minimal zone size 4M and reject mount
btrfs: allow defrag to convert inline extents to regular extents
btrfs: add "0x" prefix for unsupported optional features
btrfs: do not account twice for inode ref when reserving metadata units
btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer
btrfs: send: avoid trashing the page cache
btrfs: send: keep the current inode open while processing it
btrfs: allocate the btrfs_dio_private as part of the iomap dio bio
btrfs: move struct btrfs_dio_private to inode.c
btrfs: remove the disk_bytenr in struct btrfs_dio_private
btrfs: allocate dio_data on stack
iomap: add per-iomap_iter private data
iomap: allow the file system to provide a bio_set for direct I/O
btrfs: add a btrfs_dio_rw wrapper
btrfs: zoned: zone finish unused block group
btrfs: zoned: properly finish block group on metadata write
btrfs: zoned: finish block group when there are no more allocatable bytes left
btrfs: zoned: consolidate zone finish functions
btrfs: zoned: introduce btrfs_zoned_bg_is_full
btrfs: improve error reporting in lookup_inline_extent_backref
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
"A single patch to fix zonefs_init_file_inode() return value"
* tag 'zonefs-5.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Fix zonefs_init_file_inode() return value
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs (and fscache) updates from Gao Xiang:
"After working on it on the mailing list for more than half a year, we
finally form 'erofs over fscache' feature into shape. Hopefully it
could bring more possibility to the communities.
The story mainly started from a new project what we called "RAFS v6" [1]
for Nydus image service almost a year ago, which enhances EROFS to be
a new form of one bootstrap (which includes metadata representing the
whole fs tree) + several data-deduplicated content addressable blobs
(actually treated as multiple devices). Each blob can represent one
container image layer but not quite exactly since all new data can be
fully existed in the previous blobs so no need to introduce another
new blob.
It is actually not a new idea (at least on my side it's much like a
simpilied casync [2] for now) and has many benefits over per-file
blobs or some other exist ways since typically each RAFS v6 image only
has dozens of device blobs instead of thousands of per-file blobs.
It's easy to be signed with user keys as a golden image, transfered
untouchedly with minimal overhead over the network, kept in some type
of storage conveniently, and run with (optional) runtime verification
but without involving too many irrelevant features crossing the system
beyond EROFS itself. At least it's our final goal and we're keeping
working on it. There was also a good summary of this approach from the
casync author [3].
Regardless further optimizations, this work is almost done in the
previous Linux release cycles. In this round, we'd like to introduce
on-demand load for EROFS with the fscache/cachefiles infrastructure,
considering the following advantages:
- Introduce new file-based backend to EROFS. Although each image only
contains dozens of blobs but in densely-deployed runC host for
example, there could still be massive blobs on a machine, which is
messy if each blob is treated as a device. In contrast, fscache and
cachefiles are really great interfaces for us to make them work.
- Introduce on-demand load to fscache and EROFS. Previously, fscache
is mainly used to caching network-likewise filesystems, now it can
support on-demand downloading for local fses too with the exact
localfs on-disk format. It has many advantages which we're been
described in the latest patchset cover letter [4]. In addition to
that, most importantly, the cached data is still stored in the
original local fs on-disk format so that it's still the one signed
with private keys but only could be partially available. Users can
fully trust it during running. Later, users can also back up
cachefiles easily to another machine.
- More reliable on-demand approach in principle. After data is all
available locally, user daemon can be no longer online in some use
cases, which helps daemon crash recovery (filesystems can still in
service) and hot-upgrade (user daemon can be upgraded more
frequently due to new features or protocols introduced.)
- Other format can also be converted to EROFS filesystem format over
the internet on the fly with the new on-demand load feature and
mounted. That is entirely possible with on-demand load feature as
long as such archive format metadata can be fetched in advance like
stargz.
In addition, although currently our target user is Nydus image service [5],
but laterly, it can be used for other use cases like on-demand system
booting, etc. As for the fscache on-demand load feature itself,
strictly it can be used for other local fses too. Laterly we could
promote most code to the iomap infrastructure and also enhance it in
the read-write way if other local fses are interested.
Thanks David Howells for taking so much time and patience on this
these months, many thanks with great respect here again! Thanks Jeffle
for working on this feature and Xin Yin from Bytedance for
asynchronous I/O implementation as well as Zichen Tian, Jia Zhu, and
Yan Song for testing, much appeciated. We're also exploring more
possibly over fscache cache management over FSDAX for secure
containers and working on more improvements and useful features for
fscache, cachefiles, and on-demand load.
In addition to "erofs over fscache", NFS export and idmapped mount are
also completed in this cycle for container use cases as well.
Summary:
- Add erofs on-demand load support over fscache
- Support NFS export for erofs
- Support idmapped mounts for erofs
- Don't prompt for risk any more when using big pcluster
- Fix buffer copy overflow of ztailpacking feature
- Several minor cleanups"
[1] https://lore.kernel.org/r/20210730194625.93856-1-hsiangkao@linux.alibaba.com
[2] https://github.com/systemd/casync
[3] http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html
[4] https://lore.kernel.org/r/20220509074028.74954-1-jefflexu@linux.alibaba.com
[5] https://github.com/dragonflyoss/image-service
* tag 'erofs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: (29 commits)
erofs: scan devices from device table
erofs: change to use asynchronous io for fscache readpage/readahead
erofs: add 'fsid' mount option
erofs: implement fscache-based data readahead
erofs: implement fscache-based data read for inline layout
erofs: implement fscache-based data read for non-inline layout
erofs: implement fscache-based metadata read
erofs: register fscache context for extra data blobs
erofs: register fscache context for primary data blob
erofs: add erofs_fscache_read_folios() helper
erofs: add anonymous inode caching metadata for data blobs
erofs: add fscache context helper functions
erofs: register fscache volume
erofs: add fscache mode check helper
erofs: make erofs_map_blocks() generally available
cachefiles: document on-demand read mode
cachefiles: add tracepoints for on-demand read mode
cachefiles: enable on-demand read mode
cachefiles: implement on-demand read
cachefiles: notify the user daemon when withdrawing cookie
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- fix referencing wrong parent directory information during rename
- introduce a sys_tz mount option to use system timezone
- improve performance while zeroing a cluster with dirsync mount option
- fix slab-out-bounds in exat_clear_bitmap() reported from syzbot
* tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: check if cluster num is valid
exfat: reduce block requests when zeroing a cluster
block: add sync_blockdev_range()
exfat: introduce mount option 'sys_tz'
exfat: fix referencing wrong parent directory information after renaming
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull fs idmapping updates from Christian Brauner:
"This contains two minor updates:
- An update to the idmapping documentation by Rodrigo making it
easier to understand that we first introduce several use-cases that
fail without idmapped mounts simply to explain how they can be
handled with idmapped mounts.
- When changing a mount's idmapping we now hold writers to make it
more robust.
This is similar to turning a mount ro with the difference that in
contrast to turning a mount ro changing the idmapping can only ever
be done once while a mount can transition between ro and rw as much
as it wants.
The vfs layer itself takes care to retrieve the idmapping of a
mount once ensuring that the idmapping used for vfs permission
checking is identical to the idmapping passed down to the
filesystem. All filesystems with FS_ALLOW_IDMAP raised take the
same precautions as the vfs in code-paths that are outside of
direct control of the vfs such as ioctl()s.
However, holding writers makes this more robust and predictable for
both the kernel and userspace.
This is a minor user-visible change. But it is extremely unlikely
to matter. The caller must've created a detached mount via
OPEN_TREE_CLONE and then handed that O_PATH fd to another process
or thread which then must've gotten a writable fd for that mount
and started creating files in there while the caller is still
changing mount properties. While not impossible it will be an
extremely rare corner-case and should in general be considered a
bug in the application. Consider making a mount MOUNT_ATTR_NOEXEC
or MOUNT_ATTR_NODEV while allowing someone else to perform lookups
or exec'ing in parallel by handing them a copy of the
OPEN_TREE_CLONE fd or another fd beneath that mount.
I've pinged all major users of idmapped mounts pointing out this
change and none of them have active writers on a mount while still
changing mount properties. It would've been strange if they did.
The rest and majority of the work will be coming through the overlayfs
tree this cycle. In addition to overlayfs this cycle should also see
support for idmapped mounts on erofs as I've acked a patch to this
effect a little while ago"
* tag 'fs.idmapped.v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
fs: hold writers when changing mount's idmapping
docs: Add small intro to idmap examples
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- dvb-usb drivers entries got reworked to avoid usage of magic numbers
to refer to data position inside tables
- vcodec driver has gained support for MT8186 and for vp8 and vp9
stateless codecs
- hantro has gained support for Hantro G1 on RK366x
- Added more h264 levels on coda960
- ccs gained support for MIPI CSI-2 28 bits per pixel raw data type
- venus driver gained support for Qualcomm custom compressed pixel
formats
- lots of driver fixes and updates
* tag 'media/v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (308 commits)
media: hantro: Enable HOLD_CAPTURE_BUF for H.264
media: hantro: Add H.264 field decoding support
media: hantro: h264: Make dpb entry management more robust
media: hantro: Stop using H.264 parameter pic_num
media: rkvdec: Enable capture buffer holding for H264
media: rkvdec-h264: Add field decoding support
media: rkvdec: Ensure decoded resolution fit coded resolution
media: rkvdec: h264: Fix reference frame_num wrap for second field
media: rkvdec: h264: Validate and use pic width and height in mbs
media: rkvdec: Move H264 SPS validation in rkvdec-h264
media: rkvdec: h264: Fix bit depth wrap in pps packet
media: rkvdec: h264: Fix dpb_valid implementation
media: rkvdec: Stop overclocking the decoder
media: v4l2: Reorder field reflist
media: h264: Sort p/b reflist using frame_num
media: v4l2: Trace calculated p/b0/b1 initial reflist
media: h264: Store all fields into the unordered list
media: h264: Store current picture fields
media: h264: Increase reference lists size to 32
media: h264: Use v4l2_h264_reference for reflist
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework updates from Rafael Wysocki:
"These mostly extend the device property API and make it easier to use
in some cases.
Specifics:
- Allow error pointer to be passed to fwnode APIs (Andy Shevchenko).
- Introduce fwnode_for_each_parent_node() (Andy Shevchenko, Douglas
Anderson).
- Advertise fwnode and device property count API calls (Andy
Shevchenko).
- Clean up fwnode_is_ancestor_of() (Andy Shevchenko).
- Convert device_{dma_supported,get_dma_attr} to fwnode (Sakari
Ailus).
- Release subnode properties with data nodes (Sakari Ailus).
- Add ->iomap() and ->irq_get() to fwnode operations (Sakari Ailus)"
* tag 'devprop-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: Advertise fwnode and device property count API calls
device property: Fix recent breakage of fwnode_get_next_parent_dev()
device property: Drop 'test' prefix in parameters of fwnode_is_ancestor_of()
device property: Introduce fwnode_for_each_parent_node()
device property: Allow error pointer to be passed to fwnode APIs
ACPI: property: Release subnode properties with data nodes
device property: Add irq_get to fwnode operation
device property: Add iomap to fwnode operations
ACPI: property: Move acpi_fwnode_device_get_match_data() up
device property: Convert device_{dma_supported,get_dma_attr} to fwnode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add a thermal library and thermal tools to wrap the netlink
interface into event-based callbacks, improve overheat condition
handling during suspend-to-idle on Intel SoCs, add some new hardware
support, fix bugs and clean up code.
Specifics:
- Add thermal library and thermal tools to encapsulate the netlink
into event based callbacks (Daniel Lezcano, Jiapeng Chong).
- Improve overheat condition handling during suspend-to-idle in the
Intel PCH thermal driver (Zhang Rui).
- Use local ops instead of global ops in devfreq_cooling (Kant Fan).
- Clean up _OSC handling in int340x (Davidlohr Bueso).
- Switch hisi_termal from CONFIG_PM_SLEEP guards to pm_sleep_ptr()
(Hesham Almatary).
- Add new k3 j72xx bangdap driver and the corresponding bindings
(Keerthy).
- Fix missing of_node_put() in the SC iMX driver at probe time
(Miaoqian Lin).
- Fix memory leak in __thermal_cooling_device_register()
when device_register() fails by calling
thermal_cooling_device_destroy_sysfs() (Yang Yingliang).
- Add sc8180x and sc8280xp compatible string in the DT bindings and
lMH support for QCom tsens driver (Bjorn Andersson).
- Fix OTP Calibration Register values conforming to the documentation
on RZ/G2L and bindings documentation for RZ/G2UL (Biju Das).
- Fix type in kerneldoc description for __thermal_bind_params
(Corentin Labbe).
- Fix potential NULL dereference in sr_thermal_probe() on Broadcom
platform (Zheng Yongjun).
- Add change mode ops to the thermal-of sensor (Manaf Meethalavalappu
Pallikunhi).
- Fix non-negative value support by preventing the value to be clamp
to zero (Stefan Wahren).
- Add compatible string and DT bindings for MSM8960 tsens driver
(Dmitry Baryshkov).
- Add hwmon support for K3 driver (Massimiliano Minella).
- Refactor and add multiple generations support for QCom ADC driver
(Jishnu Prakash).
- Use platform_get_irq_optional() to get the interrupt on RCar driver
and document Document RZ/V2L bindings (Lad Prabhakar).
- Remove NULL check after container_of() call from the Intel HFI
thermal driver (Haowen Bai)"
* tag 'thermal-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (38 commits)
thermal: intel: pch: improve the cooling delay log
thermal: intel: pch: enhance overheat handling
thermal: intel: pch: move cooling delay to suspend_noirq phase
PM: wakeup: expose pm_wakeup_pending to modules
thermal: k3_j72xx_bandgap: Add the bandgap driver support
dt-bindings: thermal: k3-j72xx: Add VTM bindings documentation
thermal/drivers/imx_sc_thermal: Fix refcount leak in imx_sc_thermal_probe
thermal/core: Fix memory leak in __thermal_cooling_device_register()
dt-bindings: thermal: tsens: Add sc8280xp compatible
dt-bindings: thermal: lmh: Add Qualcomm sc8180x compatible
thermal/drivers/qcom/lmh: Add sc8180x compatible
thermal/drivers/rz2gl: Fix OTP Calibration Register values
dt-bindings: thermal: rzg2l-thermal: Document RZ/G2UL bindings
thermal: thermal_of: fix typo on __thermal_bind_params
tools/thermal: remove unneeded semicolon
tools/lib/thermal: remove unneeded semicolon
thermal/drivers/broadcom: Fix potential NULL dereference in sr_thermal_probe
tools/thermal: Add thermal daemon skeleton
tools/thermal: Add a temperature capture tool
tools/thermal: Add util library
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add support for 'artificial' Energy Models in which power
numbers for different entities may be in different scales, add support
for some new hardware, fix bugs and clean up code in multiple places.
Specifics:
- Update the Energy Model support code to allow the Energy Model to
be artificial, which means that the power values may not be on a
uniform scale with other devices providing power information, and
update the cpufreq_cooling and devfreq_cooling thermal drivers to
support artificial Energy Models (Lukasz Luba).
- Make DTPM check the Energy Model type (Lukasz Luba).
- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).
- Add CPU-based scaling support to passive devfreq governor (Saravana
Kannan, Chanwoo Choi).
- Update the rk3399_dmc devfreq driver (Brian Norris).
- Export dev_pm_ops instead of suspend() and resume() in the IIO
chemical scd30 driver (Jonathan Cameron).
- Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
PM-runtime counterparts (Jonathan Cameron).
- Move symbol exports in the IIO chemical scd30 driver into the
IIO_SCD30 namespace (Jonathan Cameron).
- Avoid device PM-runtime usage count underflows (Rafael Wysocki).
- Allow dynamic debug to control printing of PM messages (David
Cohen).
- Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
Bai).
- Preserve ACPI-table override during hibernation (Amadeusz
Sławiński).
- Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
- Make Intel RAPL power capping driver support the RaptorLake and
AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
- Remove redundant store to value after multiply in the RAPL power
capping driver (Colin Ian King).
- Add AlderLake processor support to the intel_idle driver (Zhang
Rui).
- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).
- Fix cpufreq governor clean up code to avoid using kfree() directly
to free kobject-based items (Kevin Hao).
- Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe
Leroy).
- Make intel_pstate notify frequency invariance code when no_turbo is
turned on and off (Chen Yu).
- Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
Pandruvada).
- Make cpufreq avoid unnecessary frequency updates due to mismatch
between hardware and the frequency table (Viresh Kumar).
- Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
code (Viresh Kumar).
- Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
calling convention for some driver callbacks consistent (Rafael
Wysocki).
- Avoid accessing half-initialized cpufreq policies from the show()
and store() sysfs functions (Schspa Shi).
- Rearrange cpufreq_offline() to make the calling convention for some
driver callbacks consistent (Schspa Shi).
- Update CPPC handling in cpufreq (Pierre Gondois).
- Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).
- Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
Hansson).
- Improve the way genpd deals with its governors (Ulf Hansson).
- Update the turbostat utility to version 2022.04.16 (Len Brown, Dan
Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)"
* tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits)
PM: domains: Trust domain-idle-states from DT to be correct by genpd
PM: domains: Measure power-on/off latencies in genpd based on a governor
PM: domains: Allocate governor data dynamically based on a genpd governor
PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
PM: domains: Fix initialization of genpd's next_wakeup
PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
PM: domains: Measure suspend/resume latencies in genpd based on governor
PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
PM: domains: Allocate gpd_timing_data dynamically based on governor
PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
PM: domains: Drop redundant code for genpd always-on governor
PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
powercap: intel_rapl: remove redundant store to value after multiply
cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
cpufreq: CPPC: Enable fast_switch
ACPI: CPPC: Assume no transition latency if no PCCT
ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
ACPI: CPPC: Check _OSC for flexible address space
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA kernel code to upstream revision 20220331,
improve handling of PCI devices that are in D3cold during system
initialization, add support for a few features, fix bugs and clean up
code.
Specifics:
- Update ACPICA code in the kernel to upstream revision 20220331
including the following changes:
- Add support for the Windows 11 _OSI string (Mario Limonciello)
- Add the CFMWS subtable to the CEDT table (Lawrence Hileman).
- iASL: NHLT: Treat Terminator as specific_config (Piotr
Maziarz).
- iASL: NHLT: Fix parsing undocumented bytes at the end of
Endpoint Descriptor (Piotr Maziarz).
- iASL: NHLT: Rename linux specific strucures to device_info
(Piotr Maziarz).
- Add new ACPI 6.4 semantics to Load() and LoadTable() (Bob
Moore).
- Clean up double word in comment (Tom Rix).
- Update copyright notices to the year 2022 (Bob Moore).
- Remove some tabs and // comments - automated cleanup (Bob
Moore).
- Replace zero-length array with flexible-array member (Gustavo
A. R. Silva).
- Interpreter: Add units to time variable names (Paul Menzel).
- Add support for ARM Performance Monitoring Unit Table (Besar
Wicaksono).
- Inform users about ACPI spec violation related to sleep length
(Paul Menzel).
- iASL/MADT: Add OEM-defined subtable (Bob Moore).
- Interpreter: Fix some typo mistakes (Selvarasu Ganesan).
- Updates for revision E.d of IORT (Shameer Kolothum).
- Use ACPI_FORMAT_UINT64 for 64-bit output (Bob Moore).
- Improve debug messages in the ACPI device PM code (Rafael Wysocki).
- Block ASUS B1400CEAE from suspend to idle by default (Mario
Limonciello).
- Improve handling of PCI devices that are in D3cold during system
initialization (Rafael Wysocki).
- Fix BERT error region memory mapping (Lorenzo Pieralisi).
- Add support for NVIDIA 16550-compatible port subtype to the SPCR
parsing code (Jeff Brasen).
- Use static for BGRT_SHOW kobj_attribute defines (Tom Rix).
- Fix missing prototype warning for acpi_agdi_init() (Ilkka
Koskinen).
- Fix missing ERST record ID in the APEI code (Liu Xinpeng).
- Make APEI error injection to refuse to inject into the zero page
(Tony Luck).
- Correct description of INT3407 / INT3532 DPTF attributes in sysfs
(Sumeet Pawnikar).
- Add support for high frequency impedance notification to the DPTF
driver (Sumeet Pawnikar).
- Make mp_config_acpi_gsi() a void function (Li kunyu).
- Unify Package () representation for properties in the ACPI device
properties documentation (Andy Shevchenko).
- Include UUID in _DSM evaluation warning (Michael Niewöhner)"
* tag 'acpi-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (41 commits)
Revert "ACPICA: executer/exsystem: Warn about sleeps greater than 10 ms"
ACPI: utils: include UUID in _DSM evaluation warning
ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default
x86: ACPI: Make mp_config_acpi_gsi() a void function
ACPI: DPTF: Add support for high frequency impedance notification
ACPI: AGDI: Fix missing prototype warning for acpi_agdi_init()
ACPI: bus: Avoid non-ACPI device objects in walks over children
ACPI: DPTF: Correct description of INT3407 / INT3532 attributes
ACPI: BGRT: use static for BGRT_SHOW kobj_attribute defines
ACPI, APEI, EINJ: Refuse to inject into the zero page
ACPI: PM: Always print final debug message in acpi_device_set_power()
ACPI: SPCR: Add support for NVIDIA 16550-compatible port subtype
ACPI: docs: enumeration: Unify Package () for properties (part 2)
ACPI: APEI: Fix missing ERST record id
ACPICA: Update version to 20220331
ACPICA: exsystem.c: Use ACPI_FORMAT_UINT64 for 64-bit output
ACPICA: IORT: Updates for revision E.d
ACPICA: executer/exsystem: Fix some typo mistakes
ACPICA: iASL/MADT: Add OEM-defined subtable
ACPICA: executer/exsystem: Warn about sleeps greater than 10 ms
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- support for pens with 3 buttons with Wacom driver (Joshua Dickens)
- support for HID_DG_SCANTIME to report the timestamp for pen and touch
events in Wacom driver (Joshua Dickens)
- support for sensor discovery in amd-sfh driver (Basavaraj Natikar)
- support for wider variety of Huion tablets ported from DIGImend
project (José Expósito, Nikolai Kondrashov)
- new device IDs and other assorted small code cleanups
* tag 'for-linus-2022052401' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (44 commits)
HID: apple: Properly handle function keys on Keychron keyboards
HID: uclogic: Switch to Digitizer usage for styluses
HID: uclogic: Add pen support for XP-PEN Star 06
HID: uclogic: Differentiate touch ring and touch strip
HID: uclogic: Always shift touch reports to zero
HID: uclogic: Do not focus on touch ring only
HID: uclogic: Return raw parameters from v2 pen init
HID: uclogic: Move param printing to a function
HID: core: Display "SENSOR HUB" for sensor hub bus string in hid_info
HID: amd_sfh: Move bus declaration outside of amd-sfh
HID: amd_sfh: Add physical location to HID device
HID: amd_sfh: Modify the hid name
HID: amd_sfh: Modify the bus name
HID: amd_sfh: Add sensor name by index for debug info
HID: amd_sfh: Add support for sensor discovery
HID: bigben: fix slab-out-of-bounds Write in bigben_probe
Hid: wacom: Fix kernel test robot warning
HID: uclogic: Disable pen usage for Huion keyboard interfaces
HID: uclogic: Support disabling pen usage
HID: uclogic: Pass keyboard reports as is
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi updates from Mark Brown:
"This is quite a quiet release but some new drivers mean that the
diffstat is fairly large. The new drivers include the aspeed driver
which is migrated from MTD as part of the ongoing move of controllers
with specialised support for SPI flashes into the SPI subsystem.
- Support for devices which flip CPHA during recieve only transfers
(eg, if MOSI and MISO have inverted polarity).
- Overhaul of the i.MX driver, including the addition of PIO support
for better performance on small transfers.
- Migration of the Aspeed driver from MTD.
- Support for Aspeed AST2400, Ingenic JZ4775 and X1/2000 and MediaTek
IPM and SFI"
* tag 'spi-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (84 commits)
spi: spi-au1550: replace ternary operator with min()
mtd: spi-nor: aspeed: set the decoding size to at least 2MB for AST2600
spi: aspeed: Calibrate read timings
spi: aspeed: Add support for the AST2400 SPI controller
spi: aspeed: Workaround AST2500 limitations
spi: aspeed: Adjust direct mapping to device size
spi: aspeed: Add support for direct mapping
spi: spi-mem: Convert Aspeed SMC driver to spi-mem
spi: Convert the Aspeed SMC controllers device tree binding
spi: spi-cadence: Update ISR status variable type to irqreturn_t
spi: Doc fix - Describe add_lock and dma_map_dev in spi_controller
spi: cadence-quadspi: Handle spi_unregister_master() in remove()
spi: stm32-qspi: Remove SR_BUSY bit check before sending command
spi: stm32-qspi: Always check SR_TCF flags in stm32_qspi_wait_cmd()
spi: stm32-qspi: Fix wait_cmd timeout in APM mode
spi: cadence-quadspi: remove unnecessary (void *) casts
spi: cadence-quadspi: Add missing blank line in cqspi_request_mmap_dma()
spi: spi-imx: mx51_ecspi_prepare_message(): skip writing MX51_ECSPI_CONFIG register if unchanged
spi: spi-imx: add PIO polling support
spi: spi-imx: replace struct spi_imx_data::bitbang by pointer to struct spi_controller
...
|