summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-19x86/mm: Enable broadcast TLB invalidation for multi-threaded processesRik van Riel
There is not enough room in the 12-bit ASID address space to hand out broadcast ASIDs to every process. Only hand out broadcast ASIDs to processes when they are observed to be simultaneously running on 4 or more CPUs. This also allows single threaded process to continue using the cheaper, local TLB invalidation instructions like INVLPGB. Due to the structure of flush_tlb_mm_range(), the INVLPGB flushing is done in a generically named broadcast_tlb_flush() function which can later also be used for Intel RAR. Combined with the removal of unnecessary lru_add_drain calls() (see https://lore.kernel.org/r/20241219153253.3da9e8aa@fangorn) this results in a nice performance boost for the will-it-scale tlb_flush2_threads test on an AMD Milan system with 36 cores: - vanilla kernel: 527k loops/second - lru_add_drain removal: 731k loops/second - only INVLPGB: 527k loops/second - lru_add_drain + INVLPGB: 1157k loops/second Profiling with only the INVLPGB changes showed while TLB invalidation went down from 40% of the total CPU time to only around 4% of CPU time, the contention simply moved to the LRU lock. Fixing both at the same time about doubles the number of iterations per second from this case. Comparing will-it-scale tlb_flush2_threads with several different numbers of threads on a 72 CPU AMD Milan shows similar results. The number represents the total number of loops per second across all the threads: threads tip INVLPGB 1 315k 304k 2 423k 424k 4 644k 1032k 8 652k 1267k 16 737k 1368k 32 759k 1199k 64 636k 1094k 72 609k 993k 1 and 2 thread performance is similar with and without INVLPGB, because INVLPGB is only used on processes using 4 or more CPUs simultaneously. The number is the median across 5 runs. Some numbers closer to real world performance can be found at Phoronix, thanks to Michael: https://www.phoronix.com/news/AMD-INVLPGB-Linux-Benefits [ bp: - Massage - :%s/\<static_cpu_has\>/cpu_feature_enabled/cgi - :%s/\<clear_asid_transition\>/mm_clear_asid_transition/cgi - Fold in a 0day bot fix: https://lore.kernel.org/oe-kbuild-all/202503040000.GtiWUsBm-lkp@intel.com ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nadav Amit <nadav.amit@gmail.com> Link: https://lore.kernel.org/r/20250226030129.530345-11-riel@surriel.com
2025-03-19x86/mm: Add global ASID process exit helpersRik van Riel
A global ASID is allocated for the lifetime of a process. Free the global ASID at process exit time. [ bp: Massage, create helpers, hide details inside them. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-10-riel@surriel.com
2025-03-19x86/mm: Handle global ASID context switch and TLB flushRik van Riel
Do context switch and TLB flush support for processes that use a global ASID and PCID across all CPUs. At both context switch time and TLB flush time, it needs to be checked whether a task is switching to a global ASID, and, if so, reload the TLB with the new ASID as appropriate. In both code paths, the TLB flush is avoided if a global ASID is used, because the global ASIDs are always kept up to date across CPUs, even when the process is not running on a CPU. [ bp: - Massage - :%s/\<static_cpu_has\>/cpu_feature_enabled/cgi ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-9-riel@surriel.com
2025-03-19x86/mm: Add global ASID allocation helper functionsRik van Riel
Add functions to manage global ASID space. Multithreaded processes that are simultaneously active on 4 or more CPUs can get a global ASID, resulting in the same PCID being used for that process on every CPU. This in turn will allow the kernel to use hardware-assisted TLB flushing through AMD INVLPGB or Intel RAR for these processes. [ bp: - Extend use_global_asid() comment - s/X86_BROADCAST_TLB_FLUSH/BROADCAST_TLB_FLUSH/g - other touchups ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-8-riel@surriel.com
2025-03-19x86/mm: Use broadcast TLB flushing in page reclaimRik van Riel
Page reclaim tracks only the CPU(s) where the TLB needs to be flushed, rather than all the individual mappings that may be getting invalidated. Use broadcast TLB flushing when that is available. [ bp: Massage commit message. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-7-riel@surriel.com
2025-03-19x86/mm: Use INVLPGB for kernel TLB flushesRik van Riel
Use broadcast TLB invalidation for kernel addresses when available. Remove the need to send IPIs for kernel TLB flushes. [ bp: Integrate dhansen's comments additions, merge the flush_tlb_all() change into this one too. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-5-riel@surriel.com
2025-03-19x86/mm: Add INVLPGB support codeRik van Riel
Add helper functions and definitions needed to use broadcast TLB invalidation on AMD CPUs. [ bp: - Cleanup commit message - Improve and expand comments - push the preemption guards inside the invlpgb* helpers - merge improvements from dhansen - add !CONFIG_BROADCAST_TLB_FLUSH function stubs because Clang can't do DCE properly yet and looks at the inline asm and complains about it getting a u64 argument on 32-bit code ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-4-riel@surriel.com
2025-03-19x86/mm: Add INVLPGB feature and Kconfig entryRik van Riel
In addition, the CPU advertises the maximum number of pages that can be shot down with one INVLPGB instruction in CPUID. Save that information for later use. [ bp: use cpu_has(), typos, massage. ] Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226030129.530345-3-riel@surriel.com
2025-03-19x86/mm: Consolidate full flush threshold decisionRik van Riel
Reduce code duplication by consolidating the decision point for whether to do individual invalidations or a full flush inside get_flush_tlb_info(). Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Dave Hansen <dave.hansen@intel.com> Link: https://lore.kernel.org/r/20250226030129.530345-2-riel@surriel.com
2025-03-19x86/mm: Check return value from memblock_phys_alloc_range()Philip Redkin
At least with CONFIG_PHYSICAL_START=0x100000, if there is < 4 MiB of contiguous free memory available at this point, the kernel will crash and burn because memblock_phys_alloc_range() returns 0 on failure, which leads memblock_phys_free() to throw the first 4 MiB of physical memory to the wolves. At a minimum it should fail gracefully with a meaningful diagnostic, but in fact everything seems to work fine without the weird reserve allocation. Signed-off-by: Philip Redkin <me@rarity.fan> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/94b3e98f-96a7-3560-1f76-349eb95ccf7f@rarity.fan
2025-03-19Merge tag 'v6.14-rc7' into x86/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-19Merge branch 'xa_alloc_cyclic-checks'David S. Miller
Michal Swiatkowski says: ==================== fix xa_alloc_cyclic() return checks Pierre Riteau <pierre@stackhpc.com> found suspicious handling an error from xa_alloc_cyclic() in scheduler code [1]. The same is done in few other places. v1 --> v2: [2] * add fixes tags * fix also the same usage in dpll and phy [1] https://lore.kernel.org/netdev/20250213223610.320278-1-pierre@stackhpc.com/ [2] https://lore.kernel.org/netdev/20250214132453.4108-1-michal.swiatkowski@linux.intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-19phy: fix xa_alloc_cyclic() error handlingMichal Swiatkowski
xa_alloc_cyclic() can return 1, which isn't an error. To prevent situation when the caller of this function will treat it as no error do a check only for negative here. Fixes: 384968786909 ("net: phy: Introduce ethernet link topology representation") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-19dpll: fix xa_alloc_cyclic() error handlingMichal Swiatkowski
In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will be returned, which will cause IS_ERR() to be false. Which can lead to dereference not allocated pointer (pin). Fix it by checking if err is lower than zero. This wasn't found in real usecase, only noticed. Credit to Pierre. Fixes: 97f265ef7f5b ("dpll: allocate pin ids in cycle") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-19devlink: fix xa_alloc_cyclic() error handlingMichal Swiatkowski
In case of returning 1 from xa_alloc_cyclic() (wrapping) ERR_PTR(1) will be returned, which will cause IS_ERR() to be false. Which can lead to dereference not allocated pointer (rel). Fix it by checking if err is lower than zero. This wasn't found in real usecase, only noticed. Credit to Pierre. Fixes: c137743bce02 ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-19Merge patch series "netfs: Miscellaneous fixes"Christian Brauner
David Howells <dhowells@redhat.com> says: Here are some miscellaneous fixes and changes for netfslib: (1) Fix the collection of results during a pause in transmission. (2) Call ->invalidate_cache() only if provided. (3) Fix the rolling buffer to not hammer atomic bit clears when loading from readahead. (4) Fix netfs_unbuffered_read() to return ssize_t. * patches from https://lore.kernel.org/r/20250314164201.1993231-1-dhowells@redhat.com: netfs: Fix netfs_unbuffered_read() to return ssize_t rather than int netfs: Fix rolling_buffer_load_from_ra() to not clear mark bits netfs: Call `invalidate_cache` only if implemented netfs: Fix collection of results during pause when collection offloaded Link: https://lore.kernel.org/r/20250314164201.1993231-1-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19netfs: Fix netfs_unbuffered_read() to return ssize_t rather than intDavid Howells
Fix netfs_unbuffered_read() to return an ssize_t rather than an int as netfs_wait_for_read() returns ssize_t and this gets implicitly truncated. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-5-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19netfs: Fix rolling_buffer_load_from_ra() to not clear mark bitsDavid Howells
rolling_buffer_load_from_ra() looms large in the perf report because it loops around doing an atomic clear for each of the three mark bits per folio. However, this is both inefficient (it would be better to build a mask and atomically AND them out) and unnecessary as they shouldn't be set. Fix this by removing the loop. Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-4-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19netfs: Call `invalidate_cache` only if implementedMax Kellermann
Many filesystems such as NFS and Ceph do not implement the `invalidate_cache` method. On those filesystems, if writing to the cache (`NETFS_WRITE_TO_CACHE`) fails for some reason, the kernel crashes like this: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: Oops: 0010 [#1] SMP PTI CPU: 9 UID: 0 PID: 3380 Comm: kworker/u193:11 Not tainted 6.13.3-cm4all1-hp #437 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_write_collection_worker RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffff9b86e2ca7dc0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 7fffffffffffffff RDX: 0000000000000001 RSI: ffff89259d576a18 RDI: ffff89259d576900 RBP: ffff89259d5769b0 R08: ffff9b86e2ca7d28 R09: 0000000000000002 R10: ffff89258ceaca80 R11: 0000000000000001 R12: 0000000000000020 R13: ffff893d158b9338 R14: ffff89259d576900 R15: ffff89259d5769b0 FS: 0000000000000000(0000) GS:ffff893c9fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 000000054442e003 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x60 ? page_fault_oops+0x15c/0x460 ? try_to_wake_up+0x2d2/0x530 ? exc_page_fault+0x5e/0x100 ? asm_exc_page_fault+0x22/0x30 netfs_write_collection_worker+0xe9f/0x12b0 ? xs_poll_check_readable+0x3f/0x80 ? xs_stream_data_receive_workfn+0x8d/0x110 process_one_work+0x134/0x2d0 worker_thread+0x299/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xba/0xe0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 0000000000000000 This patch adds the missing `NULL` check. Fixes: 0e0f2dfe880f ("netfs: Dispatch write requests to process a writeback slice") Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-3-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: netfs@lists.linux.dev cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19netfs: Fix collection of results during pause when collection offloadedDavid Howells
A netfs read request can run in one of two modes: for synchronous reads writes, the app thread does the collection of results and for asynchronous reads, this is offloaded to a worker thread. This is controlled by the NETFS_RREQ_OFFLOAD_COLLECTION flag. Now, if a subrequest incurs an error, the NETFS_RREQ_PAUSE flag is set to stop the issuing loop temporarily from issuing more subrequests until a retry is successful or the request is abandoned. When the issuing loop sees NETFS_RREQ_PAUSE, it jumps to netfs_wait_for_pause() which will wait for the PAUSE flag to be cleared - and whilst it is waiting, it will call out to the collector as more results acrue... But this is the wrong thing to do if OFFLOAD_COLLECTION is set as we can then end up with both the app thread and the work item collecting results simultaneously. This manifests itself occasionally when running the generic/323 xfstest against multichannel cifs as an oops that's a bit random but frequently involving io_submit() (the test does lots of simultaneous async DIO reads). Fix this by only doing the collection in netfs_wait_for_pause() if the NETFS_RREQ_OFFLOAD_COLLECTION is not set. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Reported-by: Steve French <stfrench@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20250314164201.1993231-2-dhowells@redhat.com Acked-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19fs: load the ->i_sb pointer once in inode_sb_list_{add,del}Mateusz Guzik
While this may sound like a pedantic clean up, it does in fact impact code generation -- the patched add routine is slightly smaller. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250319004635.1820589-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19fuse: fix uring race condition for null dereference of fcJoanne Koong
There is a race condition leading to a kernel crash from a null dereference when attemping to access fc->lock in fuse_uring_create_queue(). fc may be NULL in the case where another thread is creating the uring in fuse_uring_create() and has set fc->ring but has not yet set ring->fc when fuse_uring_create_queue() reads ring->fc. There is another race condition as well where in fuse_uring_register(), ring->nr_queues may still be 0 and not yet set to the new value when we compare qid against it. This fix sets fc->ring only after ring->fc and ring->nr_queues have been set, which guarantees now that ring->fc is a proper pointer when any queues are created and ring->nr_queues reflects the right number of queues if ring is not NULL. We must use smp_store_release() and smp_load_acquire() semantics to ensure the ordering will remain correct where fc->ring is assigned only after ring->fc and ring->nr_queues have been assigned. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20250318003028.3330599-1-joannelkoong@gmail.com Fixes: 24fe962c86f5 ("fuse: {io-uring} Handle SQEs - register commands") Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19afs: Fix afs_atcell_get_link() to check if ws_cell is unset firstDavid Howells
Fix afs_atcell_get_link() to check if the workstation cell is unset before doing the RCU pathwalk bit where we dereference that. Fixes: 823869e1e616 ("afs: Fix afs_atcell_get_link() to handle RCU pathwalk") Reported-by: syzbot+76a6f18e3af82e84f264@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2481796.1742296819@warthog.procyon.org.uk Tested-by: syzbot+76a6f18e3af82e84f264@syzkaller.appspotmail.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19umount: Allow superblock owners to force umountTrond Myklebust
Loosen the permission check on forced umount to allow users holding CAP_SYS_ADMIN privileges in namespaces that are privileged with respect to the userns that originally mounted the filesystem. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Link: https://lore.kernel.org/r/12f212d4ef983714d065a6bb372fbb378753bf4c.1742315194.git.trond.myklebust@hammerspace.com Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-19docs/zh_CN: fix spelling mistakePeng Jiang
The word watermark was misspelled as "watemark". Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/20250317100811126QvOaWRPxSgm2ttU5faitl@zte.com.cn
2025-03-19docs/Chinese: change the disclaimer wordsAlex Shi
2 paragraph warning and note take a bit more space, let's merge them together, and guide to other maintainer and reviewers. Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Dongliang Mu <dzm91@hust.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Reviewed-by: Tang Yizhou <yizhou.tang@shopee.com> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/20250305025101.27717-1-alexs@kernel.org
2025-03-19docs/zh_CN: Add snp-tdx-threat-model index Chinese translationYuxian Mao
Translate .../security/snp-tdx-threat-model.rst into Chinese. Update the translation through commit "cdae7e8a69c3" ("docs/MAINTAINERS: Update my email address") Fixed pdfdocs warning by Alex Shi. Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Yuxian Mao <maoyuxian@cqsoftware.com.cn> Signed-off-by: Alex Shi <alexs@kernel.org> Link: https://lore.kernel.org/r/20250304071401.117780-1-maoyuxian@cqsoftware.com.cn
2025-03-18drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2Tomasz Pakuła
Currently, it seems like the code was carried over from RDNA3 because it assumes two possible values to set. RDNA4, instead of having: 0: min SCLK 1: max SCLK only has 0: SCLK offset This change makes it so it only reports current offset value instead of showing possible min/max values and their indices. Moreover, it now only accepts the offset as a value, without the indice index. Additionally, the lower bound was printed as %u by mistake. Old: OD_SCLK_OFFSET: 0: -500Mhz 1: 1000Mhz OD_MCLK: 0: 97Mhz 1: 1259MHz OD_VDDGFX_OFFSET: 0mV OD_RANGE: SCLK_OFFSET: -500Mhz 1000Mhz MCLK: 97Mhz 1500Mhz VDDGFX_OFFSET: -200mv 0mv New: OD_SCLK_OFFSET: 0Mhz OD_MCLK: 0: 97Mhz 1: 1259MHz OD_VDDGFX_OFFSET: 0mV OD_RANGE: SCLK_OFFSET: -500Mhz 1000Mhz MCLK: 97Mhz 1500Mhz VDDGFX_OFFSET: -200mv 0mv Setting this offset: Old: "s 1 <offset>" New: "s <offset>" Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4036 Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Tomasz Pakuła <tomasz.pakula.oficjalny@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1cfeb60e6e8837b1de5eb4e17df7cf31f4442144) Cc: stable@vger.kernel.org # 6.12.x
2025-03-18drm/amd/display: Fix incorrect fw_state address in dmub_srvLo-an Chen
[WHY] The fw_state in dmub_srv was assigned with wrong address. The address was pointed to the firmware region. [HOW] Fix the firmware state by using DMUB_DEBUG_FW_STATE_OFFSET in dmub_cmd.h. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Lo-an Chen <lo-an.chen@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f57b38ac85a01bf03020cc0a9761d63e5c0ce197)
2025-03-18drm/amd/display: Use HW lock mgr for PSR1 when only one eDPMario Limonciello
[WHY] DMUB locking is important to make sure that registers aren't accessed while in PSR. Previously it was enabled but caused a deadlock in situations with multiple eDP panels. [HOW] Detect if multiple eDP panels are in use to decide whether to use lock. Refactor the function so that the first check is for PSR-SU and then replay is in use to prevent having to look up number of eDP panels for those configurations. Fixes: f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965 Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ed569e1279a3045d6b974226c814e071fa0193a6) Cc: stable@vger.kernel.org
2025-03-18drm/amd/display: Fix message for support_edp0_on_dp1Yilin Chen
[WHY] The info message was wrong when support_edp0_on_dp1 is enabled [HOW] Use correct info message for support_edp0_on_dp1 Fixes: f6d17270d18a ("drm/amd/display: add a quirk to enable eDP0 on DP1") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Yilin Chen <Yilin.Chen@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 79538e6365c99d7b1c3e560d1ea8d11ef8313465) Cc: stable@vger.kernel.org
2025-03-18drm/amdkfd: Fix user queue validation on Gfx7/8Philip Yang
To workaround queue full h/w issue on Gfx7/8, when application create AQL queue, the ring buffer bo allocate size is queue_size/2 and map queue_size ring buffer to GPU in 2 pieces using 2 attachments, each attachment map size is queue_size/2, with same ring_bo backing memory. For Gfx7/8, user queue buffer validation should use queue_size/2 to verify ring_bo allocation and mapping size. Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers") Suggested-by: Tomáš Trnka <trnka@scm.com> Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e7a477735f1771b9a9346a5fbd09d7ff0641723a) Cc: stable@vger.kernel.org
2025-03-18drm/amdgpu: Restore uncached behaviour on GFX12David Belanger
Always use MTYPE_UC if UNCACHED flag is specified. This makes kernarg region uncached and it restores usermode cache disable debug flag functionality. Do not set MTYPE_UC for COHERENT flag, on GFX12 coherence is handled by shader code. Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit eb6cdfb807d038d9b9986b5c87188f28a4071eae) Cc: stable@vger.kernel.org # 6.12.x
2025-03-18drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()Wentao Liang
In gfx_v12_0_cp_gfx_load_me_microcode_rs64(), gfx_v12_0_pfp_fini() is incorrectly used to free 'me' field of 'gfx', since gfx_v12_0_pfp_fini() can only release 'pfp' field of 'gfx'. The release function of 'me' field should be gfx_v12_0_me_fini(). Fixes: 52cb80c12e8a ("drm/amdgpu: Add gfx v12_0 ip block support (v6)") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ebdc52607a46cda08972888178c6aa9cd6965141) Cc: stable@vger.kernel.org # 6.12.x
2025-03-18drm/amdkfd: Fix instruction hazard in gfx12 trap handlerJay Cornwall
VALU instructions with SGPR source need wait states to avoid hazard with SALU using different SGPR. v2: Eliminate some hazards to reduce code explosion Signed-off-by: Jay Cornwall <jay.cornwall@amd.com> Reviewed-by: Lancelot Six <lancelot.six@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7e0459d453b911435673edd7a86eadc600c63238) Cc: stable@vger.kernel.org # 6.12.x
2025-03-18drm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2Alex Deucher
Add callbacks for fan speed fetching. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4034 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 90df6db62fa78a8ab0b705ec38db99c7973b95d6) Cc: stable@vger.kernel.org # 6.12.x
2025-03-18drm/amd/pm: add unique_id for gfx12Harish Kasiviswanathan
Expose unique_id for gfx12 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 16fbc18cb07470cd33fb5f37ad181b51583e6dc0) Cc: stable@vger.kernel.org # 6.12.x
2025-03-18drm/amdgpu: Remove JPEG from vega and carrizo video capsDavid Rosca
JPEG is only supported for VCN1+. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0a6e7b06bdbead2e43d56a2274b7e0c9c86d536e) Cc: stable@vger.kernel.org
2025-03-18drm/amdgpu: Fix JPEG video caps max size for navi1x and ravenDavid Rosca
8192x8192 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6e0d2fde3ae8fdb5b47e10389f23ed2cb4daec5d) Cc: stable@vger.kernel.org
2025-03-18drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max sizeDavid Rosca
1920x1088 is the maximum supported resolution. Signed-off-by: David Rosca <david.rosca@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1a0807feb97082bff2b1342dbbe55a2a9a8bdb88) Cc: stable@vger.kernel.org
2025-03-18drm/radeon: fix uninitialized size issue in radeon_vce_cs_parse()Nikita Zhandarovich
On the off chance that command stream passed from userspace via ioctl() call to radeon_vce_cs_parse() is weirdly crafted and first command to execute is to encode (case 0x03000001), the function in question will attempt to call radeon_vce_cs_reloc() with size argument that has not been properly initialized. Specifically, 'size' will point to 'tmp' variable before the latter had a chance to be assigned any value. Play it safe and init 'tmp' with 0, thus ensuring that radeon_vce_cs_reloc() will catch an early error in cases like these. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 2fc5703abda2 ("drm/radeon: check VCE relocation buffer range v3") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2d52de55f9ee7aaee0e09ac443f77855989c6b68) Cc: stable@vger.kernel.org
2025-03-18Merge tag 'pmdomain-v6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fix from Ulf Hansson: - Fix amlogic T7 ISP secpower * tag 'pmdomain-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: amlogic: fix T7 ISP secpower
2025-03-18fs: drop the lock trip around I_NEW wake up in evict()Mateusz Guzik
The unhashed state check in __wait_on_freeing_inode() performed with ->i_lock held against remove_hash_inode() also holding the lock makes another lock acquire in evict() completely spurious -- all potential sleepers already dropped the lock before remove_hash_inode() acquired it or they found the inode to be unhashed and aborted. Note there is no trickery here: the usual cost of both sides taking locks is still being paid, it just stops being paid twice. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250317160707.1694135-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18fs: use wq_has_sleeper() in end_dir_add()Mateusz Guzik
The routine is used a lot, while the wakeup almost never has anyone to deal with. wake_up_all() takes an irq-protected spinlock, wq_has_sleeper() "only" contains a full fence -- not free by any means, but still cheaper. Sample result tracing waiters using a custom probe during -j 20 kernel build (0 - no waiters, 1 - waiters): @[ wakeprobe+5 __wake_up_common+63 __wake_up+54 __d_add+234 d_splice_alias+146 ext4_lookup+439 path_openat+1746 do_filp_open+195 do_sys_openat2+153 __x64_sys_openat+86 do_syscall_64+82 entry_SYSCALL_64_after_hwframe+118 ]: [0, 1) 13999 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [1, ...) 1 | | Only 1 call out of 14000 with this backtrace had waiters. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250316232421.1642758-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18VFS/autofs: try_lookup_one_len() does not need any locksNeilBrown
try_lookup_one_len() is identical to lookup_one_unlocked() except that it doesn't include the call to lookup_slow(). The latter doesn't need the inode to be locked, so the former cannot either. So fix the documentation, remove the WARN_ON and fix the only caller to not take the lock. Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/174190517441.9342.5956460781380903128@noble.neil.brown.name Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18fs: dedup handling of struct filename init and refcounts bumpsMateusz Guzik
No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250313142744.1323281-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18fs: consistently deref the files table with rcu_dereference_raw()Mateusz Guzik
... except when the table is known to be only used by one thread. A file pointer can get installed at any moment despite the ->file_lock being held since the following: 8a81252b774b53e6 ("fs/file.c: don't acquire files->file_lock in fd_install()") Accesses subject to such a race can in principle suffer load tearing. While here redo the comment in dup_fd -- it only covered a race against files showing up, still assuming fd_install() takes the lock. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20250313135725.1320914-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18MAINTAINERS: append initramfs files to the VFS sectionDavid Disseldorp
At the moment it's a little unclear where initramfs patches should be sent. This should see them end up on the linux-fsdevel mailing list. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250318040711.20683-1-ddiss@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18pinctrl: spacemit: PINCTRL_SPACEMIT_K1 should not default to y unconditionallyGeert Uytterhoeven
Merely enabling compile-testing should not enable additional functionality. Fixes: 7ff4faba63571c51 ("pinctrl: spacemit: enable config option") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Yixun Lan <dlan@gentoo.org> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Alex Elder <elder@riscstar.com> Reviewed-by: Alex Elder <elder@riscstar.com> Link: https://lore.kernel.org/6881b8d1ad74ac780af8a974e604b5ef3f5d4aad.1742198691.git.geert+renesas@glider.be Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-03-18ipv6: Set errno after ip_fib_metrics_init() in ip6_route_info_create().Kuniyuki Iwashima
While creating a new IPv6, we could get a weird -ENOMEM when RTA_NH_ID is set and either of the conditions below is true: 1) CONFIG_IPV6_SUBTREES is enabled and rtm_src_len is specified 2) nexthop_get() fails e.g.) # strace ip -6 route add fe80::dead:beef:dead:beef nhid 1 from :: recvmsg(3, {msg_iov=[{iov_base=[...[ {error=-ENOMEM, msg=[... [...]]}, [{nla_len=49, nla_type=NLMSGERR_ATTR_MSG}, "Nexthops can not be used with so"...] ]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148 Let's set err explicitly after ip_fib_metrics_init() in ip6_route_info_create(). Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250312013854.61125-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>