summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-09Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "Mostly small stragglers that missed the initial merge. Driver updates are qla2xxx and smartpqi (mp3sas has a high diffstat due to the volatile qualifier removal, fnic due to unused function removal and sd.c has a lot of code shuffling to remove forward declarations)" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (38 commits) scsi: ufs: core: No need to update UPIU.header.flags and lun in advanced RPMB handler scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD scsi: mpt3sas: Remove volatile qualifier scsi: mpt3sas: Perform additional retries if doorbell read returns 0 scsi: libsas: Simplify sas_queue_reset() and remove unused code scsi: ufs: Fix the build for the old ARM OABI scsi: qla2xxx: Fix unused variable warning in qla2xxx_process_purls_pkt() scsi: fnic: Remove unused functions fnic_scsi_host_start/end_tag() scsi: qla2xxx: Fix spelling mistake "tranport" -> "transport" scsi: fnic: Replace sgreset tag with max_tag_id scsi: qla2xxx: Remove unused variables in qla24xx_build_scsi_type_6_iocbs() scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error scsi: smartpqi: Change driver version to 2.1.24-046 scsi: smartpqi: Enhance error messages scsi: smartpqi: Enhance controller offline notification scsi: smartpqi: Enhance shutdown notification scsi: smartpqi: Simplify lun_number assignment scsi: smartpqi: Rename pciinfo to pci_info scsi: smartpqi: Rename MACRO to clarify purpose scsi: smartpqi: Add abort handler ...
2023-09-09Merge tag 'driver-core-6.6-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver symbol lookup fix from Greg KH: "Here is one last fixup for your tree for 6.6-rc1. It resolves a problem with the way that symbol_get was changed in the module tree merge in your tree to fix up the DVB drivers which rely on this old api to attach new devices. As the changelog comment says: In commit 9011e49d54dc ("modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly restricted to GPL-only marked symbols. This interacts oddly with the DVB logic which only uses dvb_attach() to load the dvb driver which then uses symbol_get(). Fix this up by properly marking all of the dvb_attach attach symbols as EXPORT_SYMBOL_GPL(). This has been acked by Hans from the V4L driver side, Luis from the module side, Mauro on the media side, and Christoph said it was the correct solution, and was tested by the original reporter of the issue. It has passed 0-day testing, but has not been in linux-next due to it only being sent yesterday" * tag 'driver-core-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: media: dvb: symbol fixup for dvb_attach()
2023-09-09Merge tag 'dma-mapping-6.6-2023-09-09' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - move a dma-debug call that prints a message out from a lock that's causing problems with the lock order in serial drivers (Sergey Senozhatsky) - fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right dependency and not default to y (Christoph Hellwig) - move an ifdef a bit to remove a __maybe_unused that seems to trip up some sensitivities (Christoph Hellwig) - revert a bogus check in the CMA allocator (Zhenhua Huang) * tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping: Revert "dma-contiguous: check for memory region overlap" dma-pool: remove a __maybe_unused label in atomic_pool_expand dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
2023-09-09Merge tag 'pci-v6.6-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - Add PCI_DYNAMIC_OF_NODES dependency on OF_IRQ to fix sparc64 build error (Lizhi Hou) - After coalescing host bridge resources, free any released resources to avoid a leak (Ross Lagerwall) - Revert a quirk that prevented NVIDIA T4 GPUs from using Secondary Bus Reset. The quirk worked around an issue that we now think is related to the Root Port, not the GPU (Bjorn Helgaas) * tag 'pci-v6.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: Revert "PCI: Mark NVIDIA T4 GPUs to avoid bus reset" PCI: Free released resource after coalescing PCI: Fix CONFIG_PCI_DYNAMIC_OF_NODES kconfig dependencies
2023-09-09Merge tag 'ntb-6.6' of https://github.com/jonmason/ntbLinus Torvalds
Pull NTB updates from Jon Mason: "Link toggling fixes and debugfs error path fixes" [ And for everybody like me who always have to remind themselves what the TLA of the day is, and what NTB stands for - it's a PCIe "Non-Transparent Bridge" thing - Linus ] * tag 'ntb-6.6' of https://github.com/jonmason/ntb: ntb: Check tx descriptors outstanding instead of head/tail for tx queue ntb: Fix calculation ntb_transport_tx_free_entry() ntb: Drop packets when qp link is down ntb: Clean up tx tail index on link down ntb: amd: Drop unnecessary error check for debugfs_create_dir NTB: ntb_tool: Switch to memdup_user_nul() helper dtivers: ntb: fix parameter check in perf_setup_dbgfs() ntb: Remove error checking for debugfs_create_dir()
2023-09-09nfsd: fix change_info in NFSv4 RENAME repliesJeff Layton
nfsd sends the transposed directory change info in the RENAME reply. The source directory is in save_fh and the target is in current_fh. Reported-by: Zhi Li <yieli@redhat.com> Reported-by: Benjamin Coddington <bcodding@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2218844 Signed-off-by: Jeff Layton <jlayton@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-09-09spnego: add missing OID to oid registrySteve French
Add missing OID to the registry. Some servers and clients (including Windows) now request "NEGOEX - SPNEGEO Extended Negotiation Security") See https://datatracker.ietf.org/doc/html/draft-zhu-negoex-02 Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-09media: dvb: symbol fixup for dvb_attach()Greg Kroah-Hartman
In commit 9011e49d54dc ("modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly restricted to GPL-only marked symbols. This interacts oddly with the DVB logic which only uses dvb_attach() to load the dvb driver which then uses symbol_get(). Fix this up by properly marking all of the dvb_attach attach symbols as EXPORT_SYMBOL_GPL(). Fixes: 9011e49d54dc ("modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules") Cc: stable <stable@kernel.org> Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-media@vger.kernel.org Cc: linux-modules@vger.kernel.org Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Link: https://lore.kernel.org/r/20230908092035.3815268-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-08Merge tag '6.6-rc-ksmbd' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server update from Steve French: "After two years, many fixes and much testing, ksmbd is no longer experimental" * tag '6.6-rc-ksmbd' of git://git.samba.org/ksmbd: ksmbd: remove experimental warning
2023-09-08Merge tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarrayLinus Torvalds
Pull xarray fixes from Matthew Wilcox: - Fix a bug encountered by people using bittorrent where they'd get NULL pointer dereferences on page cache lookups when using XFS - Two documentation fixes * tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray: idr: fix param name in idr_alloc_cyclic() doc xarray: Document necessary flag in alloc functions XArray: Do not return sibling entries from xa_load()
2023-09-08Merge tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fix null_blk polled IO timeout handling (Chengming) - Regression fix for swapped arguments in drbd bvec_set_page() (Christoph) - String length handling fix for s390 dasd (Heiko) - Fixes for blk-throttle accounting (Yu) - Fix page pinning issue for same page segments (Christoph) - Remove redundant file_remove_privs() call (Christoph) - Fix a regression in partition handling for devices not supporting partitions (Li) * tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux: drbd: swap bvec_set_page len and offset block: fix pin count management when merging same-page segments null_blk: fix poll request timeout handling s390/dasd: fix string length handling block: don't add or resize partition on the disk with GENHD_FL_NO_PART block: remove the call to file_remove_privs in blkdev_write_iter blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice() blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice() blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative blk-throttle: print signed value 'carryover_bytes/ios' for user
2023-09-08Merge tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: "A few fixes that should go into the 6.6-rc merge window: - Fix for a regression this merge window caused by the SQPOLL affinity patch, where we can race with SQPOLL thread shutdown and cause an oops when trying to set affinity (Gabriel) - Fix for a regression this merge window where fdinfo reading with for a ring setup with IORING_SETUP_NO_SQARRAY will attempt to deference the non-existing SQ ring array (me) - Add the patch that allows more finegrained control over who can use io_uring (Matteo) - Locking fix for a regression added this merge window for IOPOLL overflow (Pavel) - IOPOLL fix for stable, breaking our loop if helper threads are exiting (Pavel) Also had a fix for unreaped iopoll requests from io-wq from Ming, but we found an issue with that and hence it got reverted. Will get this sorted for a future rc" * tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linux: Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()" io_uring: fix unprotected iopoll overflow io_uring: break out of iowq iopoll on teardown io_uring: add a sysctl to disable io_uring system-wide io_uring/fdinfo: only print ->sq_array[] if it's there io_uring: fix IO hang in io_wq_put_and_exit from do_exit() io_uring: Don't set affinity on a dying sqpoll thread
2023-09-08selftests/ftrace: Fix dependencies for some of the synthetic event testsNaveen N Rao
Commit b81a3a100cca1b ("tracing/histogram: Add simple tests for stacktrace usage of synthetic events") changed the output text in tracefs README, but missed updating some of the dependencies specified in selftests. This causes some of the tests to exit as unsupported. Fix this by changing the grep pattern. Since we want these tests to work on older kernels, match only against the common last part of the pattern. Link: https://lore.kernel.org/linux-trace-kernel/20230614091046.2178539-1-naveen@kernel.org Cc: <linux-kselftest@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Fixes: b81a3a100cca ("tracing/histogram: Add simple tests for stacktrace usage of synthetic events") Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-08tracing: Remove unused trace_event_file dir fieldSteven Rostedt (Google)
Now that eventfs structure is used to create the events directory via the eventfs dynamically allocate code, the "dir" field of the trace_event_file structure is no longer used. Remove it. Link: https://lkml.kernel.org/r/20230908022001.580400115@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-08tracing: Use the new eventfs descriptor for print triggerSteven Rostedt (Google)
The check to create the print event "trigger" was using the obsolete "dir" value of the trace_event_file to determine if it should create the trigger or not. But that value will now be NULL because it uses the event file descriptor. Change it to test the "ef" field of the trace_event_file structure so that the trace_marker "trigger" file appears again. Link: https://lkml.kernel.org/r/20230908022001.371815239@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ajay Kaher <akaher@vmware.com> Fixes: 27152bceea1df ("eventfs: Move tracing/events to eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-08ring-buffer: Do not attempt to read past "commit"Steven Rostedt (Google)
When iterating over the ring buffer while the ring buffer is active, the writer can corrupt the reader. There's barriers to help detect this and handle it, but that code missed the case where the last event was at the very end of the page and has only 4 bytes left. The checks to detect the corruption by the writer to reads needs to see the length of the event. If the length in the first 4 bytes is zero then the length is stored in the second 4 bytes. But if the writer is in the process of updating that code, there's a small window where the length in the first 4 bytes could be zero even though the length is only 4 bytes. That will cause rb_event_length() to read the next 4 bytes which could happen to be off the allocated page. To protect against this, fail immediately if the next event pointer is less than 8 bytes from the end of the commit (last byte of data), as all events must be a minimum of 8 bytes anyway. Link: https://lore.kernel.org/all/20230905141245.26470-1-Tze-nan.Wu@mediatek.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230907122820.0899019c@gandalf.local.home Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Reported-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-08tracefs/eventfs: Free top level files on removalSteven Rostedt (Google)
When an instance is removed, the top level files of the eventfs directory are not cleaned up. Call the eventfs_remove() on each of the entries to free them. This was found via kmemleak: unreferenced object 0xffff8881047c1280 (size 96): comm "mkdir", pid 924, jiffies 4294906489 (age 2013.077s) hex dump (first 32 bytes): 18 31 ed 03 81 88 ff ff 00 31 09 24 81 88 ff ff .1.......1.$.... 00 00 00 00 00 00 00 00 98 19 7c 04 81 88 ff ff ..........|..... backtrace: [<000000000fa46b4d>] kmalloc_trace+0x2a/0xa0 [<00000000e729cd0c>] eventfs_prepare_ef.constprop.0+0x3a/0x160 [<000000009032e6a8>] eventfs_add_events_file+0xa0/0x160 [<00000000fe968442>] create_event_toplevel_files+0x6f/0x130 [<00000000e364d173>] event_trace_add_tracer+0x14/0x140 [<00000000411840fa>] trace_array_create_dir+0x52/0xf0 [<00000000967804fa>] trace_array_create+0x208/0x370 [<00000000da505565>] instance_mkdir+0x6b/0xb0 [<00000000dc1215af>] tracefs_syscall_mkdir+0x5b/0x90 [<00000000a8aca289>] vfs_mkdir+0x272/0x380 [<000000007709b242>] do_mkdirat+0xfc/0x1d0 [<00000000c0b6d219>] __x64_sys_mkdir+0x78/0xa0 [<0000000097b5dd4b>] do_syscall_64+0x3f/0x90 [<00000000a3f00cfa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 unreferenced object 0xffff888103ed3118 (size 8): comm "mkdir", pid 924, jiffies 4294906489 (age 2013.077s) hex dump (first 8 bytes): 65 6e 61 62 6c 65 00 00 enable.. backtrace: [<0000000010f75127>] __kmalloc_node_track_caller+0x51/0x160 [<000000004b3eca91>] kstrdup+0x34/0x60 [<0000000050074d7a>] eventfs_prepare_ef.constprop.0+0x53/0x160 [<000000009032e6a8>] eventfs_add_events_file+0xa0/0x160 [<00000000fe968442>] create_event_toplevel_files+0x6f/0x130 [<00000000e364d173>] event_trace_add_tracer+0x14/0x140 [<00000000411840fa>] trace_array_create_dir+0x52/0xf0 [<00000000967804fa>] trace_array_create+0x208/0x370 [<00000000da505565>] instance_mkdir+0x6b/0xb0 [<00000000dc1215af>] tracefs_syscall_mkdir+0x5b/0x90 [<00000000a8aca289>] vfs_mkdir+0x272/0x380 [<000000007709b242>] do_mkdirat+0xfc/0x1d0 [<00000000c0b6d219>] __x64_sys_mkdir+0x78/0xa0 [<0000000097b5dd4b>] do_syscall_64+0x3f/0x90 [<00000000a3f00cfa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Link: https://lore.kernel.org/linux-trace-kernel/20230907175859.6fedbaa2@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ajay Kaher <akaher@vmware.com> Cc: Zheng Yejian <zhengyejian1@huawei.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 5bdcd5f5331a2 eventfs: ("Implement removal of meta data from eventfs") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-09-08smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTUSteve French
There was a minor typo in the define for SMB2_GLOBAL_CAP_LARGE_MTU 0X00000004 instead of 0x00000004 make it consistent Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-08Merge tag 'thermal-6.6-rc1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "Eliminate an obsolete thermal zone registration function" * tag 'thermal-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Drop thermal_zone_device_register() thermal: Use thermal_tripless_zone_device_register() thermal: core: Add function for registering tripless thermal zones thermal: core: Clean up headers of thermal zone registration functions
2023-09-08md: fix warning for holder mismatch from export_rdev()Yu Kuai
Commit a1d767191096 ("md: use mddev->external to select holder in export_rdev()") fix the problem that 'claim_rdev' is used for blkdev_get_by_dev() while 'rdev' is used for blkdev_put(). However, if mddev->external is changed from 0 to 1, then 'rdev' is used for blkdev_get_by_dev() while 'claim_rdev' is used for blkdev_put(). And this problem can be reporduced reliably by following: New file: mdadm/tests/23rdev-lifetime devname=${dev0##*/} devt=`cat /sys/block/$devname/dev` pid="" runtime=2 clean_up_test() { pill -9 $pid echo clear > /sys/block/md0/md/array_state } trap 'clean_up_test' EXIT add_by_sysfs() { while true; do echo $devt > /sys/block/md0/md/new_dev done } remove_by_sysfs(){ while true; do echo remove > /sys/block/md0/md/dev-${devname}/state done } echo md0 > /sys/module/md_mod/parameters/new_array || die "create md0 failed" add_by_sysfs & pid="$pid $!" remove_by_sysfs & pid="$pid $!" sleep $runtime exit 0 Test cmd: ./test --save-logs --logdir=/tmp/ --keep-going --dev=loop --tests=23rdev-lifetime Test result: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 960 at block/bdev.c:618 blkdev_put+0x27c/0x330 Modules linked in: multipath md_mod loop CPU: 0 PID: 960 Comm: test Not tainted 6.5.0-rc2-00121-g01e55c376936-dirty #50 RIP: 0010:blkdev_put+0x27c/0x330 Call Trace: <TASK> export_rdev.isra.23+0x50/0xa0 [md_mod] mddev_unlock+0x19d/0x300 [md_mod] rdev_attr_store+0xec/0x190 [md_mod] sysfs_kf_write+0x52/0x70 kernfs_fop_write_iter+0x19a/0x2a0 vfs_write+0x3b5/0x770 ksys_write+0x74/0x150 __x64_sys_write+0x22/0x30 do_syscall_64+0x40/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix the problem by recording if 'rdev' is used as holder. Fixes: a1d767191096 ("md: use mddev->external to select holder in export_rdev()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825025532.1523008-3-yukuai1@huaweicloud.com
2023-09-08md: don't dereference mddev after export_rdev()Yu Kuai
Except for initial reference, mddev->kobject is referenced by rdev->kobject, and if the last rdev is freed, there is no guarantee that mddev is still valid. Hence mddev should not be used anymore after export_rdev(). This problem can be triggered by following test for mdadm at very low rate: New file: mdadm/tests/23rdev-lifetime devname=${dev0##*/} devt=`cat /sys/block/$devname/dev` pid="" runtime=2 clean_up_test() { pill -9 $pid echo clear > /sys/block/md0/md/array_state } trap 'clean_up_test' EXIT add_by_sysfs() { while true; do echo $devt > /sys/block/md0/md/new_dev done } remove_by_sysfs(){ while true; do echo remove > /sys/block/md0/md/dev-${devname}/state done } echo md0 > /sys/module/md_mod/parameters/new_array || die "create md0 failed" add_by_sysfs & pid="$pid $!" remove_by_sysfs & pid="$pid $!" sleep $runtime exit 0 Test cmd: ./test --save-logs --logdir=/tmp/ --keep-going --dev=loop --tests=23rdev-lifetime Test result: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bcb: 0000 [#4] PREEMPT SMP CPU: 0 PID: 1292 Comm: test Tainted: G D W 6.5.0-rc2-00121-g01e55c376936 #562 RIP: 0010:md_wakeup_thread+0x9e/0x320 [md_mod] Call Trace: <TASK> mddev_unlock+0x1b6/0x310 [md_mod] rdev_attr_store+0xec/0x190 [md_mod] sysfs_kf_write+0x52/0x70 kernfs_fop_write_iter+0x19a/0x2a0 vfs_write+0x3b5/0x770 ksys_write+0x74/0x150 __x64_sys_write+0x22/0x30 do_syscall_64+0x40/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix this problem by don't dereference mddev after export_rdev(). Fixes: 3ce94ce5d05a ("md: fix duplicate filename for rdev") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230825025532.1523008-2-yukuai1@huaweicloud.com
2023-09-08Merge tag 'pm-6.6-rc1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix an Intel RAPL power capping driver regression introduced during the 6.5 development cycle (Srinivas Pandruvada)" * tag 'pm-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: powercap: intel_rapl: Fix invalid setting of Power Limit 4
2023-09-08Merge tag 'gpio-fixes-for-v6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fix from Bartosz Golaszewski: - fix a regression in irqchip setup in gpio-zynq * tag 'gpio-fixes-for-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: zynq: restore zynq_gpio_irq_reqres/zynq_gpio_irq_relres callbacks
2023-09-08Revert "PCI: Mark NVIDIA T4 GPUs to avoid bus reset"Bjorn Helgaas
This reverts commit d5af729dc2071273f14cbb94abbc60608142fd83. d5af729dc207 ("PCI: Mark NVIDIA T4 GPUs to avoid bus reset") avoided Secondary Bus Reset on the T4 because the reset seemed to not work when the T4 was directly attached to a Root Port. But NVIDIA thinks the issue is probably related to some issue with the Root Port, not with the T4. The T4 provides neither PM nor FLR reset, so masking bus reset compromises this device for assignment scenarios. Revert d5af729dc207 as requested by Wu Zongyong. This will leave SBR broken in the specific configuration Wu tested, as it was in v6.5, so Wu will debug that further. Link: https://lore.kernel.org/r/ZPqMCDWvITlOLHgJ@wuzongyong-alibaba Link: https://lore.kernel.org/r/20230908201104.GA305023@bhelgaas Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2023-09-08Merge tag 'sound-fix-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of fixes for 6.6-rc1. All small and easy ones. - The corrections of the previous PCM iov_iter transitions - Regression fixes in MIDI 2.0 / USB changes - Various ASoC codec fixes for Cirrus, Realtek, WCD - ASoC AMD quirks and ASoC Intel AVS driver workaround" * tag 'sound-fix-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits) ALSA: hda/realtek - ALC287 I2S speaker platform support ASoC: amd: yc: Fix a non-functional mic on Lenovo 82TL ASoC: Intel: avs: Provide support for fallback topology ALSA: seq: Fix snd_seq_expand_var_event() call to user-space ALSA: usb-audio: Fix potential memory leaks at error path for UMP open ALSA: hda/cirrus: Fix broken audio on hardware with two CS42L42 codecs. ASoC: rt5645: NULL pointer access when removing jack ASoC: amd: yc: Add DMI entries to support Victus by HP Gaming Laptop 15-fb0xxx (8A3E) MAINTAINERS: Update the MAINTAINERS enties for TEXAS INSTRUMENTS ASoC DRIVERS ALSA: sb: Fix wrong argument in commented code ALSA: pcm: Fix error checks of default read/write copy ops ASoC: Name iov_iter argument as iterator instead of buffer ASoC: dmaengine: Drop unused iov_iter for process callback ALSA: hda/tas2781: Use standard clamp() macro ASoC: cs35l56: Waiting for firmware to boot must be tolerant of I/O errors ASoC: dt-bindings: fsl_easrc: Add support for imx8mp-easrc ASoC: cs42l43: Fix missing error code in cs42l43_codec_probe() ASoC: cs35l45: Rename DACPCM1 Source control ASoC: cs35l45: Fix "Dead assigment" warning ASoC: cs35l45: Add support for Chip ID 0x35A460 ...
2023-09-08Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The main one is a fix for a broken strscpy() conversion that landed in the merge window and broke early parsing of the kernel command line. - Fix an incorrect mask in the CXL PMU driver - Fix a regression in early parsing of the kernel command line - Fix an IP checksum OoB access reported by syzbot" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: csum: Fix OoB access in IP checksum code for negative lengths arm64/sysreg: Fix broken strncpy() -> strscpy() conversion perf: CXL: fix mismatched number of counters mask
2023-09-08Merge tag 'loongarch-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Allow usage of LSX/LASX in the kernel, and use them for SIMD-optimized RAID5/RAID6 routines - Add Loongson Binary Translation (LBT) extension support - Add basic KGDB & KDB support - Add building with kcov coverage - Add KFENCE (Kernel Electric-Fence) support - Add KASAN (Kernel Address Sanitizer) support - Some bug fixes and other small changes - Update the default config file * tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits) LoongArch: Update Loongson-3 default config file LoongArch: Add KASAN (Kernel Address Sanitizer) support LoongArch: Simplify the processing of jumping new kernel for KASLR kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping LoongArch: Add KFENCE (Kernel Electric-Fence) support LoongArch: Get partial stack information when providing regs parameter LoongArch: mm: Add page table mapped mode support for virt_to_page() kfence: Defer the assignment of the local variable addr LoongArch: Allow building with kcov coverage LoongArch: Provide kaslr_offset() to get kernel offset LoongArch: Add basic KGDB & KDB support LoongArch: Add Loongson Binary Translation (LBT) extension support raid6: Add LoongArch SIMD recovery implementation raid6: Add LoongArch SIMD syndrome calculation LoongArch: Add SIMD-optimized XOR routines LoongArch: Allow usage of LSX/LASX in the kernel LoongArch: Define symbol 'fault' as a local label in fpu.S LoongArch: Adjust {copy, clear}_user exception handler behavior LoongArch: Use static defined zero page rather than allocated ...
2023-09-08Merge tag 'printk-for-6.6-fixup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Revert exporting symbols needed for dumping the raw printk buffer in panic(). I pushed the export prematurely before the user was ready for merging into the mainline. * tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: Revert "printk: export symbols for debug modules"
2023-09-08Merge tag 'landlock-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "One test fix and a __counted_by annotation" * tag 'landlock-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: selftests/landlock: Fix a resource leak landlock: Annotate struct landlock_rule with __counted_by
2023-09-08soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if ↵Lad Prabhakar
dependencies are met To prevent randconfig build issues when enabling the RZ/Five SoC, consider selecting specific configurations only when their dependencies are satisfied. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308311610.ec6bm2G8-lkp@intel.com/ Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Fixes: 484861e09f3e ("soc: renesas: Kconfig: Select the required configs for RZ/Five SoC") Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230901110936.313171-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES configLad Prabhakar
Andes errata uses sbi_ecalll() which is only available if RISCV_SBI is enabled. So add an dependency for RISCV_SBI in ERRATA_ANDES config to avoid any build failures. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308311610.ec6bm2G8-lkp@intel.com/ Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20230901110320.312674-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO configLad Prabhakar
Now that RISCV_DMA_NONCOHERENT conditionally selects DMA_DIRECT_REMAP ie only if MMU is enabled, we no longer need the MMU dependency in ERRATA_ANDES_CMO config. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20230901105858.311745-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabledLad Prabhakar
kernel/dma/mapping.c has its use of pgprot_dmacoherent() inside an #ifdef CONFIG_MMU block. kernel/dma/pool.c has its use of pgprot_dmacoherent() inside an #ifdef CONFIG_DMA_DIRECT_REMAP block. So select DMA_DIRECT_REMAP only if MMU is enabled for RISCV_DMA_NONCOHERENT config. This avoids users to explicitly select MMU. Suggested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://lore.kernel.org/r/20230901105111.311200-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"Palmer Dabbelt
Puranjay Mohan <puranjay12@gmail.com> says: Here is some data to prove the V2 fixes the problem: Without this series: root@rv-selftester:~/src/kselftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m47.562s user 0m24.145s sys 6m37.064s With this series applied: root@rv-selftester:~/src/selftest/bpf# time ./test_tag test_tag: OK (40945 tests) real 7m29.472s user 0m25.865s sys 6m18.401s BPF programs currently consume a page each on RISCV. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system. Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT. I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now. This patch series enables the BPF prog pack allocator for the RISCV BPF JIT. ====================================================== Performance Analysis of prog pack allocator on RISCV64 ====================================================== Test setup: =========== Host machine: Debian GNU/Linux 11 (bullseye) Qemu Version: QEMU emulator version 8.0.3 (Debian 1:8.0.3+dfsg-1) u-boot-qemu Version: 2023.07+dfsg-1 opensbi Version: 1.3-1 To test the performance of the BPF prog pack allocator on RV, a stresser tool[4] linked below was built. This tool loads 8 BPF programs on the system and triggers 5 of them in an infinite loop by doing system calls. The runner script starts 20 instances of the above which loads 8*20=160 BPF programs on the system, 5*20=100 of which are being constantly triggered. The script is passed a command which would be run in the above environment. The script was run with following perf command: ./run.sh "perf stat -a \ -e iTLB-load-misses \ -e dTLB-load-misses \ -e dTLB-store-misses \ -e instructions \ --timeout 60000" The output of the above command is discussed below before and after enabling the BPF prog pack allocator. The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs was created using Bjorn's riscv-cross-builder[5] docker container linked below. Results ======= Before enabling prog pack allocator: ------------------------------------ Performance counter stats for 'system wide': 4939048 iTLB-load-misses 5468689 dTLB-load-misses 465234 dTLB-store-misses 1441082097998 instructions 60.045791200 seconds time elapsed After enabling prog pack allocator: ----------------------------------- Performance counter stats for 'system wide': 3430035 iTLB-load-misses 5008745 dTLB-load-misses 409944 dTLB-store-misses 1441535637988 instructions 60.046296600 seconds time elapsed Improvements in metrics ======================= It was expected that the iTLB-load-misses would decrease as now a single huge page is used to keep all the BPF programs compared to a single page for each program earlier. -------------------------------------------- The improvement in iTLB-load-misses: -30.5 % -------------------------------------------- I repeated this expriment more than 100 times in different setups and the improvement was always greater than 30%. This patch series is boot tested on the Starfive VisionFive 2 board[6]. The performance analysis was not done on the board because it doesn't expose iTLB-load-misses, etc. The stresser program was run on the board to test the loading and unloading of BPF programs [1] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/ [2] https://lore.kernel.org/all/20230626085811.3192402-1-puranjay12@gmail.com/ [3] https://lore.kernel.org/all/20230626085811.3192402-2-puranjay12@gmail.com/ [4] https://github.com/puranjaymohan/BPF-Allocator-Bench [5] https://github.com/bjoto/riscv-cross-builder [6] https://www.starfivetech.com/en/site/boards * b4-shazam-merge: bpf, riscv: use prog pack allocator in the BPF JIT riscv: implement a memset like function for text riscv: extend patch_text_nosync() for multiple pages bpf: make bpf_prog_pack allocator portable Link: https://lore.kernel.org/r/20230831131229.497941-1-puranjay12@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "riscv: Introduce KASLR"Palmer Dabbelt
Alexandre Ghiti <alexghiti@rivosinc.com> says: The following KASLR implementation allows to randomize the kernel mapping: - virtually: we expect the bootloader to provide a seed in the device-tree - physically: only implemented in the EFI stub, it relies on the firmware to provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation hence the patch 3 factorizes KASLR related functions for riscv to take advantage. The new virtual kernel location is limited by the early page table that only has one PUD and with the PMD alignment constraint, the kernel can only take < 512 positions. * b4-shazam-merge: riscv: libstub: Implement KASLR by using generic functions libstub: Fix compilation warning for rv32 arm64: libstub: Move KASLR handling functions to kaslr.c riscv: Dump out kernel offset information on panic riscv: Introduce virtual kernel mapping KASLR Link: https://lore.kernel.org/r/20230722123850.634544-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch "RISC-V: Add ptrace support for vectors"Palmer Dabbelt
This resurrects the vector ptrace() support that was removed for 6.5 due to some bugs cropping up as part of the GDB review process. * b4-shazam-merge: RISC-V: Add ptrace support for vectors Link: https://lore.kernel.org/r/20230825050248.32681-1-andy.chiu@sifive.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "Add non-coherent DMA support for AX45MP"Palmer Dabbelt
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> non-coherent DMA support for AX45MP ==================================== On the Andes AX45MP core, cache coherency is a specification option so it may not be supported. In this case DMA will fail. To get around with this issue this patch series does the below: 1] Andes alternative ports is implemented as errata which checks if the IOCP is missing and only then applies to CMO errata. One vendor specific SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of errata. Below are the configs which Andes port provides (and are selected by RZ/Five): - ERRATA_ANDES - ERRATA_ANDES_CMO OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now part v1.3 release. 2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA) block that allows dynamic adjustment of memory attributes in the runtime. It contains a configurable amount of PMA entries implemented as CSR registers to control the attributes of memory locations in interest. OpenSBI configures the PMA regions as required and creates a reserve memory node and propagates it to the higher boot stack. Currently OpenSBI (upstream) configures the required PMA region and passes this a shared DMA pool to Linux. reserved-memory { #address-cells = <2>; #size-cells = <2>; ranges; pma_resv0@58000000 { compatible = "shared-dma-pool"; reg = <0x0 0x58000000 0x0 0x08000000>; no-map; linux,dma-default; }; }; The above shared DMA pool gets appended to Linux DTB so the DMA memory requests go through this region. 3] We provide callbacks to synchronize specific content between memory and cache. 4] RZ/Five SoC selects the below configs - AX45MP_L2_CACHE - DMA_GLOBAL_POOL - ERRATA_ANDES - ERRATA_ANDES_CMO ----------x---------------------x--------------------x---------------x---- * b4-shazam-merge: soc: renesas: Kconfig: Select the required configs for RZ/Five SoC cache: Add L2 cache management for Andes AX45MP RISC-V core dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller riscv: mm: dma-noncoherent: nonstandard cache operations support riscv: errata: Add Andes alternative ports riscv: asm: vendorid_list: Add Andes Technology to the vendors list Link: https://lore.kernel.org/r/20230818135723.80612-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "riscv: dma-mapping: unify support for cache flushes"Palmer Dabbelt
Prabhakar <prabhakar.csengg@gmail.com> says: From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> This patch series is a subset from Arnd's original series [0]. Ive just picked up the bits required for RISC-V unification of cache flushing. Remaining patches from the series [0] will be taken care by Arnd soon. * b4-shazam-merge: riscv: dma-mapping: switch over to generic implementation riscv: dma-mapping: skip invalidation before bidirectional DMA riscv: dma-mapping: only invalidate after DMA, not flush Link: https://lore.kernel.org/r/20230816232336.164413-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08Merge patch series "RISC-V: Probe for misaligned access speed"Palmer Dabbelt
Evan Green <evan@rivosinc.com> says: The current setting for the hwprobe bit indicating misaligned access speed is controlled by a vendor-specific feature probe function. This is essentially a per-SoC table we have to maintain on behalf of each vendor going forward. Let's convert that instead to something we detect at runtime. We have two assembly routines at the heart of our probe: one that does a bunch of word-sized accesses (without aligning its input buffer), and the other that does byte accesses. If we can move a larger number of bytes using misaligned word accesses than we can with the same amount of time doing byte accesses, then we can declare misaligned accesses as "fast". The tradeoff of reducing this maintenance burden is boot time. We spend 4-6 jiffies per core doing this measurement (0-2 on jiffie edge alignment, and 4 on measurement). The timing loop was based on raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number of algorithms). By taking only the fastest iteration out of all attempts for use in the comparison, variance between runs is very low. On my THead C906, it looks like this: [ 0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast Several others have chimed in with results on slow machines with the older algorithm, which took all runs into account, including noise like interrupts. Even with this variation, results indicate that in all cases (fast, slow, and emulated) the measured numbers are nowhere near each other (always multiple factors away). * b4-shazam-merge: RISC-V: alternative: Remove feature_probe_func RISC-V: Probe for unaligned access speed Link: https://lore.kernel.org/r/20230818194136.4084400-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-09-08selftests: Keep symlinks, when possibleBjörn Töpel
When kselftest is built/installed with the 'gen_tar' target, rsync is used for the installation step to copy files. Extra care is needed for tests that have symlinks. Commit ae108c48b5d2 ("selftests: net: Fix cross-tree inclusion of scripts") added '-L' (transform symlink into referent file/dir) to rsync, to fix dangling links. However, that broke some tests where the symlink (being a symlink) is part of the test (e.g. exec:execveat). Use rsync's '--copy-unsafe-links' that does right thing. Fixes: ae108c48b5d2 ("selftests: net: Fix cross-tree inclusion of scripts") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-09-08selftests: fix dependency checker scriptRicardo B. Marliere
This patch fixes inconsistencies in the parsing rules of the levels 1 and 2 of the kselftest_deps.sh. It was added the levels 4 and 5 to account for a few edge cases that are present in some tests, also some minor identation styling have been fixed (s/ /\t/g). Signed-off-by: Ricardo B. Marliere <rbmarliere@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-09-08kselftest/runner.sh: Propagate SIGTERM to runner childBjörn Töpel
Timeouts in kselftest are done using the "timeout" command with the "--foreground" option. Without the "foreground" option, it is not possible for a user to cancel the runner using SIGINT, because the signal is not propagated to timeout which is running in a different process group. The "forground" options places the timeout in the same process group as its parent, but only sends the SIGTERM (on timeout) signal to the forked process. Unfortunately, this does not play nice with all kselftests, e.g. "net:fcnal-test.sh", where the child processes will linger because timeout does not send SIGTERM to the group. Some users have noted these hangs [1]. Fix this by nesting the timeout with an additional timeout without the foreground option. Link: https://lore.kernel.org/all/7650b2eb-0aee-a2b0-2e64-c9bc63210f67@alu.unizg.hr/ # [1] Fixes: 651e0d881461 ("kselftest/runner: allow to properly deliver signals to tests") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2023-09-08MAINTAINERS: remove links to obsolete btrfs.wiki.kernel.orgBhaskar Chowdhury
The wiki has been archived and is not updated anymore. Remove or replace the links in files that contain it (MAINTAINERS, Kconfig, docs). Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: assert delayed node locked when removing delayed itemFilipe Manana
When removing a delayed item, or releasing which will remove it as well, we will modify one of the delayed node's rbtrees and item counter if the delayed item is in one of the rbtrees. This require having the delayed node's mutex locked, otherwise we will race with other tasks modifying the rbtrees and the counter. This is motivated by a previous version of another patch actually calling btrfs_release_delayed_item() after unlocking the delayed node's mutex and against a delayed item that is in a rbtree. So assert at __btrfs_remove_delayed_item() that the delayed node's mutex is locked. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: remove BUG() after failure to insert delayed dir index itemFilipe Manana
Instead of calling BUG() when we fail to insert a delayed dir index item into the delayed node's tree, we can just release all the resources we have allocated/acquired before and return the error to the caller. This is fine because all existing call chains undo anything they have done before calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending snapshots in the transaction commit path). So remove the BUG() call and do proper error handling. This relates to a syzbot report linked below, but does not fix it because it only prevents hitting a BUG(), it does not fix the issue where somehow we attempt to use twice the same index number for different index items. Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/ CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: improve error message after failure to add delayed dir index itemFilipe Manana
If we fail to add a delayed dir index item because there's already another item with the same index number, we print an error message (and then BUG). However that message isn't very helpful to debug anything because we don't know what's the index number and what are the values of index counters in the inode and its delayed inode (index_cnt fields of struct btrfs_inode and struct btrfs_delayed_node). So update the error message to include the index number and counters. We actually had a recent case where this issue was hit by a syzbot report (see the link below). Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: fix a compilation error if DEBUG is defined in btree_dirty_folioQu Wenruo
[BUG] After commit 72a69cd03082 ("btrfs: subpage: pack all subpage bitmaps into a larger bitmap"), the DEBUG section of btree_dirty_folio() would no longer compile. [CAUSE] If DEBUG is defined, we would do extra checks for btree_dirty_folio(), mostly to make sure the range we marked dirty has an extent buffer and that extent buffer is dirty. For subpage, we need to iterate through all the extent buffers covered by that page range, and make sure they all matches the criteria. However commit 72a69cd03082 ("btrfs: subpage: pack all subpage bitmaps into a larger bitmap") changes how we store the bitmap, we pack all the 16 bits bitmaps into a larger bitmap, which would save some space. This means we no longer have btrfs_subpage::dirty_bitmap, instead the dirty bitmap is starting at btrfs_subpage_info::dirty_offset, and has a length of btrfs_subpage_info::bitmap_nr_bits. [FIX] Although I'm not sure if it still makes sense to maintain such code, at least let it compile. This patch would let us test the bits one by one through the bitmaps. CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: check for BTRFS_FS_ERROR in pending ordered assertJosef Bacik
If we do fast tree logging we increment a counter on the current transaction for every ordered extent we need to wait for. This means we expect the transaction to still be there when we clear pending on the ordered extent. However if we happen to abort the transaction and clean it up, there could be no running transaction, and thus we'll trip the "ASSERT(trans)" check. This is obviously incorrect, and the code properly deals with the case that the transaction doesn't exist. Fix this ASSERT() to only fire if there's no trans and we don't have BTRFS_FS_ERROR() set on the file system. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: fix lockdep splat and potential deadlock after failure running ↵Filipe Manana
delayed items When running delayed items we are holding a delayed node's mutex and then we will attempt to modify a subvolume btree to insert/update/delete the delayed items. However if have an error during the insertions for example, btrfs_insert_delayed_items() may return with a path that has locked extent buffers (a leaf at the very least), and then we attempt to release the delayed node at __btrfs_run_delayed_items(), which requires taking the delayed node's mutex, causing an ABBA type of deadlock. This was reported by syzbot and the lockdep splat is the following: WARNING: possible circular locking dependency detected 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Not tainted ------------------------------------------------------ syz-executor.2/13257 is trying to acquire lock: ffff88801835c0c0 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 but task is already holding lock: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (btrfs-tree-00){++++}-{3:3}: __lock_release kernel/locking/lockdep.c:5475 [inline] lock_release+0x36f/0x9d0 kernel/locking/lockdep.c:5781 up_write+0x79/0x580 kernel/locking/rwsem.c:1625 btrfs_tree_unlock_rw fs/btrfs/locking.h:189 [inline] btrfs_unlock_up_safe+0x179/0x3b0 fs/btrfs/locking.c:239 search_leaf fs/btrfs/ctree.c:1986 [inline] btrfs_search_slot+0x2511/0x2f80 fs/btrfs/ctree.c:2230 btrfs_insert_empty_items+0x9c/0x180 fs/btrfs/ctree.c:4376 btrfs_insert_delayed_item fs/btrfs/delayed-inode.c:746 [inline] btrfs_insert_delayed_items fs/btrfs/delayed-inode.c:824 [inline] __btrfs_commit_inode_delayed_items+0xd24/0x2410 fs/btrfs/delayed-inode.c:1111 __btrfs_run_delayed_items+0x1db/0x430 fs/btrfs/delayed-inode.c:1153 flush_space+0x269/0xe70 fs/btrfs/space-info.c:723 btrfs_async_reclaim_metadata_space+0x106/0x350 fs/btrfs/space-info.c:1078 process_one_work+0x92c/0x12c0 kernel/workqueue.c:2600 worker_thread+0xa63/0x1210 kernel/workqueue.c:2751 kthread+0x2b8/0x350 kernel/kthread.c:389 ret_from_fork+0x2e/0x60 arch/x86/kernel/process.c:145 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 __mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603 __mutex_lock kernel/locking/mutex.c:747 [inline] mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799 __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline] __btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156 btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276 btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988 vfs_fsync_range fs/sync.c:188 [inline] vfs_fsync fs/sync.c:202 [inline] do_fsync fs/sync.c:212 [inline] __do_sys_fsync fs/sync.c:220 [inline] __se_sys_fsync fs/sync.c:218 [inline] __x64_sys_fsync+0x196/0x1e0 fs/sync.c:218 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-tree-00); lock(&delayed_node->mutex); lock(btrfs-tree-00); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by syz-executor.2/13257: #0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: spin_unlock include/linux/spinlock.h:391 [inline] #0: ffff88802c1ee370 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0xb87/0xe00 fs/btrfs/transaction.c:287 #1: ffff88802c1ee398 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0xbb2/0xe00 fs/btrfs/transaction.c:288 #2: ffff88802a5ab8e8 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_lock+0x3c/0x2a0 fs/btrfs/locking.c:198 stack backtrace: CPU: 0 PID: 13257 Comm: syz-executor.2 Not tainted 6.5.0-rc7-syzkaller-00024-g93f5de5f648d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 check_noncircular+0x375/0x4a0 kernel/locking/lockdep.c:2195 check_prev_add kernel/locking/lockdep.c:3142 [inline] check_prevs_add kernel/locking/lockdep.c:3261 [inline] validate_chain kernel/locking/lockdep.c:3876 [inline] __lock_acquire+0x39ff/0x7f70 kernel/locking/lockdep.c:5144 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5761 __mutex_lock_common+0x1d8/0x2530 kernel/locking/mutex.c:603 __mutex_lock kernel/locking/mutex.c:747 [inline] mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799 __btrfs_release_delayed_node+0x9a/0xaa0 fs/btrfs/delayed-inode.c:256 btrfs_release_delayed_node fs/btrfs/delayed-inode.c:281 [inline] __btrfs_run_delayed_items+0x2b5/0x430 fs/btrfs/delayed-inode.c:1156 btrfs_commit_transaction+0x859/0x2ff0 fs/btrfs/transaction.c:2276 btrfs_sync_file+0xf56/0x1330 fs/btrfs/file.c:1988 vfs_fsync_range fs/sync.c:188 [inline] vfs_fsync fs/sync.c:202 [inline] do_fsync fs/sync.c:212 [inline] __do_sys_fsync fs/sync.c:220 [inline] __se_sys_fsync fs/sync.c:218 [inline] __x64_sys_fsync+0x196/0x1e0 fs/sync.c:218 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3ad047cae9 Code: 28 00 00 00 75 (...) RSP: 002b:00007f3ad12510c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 00007f3ad059bf80 RCX: 00007f3ad047cae9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 RBP: 00007f3ad04c847a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f3ad059bf80 R15: 00007ffe56af92f8 </TASK> ------------[ cut here ]------------ Fix this by releasing the path before releasing the delayed node in the error path at __btrfs_run_delayed_items(). Reported-by: syzbot+a379155f07c134ea9879@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/000000000000abba27060403b5bd@google.com/ CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-09-08btrfs: do not block starts waiting on previous transaction commitJosef Bacik
Internally I got a report of very long stalls on normal operations like creating a new file when auto relocation was running. The reporter used the 'bpf offcputime' tracer to show that we would get stuck in start_transaction for 5 to 30 seconds, and were always being woken up by the transaction commit. Using my timing-everything script, which times how long a function takes and what percentage of that total time is taken up by its children, I saw several traces like this 1083 took 32812902424 ns 29929002926 ns 91.2110% wait_for_commit_duration 25568 ns 7.7920e-05% commit_fs_roots_duration 1007751 ns 0.00307% commit_cowonly_roots_duration 446855602 ns 1.36182% btrfs_run_delayed_refs_duration 271980 ns 0.00082% btrfs_run_delayed_items_duration 2008 ns 6.1195e-06% btrfs_apply_pending_changes_duration 9656 ns 2.9427e-05% switch_commit_roots_duration 1598 ns 4.8700e-06% btrfs_commit_device_sizes_duration 4314 ns 1.3147e-05% btrfs_free_log_root_tree_duration Here I was only tracing functions that happen where we are between START_COMMIT and UNBLOCKED in order to see what would be keeping us blocked for so long. The wait_for_commit() we do is where we wait for a previous transaction that hasn't completed it's commit. This can include all of the unpin work and other cleanups, which tends to be the longest part of our transaction commit. There is no reason we should be blocking new things from entering the transaction at this point, it just adds to random latency spikes for no reason. Fix this by adding a PREP stage. This allows us to properly deal with multiple committers coming in at the same time, we retain the behavior that the winner waits on the previous transaction and the losers all wait for this transaction commit to occur. Nothing else is blocked during the PREP stage, and then once the wait is complete we switch to COMMIT_START and all of the same behavior as before is maintained. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>