summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-09-04Merge tag 'mfd-next-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull NFD updates from Lee Jones: "New Drivers: - Add support for the Cirrus Logic CS42L43 Audio CODEC Fix-ups: - Make use of specific printk() format tags for various optimisations - Kconfig / module modifications / tweaking - Simplify obtaining resources (memory, device data) using unified API helpers - Bunch of Device Tree additions, conversions and adaptions - Convert a bunch of Regmap configurations to use the Maple Tree cache - Ensure correct includes are present and remove some that are not required - Remove superfluous code - Reduce amount of cycles spent in critical sections - Omit the use of redundant casts and if relevant replace with better ones - Swap out raw_spin_{un}lock_irq{save,restore}() for spin_{un}lock_irq{save,restore}() Bug Fixes: - Repair theoretical deadlock situation - Fix some link-time dependencies - Use more appropriate datatype when casting" * tag 'mfd-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (70 commits) mfd: mc13xxx: Simplify device data fetching in probe() mfd: rz-mtu3: Replace raw_spin_lock->spin_lock() mfd: rz-mtu3: Reduce critical sections mfd: mxs-lradc: Fix Wvoid-pointer-to-enum-cast warning mfd: wm31x: Fix Wvoid-pointer-to-enum-cast warning mfd: wm8994: Fix Wvoid-pointer-to-enum-cast warning mfd: tc3589: Fix Wvoid-pointer-to-enum-cast warning mfd: lp87565: Fix Wvoid-pointer-to-enum-cast warning mfd: hi6421-pmic: Fix Wvoid-pointer-to-enum-cast warning mfd: max77541: Fix Wvoid-pointer-to-enum-cast warning mfd: max14577: Fix Wvoid-pointer-to-enum-cast warning mfd: stmpe: Fix Wvoid-pointer-to-enum-cast warning mfd: rn5t618: Remove redundant of_match_ptr() mfd: lochnagar-i2c: Remove redundant of_match_ptr() mfd: stpmic1: Remove redundant of_match_ptr() mfd: act8945a: Remove redundant of_match_ptr() mfd: rsmu_spi: Remove redundant of_match_ptr() mfd: altera-a10sr: Remove redundant of_match_ptr() mfd: rsmu_i2c: Remove redundant of_match_ptr() mfd: tc3589x: Remove redundant of_match_ptr() ...
2023-09-04Merge tag 'i2c-for-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c updates from Wolfram Sang: "I2C has mainly cleanups this time and a few driver improvements. Because a lot of developers were on holidays (including myself) it was a good timing to apply lots of cleanups which would normally cause merge conflicts with other floating patches. Extra thanks go to Andi Shyti who backed me up when I was on a four week hiatus. This is also the reason that some patches were commited later than ideal" * tag 'i2c-for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (67 commits) i2c: at91: Use dev_err_probe() instead of dev_err() I2C: ali15x3: Do PCI error checks on own line i2c: Make return value check more accurate and explicit for devm_pinctrl_get() i2c: designware: Add support for recovery when GPIO need pinctrl i2c: mlxcpld: Add support for extended transaction length i2c: mlxcpld: Allow driver to run on ARM64 architecture i2c: nforce2: Do PCI error check on own line i2c: sis5595: Do PCI error checks on own line i2c: qcom-cci: Fix error checking in cci_probe() i2c: muxes: pca954x: Add regulator support i2c: muxes: pca954x: Add MAX735x/MAX736x support dt-bindings: i2c: Add Maxim MAX735x/MAX736x variants dt-bindings: i2c: pca954x: Correct interrupt support i2c: pnx: Use devm_platform_get_and_ioremap_resource() i2c: pxa: Use devm_platform_get_and_ioremap_resource() i2c: s3c2410: Use devm_platform_get_and_ioremap_resource() i2c: sh_mobile: Use devm_platform_get_and_ioremap_resource() i2c: st: Use devm_platform_get_and_ioremap_resource() i2c: qcom-geni: Convert to devm_platform_ioremap_resource() i2c: stm32f4: Use devm_platform_get_and_ioremap_resource() ...
2023-09-04Merge tag 'printk-for-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Do not try to get the console lock when it is not need or useful in panic() - Replace the global console_suspended state by a per-console flag - Export symbols needed for dumping the raw printk buffer in panic() - Fix documentation of printf formats for integer types - Moved Sergey Senozhatsky to the reviewer role - Misc cleanups * tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: export symbols for debug modules lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix() printk: ringbuffer: Fix truncating buffer size min_t cast printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() printk: Add per-console suspended state printk: Consolidate console deferred printing printk: Do not take console lock for console_flush_on_panic() printk: Keep non-panic-CPUs out of console lock printk: Reduce console_unblank() usage in unsafe scenarios kdb: Do not assume write() callback available docs: printk-formats: Treat char as always unsigned docs: printk-formats: Fix hex printing of signed values MAINTAINERS: adjust printk/vsprintf entries
2023-09-04Merge tag 'timers-core-2023-09-04-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource/clockevent driver updates from Thomas Gleixner: - Remove the OXNAS driver instead of adding a new one! - A set of boring fixes, cleanups and improvements * tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Explicitly include correct DT includes clocksource/drivers/sun5i: Convert to platform device driver clocksource/drivers/sun5i: Remove pointless struct clocksource/drivers/sun5i: Remove duplication of code and data clocksource/drivers/loongson1: Set variable ls1x_timer_lock storage-class-specifier to static clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL dt-bindings: timer: oxsemi,rps-timer: remove obsolete bindings clocksource/drivers/timer-oxnas-rps: Remove obsolete timer driver
2023-09-04tpm: Enable hwrng only for Pluton on AMD CPUsJarkko Sakkinen
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the reported systems the TPM doesn't reply at bootup and returns back the command code. This makes the TPM fail probe on Lenovo Legion Y540 laptop. Since only Microsoft Pluton is the only known combination of AMD CPU and fTPM from other vendor, disable hwrng otherwise. In order to make sysadmin aware of this, print also info message to the klog. Cc: stable@vger.kernel.org Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804 Reported-by: Patrick Steinhardt <ps@pks.im> Reported-by: Raymond Jay Golo <rjgolo@gmail.com> Reported-by: Ronan Pigott <ronan@rjp.ie> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-09-04tpm_crb: Fix an error handling path in crb_acpi_add()Christophe JAILLET
Some error paths don't call acpi_put_table() before returning. Branch to the correct place instead of doing some direct return. Fixes: 4d2732882703 ("tpm_crb: Add support for CRB devices based on Pluton") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Matthew Garrett <mgarrett@aurora.tech> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-09-04Merge tag 'm68knommu-for-v6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: "Two changes, one a trivial white space clean up, the other removes the unnecessary local pcibios_setup() code" * tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: coldfire: dma_timer: ERROR: "foo __init bar" should be "foo __init bar" m68k/pci: Drop useless pcibios_setup()
2023-09-04Merge tag 'uml-for-linus-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Drop 32-bit checksum implementation and re-use it from arch/x86 - String function cleanup - Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes builds * tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: virt-pci: fix missing declaration warning um: Refactor deprecated strncpy to memcpy um: fix 3 instances of -Wmissing-prototypes um: port_kern: fix -Wmissing-variable-declarations uml: audio: fix -Wmissing-variable-declarations um: vector: refactor deprecated strncpy um: use obj-y to descend into arch/um/*/ um: Hard-code the result of 'uname -s' um: Use the x86 checksum implementation on 32-bit asm-generic: current: Don't include thread-info.h if building asm um: Remove unsued extern declaration ldt_host_info() um: Fix hostaudio build errors um: Remove strlcpy usage
2023-09-04Merge tag 'hyperv-next-signed-20230902' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for SEV-SNP guests on Hyper-V (Tianyu Lan) - Support for TDX guests on Hyper-V (Dexuan Cui) - Use SBRM API in Hyper-V balloon driver (Mitchell Levy) - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej Szmigiero) - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh Sengar) * tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: Remove duplicate include x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's x86/hyperv: Remove hv_isolation_type_en_snp x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor x86/hyperv: Introduce a global variable hyperv_paravisor_present Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests Drivers: hv: vmbus: Support fully enlightened TDX guests x86/hyperv: Support hypercalls for fully enlightened TDX guests x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub hv: hyperv.h: Replace one-element array with flexible-array member Drivers: hv: vmbus: Don't dereference ACPI root object handle x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES x86/hyperv: Add smp support for SEV-SNP guest clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest ...
2023-09-04Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "A small pull request this time around, mostly because the vduse network got postponed to next relase so we can be sure we got the security store right" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_ring: fix avail_wrap_counter in virtqueue_add_packed virtio_vdpa: build affinity masks conditionally virtio_net: merge dma operations when filling mergeable buffers virtio_ring: introduce dma sync api for virtqueue virtio_ring: introduce dma map api for virtqueue virtio_ring: introduce virtqueue_reset() virtio_ring: separate the logic of reset/enable from virtqueue_resize virtio_ring: correct the expression of the description of virtqueue_resize() virtio_ring: skip unmap for premapped virtio_ring: introduce virtqueue_dma_dev() virtio_ring: support add premapped buf virtio_ring: introduce virtqueue_set_dma_premapped() virtio_ring: put mapping error check in vring_map_one_sg virtio_ring: check use_dma_api before unmap desc for indirect vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa: add get_backend_features vdpa operation vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag vdpa/mlx5: Remove unused function declarations
2023-09-04Merge tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1Linus Torvalds
Pull tomoyo updates from Tetsuo Handa: "Three cleanup patches, no behavior changes" * tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: tomoyo: remove unused function declaration tomoyo: refactor deprecated strncpy tomoyo: add format attributes to functions
2023-09-04Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge additional cpufreq updates for 6.6-rc1: - Add support for per-policy performance boost (Jie Zhan). - Fix assorted issues in the cpufreq core, common governor code and in the pcc cpufreq driver (Liao Chang). * pm-cpufreq: cpufreq: Support per-policy performance boost cpufreq: pcc: Fix the potentinal scheduling delays in target_index() cpufreq: governor: Free dbs_data directly when gov->init() fails cpufreq: Fix the race condition while updating the transition_task of policy cpufreq: Avoid printing kernel addresses in cpufreq_resume()
2023-09-04ALSA: hda/cirrus: Fix broken audio on hardware with two CS42L42 codecs.Vitaly Rodionov
Recently in v6.3-rc1 there was a change affecting behaviour of hrtimers (commit 0c52310f260014d95c1310364379772cb74cf82d) and causing few issues on platforms with two CS42L42 codecs. Canonical/Dell has reported an issue with Vostro-3910. We need to increase this value by 15ms. Link: https://bugs.launchpad.net/somerville/+bug/2031060 Fixes: 9fb9fa18fb50 ("ALSA: hda/cirrus: Add extra 10 ms delay to allow PLL settle and lock.") Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://lore.kernel.org/r/20230904160033.908135-1-vitalyr@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-09-04x86/smp: Don't send INIT to non-present and non-booted CPUsThomas Gleixner
Vasant reported that kexec() can hang or reset the machine when it tries to park CPUs via INIT. This happens when the kernel is using extended APIC, but the present mask has APIC IDs >= 0x100 enumerated. As extended APIC can only handle 8 bit of APIC ID sending INIT to APIC ID 0x100 sends INIT to APIC ID 0x0. That's the boot CPU which is special on x86 and INIT causes the system to hang or resets the machine. Prevent this by sending INIT only to those CPUs which have been booted once. Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible") Reported-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vasant Hegde <vasant.hegde@amd.com> Link: https://lore.kernel.org/r/87cyzwjbff.ffs@tglx
2023-09-04spi: sun6i: fix race between DMA RX transfer completion and RX FIFO drainTobias Schramm
Previously the transfer complete IRQ immediately drained to RX FIFO to read any data remaining in FIFO to the RX buffer. This behaviour is correct when dealing with SPI in interrupt mode. However in DMA mode the transfer complete interrupt still fires as soon as all bytes to be transferred have been stored in the FIFO. At that point data in the FIFO still needs to be picked up by the DMA engine. Thus the drain procedure and DMA engine end up racing to read from RX FIFO, corrupting any data read. Additionally the RX buffer pointer is never adjusted according to DMA progress in DMA mode, thus calling the RX FIFO drain procedure in DMA mode is a bug. Fix corruptions in DMA RX mode by draining RX FIFO only in interrupt mode. Also wait for completion of RX DMA when in DMA mode before returning to ensure all data has been copied to the supplied memory buffer. Signed-off-by: Tobias Schramm <t.schramm@manjaro.org> Link: https://lore.kernel.org/r/20230827152558.5368-3-t.schramm@manjaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-09-04spi: sun6i: reduce DMA RX transfer width to single byteTobias Schramm
Through empirical testing it has been determined that sometimes RX SPI transfers with DMA enabled return corrupted data. This is down to single or even multiple bytes lost during DMA transfer from SPI peripheral to memory. It seems the RX FIFO within the SPI peripheral can become confused when performing bus read accesses wider than a single byte to it during an active SPI transfer. This patch reduces the width of individual DMA read accesses to the RX FIFO to a single byte to mitigate that issue. Signed-off-by: Tobias Schramm <t.schramm@manjaro.org> Link: https://lore.kernel.org/r/20230827152558.5368-2-t.schramm@manjaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
2023-09-04ASoC: rt5645: NULL pointer access when removing jackBrent Lu
Machine driver calls snd_soc_component_set_jack() function with NULL jack and data parameters when removing jack in codec exit function. Do not access data when jack is NULL. Signed-off-by: Brent Lu <brent.lu@intel.com> Link: https://lore.kernel.org/r/20230904104046.4150208-1-brent.lu@intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-09-04ASoC: amd: yc: Add DMI entries to support Victus by HP Gaming Laptop ↵Shubh
15-fb0xxx (8A3E) This model requires an additional detection quirk to enable the internal microphone. Signed-off-by: Shubh <shubhisroking@gmail.com> Link: https://lore.kernel.org/r/20230902150807.133523-1-shubhisroking@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-09-04MAINTAINERS: Update the MAINTAINERS enties for TEXAS INSTRUMENTS ASoC DRIVERSKevin-Lu
Update the MAINTAINERS email for TEXAS INSTRUMENTS ASoC DRIVERS. Signed-off-by: Kevin-Lu <kevin-lu@ti.com> Link: https://lore.kernel.org/r/20230903161439.85-1-kevin-lu@ti.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-09-04Merge branch 'af_unix-data-races'David S. Miller
Kuniyuki Iwashima says: ==================== af_unix: Fix four data-races. While running syzkaller, KCSAN reported 3 data-races with systemd-coredump using AF_UNIX sockets. This series fixes the three and another one inspiered by one of the reports. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04af_unix: Fix data race around sk->sk_err.Kuniyuki Iwashima
As with sk->sk_shutdown shown in the previous patch, sk->sk_err can be read locklessly by unix_dgram_sendmsg(). Let's use READ_ONCE() for sk_err as well. Note that the writer side is marked by commit cc04410af7de ("af_unix: annotate lockless accesses to sk->sk_err"). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04af_unix: Fix data-races around sk->sk_shutdown.Kuniyuki Iwashima
sk->sk_shutdown is changed under unix_state_lock(sk), but unix_dgram_sendmsg() calls two functions to read sk_shutdown locklessly. sock_alloc_send_pskb `- sock_wait_for_wmem Let's use READ_ONCE() there. Note that the writer side was marked by commit e1d09c2c2f57 ("af_unix: Fix data races around sk->sk_shutdown."). BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock write (marked) to 0xffff8880069af12c of 1 bytes by task 1 on cpu 1: unix_release_sock+0x75c/0x910 net/unix/af_unix.c:631 unix_release+0x59/0x80 net/unix/af_unix.c:1053 __sock_release+0x7d/0x170 net/socket.c:654 sock_close+0x19/0x30 net/socket.c:1386 __fput+0x2a3/0x680 fs/file_table.c:384 ____fput+0x15/0x20 fs/file_table.c:412 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 read to 0xffff8880069af12c of 1 bytes by task 28650 on cpu 0: sock_alloc_send_pskb+0xd2/0x620 net/core/sock.c:2767 unix_dgram_sendmsg+0x2f8/0x14f0 net/unix/af_unix.c:1944 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline] unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x148/0x160 net/socket.c:748 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548 __sys_sendmsg+0x94/0x140 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 value changed: 0x00 -> 0x03 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 28650 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04af_unix: Fix data-race around unix_tot_inflight.Kuniyuki Iwashima
unix_tot_inflight is changed under spin_lock(unix_gc_lock), but unix_release_sock() reads it locklessly. Let's use READ_ONCE() for unix_tot_inflight. Note that the writer side was marked by commit 9d6d7f1cb67c ("af_unix: annote lockless accesses to unix_tot_inflight & gc_in_progress") BUG: KCSAN: data-race in unix_inflight / unix_release_sock write (marked) to 0xffffffff871852b8 of 4 bytes by task 123 on cpu 1: unix_inflight+0x130/0x180 net/unix/scm.c:64 unix_attach_fds+0x137/0x1b0 net/unix/scm.c:123 unix_scm_to_skb net/unix/af_unix.c:1832 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1955 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x148/0x160 net/socket.c:747 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2493 ___sys_sendmsg+0xc6/0x140 net/socket.c:2547 __sys_sendmsg+0x94/0x140 net/socket.c:2576 __do_sys_sendmsg net/socket.c:2585 [inline] __se_sys_sendmsg net/socket.c:2583 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2583 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc read to 0xffffffff871852b8 of 4 bytes by task 4891 on cpu 0: unix_release_sock+0x608/0x910 net/unix/af_unix.c:671 unix_release+0x59/0x80 net/unix/af_unix.c:1058 __sock_release+0x7d/0x170 net/socket.c:653 sock_close+0x19/0x30 net/socket.c:1385 __fput+0x179/0x5e0 fs/file_table.c:321 ____fput+0x15/0x20 fs/file_table.c:349 task_work_run+0x116/0x1a0 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] exit_to_user_mode_prepare+0x174/0x180 kernel/entry/common.c:204 __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline] syscall_exit_to_user_mode+0x1a/0x30 kernel/entry/common.c:297 do_syscall_64+0x4b/0x90 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x72/0xdc value changed: 0x00000000 -> 0x00000001 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 4891 Comm: systemd-coredum Not tainted 6.4.0-rc5-01219-gfa0e21fa4443 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 9305cfa4443d ("[AF_UNIX]: Make unix_tot_inflight counter non-atomic") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04af_unix: Fix data-races around user->unix_inflight.Kuniyuki Iwashima
user->unix_inflight is changed under spin_lock(unix_gc_lock), but too_many_unix_fds() reads it locklessly. Let's annotate the write/read accesses to user->unix_inflight. BUG: KCSAN: data-race in unix_attach_fds / unix_inflight write to 0xffffffff8546f2d0 of 8 bytes by task 44798 on cpu 1: unix_inflight+0x157/0x180 net/unix/scm.c:66 unix_attach_fds+0x147/0x1e0 net/unix/scm.c:123 unix_scm_to_skb net/unix/af_unix.c:1827 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline] unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x148/0x160 net/socket.c:748 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548 __sys_sendmsg+0x94/0x140 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 read to 0xffffffff8546f2d0 of 8 bytes by task 44814 on cpu 0: too_many_unix_fds net/unix/scm.c:101 [inline] unix_attach_fds+0x54/0x1e0 net/unix/scm.c:110 unix_scm_to_skb net/unix/af_unix.c:1827 [inline] unix_dgram_sendmsg+0x46a/0x14f0 net/unix/af_unix.c:1950 unix_seqpacket_sendmsg net/unix/af_unix.c:2308 [inline] unix_seqpacket_sendmsg+0xba/0x130 net/unix/af_unix.c:2292 sock_sendmsg_nosec net/socket.c:725 [inline] sock_sendmsg+0x148/0x160 net/socket.c:748 ____sys_sendmsg+0x4e4/0x610 net/socket.c:2494 ___sys_sendmsg+0xc6/0x140 net/socket.c:2548 __sys_sendmsg+0x94/0x140 net/socket.c:2577 __do_sys_sendmsg net/socket.c:2586 [inline] __se_sys_sendmsg net/socket.c:2584 [inline] __x64_sys_sendmsg+0x45/0x50 net/socket.c:2584 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 value changed: 0x000000000000000c -> 0x000000000000000d Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 44814 Comm: systemd-coredum Not tainted 6.4.0-11989-g6843306689af #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Fixes: 712f4aad406b ("unix: properly account for FDs passed over unix sockets") Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Willy Tarreau <w@1wt.eu> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04af_unix: Fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT.Kuniyuki Iwashima
Heiko Carstens reported that SCM_PIDFD does not work with MSG_CMSG_COMPAT because scm_pidfd_recv() always checks msg_controllen against sizeof(struct cmsghdr). We need to use sizeof(struct compat_cmsghdr) for the compat case. Fixes: 5e2ff6704a27 ("scm: add SO_PASSPIDFD and SCM_PIDFD") Reported-by: Heiko Carstens <hca@linux.ibm.com> Closes: https://lore.kernel.org/netdev/20230901200517.8742-A-hca@linux.ibm.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04docs: netdev: update the netdev infra URLsJakub Kicinski
Some corporate proxies block our current NIPA URLs because they use a free / shady DNS domain. As suggested by Jesse we got a new DNS entry from Konstantin - netdev.bots.linux.dev, use it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04Merge branch 'rework/misc-cleanups' into for-linusPetr Mladek
2023-09-04Merge branch 'for-6.6-vsprintf-doc' into for-linusPetr Mladek
2023-09-04docs: netdev: document patchwork patch statesJakub Kicinski
The patchwork states are largely self-explanatory but small ambiguities may still come up. Document how we interpret the states in networking. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04bpf, sockmap: Fix skb refcnt race after locking changesJohn Fastabend
There is a race where skb's from the sk_psock_backlog can be referenced after userspace side has already skb_consumed() the sk_buff and its refcnt dropped to zer0 causing use after free. The flow is the following: while ((skb = skb_peek(&psock->ingress_skb)) sk_psock_handle_Skb(psock, skb, ..., ingress) if (!ingress) ... sk_psock_skb_ingress sk_psock_skb_ingress_enqueue(skb) msg->skb = skb sk_psock_queue_msg(psock, msg) skb_dequeue(&psock->ingress_skb) The sk_psock_queue_msg() puts the msg on the ingress_msg queue. This is what the application reads when recvmsg() is called. An application can read this anytime after the msg is placed on the queue. The recvmsg hook will also read msg->skb and then after user space reads the msg will call consume_skb(skb) on it effectively free'ing it. But, the race is in above where backlog queue still has a reference to the skb and calls skb_dequeue(). If the skb_dequeue happens after the user reads and free's the skb we have a use after free. The !ingress case does not suffer from this problem because it uses sendmsg_*(sk, msg) which does not pass the sk_buff further down the stack. The following splat was observed with 'test_progs -t sockmap_listen': [ 1022.710250][ T2556] general protection fault, ... [...] [ 1022.712830][ T2556] Workqueue: events sk_psock_backlog [ 1022.713262][ T2556] RIP: 0010:skb_dequeue+0x4c/0x80 [ 1022.713653][ T2556] Code: ... [...] [ 1022.720699][ T2556] Call Trace: [ 1022.720984][ T2556] <TASK> [ 1022.721254][ T2556] ? die_addr+0x32/0x80^M [ 1022.721589][ T2556] ? exc_general_protection+0x25a/0x4b0 [ 1022.722026][ T2556] ? asm_exc_general_protection+0x22/0x30 [ 1022.722489][ T2556] ? skb_dequeue+0x4c/0x80 [ 1022.722854][ T2556] sk_psock_backlog+0x27a/0x300 [ 1022.723243][ T2556] process_one_work+0x2a7/0x5b0 [ 1022.723633][ T2556] worker_thread+0x4f/0x3a0 [ 1022.723998][ T2556] ? __pfx_worker_thread+0x10/0x10 [ 1022.724386][ T2556] kthread+0xfd/0x130 [ 1022.724709][ T2556] ? __pfx_kthread+0x10/0x10 [ 1022.725066][ T2556] ret_from_fork+0x2d/0x50 [ 1022.725409][ T2556] ? __pfx_kthread+0x10/0x10 [ 1022.725799][ T2556] ret_from_fork_asm+0x1b/0x30 [ 1022.726201][ T2556] </TASK> To fix we add an skb_get() before passing the skb to be enqueued in the engress queue. This bumps the skb->users refcnt so that consume_skb() and kfree_skb will not immediately free the sk_buff. With this we can be sure the skb is still around when we do the dequeue. Then we just need to decrement the refcnt or free the skb in the backlog case which we do by calling kfree_skb() on the ingress case as well as the sendmsg case. Before locking change from fixes tag we had the sock locked so we couldn't race with user and there was no issue here. Fixes: 799aa7f98d53e ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Reported-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Xu Kuohai <xukuohai@huawei.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230901202137.214666-1-john.fastabend@gmail.com
2023-09-04net: phy: micrel: Correct bit assignments for phy_device flagsOleksij Rempel
Previously, the defines for phy_device flags in the Micrel driver were ambiguous in their representation. They were intended to be bit masks but were mistakenly defined as bit positions. This led to the following issues: - MICREL_KSZ8_P1_ERRATA, designated for KSZ88xx switches, overlapped with MICREL_PHY_FXEN and MICREL_PHY_50MHZ_CLK. - Due to this overlap, the code path for MICREL_PHY_FXEN, tailored for the KSZ8041 PHY, was not executed for KSZ88xx PHYs. - Similarly, the code associated with MICREL_PHY_50MHZ_CLK wasn't triggered for KSZ88xx. To rectify this, all three flags have now been explicitly converted to use the `BIT()` macro, ensuring they are defined as bit masks and preventing potential overlaps in the future. Fixes: 49011e0c1555 ("net: phy: micrel: ksz886x/ksz8081: add cabletest support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddrAlex Henrie
The existing code incorrectly casted a negative value (the result of a subtraction) to an unsigned value without checking. For example, if /proc/sys/net/ipv6/conf/*/temp_prefered_lft was set to 1, the preferred lifetime would jump to 4 billion seconds. On my machine and network the shortest lifetime that avoided underflow was 3 seconds. Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04veth: Fixing transmit return status for dropped packetsLiang Chen
The veth_xmit function returns NETDEV_TX_OK even when packets are dropped. This behavior leads to incorrect calculations of statistics counts, as well as things like txq->trans_start updates. Fixes: e314dbdc1c0d ("[NET]: Virtual ethernet device driver.") Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04gve: fix frag_list chainingEric Dumazet
gve_rx_append_frags() is able to build skbs chained with frag_list, like GRO engine. Problem is that shinfo->frag_list should only be used for the head of the chain. All other links should use skb->next pointer. Otherwise, built skbs are not valid and can cause crashes. Equivalent code in GRO (skb_gro_receive()) is: if (NAPI_GRO_CB(p)->last == p) skb_shinfo(p)->frag_list = skb; else NAPI_GRO_CB(p)->last->next = skb; NAPI_GRO_CB(p)->last = skb; Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Catherine Sullivan <csully@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-04net: deal with integer overflows in kmalloc_reserve()Eric Dumazet
Blamed commit changed: ptr = kmalloc(size); if (ptr) size = ksize(ptr); to: size = kmalloc_size_roundup(size); ptr = kmalloc(size); This allowed various crash as reported by syzbot [1] and Kyle Zeng. Problem is that if @size is bigger than 0x80000001, kmalloc_size_roundup(size) returns 2^32. kmalloc_reserve() uses a 32bit variable (obj_size), so 2^32 is truncated to 0. kmalloc(0) returns ZERO_SIZE_PTR which is not handled by skb allocations. Following trace can be triggered if a netdev->mtu is set close to 0x7fffffff We might in the future limit netdev->mtu to more sensible limit (like KMALLOC_MAX_SIZE). This patch is based on a syzbot report, and also a report and tentative fix from Kyle Zeng. [1] BUG: KASAN: user-memory-access in __build_skb_around net/core/skbuff.c:294 [inline] BUG: KASAN: user-memory-access in __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527 Write of size 32 at addr 00000000fffffd10 by task syz-executor.4/22554 CPU: 1 PID: 22554 Comm: syz-executor.4 Not tainted 6.1.39-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023 Call trace: dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:279 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:286 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x120/0x1a0 lib/dump_stack.c:106 print_report+0xe4/0x4b4 mm/kasan/report.c:398 kasan_report+0x150/0x1ac mm/kasan/report.c:495 kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:189 memset+0x40/0x70 mm/kasan/shadow.c:44 __build_skb_around net/core/skbuff.c:294 [inline] __alloc_skb+0x3c4/0x6e8 net/core/skbuff.c:527 alloc_skb include/linux/skbuff.h:1316 [inline] igmpv3_newpack+0x104/0x1088 net/ipv4/igmp.c:359 add_grec+0x81c/0x1124 net/ipv4/igmp.c:534 igmpv3_send_cr net/ipv4/igmp.c:667 [inline] igmp_ifc_timer_expire+0x1b0/0x1008 net/ipv4/igmp.c:810 call_timer_fn+0x1c0/0x9f0 kernel/time/timer.c:1474 expire_timers kernel/time/timer.c:1519 [inline] __run_timers+0x54c/0x710 kernel/time/timer.c:1790 run_timer_softirq+0x28/0x4c kernel/time/timer.c:1803 _stext+0x380/0xfbc ____do_softirq+0x14/0x20 arch/arm64/kernel/irq.c:79 call_on_irq_stack+0x24/0x4c arch/arm64/kernel/entry.S:891 do_softirq_own_stack+0x20/0x2c arch/arm64/kernel/irq.c:84 invoke_softirq kernel/softirq.c:437 [inline] __irq_exit_rcu+0x1c0/0x4cc kernel/softirq.c:683 irq_exit_rcu+0x14/0x78 kernel/softirq.c:695 el0_interrupt+0x7c/0x2e0 arch/arm64/kernel/entry-common.c:717 __el0_irq_handler_common+0x18/0x24 arch/arm64/kernel/entry-common.c:724 el0t_64_irq_handler+0x10/0x1c arch/arm64/kernel/entry-common.c:729 el0t_64_irq+0x1a0/0x1a4 arch/arm64/kernel/entry.S:584 Fixes: 12d6c1d3a2ad ("skbuff: Proactively round up to kmalloc bucket size") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-03ksmbd: remove experimental warningSteve French
ksmbd has made significant improvements over the past two years and is regularly tested and used. Remove the experimental warning. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-09-03virtio_ring: fix avail_wrap_counter in virtqueue_add_packedYuan Yao
In current packed virtqueue implementation, the avail_wrap_counter won't flip, in the case when the driver supplies a descriptor chain with a length equals to the queue size; total_sg == vq->packed.vring.num. Let’s assume the following situation: vq->packed.vring.num=4 vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 0 Then the driver adds a descriptor chain containing 4 descriptors. We expect the following result with avail_wrap_counter flipped: vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 1 But, the current implementation gives the following result: vq->packed.next_avail_idx: 1 vq->packed.avail_wrap_counter: 0 To reproduce the bug, you can set a packed queue size as small as possible, so that the driver is more likely to provide a descriptor chain with a length equal to the packed queue size. For example, in qemu run following commands: sudo qemu-system-x86_64 \ -enable-kvm \ -nographic \ -kernel "path/to/kernel_image" \ -m 1G \ -drive file="path/to/rootfs",if=none,id=disk \ -device virtio-blk,drive=disk \ -drive file="path/to/disk_image",if=none,id=rwdisk \ -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\ indirect_desc=off \ -append "console=ttyS0 root=/dev/vda rw init=/bin/bash" Inside the VM, create a directory and mount the rwdisk device on it. The rwdisk will hang and mount operation will not complete. This commit fixes the wrap counter error by flipping the packed.avail_wrap_counter, when start of descriptor chain equals to the end of descriptor chain (head == i). Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support") Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org> Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_vdpa: build affinity masks conditionallyJason Wang
We try to build affinity mask via create_affinity_masks() unconditionally which may lead several issues: - the affinity mask is not used for parent without affinity support (only VDUSE support the affinity now) - the logic of create_affinity_masks() might not work for devices other than block. For example it's not rare in the networking device where the number of queues could exceed the number of CPUs. Such case breaks the current affinity logic which is based on group_cpus_evenly() who assumes the number of CPUs are not less than the number of groups. This can trigger a warning[1]: if (ret >= 0) WARN_ON(nr_present + nr_others < numgrps); Fixing this by only build the affinity masks only when - Driver passes affinity descriptor, driver like virtio-blk can make sure to limit the number of queues when it exceeds the number of CPUs - Parent support affinity setting config ops This help to avoid the warning. More optimizations could be done on top. [1] [ 682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0 [ 682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79 [ 682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014 [ 682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0 [ 682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc [ 682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293 [ 682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000 [ 682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030 [ 682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0 [ 682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800 [ 682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041 [ 682.146692] FS: 00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000 [ 682.146695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0 [ 682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 682.146701] Call Trace: [ 682.146703] <TASK> [ 682.146705] ? __warn+0x7b/0x130 [ 682.146709] ? group_cpus_evenly+0x1aa/0x1c0 [ 682.146712] ? report_bug+0x1c8/0x1e0 [ 682.146717] ? handle_bug+0x3c/0x70 [ 682.146721] ? exc_invalid_op+0x14/0x70 [ 682.146723] ? asm_exc_invalid_op+0x16/0x20 [ 682.146727] ? group_cpus_evenly+0x1aa/0x1c0 [ 682.146729] ? group_cpus_evenly+0x15c/0x1c0 [ 682.146731] create_affinity_masks+0xaf/0x1a0 [ 682.146735] virtio_vdpa_find_vqs+0x83/0x1d0 [ 682.146738] ? __pfx_default_calc_sets+0x10/0x10 [ 682.146742] virtnet_find_vqs+0x1f0/0x370 [ 682.146747] virtnet_probe+0x501/0xcd0 [ 682.146749] ? vp_modern_get_status+0x12/0x20 [ 682.146751] ? get_cap_addr.isra.0+0x10/0xc0 [ 682.146754] virtio_dev_probe+0x1af/0x260 [ 682.146759] really_probe+0x1a5/0x410 Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading mechanism") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230811091539.1359865-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_net: merge dma operations when filling mergeable buffersXuan Zhuo
Currently, the virtio core will perform a dma operation for each buffer. Although, the same page may be operated multiple times. This patch, the driver does the dma operation and manages the dma address based the feature premapped of virtio core. This way, we can perform only one dma operation for the pages of the alloc frag. This is beneficial for the iommu device. kernel command line: intel_iommu=on iommu.passthrough=0 | strict=0 | strict=1 Before | 775496pps | 428614pps After | 1109316pps | 742853pps Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce dma sync api for virtqueueXuan Zhuo
These API has been introduced: * virtqueue_dma_need_sync * virtqueue_dma_sync_single_range_for_cpu * virtqueue_dma_sync_single_range_for_device These APIs can be used together with the premapped mechanism to sync the DMA address. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce dma map api for virtqueueXuan Zhuo
Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce virtqueue_reset()Xuan Zhuo
Introduce virtqueue_reset() to release all buffer inside vq. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: separate the logic of reset/enable from virtqueue_resizeXuan Zhuo
The subsequent reset function will reuse these logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-9-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: correct the expression of the description of virtqueue_resize()Xuan Zhuo
Modify the "useless" to a more accurate "unused". Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-8-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: skip unmap for premappedXuan Zhuo
Now we add a case where we skip dma unmap, the vq->premapped is true. We can't just rely on use_dma_api to determine whether to skip the dma operation. For convenience, I introduced the "do_unmap". By default, it is the same as use_dma_api. If the driver is configured with premapped, then do_unmap is false. So as long as do_unmap is false, for addr of desc, we should skip dma unmap operation. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce virtqueue_dma_dev()Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the caller can do dma operation in advance. The purpose is to keep memory mapped across multiple add/get buf operations. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: support add premapped bufXuan Zhuo
If the vq is the premapped mode, use the sg_dma_address() directly. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-5-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: introduce virtqueue_set_dma_premapped()Xuan Zhuo
This helper allows the driver change the dma mode to premapped mode. Under the premapped mode, the virtio core do not do dma mapping internally. This just work when the use_dma_api is true. If the use_dma_api is false, the dma options is not through the DMA APIs, that is not the standard way of the linux kernel. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20230810123057.43407-4-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: put mapping error check in vring_map_one_sgXuan Zhuo
This patch put the dma addr error check in vring_map_one_sg(). The benefits of doing this: 1. reduce one judgment of vq->use_dma_api. 2. make vring_map_one_sg more simple, without calling vring_mapping_error to check the return value. simplifies subsequent code Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-3-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-03virtio_ring: check use_dma_api before unmap desc for indirectXuan Zhuo
Inside detach_buf_split(), if use_dma_api is false, vring_unmap_one_split_indirect will be called many times, but actually nothing is done. So this patch check use_dma_api firstly. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230810123057.43407-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>