summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-22Merge tag 'stable/vduse-virtio-net' into vhostMichael S. Tsirkin
This adds support for virtio-net to vduse. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: delete vq in vp_find_vqs_msix() when request_irq() failsJiri Pirko
When request_irq() fails, error path calls vp_del_vqs(). There, as vq is present in the list, free_irq() is called for the same vector. That causes following splat: [ 0.414355] Trying to free already-free IRQ 27 [ 0.414403] WARNING: CPU: 1 PID: 1 at kernel/irq/manage.c:1899 free_irq+0x1a1/0x2d0 [ 0.414510] Modules linked in: [ 0.414540] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc4+ #27 [ 0.414540] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 0.414540] RIP: 0010:free_irq+0x1a1/0x2d0 [ 0.414540] Code: 1e 00 48 83 c4 08 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 90 8b 74 24 04 48 c7 c7 98 80 6c b1 e8 00 c9 f7 ff 90 <0f> 0b 90 90 48 89 ee 4c 89 ef e8 e0 20 b8 00 49 8b 47 40 48 8b 40 [ 0.414540] RSP: 0000:ffffb71480013ae0 EFLAGS: 00010086 [ 0.414540] RAX: 0000000000000000 RBX: ffffa099c2722000 RCX: 0000000000000000 [ 0.414540] RDX: 0000000000000000 RSI: ffffb71480013998 RDI: 0000000000000001 [ 0.414540] RBP: 0000000000000246 R08: 00000000ffffdfff R09: 0000000000000001 [ 0.414540] R10: 00000000ffffdfff R11: ffffffffb18729c0 R12: ffffa099c1c91760 [ 0.414540] R13: ffffa099c1c916a4 R14: ffffa099c1d2f200 R15: ffffa099c1c91600 [ 0.414540] FS: 0000000000000000(0000) GS:ffffa099fec40000(0000) knlGS:0000000000000000 [ 0.414540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.414540] CR2: 0000000000000000 CR3: 0000000008e3e001 CR4: 0000000000370ef0 [ 0.414540] Call Trace: [ 0.414540] <TASK> [ 0.414540] ? __warn+0x80/0x120 [ 0.414540] ? free_irq+0x1a1/0x2d0 [ 0.414540] ? report_bug+0x164/0x190 [ 0.414540] ? handle_bug+0x3b/0x70 [ 0.414540] ? exc_invalid_op+0x17/0x70 [ 0.414540] ? asm_exc_invalid_op+0x1a/0x20 [ 0.414540] ? free_irq+0x1a1/0x2d0 [ 0.414540] vp_del_vqs+0xc1/0x220 [ 0.414540] vp_find_vqs_msix+0x305/0x470 [ 0.414540] vp_find_vqs+0x3e/0x1a0 [ 0.414540] vp_modern_find_vqs+0x1b/0x70 [ 0.414540] init_vqs+0x387/0x600 [ 0.414540] virtnet_probe+0x50a/0xc80 [ 0.414540] virtio_dev_probe+0x1e0/0x2b0 [ 0.414540] really_probe+0xc0/0x2c0 [ 0.414540] ? __pfx___driver_attach+0x10/0x10 [ 0.414540] __driver_probe_device+0x73/0x120 [ 0.414540] driver_probe_device+0x1f/0xe0 [ 0.414540] __driver_attach+0x88/0x180 [ 0.414540] bus_for_each_dev+0x85/0xd0 [ 0.414540] bus_add_driver+0xec/0x1f0 [ 0.414540] driver_register+0x59/0x100 [ 0.414540] ? __pfx_virtio_net_driver_init+0x10/0x10 [ 0.414540] virtio_net_driver_init+0x90/0xb0 [ 0.414540] do_one_initcall+0x58/0x230 [ 0.414540] kernel_init_freeable+0x1a3/0x2d0 [ 0.414540] ? __pfx_kernel_init+0x10/0x10 [ 0.414540] kernel_init+0x1a/0x1c0 [ 0.414540] ret_from_fork+0x31/0x50 [ 0.414540] ? __pfx_kernel_init+0x10/0x10 [ 0.414540] ret_from_fork_asm+0x1a/0x30 [ 0.414540] </TASK> Fix this by calling deleting the current vq when request_irq() fails. Fixes: 0b0f9dc52ed0 ("Revert "virtio_pci: use shared interrupts for virtqueues"") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240426150845.3999481-1-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22MAINTAINERS: add Eugenio Pérez as reviewerEugenio Pérez
Add myself as a reviewer of some VirtIO areas I'm interested. Until this point I've been scanning manually the list looking for series that touches this area. Adding myself to make this task easier. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20240213182450.106796-1-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-vdpa: Remove usage of the deprecated ida_simple_xx() APIChristophe JAILLET
ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). Note that the upper limit of ida_simple_get() is exclusive, but the one of ida_alloc_max() is inclusive. So a -1 has been added when needed. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Simon Horman <horms@kernel.org> Message-Id: <67c2edf49788c27d5f7a49fc701520b9fcf739b5.1713088999.git.christophe.jaillet@wanadoo.fr> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-05-22vp_vdpa: don't allocate unused msix vectorsYuxue Liu
When there is a ctlq and it doesn't require interrupt callbacks,the original method of calculating vectors wastes hardware msi or msix resources as well as system IRQ resources. When conducting performance testing using testpmd in the guest os, it was found that the performance was lower compared to directly using vfio-pci to passthrough the device In scenarios where the virtio device in the guest os does not utilize interrupts, the vdpa driver still configures the hardware's msix vector. Therefore, the hardware still sends interrupts to the host os. Because of this unnecessary action by the hardware, hardware performance decreases, and it also affects the performance of the host os. Before modification:(interrupt mode) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 0 0 0 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-2 35: 0 0 0 0 PCI-MSI 32771-edge vp-vdpa[0000:00:02.0]-config After modification:(interrupt mode) 32: 0 0 1 7 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 36 0 3 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-config Before modification:(virtio pmd mode for guest os) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-0 33: 0 0 0 0 PCI-MSI 32769-edge vp-vdpa[0000:00:02.0]-1 34: 0 0 0 0 PCI-MSI 32770-edge vp-vdpa[0000:00:02.0]-2 35: 0 0 0 0 PCI-MSI 32771-edge vp-vdpa[0000:00:02.0]-config After modification:(virtio pmd mode for guest os) 32: 0 0 0 0 PCI-MSI 32768-edge vp-vdpa[0000:00:02.0]-config To verify the use of the virtio PMD mode in the guest operating system, the following patch needs to be applied to QEMU: https://lore.kernel.org/all/20240408073311.2049-1-yuxue.liu@jaguarmicro.com Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Message-Id: <20240410033020.1310-1-yuxue.liu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22sound: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-25-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Anton Yakovlev <anton.yakovlev@opensynergy.com>
2024-05-22fuse: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-24-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-22scsi: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-23-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
2024-05-22rpmsg: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-22-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22nvdimm: virtio_pmem: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-21-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
2024-05-22wifi: mac80211_hwsim: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-20-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vsock/virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-19-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-22net: 9p: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-18-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22net: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-17-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22net: caif: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-16-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22misc: nsm: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-15-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alexander Graf <graf@amazon.com>
2024-05-22iommu: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-14-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22drm/virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-13-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22gpio: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-12-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Linus Walleij <linus.walleij@linaro.org>
2024-05-22firmware: arm_scmi: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-11-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com>
2024-05-22crypto: virtio - drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-10-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-05-22virtio_console: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-9-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22hwrng: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-8-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22bluetooth: virtio: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-7-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio_blk: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-6-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-05-22um: virt-pci: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-5-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: mem: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-4-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: input: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-3-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio: balloon: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-2-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio_balloon: Treat stats requests as wakeup eventsDavid Stevens
Treat stats requests as wakeup events to ensure that the driver responds to device requests in a timely manner. Signed-off-by: David Stevens <stevensd@chromium.org> Acked-by: David Hildenbrand <david@redhat.com> Message-Id: <20240321012445.1593685-3-stevensd@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio_balloon: Give the balloon its own wakeup sourceDavid Stevens
Wakeup sources don't support nesting multiple events, so sharing a single object between multiple drivers can result in one driver overriding the wakeup event processing period specified by another driver. Have the virtio balloon driver use the wakeup source of the device it is bound to rather than the wakeup source of the parent device, to avoid conflicts with the transport layer. Note that although the virtio balloon's virtio_device itself isn't what actually wakes up the device, it is responsible for processing wakeup events. In the same way that EPOLLWAKEUP uses a dedicated wakeup_source to prevent suspend when userspace is processing wakeup events, a dedicated wakeup_source is necessary when processing wakeup events in a higher layer in the kernel. Fixes: b12fbc3f787e ("virtio_balloon: stay awake while adjusting balloon") Signed-off-by: David Stevens <stevensd@chromium.org> Acked-by: David Hildenbrand <david@redhat.com> Message-Id: <20240321012445.1593685-2-stevensd@google.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22virtio-mem: support suspend+resumeDavid Hildenbrand
With virtio-mem, primarily hibernation is problematic: as the machine shuts down, the virtio-mem device loses its state. Powering the machine back up is like losing a bunch of DIMMs. While there would be ways to add limited support, suspend+resume is more commonly used for VMs and "easier" to support cleanly. s2idle can be supported without any device dependencies. Similarly, one would expect suspend-to-ram (i.e., S3) to work out of the box. However, QEMU currently unplugs all device memory when resuming the VM, using a cold reset on the "wakeup" path. In order to support S3, we need a feature flag for the device to tell us if memory remains plugged when waking up. In the future, QEMU will implement this feature. So let's always support s2idle and support S3 with plugged memory only if the device indicates support. Block hibernation early using the PM notifier. Trying to hibernate now fails early: # echo disk > /sys/power/state [ 26.455369] PM: hibernation: hibernation entry [ 26.458271] virtio_mem virtio0: hibernation is not supported. [ 26.462498] PM: hibernation: hibernation exit -bash: echo: write error: Operation not permitted s2idle works even without the new feature bit: # echo s2idle > /sys/power/mem_sleep # echo mem > /sys/power/state [ 52.083725] PM: suspend entry (s2idle) [ 52.095950] Filesystems sync: 0.010 seconds [ 52.101493] Freezing user space processes [ 52.104213] Freezing user space processes completed (elapsed 0.001 seconds) [ 52.106520] OOM killer disabled. [ 52.107655] Freezing remaining freezable tasks [ 52.110880] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 52.113296] printk: Suspending console(s) (use no_console_suspend to debug) S3 does not work without the feature bit when memory is plugged: # echo deep > /sys/power/mem_sleep # echo mem > /sys/power/state [ 32.788281] PM: suspend entry (deep) [ 32.816630] Filesystems sync: 0.027 seconds [ 32.820029] Freezing user space processes [ 32.823870] Freezing user space processes completed (elapsed 0.001 seconds) [ 32.827756] OOM killer disabled. [ 32.829608] Freezing remaining freezable tasks [ 32.833842] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 32.837953] printk: Suspending console(s) (use no_console_suspend to debug) [ 32.916172] virtio_mem virtio0: suspend+resume with plugged memory is not supported [ 32.916181] virtio-pci 0000:00:02.0: PM: pci_pm_suspend(): virtio_pci_freeze+0x0/0x50 returns -1 [ 32.916197] virtio-pci 0000:00:02.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -1 [ 32.916210] virtio-pci 0000:00:02.0: PM: failed to suspend async: error -1 But S3 works with the new feature bit when memory is plugged (patched QEMU): # echo deep > /sys/power/mem_sleep # echo mem > /sys/power/state [ 33.983694] PM: suspend entry (deep) [ 34.009828] Filesystems sync: 0.024 seconds [ 34.013589] Freezing user space processes [ 34.016722] Freezing user space processes completed (elapsed 0.001 seconds) [ 34.019092] OOM killer disabled. [ 34.020291] Freezing remaining freezable tasks [ 34.023549] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 34.026090] printk: Suspending console(s) (use no_console_suspend to debug) Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240318120645.105664-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22kernel: Remove signal hacks for vhost_tasksMike Christie
This removes the signal/coredump hacks added for vhost_tasks in: Commit f9010dbdce91 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") When that patch was added vhost_tasks did not handle SIGKILL and would try to ignore/clear the signal and continue on until the device's close function was called. In the previous patches vhost_tasks and the vhost drivers were converted to support SIGKILL by cleaning themselves up and exiting. The hacks are no longer needed so this removes them. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-10-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost_task: Handle SIGKILL by flushing work and exitingMike Christie
Instead of lingering until the device is closed, this has us handle SIGKILL by: 1. marking the worker as killed so we no longer try to use it with new virtqueues and new flush operations. 2. setting the virtqueue to worker mapping so no new works are queued. 3. running all the exiting works. Suggested-by: Edward Adam Davis <eadavis@qq.com> Reported-and-tested-by: syzbot+98edc2df894917b3431f@syzkaller.appspotmail.com Message-Id: <tencent_546DA49414E876EEBECF2C78D26D242EE50A@qq.com> Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-9-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost: Release worker mutex during flushesMike Christie
In the next patches where the worker can be killed while in use, we need to be able to take the worker mutex and kill queued works for new IO and flushes, and set some new flags to prevent new __vhost_vq_attach_worker calls from swapping in/out killed workers. If we are holding the worker mutex during a flush and the flush's work is still in the queue, the worker code that will handle the SIGKILL cleanup won't be able to take the mutex and perform it's cleanup. So this patch has us drop the worker mutex while waiting for the flush to complete. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-8-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost: Use virtqueue mutex for swapping workerMike Christie
__vhost_vq_attach_worker uses the vhost_dev mutex to serialize the swapping of a virtqueue's worker. This was done for simplicity because we are already holding that mutex. In the next patches where the worker can be killed while in use, we need finer grained locking because some drivers will hold the vhost_dev mutex while flushing. However in the SIGKILL handler in the next patches, we will need to be able to swap workers (set current one to NULL), kill queued works and stop new flushes while flushes are in progress. To prepare us, this has us use the virtqueue mutex for swapping workers instead of the vhost_dev one. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-7-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost_scsi: Handle vhost_vq_work_queue failures for TMFsMike Christie
vhost_vq_work_queue will never fail when queueing the TMF's response handling because a guest can only send us TMFs when the device is fully setup so there is always a worker at that time. In the next patches we will modify the worker code so it handles SIGKILL by exiting before outstanding commands/TMFs have sent their responses. In that case vhost_vq_work_queue can fail when we try to send a response. This has us just free the TMF's resources since at this time the guest won't be able to get a response even if we could send it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-6-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost: Remove vhost_vq_flushMike Christie
vhost_vq_flush is no longer used so remove it. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-5-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-scsi: Use system wq to flush dev for TMFsMike Christie
We flush all the workers that are not also used by the ctl vq to make sure that responses queued by LIO before the TMF response are sent before the TMF response. This requires a special vhost_vq_flush function which, in the next patches where we handle SIGKILL killing workers while in use, will require extra locking/complexity. To avoid that, this patch has us flush the entire device from the system work queue, then queue up sending the response from there. This is a little less optimal since we now flush all workers but this will be ok since commands have already timed out and perf is not a concern. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-4-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-scsi: Handle vhost_vq_work_queue failures for cmdsMike Christie
In the next patches we will support the vhost_task being killed while in use. The problem for vhost-scsi is that we can't free some structs until we get responses for commands we have submitted to the target layer and we currently process the responses from the vhost_task. This has just drop the responses and free the command's resources. When all commands have completed then operations like flush will be woken up and we can complete device release and endpoint cleanup. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vhost-scsi: Handle vhost_vq_work_queue failures for eventsMike Christie
Currently, we can try to queue an event's work before the vhost_task is created. When this happens we just drop it in vhost_scsi_do_plug before even calling vhost_vq_work_queue. During a device shutdown we do the same thing after vhost_scsi_clear_endpoint has cleared the backends. In the next patches we will be able to kill the vhost_task before we have cleared the endpoint. In that case, vhost_vq_work_queue can fail and we will leak the event's memory. This has handle the failure by just freeing the event. This is safe to do, because vhost_vq_work_queue will only return failure for us when the vhost_task is killed and so userspace will not be able to handle events if we sent them. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20240316004707.45557-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vdpa: Convert sprintf/snprintf to sysfs_emitLi Zhijian
Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). sprintf() will be converted as weel if they have. Generally, this patch is generated by make coccicheck M=<path/to/file> MODE=patch \ COCCI=scripts/coccinelle/api/device_attr_show.cocci No functional change intended CC: "Michael S. Tsirkin" <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com> CC: virtualization@lists.linux.dev Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Message-Id: <20240314095853.1326111-1-lizhijian@fujitsu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vp_vdpa: Fix return value check vp_vdpa_request_irqYuxue Liu
In the vp_vdpa_set_status function, when setting the device status to VIRTIO_CONFIG_S_DRIVER_OK, the vp_vdpa_request_irq function may fail. In such cases, the device status should not be set to DRIVER_OK. Add exception printing to remind the user. Signed-off-by: Yuxue Liu <yuxue.liu@jaguarmicro.com> Message-Id: <20240325105448.235-1-gavin.liu@jaguarmicro.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22arm64/fpsimd: Avoid erroneous elide of user state reloadArd Biesheuvel
TIF_FOREIGN_FPSTATE is a 'convenience' flag that should reflect whether the current CPU holds the most recent user mode FP/SIMD state of the current task. It combines two conditions: - whether the current CPU's FP/SIMD state belongs to the task; - whether that state is the most recent associated with the task (as a task may have executed on other CPUs as well). When a task is scheduled in and TIF_KERNEL_FPSTATE is set, it means the task was in a kernel mode NEON section when it was scheduled out, and so the kernel mode FP/SIMD state is restored. Since this implies that the current CPU is *not* holding the most recent user mode FP/SIMD state of the current task, the TIF_FOREIGN_FPSTATE flag is set too, so that the user mode FP/SIMD state is reloaded from memory when returning to userland. However, the task may be scheduled out after completing the kernel mode NEON section, but before returning to userland. When this happens, the TIF_FOREIGN_FPSTATE flag will not be preserved, but will be set as usual the next time the task is scheduled in, and will be based on the above conditions. This means that, rather than setting TIF_FOREIGN_FPSTATE when scheduling in a task with TIF_KERNEL_FPSTATE set, the underlying state should be updated so that TIF_FOREIGN_FPSTATE will assume the expected value as a result. So instead, call fpsimd_flush_cpu_state(), which takes care of this. Closes: https://lore.kernel.org/all/cb8822182231850108fa43e0446a4c7f@kernel.org Reported-by: Johannes Nixdorf <mixi@shadowice.org> Fixes: aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") Cc: Mark Brown <broonie@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Janne Grunau <j@jannau.net> Cc: stable@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Janne Grunau <j@jannau.net> Tested-by: Johannes Nixdorf <mixi@shadowice.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240522091335.335346-2-ardb+git@google.com Signed-off-by: Will Deacon <will@kernel.org>
2024-05-22Reapply "arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"Will Deacon
This reverts commit b8995a18417088bb53f87c49d200ec72a9dd4ec1. Ard managed to reproduce the dm-crypt corruption problem and got to the bottom of it, so re-apply the problematic patch in preparation for fixing things properly. Cc: stable@vger.kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-05-22net: mana: Fix the extra HZ in mana_hwc_send_requestSouradeep Chakrabarti
Commit 62c1bff593b7 added an extra HZ along with msecs_to_jiffies. This patch fixes that. Cc: stable@vger.kernel.org Fixes: 62c1bff593b7 ("net: mana: Configure hwc timeout from hardware") Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1716185104-31658-1-git-send-email-schakrabarti@linux.microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-22net: lan966x: Remove ptp traps in case the ptp is not enabled.Horatiu Vultur
Lan966x is adding ptp traps to redirect the ptp frames to the CPU such that the HW will not forward these frames anywhere. The issue is that in case ptp is not enabled and the timestamping source is et to HWTSTAMP_SOURCE_NETDEV then these traps would not be removed on the error path. Fix this by removing the traps in this case as they are not needed. Fixes: 54e1ed69c40a ("net: lan966x: convert to ndo_hwtstamp_get() and ndo_hwtstamp_set()") Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20240517135808.3025435-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-21rv: Update rv_en(dis)able_monitor doc to match kernel-docYang Li
The patch updates the function documentation comment for rv_en(dis)able_monitor to adhere to the kernel-doc specification. Link: https://lore.kernel.org/linux-trace-kernel/20240520054239.61784-1-yang.lee@linux.alibaba.com Fixes: 102227b970a15 ("rv: Add Runtime Verification (RV) interface") Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-21tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_testJeff Johnson
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/trace/preemptirq_delay_test.o Link: https://lore.kernel.org/linux-trace-kernel/20240518-md-preemptirq_delay_test-v1-1-387d11b30d85@quicinc.com Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: f96e8577da10 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-05-21ring-buffer: Fix a race between readers and resize checksPetr Pavlu
The reader code in rb_get_reader_page() swaps a new reader page into the ring buffer by doing cmpxchg on old->list.prev->next to point it to the new page. Following that, if the operation is successful, old->list.next->prev gets updated too. This means the underlying doubly-linked list is temporarily inconsistent, page->prev->next or page->next->prev might not be equal back to page for some page in the ring buffer. The resize operation in ring_buffer_resize() can be invoked in parallel. It calls rb_check_pages() which can detect the described inconsistency and stop further tracing: [ 190.271762] ------------[ cut here ]------------ [ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0 [ 190.271789] Modules linked in: [...] [ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1 [ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f [ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014 [ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0 [ 190.272023] Code: [...] [ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206 [ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80 [ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700 [ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000 [ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720 [ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000 [ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000 [ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0 [ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 190.272077] Call Trace: [ 190.272098] <TASK> [ 190.272189] ring_buffer_resize+0x2ab/0x460 [ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0 [ 190.272206] tracing_resize_ring_buffer+0x65/0x90 [ 190.272216] tracing_entries_write+0x74/0xc0 [ 190.272225] vfs_write+0xf5/0x420 [ 190.272248] ksys_write+0x67/0xe0 [ 190.272256] do_syscall_64+0x82/0x170 [ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 190.272373] RIP: 0033:0x7f1bd657d263 [ 190.272381] Code: [...] [ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263 [ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001 [ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000 [ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500 [ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002 [ 190.272412] </TASK> [ 190.272414] ---[ end trace 0000000000000000 ]--- Note that ring_buffer_resize() calls rb_check_pages() only if the parent trace_buffer has recording disabled. Recent commit d78ab792705c ("tracing: Stop current tracer when resizing buffer") causes that it is now always the case which makes it more likely to experience this issue. The window to hit this race is nonetheless very small. To help reproducing it, one can add a delay loop in rb_get_reader_page(): ret = rb_head_page_replace(reader, cpu_buffer->reader_page); if (!ret) goto spin; for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */ __asm__ __volatile__ ("" : : : "memory"); rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list; .. and then run the following commands on the target system: echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable while true; do echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1 done & while true; do for i in /sys/kernel/tracing/per_cpu/*; do timeout 0.1 cat $i/trace_pipe; sleep 0.2 done done To fix the problem, make sure ring_buffer_resize() doesn't invoke rb_check_pages() concurrently with a reader operating on the same ring_buffer_per_cpu by taking its cpu_buffer->reader_lock. Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavlu@suse.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> [ Fixed whitespace ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>