summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-11virtio_ring: introduce virtqueue_resize()Xuan Zhuo
Introduce virtqueue_resize() to implement the resize of vring. Based on these, the driver can dynamically adjust the size of the vring. For example: ethtool -G. virtqueue_resize() implements resize based on the vq reset function. In case of failure to allocate a new vring, it will give up resize and use the original vring. During this process, if the re-enable reset vq fails, the vq can no longer be used. Although the probability of this situation is not high. The parameter recycle is used to recycle the buffer that is no longer used. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-25-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: introduce virtqueue_resize_packed()Xuan Zhuo
virtio ring packed supports resize. Only after the new vring is successfully allocated based on the new num, we will release the old vring. In any case, an error is returned, indicating that the vring still points to the old vring. In the case of an error, re-initialize(by virtqueue_reinit_packed()) the virtqueue to ensure that the vring can be used. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-24-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: introduce virtqueue_reinit_packed()Xuan Zhuo
Introduce a function to initialize vq without allocating new ring, desc_state, desc_extra. Subsequent patches will call this function after reset vq to reinitialize vq. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-23-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: extract the logic of attach vringXuan Zhuo
Separate the logic of attach vring, the subsequent patch will call it separately. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-22-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: extract the logic of vring initXuan Zhuo
Separate the logic of initializing vring, and subsequent patches will call it separately. This function completes the variable initialization of packed vring. It together with the logic of atatch constitutes the initialization of vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-21-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: extract the logic of alloc state and extraXuan Zhuo
Separate the logic for alloc desc_state and desc_extra, which will be called separately by subsequent patches. Use struct vring_packed to pass desc_state, desc_extra. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-20-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: extract the logic of alloc queueXuan Zhuo
Separate the logic of packed to create vring queue. This feature is required for subsequent virtuqueue reset vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: packed: introduce vring_free_packedXuan Zhuo
Free the structure struct vring_vritqueue_packed. Subsequent patches require it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: introduce virtqueue_resize_split()Xuan Zhuo
virtio ring split supports resize. Only after the new vring is successfully allocated based on the new num, we will release the old vring. In any case, an error is returned, indicating that the vring still points to the old vring. In the case of an error, re-initialize(virtqueue_reinit_split()) the virtqueue to ensure that the vring can be used. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-17-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: reserve vring_align, may_reduce_numXuan Zhuo
In vring_alloc_queue_split() save vring_align, may_reduce_num to structure vring_virtqueue_split. Used to create a new vring when implementing resize. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-16-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: introduce virtqueue_reinit_split()Xuan Zhuo
Introduce a function to initialize vq without allocating new ring, desc_state, desc_extra. Subsequent patches will call this function after reset vq to reinitialize vq. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-15-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: extract the logic of attach vringXuan Zhuo
Separate the logic of attach vring, subsequent patches will call it separately. virtqueue_vring_init_split() completes the initialization of other variables of vring split. We can directly use vq->split = *vring_split to complete attach. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-14-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: extract the logic of vring initXuan Zhuo
Separate the logic of initializing vring, and subsequent patches will call it separately. This function completes the variable initialization of split vring. It together with the logic of atatch constitutes the initialization of vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: extract the logic of alloc state and extraXuan Zhuo
Separate the logic of creating desc_state, desc_extra, and subsequent patches will call it independently. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-12-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: extract the logic of alloc queueXuan Zhuo
Separate the logic of split to create vring queue. This feature is required for subsequent virtuqueue reset vring. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-11-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: introduce vring_free_split()Xuan Zhuo
Free the structure struct vring_vritqueue_split. Subsequent patches require it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-10-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: __vring_new_virtqueue() accept struct vring_virtqueue_splitXuan Zhuo
__vring_new_virtqueue() instead accepts struct vring_virtqueue_split. The purpose of this is to pass more information into __vring_new_virtqueue() to make the code simpler and the structure cleaner. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-9-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split: stop __vring_new_virtqueue as export symbolXuan Zhuo
There is currently only one place to reference __vring_new_virtqueue() directly from the outside of virtio core. And here vring_new_virtqueue() can be used instead. Subsequent patches will modify __vring_new_virtqueue, so stop it as an export symbol for now. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-8-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: introduce virtqueue_init()Xuan Zhuo
Separate the logic of virtqueue initialization. These variables should be reset during reset. This logic can be called independently when implementing resize/reset later. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-7-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: split vring_virtqueueXuan Zhuo
Separate the two inline structures(split and packed) from the structure vring_virtqueue. In this way, we can use these two structures later to pass parameters and retain temporary variables. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: extract the logic of freeing vringXuan Zhuo
Introduce vring_free() to free the vring of vq. Subsequent patches will use vring_free() alone. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-5-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: update the document of the virtqueue_detach_unused_buf for ↵Xuan Zhuo
queue reset Added documentation for virtqueue_detach_unused_buf, allowing it to be called on queue reset. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-4-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio: struct virtio_config_ops add callbacks for queue_resetXuan Zhuo
reset can be divided into the following four steps (example): 1. transport: notify the device to reset the queue 2. vring: recycle the buffer submitted 3. vring: reset/resize the vring (may re-alloc) 4. transport: mmap vring to device, and enable the queue In order to support queue reset, add two callbacks in struct virtio_config_ops to implement steps 1 and 4. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-3-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio: record the maximum queue num supported by the device.Xuan Zhuo
virtio-net can display the maximum (supported by hardware) ring size in ethtool -g eth0. When the subsequent patch implements vring reset, it can judge whether the ring size passed by the driver is legal based on this. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architecturesDavid Hildenbrand
Let's make it clearer that simply unlocking CONFIG_VIRTIO_MEM on an architecture is most probably not sufficient to have it working as expected. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20220610094737.65254-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_mmio: add support to set IRQ of a virtio device as wakeup sourceMinghao Xue
According to virtio_mmio wakeup flag in device trees, set its IRQ as wakeup source in virtqueue initialization. Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com> Message-Id: <1654851507-13891-3-git-send-email-quic_mingxue@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11dt-bindings: virtio: mmio: add optional wakeup-source propertyMinghao Xue
Some systems want to set the interrupt of virtio_mmio device as a wakeup source. On such systems, we'll use the existence of the "wakeup-source" property as a signal of requirement. Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <1654851507-13891-2-git-send-email-quic_mingxue@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa: Use device_iommu_capable()Robin Murphy
Use the new interface to check the capability for our device specifically. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Message-Id: <548e316fa282ce513fabb991a4c4d92258062eb5.1654688822.git.robin.murphy@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11virtio: VIRTIO_HARDEN_NOTIFICATION is brokenMichael S. Tsirkin
This option doesn't really work and breaks too many drivers. Not yet sure what's the right thing to do, for now let's make sure randconfig isn't broken by this. Fixes: c346dae4f3fb ("virtio: disable notification hardening by default") Cc: "Jason Wang" <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11virtio_pmem: set device ready in probe()Jason Wang
The NVDIMM region could be available before the virtio_device_ready() that is called by virtio_dev_probe(). This means the driver tries to use device before DRIVER_OK which violates the spec, fixing this by set device ready before the nvdimm_pmem_region_create(). Note that this means the virtio_pmem_host_ack() could be triggered before the creation of the nd region, this is safe since the pmem_lock has been initialized and whether or not any available buffer is added before is validated by virtio_pmem_host_ack(). Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver") Acked-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220628083430.61856-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_pmem: initialize provider_data through nd_region_descJason Wang
We used to initialize the provider_data manually after nvdimm_pemm_region_create(). This seems to be racy if the flush is issued before the initialization of provider_data[1]. Fixing this by initializing the provider_data through nd_region_desc to make sure the provider_data is ready after the pmem is created. [1]: [ 80.152281] nd_pmem namespace0.0: unable to guarantee persistence of writes [ 92.393956] BUG: kernel NULL pointer dereference, address: 0000000000000318 [ 92.394551] #PF: supervisor read access in kernel mode [ 92.394955] #PF: error_code(0x0000) - not-present page [ 92.395365] PGD 0 P4D 0 [ 92.395566] Oops: 0000 [#1] PREEMPT SMP PTI [ 92.395867] CPU: 2 PID: 506 Comm: mkfs.ext4 Not tainted 5.19.0-rc1+ #453 [ 92.396365] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 92.397178] RIP: 0010:virtio_pmem_flush+0x2f/0x1f0 [ 92.397521] Code: 55 41 54 55 53 48 81 ec a0 00 00 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 98 00 00 00 31 c0 48 8b 87 78 03 00 00 48 89 04 24 <48> 8b 98 18 03 00 00 e8 85 bf 6b 00 ba 58 00 00 00 be c0 0c 00 00 [ 92.398982] RSP: 0018:ffff9a7380aefc88 EFLAGS: 00010246 [ 92.399349] RAX: 0000000000000000 RBX: ffff8e77c3f86f00 RCX: 0000000000000000 [ 92.399833] RDX: ffffffffad4ea720 RSI: ffff8e77c41e39c0 RDI: ffff8e77c41c5c00 [ 92.400388] RBP: ffff8e77c41e39c0 R08: ffff8e77c19f0600 R09: 0000000000000000 [ 92.400874] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e77c0814e28 [ 92.401364] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8e77c41e39c0 [ 92.401849] FS: 00007f3cd75b2780(0000) GS:ffff8e7937d00000(0000) knlGS:0000000000000000 [ 92.402423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 92.402821] CR2: 0000000000000318 CR3: 0000000103c80002 CR4: 0000000000370ee0 [ 92.403307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 92.403793] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 92.404278] Call Trace: [ 92.404481] <TASK> [ 92.404654] ? mempool_alloc+0x5d/0x160 [ 92.404939] ? terminate_walk+0x5f/0xf0 [ 92.405226] ? bio_alloc_bioset+0xbb/0x3f0 [ 92.405525] async_pmem_flush+0x17/0x80 [ 92.405806] nvdimm_flush+0x11/0x30 [ 92.406067] pmem_submit_bio+0x1e9/0x200 [ 92.406354] __submit_bio+0x80/0x120 [ 92.406621] submit_bio_noacct_nocheck+0xdc/0x2a0 [ 92.406958] submit_bio_wait+0x4e/0x80 [ 92.407234] blkdev_issue_flush+0x31/0x50 [ 92.407526] ? punt_bios_to_rescuer+0x230/0x230 [ 92.407852] blkdev_fsync+0x1e/0x30 [ 92.408112] do_fsync+0x33/0x70 [ 92.408354] __x64_sys_fsync+0xb/0x10 [ 92.408625] do_syscall_64+0x43/0x90 [ 92.408895] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 92.409257] RIP: 0033:0x7f3cd76c6c44 Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver") Acked-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220628083430.61856-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vringh: iterate on iotlb_translate to handle large translationsStefano Garzarella
iotlb_translate() can return -ENOBUFS if the bio_vec is not big enough to contain all the ranges for translation. This can happen for example if the VMM maps a large bounce buffer, without using hugepages, that requires more than 16 ranges to translate the addresses. To handle this case, let's extend iotlb_translate() to also return the number of bytes successfully translated. In copy_from_iotlb()/copy_to_iotlb() loops by calling iotlb_translate() several times until we complete the translation. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220624075656.13997-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_ring: remove the arg vq of vring_alloc_desc_extra()Xuan Zhuo
The parameter vq of vring_alloc_desc_extra() is useless. This patch removes this parameter. Subsequent patches will call this function to avoid passing useless arguments. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220624025621.128843-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11remoteproc: rename len of rpoc_vring to numXuan Zhuo
Rename the member len in the structure rpoc_vring to num. And remove 'in bytes' from the comment of it. This is misleading. Because this actually refers to the size of the virtio vring to be created. The unit is not bytes. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20220624025621.128843-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-10bpf: Shut up kern_sys_bpf warning.Alexei Starovoitov
Shut up this warning: kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes] int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-11KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability tableBagas Sanjaya
There is unexpected warning on KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table, which cause the table to be rendered as paragraph text instead. The warning is due to missing colon at capability name and returns keyword, as well as improper alignment on multi-line returns field. Fix the warning by adding missing colons and aligning the field. Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/ Fixes: 084cc29f8bbb03 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Matlack <dmatlack@google.com> Cc: Ben Gardon <bgardon@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-next@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Message-Id: <20220627095151.19339-3-bagasdotme@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-11Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underlineBagas Sanjaya
Extend heading underline for KVM_CAP_VM_DISABLE_NX_HUGE_PAGE to match the heading text length. Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/ Fixes: 084cc29f8bbb03 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: David Matlack <dmatlack@google.com> Cc: Ben Gardon <bgardon@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-next@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Message-Id: <20220627095151.19339-2-bagasdotme@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-10net/tls: Use RCU API to access tls_ctx->netdevMaxim Mikityanskiy
Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10tls: rx: device: don't try to copy too much on detachJakub Kicinski
Another device offload bug, we use the length of the output skb as an indication of how much data to copy. But that skb is sized to offset + record length, and we start from offset. So we end up double-counting the offset which leads to skb_copy_bits() returning -EFAULT. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10tls: rx: device: bound the frag walkJakub Kicinski
We can't do skb_walk_frags() on the input skbs, because the input skbs is really just a pointer to the tcp read queue. We need to bound the "is decrypted" check by the amount of data in the message. Note that the walk in tls_device_reencrypt() is after a CoW so the skb there is safe to walk. Actually in the current implementation it can't have frags at all, but whatever, maybe one day it will. Reported-by: Tariq Toukan <tariqt@nvidia.com> Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser") Tested-by: Ran Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10net_sched: cls_route: remove from list when handle is 0Thadeu Lima de Souza Cascardo
When a route filter is replaced and the old filter has a 0 handle, the old one won't be removed from the hashtable, while it will still be freed. The test was there since before commit 1109c00547fc ("net: sched: RCU cls_route"), when a new filter was not allocated when there was an old one. The old filter was reused and the reinserting would only be necessary if an old filter was replaced. That was still wrong for the same case where the old handle was 0. Remove the old filter from the list independently from its handle value. This fixes CVE-2022-2588, also reported as ZDI-CAN-17440. Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Kamal Mostafa <kamal@canonical.com> Cc: <stable@vger.kernel.org> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11ALSA: hda: Fix crash due to jack poll in suspendMohan Kumar
With jackpoll_in_suspend flag set, there is a possibility that jack poll worker thread will run even after system suspend was completed. Any register access after system pm callback flow will result in kernel crash as still jack poll worker thread tries to access registers. To fix the crash issue during system flow, cancel the jack poll worker thread during system pm prepare callback and cancel the worker thread at start of runtime suspend callback and re-schedule at last to avoid any unwarranted access of register by worker thread during suspend flow. Signed-off-by: Mohan Kumar <mkumard@nvidia.com> Fixes: b33115bd05af ("ALSA: hda: Jack detection poll in suspend state") Link: https://lore.kernel.org/r/20220811052704.2944-1-mkumard@nvidia.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-08-11ALSA: hda/cirrus - support for iMac 12,1 modelAllen Ballway
The 12,1 model requires the same configuration as the 12,2 model to enable headphones but has a different codec SSID. Adds 12,1 SSID for matching quirk. [ re-sorted in SSID order by tiwai ] Signed-off-by: Allen Ballway <ballway@chromium.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220810152701.1.I902c2e591bbf8de9acb649d1322fa1f291849266@changeid Signed-off-by: Takashi Iwai <tiwai@suse.de>
2022-08-10selftests: forwarding: Fix failing tests with old libnetIdo Schimmel
The custom multipath hash tests use mausezahn in order to test how changes in various packet fields affect the packet distribution across the available nexthops. The tool uses the libnet library for various low-level packet construction and injection. The library started using the "SO_BINDTODEVICE" socket option for IPv6 sockets in version 1.1.6 and for IPv4 sockets in version 1.2. When the option is not set, packets are not routed according to the table associated with the VRF master device and tests fail. Fix this by prefixing the command with "ip vrf exec", which will cause the route lookup to occur in the VRF routing table. This makes the tests pass regardless of the libnet library version. Fixes: 511e8db54036 ("selftests: forwarding: Add test for custom multipath hash") Fixes: 185b0c190bb6 ("selftests: forwarding: Add test for custom multipath hash with IPv4 GRE") Fixes: b7715acba4d3 ("selftests: forwarding: Add test for custom multipath hash with IPv6 GRE") Reported-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Link: https://lore.kernel.org/r/20220809113320.751413-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== bpf 2022-08-10 We've added 23 non-merge commits during the last 7 day(s) which contain a total of 19 files changed, 424 insertions(+), 35 deletions(-). The main changes are: 1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao. 2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia. 3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu. 4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev. 5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney. 6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi. 7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa. 8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai. 9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address offset buffer, from Aijun Sun. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator bpf: Check the validity of max_rdwr_access for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for hash map iterator bpf: Acquire map uref in .init_seq_private for array map iterator bpf: Disallow bpf programs call prog_run command. bpf, arm64: Fix bpf trampoline instruction endianness selftests/bpf: Add test for prealloc_lru_pop bug bpf: Don't reinit map value in prealloc_lru_pop bpf: Allow calling bpf_prog_test kfuncs in tracing programs bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf bpf: Use proper target btf when exporting attach_btf_obj_id mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled bpf: Cleanup ftrace hash in bpf_trampoline_put BPF: Fix potential bad pointer dereference in bpf_sys_bpf() ... ==================== Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10Merge branch 'net-enhancements-to-sk_user_data-field'Jakub Kicinski
Hawkins Jiawei says: ==================== net: enhancements to sk_user_data field This patchset fixes refcount bug by adding SK_USER_DATA_PSOCK flag bit in sk_user_data field. The bug cause following info: WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> To improve code maintainability, this patchset refactors sk_user_data flags code to be more generic. ==================== Link: https://lore.kernel.org/r/cover.1659676823.git.yin31149@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10net: refactor bpf_sk_reuseport_detach()Hawkins Jiawei
Refactor sk_user_data dereference using more generic function __rcu_dereference_sk_user_data_with_flags(), which improve its maintainability Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10net: fix refcount bug in sk_psock_get (2)Hawkins Jiawei
Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10riscv: implement Zicbom-based CMO instructions + the t-head variantPalmer Dabbelt
This series is based on the alternatives changes done in my svpbmt series and thus also depends on Atish's isa-extension parsing series. It implements using the cache-management instructions from the Zicbom- extension to handle cache flush, etc actions on platforms needing them. SoCs using cpu cores from T-Head like the Allwinne D1 implement a different set of cache instructions. But while they are different, instructions they provide the same functionality, so a variant can easly hook into the existing alternatives mechanism on those. [Palmer: Some minor fixups, including a RISCV_ISA_ZICBOM dependency on MMU that's probably not strictly necessary. The Zicbom support will trip up sparse for users that have new toolchains, I just sent a patch.] Link: https://lore.kernel.org/all/20220706231536.2041855-1-heiko@sntech.de/ Link: https://lore.kernel.org/linux-sparse/20220811033138.20676-1-palmer@rivosinc.com/T/#u * palmer/riscv-zicbom: riscv: implement cache-management errata for T-Head SoCs riscv: Add support for non-coherent devices using zicbom extension dt-bindings: riscv: document cbom-block-size of: also handle dma-noncoherent in of_dma_is_coherent()
2022-08-10cifs: Remove {cifs,nfs}_fscache_release_page()David Howells
Remove {cifs,nfs}_fscache_release_page() from fs/cifs/fscache.h. This functionality got built directly into cifs_release_folio() and will hopefully be replaced with netfs_release_folio() at some point. The "nfs_" version is a copy and paste error and should've been altered to read "cifs_". That can also be removed. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: Steve French <smfrench@gmail.com> cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>