Age | Commit message (Collapse) | Author |
|
|
|
|
|
In current packed virtqueue implementation, the avail_wrap_counter won't
flip, in the case when the driver supplies a descriptor chain with a
length equals to the queue size; total_sg == vq->packed.vring.num.
Let’s assume the following situation:
vq->packed.vring.num=4
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
Then the driver adds a descriptor chain containing 4 descriptors.
We expect the following result with avail_wrap_counter flipped:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 1
But, the current implementation gives the following result:
vq->packed.next_avail_idx: 1
vq->packed.avail_wrap_counter: 0
To reproduce the bug, you can set a packed queue size as small as
possible, so that the driver is more likely to provide a descriptor
chain with a length equal to the packed queue size. For example, in
qemu run following commands:
sudo qemu-system-x86_64 \
-enable-kvm \
-nographic \
-kernel "path/to/kernel_image" \
-m 1G \
-drive file="path/to/rootfs",if=none,id=disk \
-device virtio-blk,drive=disk \
-drive file="path/to/disk_image",if=none,id=rwdisk \
-device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
indirect_desc=off \
-append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
Inside the VM, create a directory and mount the rwdisk device on it. The
rwdisk will hang and mount operation will not complete.
This commit fixes the wrap counter error by flipping the
packed.avail_wrap_counter, when start of descriptor chain equals to the
end of descriptor chain (head == i).
Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
Signed-off-by: Yuan Yao <yuanyaogoog@chromium.org>
Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We try to build affinity mask via create_affinity_masks()
unconditionally which may lead several issues:
- the affinity mask is not used for parent without affinity support
(only VDUSE support the affinity now)
- the logic of create_affinity_masks() might not work for devices
other than block. For example it's not rare in the networking device
where the number of queues could exceed the number of CPUs. Such
case breaks the current affinity logic which is based on
group_cpus_evenly() who assumes the number of CPUs are not less than
the number of groups. This can trigger a warning[1]:
if (ret >= 0)
WARN_ON(nr_present + nr_others < numgrps);
Fixing this by only build the affinity masks only when
- Driver passes affinity descriptor, driver like virtio-blk can make
sure to limit the number of queues when it exceeds the number of CPUs
- Parent support affinity setting config ops
This help to avoid the warning. More optimizations could be done on
top.
[1]
[ 682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
[ 682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
[ 682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
[ 682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
[ 682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
[ 682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
[ 682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
[ 682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
[ 682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
[ 682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
[ 682.146692] FS: 00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
[ 682.146695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
[ 682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 682.146701] Call Trace:
[ 682.146703] <TASK>
[ 682.146705] ? __warn+0x7b/0x130
[ 682.146709] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146712] ? report_bug+0x1c8/0x1e0
[ 682.146717] ? handle_bug+0x3c/0x70
[ 682.146721] ? exc_invalid_op+0x14/0x70
[ 682.146723] ? asm_exc_invalid_op+0x16/0x20
[ 682.146727] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146729] ? group_cpus_evenly+0x15c/0x1c0
[ 682.146731] create_affinity_masks+0xaf/0x1a0
[ 682.146735] virtio_vdpa_find_vqs+0x83/0x1d0
[ 682.146738] ? __pfx_default_calc_sets+0x10/0x10
[ 682.146742] virtnet_find_vqs+0x1f0/0x370
[ 682.146747] virtnet_probe+0x501/0xcd0
[ 682.146749] ? vp_modern_get_status+0x12/0x20
[ 682.146751] ? get_cap_addr.isra.0+0x10/0xc0
[ 682.146754] virtio_dev_probe+0x1af/0x260
[ 682.146759] really_probe+0x1a5/0x410
Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading mechanism")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230811091539.1359865-1-jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Currently, the virtio core will perform a dma operation for each
buffer. Although, the same page may be operated multiple times.
This patch, the driver does the dma operation and manages the dma
address based the feature premapped of virtio core.
This way, we can perform only one dma operation for the pages of the
alloc frag. This is beneficial for the iommu device.
kernel command line: intel_iommu=on iommu.passthrough=0
| strict=0 | strict=1
Before | 775496pps | 428614pps
After | 1109316pps | 742853pps
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
These API has been introduced:
* virtqueue_dma_need_sync
* virtqueue_dma_sync_single_range_for_cpu
* virtqueue_dma_sync_single_range_for_device
These APIs can be used together with the premapped mechanism to sync the
DMA address.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in
advance. The purpose is to keep memory mapped across multiple add/get
buf operations.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Introduce virtqueue_reset() to release all buffer inside vq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The subsequent reset function will reuse these logic.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-9-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Modify the "useless" to a more accurate "unused".
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-8-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Now we add a case where we skip dma unmap, the vq->premapped is true.
We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.
So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
If the vq is the premapped mode, use the sg_dma_address() directly.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-5-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.
This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20230810123057.43407-4-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This patch put the dma addr error check in vring_map_one_sg().
The benefits of doing this:
1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
vring_mapping_error to check the return value. simplifies subsequent
code
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-3-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230810123057.43407-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Start offering the feature in the simulator. Other parent drivers can
follow this code to offer it too.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-5-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This operation allow vdpa parent to expose its own backend feature bits.
Next patches introduce a feature not compatible with all parent drivers:
the ability to enable vq after driver_ok. Each parent must declare if
it allows it or not.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-4-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Accepting VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature if
userland sets it.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-3-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This feature flag allows the driver enabling virtqueues both before and
after DRIVER_OK.
This is needed for software assisted live migration, so userland can
restore the device status in devices with control virtqueue before the
dataplane is enabled.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Message-Id: <20230609092127.170673-2-eperezma@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Commit 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation")
declared but never implemented these.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Message-Id: <20230803143041.23388-1-yuehaibing@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"New controller support and updates to drivers.
New support:
- Qualcomm SM6115 and QCM2290 dmaengine support
- at_xdma support for microchip,sam9x7 controller
Updates:
- idxd updates for wq simplification and ats knob updates
- fsl edma updates for v3 support
- Xilinx AXI4-Stream control support
- Yaml conversion for bcm dma binding"
* tag 'dmaengine-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (53 commits)
dmaengine: fsl-edma: integrate v3 support
dt-bindings: fsl-dma: fsl-edma: add edma3 compatible string
dmaengine: fsl-edma: move tcd into struct fsl_dma_chan
dmaengine: fsl-edma: refactor chan_name setup and safety
dmaengine: fsl-edma: move clearing of register interrupt into setup_irq function
dmaengine: fsl-edma: refactor using devm_clk_get_enabled
dmaengine: fsl-edma: simply ATTR_DSIZE and ATTR_SSIZE by using ffs()
dmaengine: fsl-edma: move common IRQ handler to common.c
dmaengine: fsl-edma: Remove enum edma_version
dmaengine: fsl-edma: transition from bool fields to bitmask flags in drvdata
dmaengine: fsl-edma: clean up EXPORT_SYMBOL_GPL in fsl-edma-common.c
dmaengine: fsl-edma: fix build error when arch is s390
dmaengine: idxd: Fix issues with PRS disable sysfs knob
dmaengine: idxd: Allow ATS disable update only for configurable devices
dmaengine: xilinx_dma: Program interrupt delay timeout
dmaengine: xilinx_dma: Use tasklet_hi_schedule for timing critical usecase
dmaengine: xilinx_dma: Freeup active list based on descriptor completion bit
dmaengine: xilinx_dma: Increase AXI DMA transaction segment count
dmaengine: xilinx_dma: Pass AXI4-Stream control words to dma client
dt-bindings: dmaengine: xilinx_dma: Add xlnx,irq-delay property
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
Pull phy updates from Vinod Koul:
"As usual a couple of new drivers, a bunch of new device support and
few updates to existing drivers
New Support:
- Starfive dphy rx, JH7110 usb and pcie support
- Rockchip rv1126 inno-dsi phy, rk3588 usb and pcie support
- Qualcomm sa8775p PCIe support, M31 USB PHY driver
- Samsung Exynos850 usb support
Updates:
- Mediatek dsi driver clock updates
- Qualcomm sm8150 combo phy with reworking of qmp pcie driver
- Xilinx zynqmp runtime PM support"
* tag 'phy-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (83 commits)
phy: exynos5-usbdrd: Add Exynos850 support
phy: exynos5-usbdrd: Add 26MHz ref clk support
phy: exynos5-usbdrd: Make it possible to pass custom phy ops
dt-bindings: phy: samsung,usb3-drd-phy: Add Exynos850 support
phy: qcom-qmp-combo: fix clock probing
phy: qcom-qmp-pcie: support SM8150 PCIe QMP PHYs
phy: qcom-qmp-pcie: populate offsets configuration
phy: qcom-qmp-pcie: simplify clock handling
phy: qcom-qmp-pcie: keep offset tables sorted
phy: qcom-qmp-pcie: drop ln_shrd from v5_20 config
dt-bindings: phy: qcom,qmp-pcie: describe SM8150 PCIe PHYs
dt-bindings: phy: migrate QMP PCIe PHY bindings to qcom,sc8280xp-qmp-pcie-phy.yaml
phy: fsl-imx8mq-usb: add dev_err_probe if getting vbus failed
phy: qcom: Introduce M31 USB PHY driver
dt-bindings: phy: qcom,m31: Document qcom,m31 USB phy
phy: rockchip: inno-dsidphy: Add rv1126 support
dt-bindings: phy: rockchip-inno-dsidphy: Document rv1126
dt-bindings: phy: mediatek,tphy: allow simple nodename pattern
phy: amlogic: meson-g12a-usb2: fix Wvoid-pointer-to-enum-cast warning
phy: marvell pxa-usb: fix Wvoid-pointer-to-enum-cast warning
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
"Device numbering and intel driver changes are main features:
- Core support for soundwire device number allocation
- intel driver updates for adding hw_params for DAI ops, hybrid
number allocation and power managemnt callback updates
- DT header include changes for subsystem"
* tag 'soundwire-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire:
soundwire: intel_ace2x: add DAI hw_params/prepare/hw_free callbacks
soundwire: intel_auxdevice: add hybrid IDA-based device_number allocation
soundwire: bus: add callbacks for device_number allocation
soundwire: extend parameters of new_peripheral_assigned() callback
soundWire: intel_auxdevice: resume 'sdw-master' on startup and system resume
soundwire: intel_auxdevice: enable pm_runtime earlier on startup
soundwire: Explicitly include correct DT includes
|
|
Currently the Kconfig fragments in kernel/configs and arch/*/configs
that aren't used internally aren't discoverable through "make help",
which consists of hard-coded lists of config fragments. Instead, list
all the fragment targets that have a "# Help: " comment prefix so the
targets can be generated dynamically.
Add logic to the Makefile to search for and display the fragment and
comment. Add comments to fragments that are intended to be direct targets.
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD updates from Miquel Raynal:
"Core MTD changes:
- Use refcount to prevent corruption
- Call external _get and _put in right order
- Fix use-after-free in mtd release
- Explicitly include correct DT includes
- Clean refcounting with MTD_PARTITIONED_MASTER
- mtdblock: make warning messages ratelimited
- dt-bindings: Add SEAMA partition bindings
Device driver changes:
- Use devm helper functions
- Fix questionable cast, remove pointless ones.
- error handling fixes
- add support for new chip versions
- update DT bindings
- misc cleanups - fix typos, whitespace, indentation"
* tag 'mtd/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (105 commits)
dt-bindings: mtd: amlogic,meson-nand: drop unneeded quotes
mtd: spear_smi: Use helper function devm_clk_get_enabled()
mtd: rawnand: orion: Use helper function devm_clk_get_optional_enabled()
mtd: rawnand: vf610_nfc: Use helper function devm_clk_get_enabled()
mtd: rawnand: sunxi: Use helper function devm_clk_get_enabled()
mtd: rawnand: stm32_fmc2: Use helper function devm_clk_get_enabled()
mtd: rawnand: mtk: Use helper function devm_clk_get_enabled()
mtd: rawnand: mpc5121: Use helper function devm_clk_get_enabled()
mtd: rawnand: lpc32xx_slc: Use helper function devm_clk_get_enabled()
mtd: rawnand: intel: Use helper function devm_clk_get_enabled()
mtd: rawnand: fsmc: Use helper function devm_clk_get_enabled()
mtd: rawnand: arasan: Use helper function devm_clk_get_enabled()
mtd: rawnand: qcom: Add read/read_start ops in exec_op path
mtd: rawnand: qcom: Clear buf_count and buf_start in raw read
mtd: maps: fix -Wvoid-pointer-to-enum-cast warning
mtd: rawnand: fix -Wvoid-pointer-to-enum-cast warning
mtd: rawnand: fsmc: handle clk prepare error in fsmc_nand_resume()
mtd: rawnand: Propagate error and simplify ternary operators for brcmstb_nand_wait_for_completion()
mtd: rawnand: qcom: Sort includes alphabetically
mtd: rawnand: qcom: Do not override the error no of submit_descs()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we don't have a highlighted feature enhancement, but
mostly have fixed issues mainly in two parts: 1) zoned block device,
and 2) compression support.
For zoned block device, we've tried to improve the power-off recovery
flow as much as possible. For compression, we found some corner cases
caused by wrong compression policy and logics. Other than them, there
were some reverts and stat corrections.
Bug fixes:
- use finish zone command when closing a zone
- check zone type before sending async reset zone command
- fix to assign compress_level for lz4 correctly
- fix error path of f2fs_submit_page_read()
- don't {,de}compress non-full cluster
- send small discard commands during checkpoint back
- flush inode if atomic file is aborted
- correct to account gc/cp stats
And, there are minor bug fixes, avoiding false lockdep warning, and
clean-ups"
* tag 'f2fs-for-6-6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (25 commits)
f2fs: use finish zone command when closing a zone
f2fs: compress: fix to assign compress_level for lz4 correctly
f2fs: fix error path of f2fs_submit_page_read()
f2fs: clean up error handling in sanity_check_{compress_,}inode()
f2fs: avoid false alarm of circular locking
Revert "f2fs: do not issue small discard commands during checkpoint"
f2fs: doc: fix description of max_small_discards
f2fs: should update REQ_TIME for direct write
f2fs: fix to account cp stats correctly
f2fs: fix to account gc stats correctly
f2fs: remove unneeded check condition in __f2fs_setxattr()
f2fs: fix to update i_ctime in __f2fs_setxattr()
Revert "f2fs: fix to do sanity check on extent cache correctly"
f2fs: increase usage of folio_next_index() helper
f2fs: Only lfs mode is allowed with zoned block device feature
f2fs: check zone type before sending async reset zone command
f2fs: compress: don't {,de}compress non-full cluster
f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs: fix to avoid mmap vs set_compress_option case
...
|
|
Commit bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()")
added a cond_resched() call to the struct page scanning loop to prevent
soft lockup from happening. However, soft lockup can still happen in that
loop in some corner cases when the pages that satisfy the "!(pfn & 63)"
check are skipped for some reasons.
Fix this corner case by moving up the cond_resched() check so that it will
be called every 64 pages unconditionally.
Link: https://lkml.kernel.org/r/20230825164947.1317981-1-longman@redhat.com
Fixes: bde5f6bc68db ("kmemleak: add scheduling point to kmemleak_scan()")
Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the past, movable allocations could be disallowed from CMA through
PF_MEMALLOC_PIN. As CMA pages are funneled through the MOVABLE pcplist,
this required filtering that cornercase during allocations, such that
pinnable allocations wouldn't accidentally get a CMA page.
However, since 8e3560d963d2 ("mm: honor PF_MEMALLOC_PIN for all movable
pages"), PF_MEMALLOC_PIN automatically excludes __GFP_MOVABLE. Once
again, MOVABLE implies CMA is allowed.
Remove the stale filtering code. Also remove a stale comment that was
introduced as part of the filtering code, because the filtering let
order-0 pages fall through to the buddy allocator. See 1d91df85f399
("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore}
APIs") for context. The comment's been obsolete since the introduction of
the explicit ALLOC_HIGHATOMIC flag in eb2e2b425c69 ("mm/page_alloc:
explicitly record high-order atomic allocations in alloc_flags").
Link: https://lkml.kernel.org/r/20230824153821.243148-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: David Hildenbrand <david@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Make it easier to figure out where to send patches for this file.
Link: https://lkml.kernel.org/r/efbc7689d35a48ff402644d696aa9a8d8bb6333a.1692877089.git.baruch@tkos.co.il
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
anon_vma_link() is unused since commit 5beb49305251 ("mm: change anon_vma
linking to fix multi-process server scalability issue").
Link: https://lkml.kernel.org/r/cdce9b00c9ab15f6d02eddf40dcad537d3e9676f.1692877089.git.baruch@tkos.co.il
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With madvise and prctl KSM can be enabled for different VMA's. Once it is
enabled we can query how effective KSM is overall. However we cannot
easily query if an individual VMA benefits from KSM.
This commit adds a KSM section to the /prod/<pid>/smaps file. It reports
how many of the pages are KSM pages. Note that KSM-placed zeropages are
not included, only actual KSM pages.
Here is a typical output:
7f420a000000-7f421a000000 rw-p 00000000 00:00 0
Size: 262144 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Rss: 51212 kB
Pss: 8276 kB
Shared_Clean: 172 kB
Shared_Dirty: 42996 kB
Private_Clean: 196 kB
Private_Dirty: 7848 kB
Referenced: 15388 kB
Anonymous: 51212 kB
KSM: 41376 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
FilePmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 202016 kB
SwapPss: 3882 kB
Locked: 0 kB
THPeligible: 0
ProtectionKey: 0
ksm_state: 0
ksm_skip_base: 0
ksm_skip_count: 0
VmFlags: rd wr mr mw me nr mg anon
This information also helps with the following workflow:
- First enable KSM for all the VMA's of a process with prctl.
- Then analyze with the above smaps report which VMA's benefit the most
- Change the application (if possible) to add the corresponding madvise
calls for the VMA's that benefit the most
[shr@devkernel.io: v5]
Link: https://lkml.kernel.org/r/20230823170107.1457915-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20230822180539.1424843-1-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the discussion of "Improve hugetlbfs read on HWPOISON hugepages" [1],
Matthew Wilcox suggests hwp is a bad abbreviation of hwpoison, as hwp is
already used as "an acronym by acpi, intel_pstate, some clock drivers, an
ethernet driver, and a scsi driver"[1].
So rename hwp_walk and hwp_walk_ops to hwpoison_walk and
hwpoison_walk_ops respectively.
raw_hwp_(page|list), *_raw_hwp, and raw_hwp_unreliable flag are other
major appearances of "hwp". However, given the "raw" hint in the name, it
is easy to differentiate them from other "hwp" acronyms. Since renaming
them is not as straightforward as renaming hwp_walk*, they are not covered
by this commit.
[1] https://lore.kernel.org/lkml/20230707201904.953262-5-jiaqiyan@google.com/T/#me6fecb8ce1ad4d5769199c9e162a44bc88f7bdec
Link: https://lkml.kernel.org/r/20230713235553.4121855-1-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Memory failure is not interested in logically offlined pages. Skip this
type of page.
Link: https://lkml.kernel.org/r/20230727115643.639741-5-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr, libsas) and
the usual minor updates and bug fixes but no significant core changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (116 commits)
scsi: storvsc: Handle additional SRB status values
scsi: libsas: Delete sas_ata_task.retry_count
scsi: libsas: Delete sas_ata_task.stp_affil_pol
scsi: libsas: Delete sas_ata_task.set_affil_pol
scsi: libsas: Delete sas_ssp_task.task_prio
scsi: libsas: Delete sas_ssp_task.enable_first_burst
scsi: libsas: Delete sas_ssp_task.retry_count
scsi: libsas: Delete struct scsi_core
scsi: libsas: Delete enum sas_phy_type
scsi: libsas: Delete enum sas_class
scsi: libsas: Delete sas_ha_struct.lldd_module
scsi: target: Fix write perf due to unneeded throttling
scsi: lpfc: Do not abuse UUID APIs and LPFC_COMPRESS_VMID_SIZE
scsi: pm8001: Remove unused declarations
scsi: fcoe: Fix potential deadlock on &fip->ctlr_lock
scsi: elx: sli4: Remove code duplication
scsi: bfa: Replace one-element array with flexible-array member in struct fc_rscn_pl_s
scsi: qla2xxx: Remove unused declarations
scsi: pmcraid: Use pci_dev_id() to simplify the code
scsi: pm80xx: Set RETFIS when requested by libsas
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu:
- kprobes: use struct_size() for variable size kretprobe_instance data
structure.
- eprobe: Simplify trace_eprobe list iteration.
- probe events: Data structure field access support on BTF argument.
- Update BTF argument support on the functions in the kernel
loadable modules (only loaded modules are supported).
- Move generic BTF access function (search function prototype and
get function parameters) to a separated file.
- Add a function to search a member of data structure in BTF.
- Support accessing BTF data structure member from probe args by
C-like arrow('->') and dot('.') operators. e.g.
't sched_switch next=next->pid vruntime=next->se.vruntime'
- Support accessing BTF data structure member from $retval. e.g.
'f getname_flags%return +0($retval->name):string'
- Add string type checking if BTF type info is available. This will
reject if user specify ":string" type for non "char pointer"
type.
- Automatically assume the fprobe event as a function return event
if $retval is used.
- selftests/ftrace: Add BTF data field access test cases.
- Documentation: Update fprobe event example with BTF data field.
* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation: tracing: Update fprobe event example with BTF field
selftests/ftrace: Add BTF fields access testcases
tracing/fprobe-event: Assume fprobe is a return event by $retval
tracing/probes: Add string type check with BTF
tracing/probes: Support BTF field access from $retval
tracing/probes: Support BTF based data structure field access
tracing/probes: Add a function to search a member of a struct/union
tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
tracing/probes: Support BTF argument on module functions
tracing/eprobe: Iterate trace_eprobe directly
kernel: kprobes: Use struct_size()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
"Tracing fixes and clean ups:
- Replace strlcpy() with strscpy()
- Initialize the pipe cpumask to zero on allocation
- Use within_module() instead of open coding it
- Remove extra space in hwlat_detectory/mode output
- Use LIST_HEAD() instead of open coding it
- A bunch of clean ups and fixes for the cpumask filter
- Set local da_mon_##name to static
- Fix race in snapshot buffer between cpu write and swap"
* tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/filters: Fix coding style issues
tracing/filters: Change parse_pred() cpulist ternary into an if block
tracing/filters: Fix double-free of struct filter_pred.mask
tracing/filters: Fix error-handling of cpulist parsing buffer
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
ftrace: Use LIST_HEAD to initialize clear_hash
ftrace: Use within_module to check rec->ip within specified module.
tracing: Replace strlcpy with strscpy in trace/events/task.h
tracing: Fix race issue between cpu buffer write and swap
tracing: Remove extra space at the end of hwlat_detector/mode
rv: Set variable 'da_mon_##name' to static
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
- Adjust sizes of buffers just avoid uncompress failures (Ard
Biesheuvel)
* tag 'pstore-v6.6-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Base compression input buffer size on estimated compressed size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 selftest fix from Ingo Molnar:
"Fix the __NR_map_shadow_stack syscall-renumbering fallout in the x86
self-test code.
[ Arguably the existing code was unnecessarily fragile, and tooling
should have picked up the new syscall number, and a wider fix is
being worked on - but meanwhile, let's not have the old syscall
number in the kernel tree. ]"
* tag 'x86-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Update map_shadow_stack syscall nr
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
"Fix false positive 'softirq work is pending' messages on -rt kernels,
caused by a buggy factoring-out of existing code"
* tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/rcu: Fix false positive "softirq work is pending" messages
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
"Fix a CPU hotplug related deadlock between the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer"
* tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Miscellaneous scheduler fixes: a reporting fix, a static symbol fix,
and a kernel-doc fix"
* tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE
sched/fair: Make update_entity_lag() static
sched/core: Add kernel-doc for set_cpus_allowed_ptr()
|
|
Mikhail reports early-6.6-based Fedora Rawhide not booting: "rcu_preempt
detected expedited stalls", minutes wait, and then hung_task splat while
kworker trying to synchronize_rcu_expedited(). Nothing logged to disk.
He bisected to my 6.6 a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and
rcu_read_unlock()s"): but the one to blame is my 6.5 commit to fix the
espfix "bad pmd" warnings when booting x86_64 with CONFIG_EFI_PGT_DUMP=y.
Gaah, that added an "addr >= TASK_SIZE" check to avoid pte_offset_map(),
but failed to add the equivalent check when choosing to pte_unmap().
It's not a problem on 6.5 (for different reasons, it's harmless on both
64-bit and 32-bit), but becomes a bootstopper on 6.6 with the unbalanced
rcu_read_unlock() - RCU has a WARN_ON_ONCE for that, but it would have
scrolled off Mikhail's console too quickly.
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Closes: https://lore.kernel.org/linux-mm/CABXGCsNi8Tiv5zUPNXr6UJw6qV1VdaBEfGqEAMkkXE3QPvZuAQ@mail.gmail.com/
Fixes: 8b1cb4a2e819 ("mm/pagewalk: fix EFI_PGT_DUMP of espfix area")
Fixes: a349d72fd9ef ("mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s")
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Sudip Mukherjee reports that the mips sb1250_swarm_defconfig build fails
with the current kernel. It isn't actually MIPS-specific, it's just
that that defconfig does not have CGROUP_SCHED enabled like most configs
do, and as such shows this error:
kernel/cgroup/cgroup.c: In function 'cgroup_local_stat_show':
kernel/cgroup/cgroup.c:3699:15: error: implicit declaration of function 'cgroup_tryget_css'; did you mean 'cgroup_tryget'? [-Werror=implicit-function-declaration]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^~~~~~~~~~~~~~~~~
| cgroup_tryget
kernel/cgroup/cgroup.c:3699:13: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^
because cgroup_tryget_css() only exists when CGROUP_SCHED is enabled,
and the cgroup_local_stat_show() function should similarly be guarded by
that config option.
Move things around a bit to fix this all.
Fixes: d1d4ff5d11a5 ("cgroup: put cgroup_tryget_css() inside CONFIG_CGROUP_SCHED")
Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the typo which resulted in the driver using FB_DEFAULT_IOMEM_HELPERS
instead of FB_DEFAULT_IOMEM_OPS as the fbdev I/O helpers.
Fixes: 501126083855 ("fbdev/g364fb: Use fbdev I/O helpers")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
Micron 1100 drives lock up when encountering queued TRIM command. It is
a quite old hardware series, for past years we have been running our
machines with these drives using libata.force=noncqtrim.
[Damien] Move the "Crucial_CT*M500*" entry to keep Micron and Crucial
entries together.
Signed-off-by: Pawel Zmarzly <pzmarzly@meta.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Elkhart Lake is the successor of Apollo Lake and Gemini Lake. These
CPUs and their PCHs are used in mobile and embedded environments.
With this patch I suggest that Elkhart Lake SATA controllers [1] should
use the default LPM policy for mobile chipsets.
The disadvantage of missing hot-plug support with this setting should
not be an issue, as those CPUs are used in embedded environments and
not in servers with hot-plug backplanes.
We discovered that the Elkhart Lake SATA controllers have been missing
in ahci.c after a customer reported the throttling of his SATA SSD
after a short period of higher I/O. We determined the high temperature
of the SSD controller in idle mode as the root cause for that.
Depending on the used SSD, we have seen up to 1.8 Watt lower system
idle power usage and up to 30°C lower SSD controller temperatures in
our tests, when we set med_power_with_dipm manually. I have provided a
table showing seven different SATA SSDs from ATP, Intel/Solidigm and
Samsung [2].
Intel lists a total of 3 SATA controller IDs (4B60, 4B62, 4B63) in [1]
for those mobile PCHs.
This commit just adds 0x4b63 as I do not have test systems with 0x4b60
and 0x4b62 SATA controllers.
I have tested this patch with a system which uses 0x4b63 as SATA
controller.
[1] https://sata-io.org/product/8803
[2] https://www.thomas-krenn.com/en/wiki/SATA_Link_Power_Management#Example_LES_v4
Signed-off-by: Werner Fischer <devlists@wefi.net>
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Recent commits have introduced some coding style issues, fix those up.
Link: https://lkml.kernel.org/r/20230901151039.125186-5-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Review comments noted that an if block would be clearer than a ternary, so
swap it out.
No change in behaviour intended
Link: https://lkml.kernel.org/r/20230901151039.125186-4-vschneid@redhat.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|