Age | Commit message (Collapse) | Author |
|
struct bpf_iter_scx_dsq *it maybe not initialized.
If we didn't call scx_bpf_dsq_move_set_vtime and scx_bpf_dsq_move_set_slice
before scx_bpf_dsq_move, it would cause unexpected behaviors:
1. Assign a huge slice into p->scx.slice
2. Assign a invalid vtime into p->scx.dsq_vtime
Signed-off-by: Henry Huang <henry.hj@antgroup.com>
Fixes: 6462dd53a260 ("sched_ext: Compact struct bpf_iter_scx_dsq_kern")
Cc: stable@vger.kernel.org # v6.12
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Fix a compiler warning with W=1:
kernel/workqueue.c: error:
function ‘__alloc_workqueue’ might be a candidate for ‘gnu_printf’
format attribute[-Werror=suggest-attribute=format]
5657 | name_len = vsnprintf(wq->name, sizeof(wq->name), fmt, args);
| ^~~~~~~~
Fixes: 9b59a85a84dc ("workqueue: Don't call va_start / va_end twice")
Signed-off-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Four minor fixes for NAND controller drivers (cleanup path, double
actions, and W=1 warning) as well as a cast to avoid overflows in an
mtd device driver"
* tag 'mtd/fixes-for-6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: omap2: Fix build warnings with W=1
mtd: rawnand: arasan: Fix missing de-registration of NAND
mtd: rawnand: arasan: Fix double assertion of chip-select
mtd: diskonchip: Cast an operand to prevent potential overflow
mtd: rawnand: fix double free in atmel_pmecc_create_user()
|
|
The driver no longer builds when regulator support is unavailable:
arm-linux-gnueabi-ld: drivers/phy/mediatek/phy-mtk-hdmi.o: in function `mtk_hdmi_phy_register_regulators':
phy-mtk-hdmi.c:(.text.unlikely+0x3e): undefined reference to `devm_regulator_register'
arm-linux-gnueabi-ld: drivers/phy/mediatek/phy-mtk-hdmi-mt8195.o: in function `mtk_hdmi_phy_pwr5v_is_enabled':
phy-mtk-hdmi-mt8195.c:(.text+0x326): undefined reference to `rdev_get_drvdata'
arm-linux-gnueabi-ld: drivers/phy/mediatek/phy-mtk-hdmi-mt8195.o: in function `mtk_hdmi_phy_pwr5v_disable':
phy-mtk-hdmi-mt8195.c:(.text+0x346): undefined reference to `rdev_get_drvdata'
arm-linux-gnueabi-ld: drivers/phy/mediatek/phy-mtk-hdmi-mt8195.o: in function `mtk_hdmi_phy_pwr5v_enable':
Fixes: 49393b2da1cd ("phy: mediatek: phy-mtk-hdmi: Register PHY provided regulator")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20241213083056.2596499-1-arnd@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
The Kernel test robot returns the following warning:
do_div() does a 64-by-32 division, please consider using div64_ul instead.
To prevent the 64-by-32 divsion, consolidate both the multiplication
and the do_div into one line which explicitly uses u64 sizes.
Fixes: 1951dbb41d1d ("phy: freescale: fsl-samsung-hdmi: Support dynamic integer")
Signed-off-by: Adam Ford <aford173@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412091243.fSObwwPi-lkp@intel.com/
Link: https://lore.kernel.org/r/20241215220555.99113-1-aford173@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
For macro for_each_child_of_node(parent, child), refcount of @child has
been increased before entering its loop body, so normally needs to call
of_node_put(@child) before returning from the loop body to avoid refcount
leakage.
of_phy_provider_lookup() has such usage but does not call of_node_put()
before returning, so cause leakage of the OF node refcount.
Fix by simply calling of_node_put() before returning from the loop body.
The APIs affected by this issue are shown below since they indirectly
invoke problematic of_phy_provider_lookup().
phy_get()
of_phy_get()
devm_phy_get()
devm_of_phy_get()
devm_of_phy_get_by_index()
Fixes: 2a4c37016ca9 ("phy: core: Fix of_phy_provider_lookup to return PHY provider for sub node")
Cc: stable@vger.kernel.org
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-5-40ae28f5015a@quicinc.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
_of_phy_get() will directly return when suffers of_device_is_compatible()
error, but it forgets to decrease refcount of OF node @args.np before error
return, the refcount was increased by previous of_parse_phandle_with_args()
so causes the OF node's refcount leakage.
Fix by decreasing the refcount via of_node_put() before the error return.
Fixes: b7563e2796f8 ("phy: work around 'phys' references to usb-nop-xceiv devices")
Cc: stable@vger.kernel.org
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-4-40ae28f5015a@quicinc.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
For devm_phy_destroy(), its comment says it needs to invoke phy_destroy()
to destroy the phy, but it will not actually invoke the function since
devres_destroy() does not call devm_phy_consume(), and the missing
phy_destroy() call will cause that the phy fails to be destroyed.
Fortunately, the faulty API has not been used by current kernel tree.
Fix by using devres_release() instead of devres_destroy() within the API.
Fixes: ff764963479a ("drivers: phy: add generic PHY framework")
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-3-40ae28f5015a@quicinc.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
unregister the phy provider
For devm_of_phy_provider_unregister(), its comment says it needs to invoke
of_phy_provider_unregister() to unregister the phy provider, but it will
not actually invoke the function since devres_destroy() does not call
devm_phy_provider_release(), and the missing of_phy_provider_unregister()
call will cause:
- The phy provider fails to be unregistered.
- Leak both memory and the OF node refcount.
Fortunately, the faulty API has not been used by current kernel tree.
Fix by using devres_release() instead of devres_destroy() within the API.
Fixes: ff764963479a ("drivers: phy: add generic PHY framework")
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/stable/20241213-phy_core_fix-v6-2-40ae28f5015a%40quicinc.com
Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-2-40ae28f5015a@quicinc.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
For devm_phy_put(), its comment says it needs to invoke phy_put() to
release the phy, but it will not actually invoke the function since
devres_destroy() does not call devm_phy_release(), and the missing
phy_put() call will cause:
- The phy fails to be released.
- devm_phy_put() can not fully undo what API devm_phy_get() does.
- Leak refcount of both the module and device for below typical usage:
devm_phy_get(); // or its variant
...
err = do_something();
if (err)
goto err_out;
...
err_out:
devm_phy_put(); // leak refcount here
The file(s) affected by this issue are shown below since they have such
typical usage.
drivers/pci/controller/cadence/pcie-cadence.c
drivers/net/ethernet/ti/am65-cpsw-nuss.c
Fix by using devres_release() instead of devres_destroy() within the API.
Fixes: ff764963479a ("drivers: phy: add generic PHY framework")
Cc: stable@vger.kernel.org
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Krzysztof Wilczyński <kw@linux.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-1-40ae28f5015a@quicinc.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Currently, the driver does not return the correct DMA status when a DMA
pause is issued by the client drivers. This causes GPCDMA users to
assume that DMA is still running, while in reality, the DMA is paused.
Return DMA_PAUSED for tx_status() if the channel is paused in the middle
of a transfer.
Fixes: ee17028009d4 ("dmaengine: tegra: Add tegra gpcdma driver")
Cc: stable@vger.kernel.org
Signed-off-by: Akhil R <akhilrajeev@nvidia.com>
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Link: https://lore.kernel.org/r/20241212124412.5650-1-kkartik@nvidia.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
The for_each_child_of_node() loop requires explicit calls to
of_node_put() to decrement the child's refcount upon early exits (break,
goto, return).
Add the missing calls in the two early exits before the goto
instructions.
Cc: stable@vger.kernel.org
Fixes: f7d12ef53ddf ("dma: mv_xor: add Device Tree binding")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://lore.kernel.org/r/20241011-dma_mv_xor_of_node_put-v1-1-3c2de819f463@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
sub-pipe PG is not present on DG1. Setting these bits can disable
other power gates and cause GPU hangs on video playbacks.
VLK: 16314, 4304
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13381
Fixes: 85a12d7eb8fe ("drm/i915/tgl: Fix Media power gate sequence.")
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241219210019.70532-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit de7061947b4ed4be857d452c60d5fb795831d79e)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
|
|
According to spec VDR_CUSTOM_WIDTH register gets programmed after pll
specific VDR registers and TX Lane programming registers are done.
Moreover we only program into C10_VDR_CONTROL1 to update config and
setup master lane once all VDR registers are written into.
Bspec: 67636
Fixes: 51390cc0e00a ("drm/i915/mtl: Add Support for C10 PHY message bus and pll programming")
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241216181554.2861381-1-suraj.kandpal@intel.com
(cherry picked from commit f9d418552ba1e3a0e92487ff82eb515dab7516c0)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
|
|
The similar patch in siw is in the link:
https://git.kernel.org/rdma/rdma/c/16b87037b48889
This problem also occurred in RXE. The following analyze this problem.
In the following Call Traces:
"
BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
Read of size 4 at addr ffff8880554640b0 by task kworker/1:4/5295
CPU: 1 UID: 0 PID: 5295 Comm: kworker/1:4 Not tainted
6.12.0-rc3-syzkaller-00399-g9197b73fd7bb #0
Hardware name: Google Compute Engine/Google Compute Engine,
BIOS Google 09/13/2024
Workqueue: infiniband ib_cache_event_task
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
rxe_query_port+0x12d/0x260 drivers/infiniband/sw/rxe/rxe_verbs.c:60
__ib_query_port drivers/infiniband/core/device.c:2111 [inline]
ib_query_port+0x168/0x7d0 drivers/infiniband/core/device.c:2143
ib_cache_update+0x1a9/0xb80 drivers/infiniband/core/cache.c:1494
ib_cache_event_task+0xf3/0x1e0 drivers/infiniband/core/cache.c:1568
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f2/0x390 kernel/kthread.c:389
ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
"
1). In the link [1],
"
infiniband syz2: set down
"
This means that on 839.350575, the event ib_cache_event_task was sent andi
queued in ib_wq.
2). In the link [1],
"
team0 (unregistering): Port device team_slave_0 removed
"
It indicates that before 843.251853, the net device should be freed.
3). In the link [1],
"
BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0
"
This means that on 850.559070, this slab-use-after-free problem occurred.
In all, on 839.350575, the event ib_cache_event_task was sent and queued
in ib_wq,
before 843.251853, the net device veth was freed.
on 850.559070, this event was executed, and the mentioned freed net device
was called. Thus, the above call trace occurred.
[1] https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000
Reported-by: syzbot+4b87489410b4efd181bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4b87489410b4efd181bf
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20241220222325.2487767-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Current implementation of fsl_edma3_attach_pd() does not provide a
cleanup path, resulting in a memory leak. For example,
dev_pm_domain_detach() is not called after dev_pm_domain_attach_by_id(),
and the device link created with the DL_FLAG_STATELESS is not released
explicitly.
Therefore, provide a cleanup function fsl_edma3_detach_pd() and call it
upon failure. Also add a devm_add_action_or_reset() call with this
function after a successful fsl_edma3_attach_pd().
Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Link: https://lore.kernel.org/r/20241221075712.3297200-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
If a large count is provided, it will trigger a warning in bitmap_parse_user.
Also check zero for it.
Cc: stable@vger.kernel.org
Fixes: 9e01c1b74c953 ("cpumask: convert kernel trace functions")
Link: https://lore.kernel.org/20241216073238.2573704-1-lizhi.xu@windriver.com
Reported-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0aecfd34fb878546f3fd
Tested-by: syzbot+0aecfd34fb878546f3fd@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The name member of the struct trace_event_call is assigned with
generated string literals; declare them pointer to read-only.
Reported by clang:
security/landlock/syscalls.c:179:1: warning: initializing 'char *' with an expression of type 'const char[34]' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
179 | SYSCALL_DEFINE3(landlock_create_ruleset,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
180 | const struct landlock_ruleset_attr __user *const, attr,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
181 | const size_t, size, const __u32, flags)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/syscalls.h:226:36: note: expanded from macro 'SYSCALL_DEFINE3'
226 | #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/syscalls.h:234:2: note: expanded from macro 'SYSCALL_DEFINEx'
234 | SYSCALL_METADATA(sname, x, __VA_ARGS__) \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/syscalls.h:184:2: note: expanded from macro 'SYSCALL_METADATA'
184 | SYSCALL_TRACE_ENTER_EVENT(sname); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./include/linux/syscalls.h:151:30: note: expanded from macro 'SYSCALL_TRACE_ENTER_EVENT'
151 | .name = "sys_enter"#sname, \
| ^~~~~~~~~~~~~~~~~
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mickaël Salaün <mic@digikod.net>
Cc: Günther Noack <gnoack@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/20241125105028.42807-1-cgoettsche@seltendoof.de
Fixes: b77e38aa240c3 ("tracing: add event trace infrastructure")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules fix from Petr Pavlu:
"A single fix is present to correct the module vermagic for PREEMPT_RT"
* tag 'modules-6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
preempt: Move PREEMPT_RT before PREEMPT in vermagic.
|
|
The following sysfs entries are reading super block member directly,
which can have a different endian and cause wrong values:
- sys/fs/btrfs/<uuid>/nodesize
- sys/fs/btrfs/<uuid>/sectorsize
- sys/fs/btrfs/<uuid>/clone_alignment
Thankfully those values (nodesize and sectorsize) are always aligned
inside the btrfs_super_block, so it won't trigger unaligned read errors,
just endian problems.
Fix them by using the native cached members instead.
Fixes: df93589a1737 ("btrfs: export more from FS_INFO to sysfs")
CC: stable@vger.kernel.org
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Set squota incompat bit before committing the transaction that enables
the feature.
With the config CONFIG_BTRFS_ASSERT enabled, an assertion
failure occurs regarding the simple quota feature.
[5.596534] assertion failed: btrfs_fs_incompat(fs_info, SIMPLE_QUOTA), in fs/btrfs/qgroup.c:365
[5.597098] ------------[ cut here ]------------
[5.597371] kernel BUG at fs/btrfs/qgroup.c:365!
[5.597946] CPU: 1 UID: 0 PID: 268 Comm: mount Not tainted 6.13.0-rc2-00031-gf92f4749861b #146
[5.598450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[5.599008] RIP: 0010:btrfs_read_qgroup_config+0x74d/0x7a0
[5.604303] <TASK>
[5.605230] ? btrfs_read_qgroup_config+0x74d/0x7a0
[5.605538] ? exc_invalid_op+0x56/0x70
[5.605775] ? btrfs_read_qgroup_config+0x74d/0x7a0
[5.606066] ? asm_exc_invalid_op+0x1f/0x30
[5.606441] ? btrfs_read_qgroup_config+0x74d/0x7a0
[5.606741] ? btrfs_read_qgroup_config+0x74d/0x7a0
[5.607038] ? try_to_wake_up+0x317/0x760
[5.607286] open_ctree+0xd9c/0x1710
[5.607509] btrfs_get_tree+0x58a/0x7e0
[5.608002] vfs_get_tree+0x2e/0x100
[5.608224] fc_mount+0x16/0x60
[5.608420] btrfs_get_tree+0x2f8/0x7e0
[5.608897] vfs_get_tree+0x2e/0x100
[5.609121] path_mount+0x4c8/0xbc0
[5.609538] __x64_sys_mount+0x10d/0x150
The issue can be easily reproduced using the following reproducer:
root@q:linux# cat repro.sh
set -e
mkfs.btrfs -q -f /dev/sdb
mount /dev/sdb /mnt/btrfs
btrfs quota enable -s /mnt/btrfs
umount /mnt/btrfs
mount /dev/sdb /mnt/btrfs
The issue is that when enabling quotas, at btrfs_quota_enable(), we set
BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE at fs_info->qgroup_flags and persist
it in the quota root in the item with the key BTRFS_QGROUP_STATUS_KEY, but
we only set the incompat bit BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA after we
commit the transaction used to enable simple quotas.
This means that if after that transaction commit we unmount the filesystem
without starting and committing any other transaction, or we have a power
failure, the next time we mount the filesystem we will find the flag
BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE set in the item with the key
BTRFS_QGROUP_STATUS_KEY but we will not find the incompat bit
BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA set in the superblock, triggering an
assertion failure at:
btrfs_read_qgroup_config() -> qgroup_read_enable_gen()
To fix this issue, set the BTRFS_FEATURE_INCOMPAT_SIMPLE_QUOTA flag
immediately after setting the BTRFS_QGROUP_STATUS_FLAG_SIMPLE_MODE.
This ensures that both flags are flushed to disk within the same
transaction.
Fixes: 182940f4f4db ("btrfs: qgroup: add new quota mode for simple quotas")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Julian Sun <sunjunchao2870@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
During swap activation we iterate over the extents of a file and we can
have many thousands of them, so we can end up in a busy loop monopolizing
a core. Avoid this by doing a voluntary reschedule after processing each
extent.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
During swap activation we iterate over the extents of a file, then do
several checks for each extent, some of which may take some significant
time such as checking if an extent is shared. Since a file can have
many thousands of extents, this can be a very slow operation and it's
currently not interruptible. I had a bug during development of a previous
patch that resulted in an infinite loop when iterating the extents, so
a core was busy looping and I couldn't cancel the operation, which is very
annoying and requires a reboot. So make the loop interruptible by checking
for fatal signals at the end of each iteration and stopping immediately if
there is one.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When activating a swap file, to determine if an extent is shared we use
can_nocow_extent(), which ends up at btrfs_cross_ref_exist(). That helper
is meant to be quick because it's used in the NOCOW write path, when
flushing delalloc and when doing a direct IO write, however it does return
some false positives, meaning it may indicate that an extent is shared
even if it's no longer the case. For the write path this is fine, we just
do a unnecessary COW operation instead of doing a more rigorous check
which would be too heavy (calling btrfs_is_data_extent_shared()).
However when activating a swap file, the false positives simply result
in a failure, which is confusing for users/applications. One particular
case where this happens is when a data extent only has 1 reference but
that reference is not inlined in the extent item located in the extent
tree - this happens when we create more than 33 references for an extent
and then delete those 33 references plus every other non-inline reference
except one. The function check_committed_ref() assumes that if the size
of an extent item doesn't match the size of struct btrfs_extent_item
plus the size of an inline reference (plus an owner reference in case
simple quotas are enabled), then the extent is shared - that is not the
case however, we can have a single reference but it's not inlined - the
reason we do this is to be fast and avoid inspecting non-inline references
which may be located in another leaf of the extent tree, slowing down
write paths.
The following test script reproduces the bug:
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
NUM_CLONES=50
umount $DEV &> /dev/null
run_test()
{
local sync_after_add_reflinks=$1
local sync_after_remove_reflinks=$2
mkfs.btrfs -f $DEV > /dev/null
#mkfs.xfs -f $DEV > /dev/null
mount $DEV $MNT
touch $MNT/foo
chmod 0600 $MNT/foo
# On btrfs the file must be NOCOW.
chattr +C $MNT/foo &> /dev/null
xfs_io -s -c "pwrite -b 1M 0 1M" $MNT/foo
mkswap $MNT/foo
for ((i = 1; i <= $NUM_CLONES; i++)); do
touch $MNT/foo_clone_$i
chmod 0600 $MNT/foo_clone_$i
# On btrfs the file must be NOCOW.
chattr +C $MNT/foo_clone_$i &> /dev/null
cp --reflink=always $MNT/foo $MNT/foo_clone_$i
done
if [ $sync_after_add_reflinks -ne 0 ]; then
# Flush delayed refs and commit current transaction.
sync -f $MNT
fi
# Remove the original file and all clones except the last.
rm -f $MNT/foo
for ((i = 1; i < $NUM_CLONES; i++)); do
rm -f $MNT/foo_clone_$i
done
if [ $sync_after_remove_reflinks -ne 0 ]; then
# Flush delayed refs and commit current transaction.
sync -f $MNT
fi
# Now use the last clone as a swap file. It should work since
# its extent are not shared anymore.
swapon $MNT/foo_clone_${NUM_CLONES}
swapoff $MNT/foo_clone_${NUM_CLONES}
umount $MNT
}
echo -e "\nTest without sync after creating and removing clones"
run_test 0 0
echo -e "\nTest with sync after creating clones"
run_test 1 0
echo -e "\nTest with sync after removing clones"
run_test 0 1
echo -e "\nTest with sync after creating and removing clones"
run_test 1 1
Running the test:
$ ./test.sh
Test without sync after creating and removing clones
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0017 sec (556.793 MiB/sec and 556.7929 ops/sec)
Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
no label, UUID=a6b9c29e-5ef4-4689-a8ac-bc199c750f02
swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument
swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument
Test with sync after creating clones
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0036 sec (271.739 MiB/sec and 271.7391 ops/sec)
Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
no label, UUID=5e9008d6-1f7a-4948-a1b4-3f30aba20a33
swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument
swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument
Test with sync after removing clones
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0103 sec (96.665 MiB/sec and 96.6651 ops/sec)
Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
no label, UUID=916c2740-fa9f-4385-9f06-29c3f89e4764
Test with sync after creating and removing clones
wrote 1048576/1048576 bytes at offset 0
1 MiB, 1 ops; 0.0031 sec (314.268 MiB/sec and 314.2678 ops/sec)
Setting up swapspace version 1, size = 1020 KiB (1044480 bytes)
no label, UUID=06aab1dd-4d90-49c0-bd9f-3a8db4e2f912
swapon: /mnt/sdi/foo_clone_50: swapon failed: Invalid argument
swapoff: /mnt/sdi/foo_clone_50: swapoff failed: Invalid argument
Fix this by reworking btrfs_swap_activate() to instead of using extent
maps and checking for shared extents with can_nocow_extent(), iterate
over the inode's file extent items and use the accurate
btrfs_is_data_extent_shared().
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When activating the swap file we flush all delalloc and wait for ordered
extent completion, so that we don't miss any delalloc and extents before
we check that the file's extent layout is usable for a swap file and
activate the swap file. We are called with the inode's VFS lock acquired,
so we won't race with buffered and direct IO writes, however we can still
race with memory mapped writes since they don't acquire the inode's VFS
lock. The race window is between flushing all delalloc and locking the
whole file's extent range, since memory mapped writes lock an extent range
with the length of a page.
Fix this by acquiring the inode's mmap lock before we flush delalloc.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When we call btrfs_read_folio() we get an unlocked folio, so it is possible
for a different thread to concurrently modify folio->mapping. We must
check that this hasn't happened once we do have the lock.
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When we call btrfs_read_folio() to bring a folio uptodate, we unlock the
folio. The result of that is that a different thread can modify the
mapping (like remove it with invalidate) before we call folio_lock().
This results in an invalid page and we need to try again.
In particular, if we are relocating concurrently with aborting a
transaction, this can result in a crash like the following:
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 76 PID: 1411631 Comm: kworker/u322:5
Workqueue: events_unbound btrfs_reclaim_bgs_work
RIP: 0010:set_page_extent_mapped+0x20/0xb0
RSP: 0018:ffffc900516a7be8 EFLAGS: 00010246
RAX: ffffea009e851d08 RBX: ffffea009e0b1880 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc900516a7b90 RDI: ffffea009e0b1880
RBP: 0000000003573000 R08: 0000000000000001 R09: ffff88c07fd2f3f0
R10: 0000000000000000 R11: 0000194754b575be R12: 0000000003572000
R13: 0000000003572fff R14: 0000000000100cca R15: 0000000005582fff
FS: 0000000000000000(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000407d00f002 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x78/0xc0
? page_fault_oops+0x2a8/0x3a0
? __switch_to+0x133/0x530
? wq_worker_running+0xa/0x40
? exc_page_fault+0x63/0x130
? asm_exc_page_fault+0x22/0x30
? set_page_extent_mapped+0x20/0xb0
relocate_file_extent_cluster+0x1a7/0x940
relocate_data_extent+0xaf/0x120
relocate_block_group+0x20f/0x480
btrfs_relocate_block_group+0x152/0x320
btrfs_relocate_chunk+0x3d/0x120
btrfs_reclaim_bgs_work+0x2ae/0x4e0
process_scheduled_works+0x184/0x370
worker_thread+0xc6/0x3e0
? blk_add_timer+0xb0/0xb0
kthread+0xae/0xe0
? flush_tlb_kernel_range+0x90/0x90
ret_from_fork+0x2f/0x40
? flush_tlb_kernel_range+0x90/0x90
ret_from_fork_asm+0x11/0x20
</TASK>
This occurs because cleanup_one_transaction() calls
destroy_delalloc_inodes() which calls invalidate_inode_pages2() which
takes the folio_lock before setting mapping to NULL. We fail to check
this, and subsequently call set_extent_mapping(), which assumes that
mapping != NULL (in fact it asserts that in debug mode)
Note that the "fixes" patch here is not the one that introduced the
race (the very first iteration of this code from 2009) but a more recent
change that made this particular crash happen in practice.
Fixes: e7f1326cc24e ("btrfs: set page extent mapped after read_folio in relocate_one_page")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When a COWing a tree block, at btrfs_cow_block(), and we have the
tracepoint trace_btrfs_cow_block() enabled and preemption is also enabled
(CONFIG_PREEMPT=y), we can trigger a use-after-free in the COWed extent
buffer while inside the tracepoint code. This is because in some paths
that call btrfs_cow_block(), such as btrfs_search_slot(), we are holding
the last reference on the extent buffer @buf so btrfs_force_cow_block()
drops the last reference on the @buf extent buffer when it calls
free_extent_buffer_stale(buf), which schedules the release of the extent
buffer with RCU. This means that if we are on a kernel with preemption,
the current task may be preempted before calling trace_btrfs_cow_block()
and the extent buffer already released by the time trace_btrfs_cow_block()
is called, resulting in a use-after-free.
Fix this by moving the trace_btrfs_cow_block() from btrfs_cow_block() to
btrfs_force_cow_block() before the COWed extent buffer is freed.
This also has a side effect of invoking the tracepoint in the tree defrag
code, at defrag.c:btrfs_realloc_node(), since btrfs_force_cow_block() is
called there, but this is fine and it was actually missing there.
Reported-by: syzbot+8517da8635307182c8a5@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/6759a9b9.050a0220.1ac542.000d.GAE@google.com/
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Fix a use-after-free in the I/O completion path for encoded reads by
using a completion instead of a wait_queue for synchronizing the
destruction of 'struct btrfs_encoded_read_private'.
Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever::
- Revert one v6.13 fix at the author's request (to be done differently)
- Fix a minor problem with recent NFSv4.2 COPY enhancements
- Fix an NFSv4.0 callback bug introduced in the v6.13 merge window
* tag 'nfsd-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: restore callback functionality for NFSv4.0
NFSD: fix management of pending async copies
nfsd: Revert "nfsd: release svc_expkey/svc_export with rcu_work"
|
|
Daniel Machon says:
====================
net: lan969x: add RGMII support
== Description:
This series is the fourth of a multi-part series, that prepares and adds
support for the new lan969x switch driver.
The upstreaming efforts is split into multiple series (might change a
bit as we go along):
1) Prepare the Sparx5 driver for lan969x (merged)
2) Add support for lan969x (same basic features as Sparx5
provides excl. FDMA and VCAP, merged).
3) Add lan969x VCAP functionality (merged).
--> 4) Add RGMII support.
5) Add FDMA support.
== RGMII support:
The lan969x switch device includes two RGMII port interfaces (port 28
and 29) supporting data speeds of 1 Gbps, 100 Mbps and 10 Mbps.
== Patch breakdown:
Patch #1 does some preparation work.
Patch #2 adds new function: is_port_rgmii() to the match data ops.
Patch #3 uses the is_port_rgmii() in a number of places.
Patch #4 makes sure that we do not configure an RGMII device as a
low-speed device, when doing a port config.
Patch #5 makes sure we only return the PCS if the port mode requires
it.
Patch #6 adds checks for RGMII PHY modes in sparx5_verify_speeds().
Patch #7 adds registers required to configure RGMII.
Patch #8 adds RGMII implementation.
Patch #9 documents RGMII delays in the dt-bindings.
Details are in the commit description of the individual patches
v4: https://lore.kernel.org/20241213-sparx5-lan969x-switch-driver-4-v4-0-d1a72c9c4714@microchip.com
v3: https://lore.kernel.org/20241118-sparx5-lan969x-switch-driver-4-v3-0-3cefee5e7e3a@microchip.com
v2: https://lore.kernel.org/20241113-sparx5-lan969x-switch-driver-4-v2-0-0db98ac096d1@microchip.com
v1: https://lore.kernel.org/20241106-sparx5-lan969x-switch-driver-4-v1-0-f7f7316436bd@microchip.com
====================
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-0-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lan969x switch device supports two RGMII port interfaces that can be
configured for MAC level rx and tx delays. Document two new properties
{rx,tx}-internal-delay-ps in the bindings, used to select these delays.
Tested-by: Robert Marko <robert.marko@sartura.hr>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-9-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lan969x switch device includes two RGMII port interfaces (port 28
and 29) supporting data speeds of 1 Gbps, 100 Mbps and 10 Mbps. MAC
level delays are configurable through the HSIO_WRAP target, by choosing
a phase shift selector, corresponding to a certain time delay in nano
seconds.
Add new file: lan969x_rgmii.c that contains the implementation for
configuring the RGMII port devices. MAC level delays are configured
using the "{rx,tx}-internal-delay-ps" properties. These properties must
be specified independently of the phy-mode. If missing, or set to zero,
the MAC will not apply any delay.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-8-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Configuration of RGMII is done by configuring the GPIO and clock
settings in the HSIOWRAP target, and configuring the RGMII port devices
in the DEVRGMII target. Both targets contain registers replicated for
the number of RGMII port devices, which is two.
Add said targets and register macros required to configure RGMII.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-7-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When doing a port config, we verify the port speed against the PHY mode
and supported speeds of that PHY mode. Add checks for the four RGMII phy
modes: RGMII, RGMII_ID, RGMII_TXID and RGMII_RXID.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-6-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RGMII ports have no PCS to configure. Make sure we only return the
PCS for port modes that require it.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-5-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When doing a port config, we configure low-speed port devices, among
other things. We have a check to ensure, that the device is indeed a
low-speed device, an not a high-speed device. Add an additional check,
to ensure that the device is not an RGMII device.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-4-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we can check if a given port is an RGMII port, use it in the
following cases:
- To set RGMII PHY modes for RGMII port devices.
- To avoid checking for a SerDes node in the devicetree, when the port
is an RGMII port.
- To bail out of sparx5_port_init() when the common configuration is
done.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-3-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lan969x device contains two RGMII port interfaces, sitting at port
28 and 29. Add function: is_port_rgmii() to the match data ops, that
checks if a given port is an RGMII port or not. For Sparx5, this
function always returns false.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-2-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The sparx5_port_init() does initial configuration of a variety of
different features and options for each port. Some are shared for all
types of devices, some are not. As it is now, common configuration is
done after configuration of low-speed devices. This will not work when
adding RGMII support in a subsequent patch.
In preparation for lan969x RGMII support, move a block of code, that
configures 2g5 devices, down. This ensures that the configuration common
to all devices is done before configuration of 2g5, 5g, 10g and 25g
devices.
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Tested-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Link: https://patch.msgid.link/20241220-sparx5-lan969x-switch-driver-4-v5-1-fa8ba5dff732@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Tariq Toukan says:
====================
mlx5 misc fixes 2024-12-20
This small patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.
====================
Link: https://patch.msgid.link/20241220081505.1286093-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the cited commit, when changing from switchdev to legacy mode,
uplink representor's netdev is kept, and its profile is replaced with
nic profile, so netdev is detached from old profile, then attach to
new profile.
During profile change, the hardware resources allocated by the old
profile will be cleaned up. However, the cleanup is relying on the
related kernel modules. And they may need to flush themselves first,
which is triggered by netdev events, for example, NETDEV_UNREGISTER.
However, netdev is kept, or netdev_register is called after the
cleanup, which may cause troubles because the resources are still
referred by kernel modules.
The same process applies to all the caes when uplink is leaving
switchdev mode, including devlink eswitch mode set legacy, driver
unload and devlink reload. For the first one, it can be blocked and
returns failure to users, whenever possible. But it's hard for the
others. Besides, the attachment to nic profile is unnecessary as the
netdev will be unregistered anyway for such cases.
So in this patch, the original behavior is kept only for devlink
eswitch set mode legacy. For the others, moves netdev unregistration
before the profile change.
Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241220081505.1286093-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During driver unload, unregister_netdev is called after unloading
vport rep. So, the mlx5e_rep_priv is already freed while trying to get
rpriv->netdev, or walk rpriv->tc_ht, which results in use-after-free.
So add the checking to make sure access the data of vport rep which is
still loaded.
Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241220081505.1286093-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In MACsec, it is possible to create multiple active TX SAs on a SC,
but only one such SA can be used at a time for transmission. This SA
is selected through the encoding_sa link parameter.
When there are 2 or more active TX SAs configured (encoding_sa=0):
ip macsec add macsec0 tx sa 0 pn 1 on key 00 <KEY1>
ip macsec add macsec0 tx sa 1 pn 1 on key 00 <KEY2>
... the traffic should be still sent via TX SA 0 as the encoding_sa was
not changed. However, the driver ignores the encoding_sa and overrides
it to SA 1 by installing the flow steering id of the newly created TX SA
into the SCI -> flow steering id hash map. The future packet tx
descriptors will point to the incorrect flow steering rule (SA 1).
This patch fixes the issue by avoiding the creation of the flow steering
rule for an active TX SA that is not the encoding_sa. The driver side
tx_sa object and the FW side macsec object are still created. When the
encoding_sa link parameter is changed to another active TX SA, only the
new flow steering rule will be created in the mlx5e_macsec_upd_txsa()
handler.
Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241220081505.1286093-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When creating a software steering completion queue (CQ), an arbitrary
MSIX vector n is selected. This results in the CQ sharing the same
Ethernet traffic channel n associated with the chosen vector. However,
the value of n is often unpredictable, which can introduce complications
for interrupt monitoring and verification tools.
Moreover, SW steering uses polling rather than event-driven interrupts.
Therefore, there is no need to select any MSIX vector other than the
existing vector 0 for CQ creation.
In light of these factors, and to enhance predictability, we modify the
code to consistently select MSIX vector 0 for CQ creation.
Fixes: 297cccebdc5a ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241220081505.1286093-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ixgbe, ixgbevf: Add support for Intel(R) E610 device
Piotr Kwapulinski says:
Add initial support for Intel(R) E610 Series of network devices. The E610
is based on X550 but adds firmware managed link, enhanced security
capabilities and support for updated server manageability.
* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ixgbevf: Add support for Intel(R) E610 device
PCI: Add PCI_VDEVICE_SUB helper macro
ixgbe: Enable link management in E610 device
ixgbe: Clean up the E610 link management related code
ixgbe: Add ixgbe_x540 multiple header inclusion protection
ixgbe: Add support for EEPROM dump in E610 device
ixgbe: Add support for NVM handling in E610 device
ixgbe: Add link management support for E610 device
ixgbe: Add support for E610 device capabilities detection
ixgbe: Add support for E610 FW Admin Command Interface
====================
Link: https://patch.msgid.link/20241220201521.3363985-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An issue was present in the initial driver implementation. The driver
read the power status of all channels before toggling the bit of the
desired one. Using the power status register as a base value introduced
a problem, because only the bit corresponding to the concerned channel ID
should be set in the write-only power enable register. This led to cases
where disabling power for one channel also powered off other channels.
This patch removes the power status read and ensures the value is
limited to the bit matching the channel index of the PI.
Fixes: 20e6d190ffe1 ("net: pse-pd: Add TI TPS23881 PSE controller driver")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20241220170400.291705-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The __ethtool_get_ts_info function can be called with or without the
rtnl lock held. When the rtnl lock is not held, using rtnl_dereference()
triggers a warning due to the lack of lock context.
Add an rcu_read_lock() to ensure the lock is acquired and to maintain
synchronization.
Reported-by: syzbot+a344326c05c98ba19682@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/676147f8.050a0220.37aaf.0154.GAE@google.com/
Fixes: b9e3f7dc9ed9 ("net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topology")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241220083741.175329-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Host Port (i.e. CPU facing port) of CPSW receives traffic from Linux
via TX DMA Channels which are Hardware Queues consisting of traffic
categorized according to their priority. The Host Port is configured to
dequeue traffic from these Hardware Queues on the basis of priority i.e.
as long as traffic exists on a Hardware Queue of a higher priority, the
traffic on Hardware Queues of lower priority isn't dequeued. An alternate
operation is also supported wherein traffic can be dequeued by the Host
Port in a Round-Robin manner.
Until commit under Fixes, the am65-cpsw driver enabled a single TX DMA
Channel, due to which, unless modified by user via "ethtool", all traffic
from Linux is transmitted on DMA Channel 0. Therefore, configuring
the Host Port for priority based dequeuing or Round-Robin operation
is identical since there is a single DMA Channel.
Since commit under Fixes, all 8 TX DMA Channels are enabled by default.
Additionally, the default "tc mapping" doesn't take into account
the possibility of different traffic profiles which various users
might have. This results in traffic starvation at the Host Port
due to the priority based dequeuing which has been enabled by default
since the inception of the driver. The traffic starvation triggers
NETDEV WATCHDOG timeout for all TX DMA Channels that haven't been serviced
due to the presence of traffic on the higher priority TX DMA Channels.
Fix this by defaulting to Round-Robin dequeuing at the Host Port, which
shall ensure that traffic is dequeued from all TX DMA Channels irrespective
of the traffic profile. This will address the NETDEV WATCHDOG timeouts.
At the same time, users can still switch from Round-Robin to Priority
based dequeuing at the Host Port with the help of the "p0-rx-ptype-rrobin"
private flag of "ethtool". Users are expected to setup an appropriate
"tc mapping" that suits their traffic profile when switching to priority
based dequeuing at the Host Port.
Fixes: be397ea3473d ("net: ethernet: am65-cpsw: Set default TX channels to maximum")
Cc: <stable@vger.kernel.org>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://patch.msgid.link/20241220075618.228202-1-s-vadapalli@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
eth: fbnic: support basic RSS config and setting channel count
Add support for basic RSS config (indirection table, key get and set),
and changing the number of channels.
# ./ksft-net-drv/run_kselftest.sh -t drivers/net/hw:rss_ctx.py
TAP version 13
1..1
# timeout set to 0
# selftests: drivers/net/hw: rss_ctx.py
# KTAP version 1
# 1..15
# ok 1 rss_ctx.test_rss_key_indir
# ok 2 rss_ctx.test_rss_queue_reconfigure
# ok 3 rss_ctx.test_rss_resize
# ok 4 rss_ctx.test_hitless_key_update
.. the rest of the tests are for additional contexts so they
get skipped..
The slicing of the patches (and bugs) are mine, but I'm keeping
Alex as the author on the patches where he wrote 100% of the code.
====================
Link: https://patch.msgid.link/20241220025241.1522781-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|