summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-07-05nfp: clean mc addresses in application firmware when closing portYinjun Zhang
When moving devices from one namespace to another, mc addresses are cleaned in software while not removed from application firmware. Thus the mc addresses are remained and will cause resource leak. Now use `__dev_mc_unsync` to clean mc addresses when closing port. Fixes: e20aa071cd95 ("nfp: fix schedule in atomic context when sync mc address") Cc: stable@vger.kernel.org Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com> Acked-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Message-ID: <20230705052818.7122-1-louis.peens@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-05Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-07-05 We've added 2 non-merge commits during the last 1 day(s) which contain a total of 3 files changed, 16 insertions(+), 4 deletions(-). The main changes are: 1) Fix BTF to warn but not returning an error for a NULL BTF to still be able to load modules under CONFIG_DEBUG_INFO_BTF, from SeongJae Park. 2) Fix xsk sockets to honor SO_BINDTODEVICE in bind(), from Ilya Maximets. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: xsk: Honor SO_BINDTODEVICE on bind bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set() ==================== Link: https://lore.kernel.org/r/20230705171716.6494-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-05net/mlx5e: RX, Fix page_pool page fragment tracking for XDPDragos Tatulea
Currently mlx5e releases pages directly to the page_pool for XDP_TX and does page fragment counting for XDP_REDIRECT. RX pages from the page_pool are leaking on XDP_REDIRECT because the xdp core will release only one fragment out of MLX5E_PAGECNT_BIAS_MAX and subsequently the page is marked as "skip release" which avoids the driver release. A fix would be to take an extra fragment for XDP_REDIRECT and not set the "skip release" bit so that the release on the driver side can handle the remaining bias fragments. But this would be a shortsighted solution. Instead, this patch converges the two XDP paths (XDP_TX and XDP_REDIRECT) to always do fragment tracking. The "skip release" bit is no longer necessary for XDP. Fixes: 6f5742846053 ("net/mlx5e: RX, Enable skb page recycling through the page_pool") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5: Query hca_cap_2 only when supportedMaher Sanalla
On vport enable, where fw's hca caps are queried, the driver queries hca_caps_2 without checking if fw truly supports them, causing a false failure of vfs vport load and blocking SRIOV enablement on old devices such as CX4 where hca_caps_2 support is missing. Thus, add a check for the said caps support before accessing them. Fixes: e5b9642a33be ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5e: TC, CT: Offload ct clear only onceYevgeny Kliteynik
Non-clear CT action causes a flow rule split, while CT clear action doesn't and is just a header-rewrite to the current flow rule. But ct offload is done in post_parse and is per ct action instance, so ct clear offload is parsed multiple times, while its deleted once. Fix this by post_parsing the ct action only once per flow attribute (which is per flow rule) by using a offloaded ct_attr flag. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5e: Check for NOT_READY flag state after lockingVlad Buslov
Currently the check for NOT_READY flag is performed before obtaining the necessary lock. This opens a possibility for race condition when the flow is concurrently removed from unready_flows list by the workqueue task, which causes a double-removal from the list and a crash[0]. Fix the issue by moving the flag check inside the section protected by uplink_priv->unready_flows_lock mutex. [0]: [44376.389654] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP [44376.391665] CPU: 7 PID: 59123 Comm: tc Not tainted 6.4.0-rc4+ #1 [44376.392984] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [44376.395342] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.396857] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06 [44376.399167] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246 [44376.399680] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00 [44376.400337] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0 [44376.401001] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001 [44376.401663] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000 [44376.402342] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000 [44376.402999] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000 [44376.403787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44376.404343] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0 [44376.405004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44376.405665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [44376.406339] Call Trace: [44376.406651] <TASK> [44376.406939] ? die_addr+0x33/0x90 [44376.407311] ? exc_general_protection+0x192/0x390 [44376.407795] ? asm_exc_general_protection+0x22/0x30 [44376.408292] ? mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.408876] __mlx5e_tc_del_fdb_peer_flow+0xbc/0xe0 [mlx5_core] [44376.409482] mlx5e_tc_del_flow+0x42/0x210 [mlx5_core] [44376.410055] mlx5e_flow_put+0x25/0x50 [mlx5_core] [44376.410529] mlx5e_delete_flower+0x24b/0x350 [mlx5_core] [44376.411043] tc_setup_cb_reoffload+0x22/0x80 [44376.411462] fl_reoffload+0x261/0x2f0 [cls_flower] [44376.411907] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core] [44376.412481] ? mlx5e_rep_indr_setup_ft_cb+0x160/0x160 [mlx5_core] [44376.413044] tcf_block_playback_offloads+0x76/0x170 [44376.413497] tcf_block_unbind+0x7b/0xd0 [44376.413881] tcf_block_setup+0x17d/0x1c0 [44376.414269] tcf_block_offload_cmd.isra.0+0xf1/0x130 [44376.414725] tcf_block_offload_unbind+0x43/0x70 [44376.415153] __tcf_block_put+0x82/0x150 [44376.415532] ingress_destroy+0x22/0x30 [sch_ingress] [44376.415986] qdisc_destroy+0x3b/0xd0 [44376.416343] qdisc_graft+0x4d0/0x620 [44376.416706] tc_get_qdisc+0x1c9/0x3b0 [44376.417074] rtnetlink_rcv_msg+0x29c/0x390 [44376.419978] ? rep_movs_alternative+0x3a/0xa0 [44376.420399] ? rtnl_calcit.isra.0+0x120/0x120 [44376.420813] netlink_rcv_skb+0x54/0x100 [44376.421192] netlink_unicast+0x1f6/0x2c0 [44376.421573] netlink_sendmsg+0x232/0x4a0 [44376.421980] sock_sendmsg+0x38/0x60 [44376.422328] ____sys_sendmsg+0x1d0/0x1e0 [44376.422709] ? copy_msghdr_from_user+0x6d/0xa0 [44376.423127] ___sys_sendmsg+0x80/0xc0 [44376.423495] ? ___sys_recvmsg+0x8b/0xc0 [44376.423869] __sys_sendmsg+0x51/0x90 [44376.424226] do_syscall_64+0x3d/0x90 [44376.424587] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [44376.425046] RIP: 0033:0x7f045134f887 [44376.425403] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [44376.426914] RSP: 002b:00007ffd63a82b98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [44376.427592] RAX: ffffffffffffffda RBX: 000000006481955f RCX: 00007f045134f887 [44376.428195] RDX: 0000000000000000 RSI: 00007ffd63a82c00 RDI: 0000000000000003 [44376.428796] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [44376.429404] R10: 00007f0451208708 R11: 0000000000000246 R12: 0000000000000001 [44376.430039] R13: 0000000000409980 R14: 000000000047e538 R15: 0000000000485400 [44376.430644] </TASK> [44376.430907] Modules linked in: mlx5_ib mlx5_core act_mirred act_tunnel_key cls_flower vxlan dummy sch_ingress openvswitch nsh rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_g ss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] [44376.433936] ---[ end trace 0000000000000000 ]--- [44376.434373] RIP: 0010:mlx5e_tc_del_fdb_flow+0xb3/0x340 [mlx5_core] [44376.434951] Code: 00 48 8b b8 68 ce 02 00 e8 8a 4d 02 00 4c 8d a8 a8 01 00 00 4c 89 ef e8 8b 79 88 e1 48 8b 83 98 06 00 00 48 8b 93 90 06 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 90 06 [44376.436452] RSP: 0018:ffff88812cc97570 EFLAGS: 00010246 [44376.436924] RAX: dead000000000122 RBX: ffff8881088e3800 RCX: ffff8881881bac00 [44376.437530] RDX: dead000000000100 RSI: ffff88812cc97500 RDI: ffff8881242f71b0 [44376.438179] RBP: ffff88811cbb0940 R08: 0000000000000400 R09: 0000000000000001 [44376.438786] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88812c944000 [44376.439393] R13: ffff8881242f71a8 R14: ffff8881222b4000 R15: 0000000000000000 [44376.439998] FS: 00007f0451104800(0000) GS:ffff88852cb80000(0000) knlGS:0000000000000000 [44376.440714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44376.441225] CR2: 0000000000489108 CR3: 0000000123a79003 CR4: 0000000000370ea0 [44376.441843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [44376.442471] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: ad86755b18d5 ("net/mlx5e: Protect unready flows with dedicated lock") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5: Register a unique thermal zone per deviceSaeed Mahameed
Prior to this patch only one "mlx5" thermal zone could have been registered regardless of the number of individual mlx5 devices in the system. To fix this setup a unique name per device to register its own thermal zone. In order to not register a thermal zone for a virtual device (VF/SF) add a check for PF device type. The new name is a concatenation between "mlx5_" and "<PCI_DEV_BDF>", which will also help associating a thermal zone with its PCI device. $ lspci | grep ConnectX 00:04.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] 00:05.0 Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] $ cat /sys/devices/virtual/thermal/thermal_zone0/type mlx5_0000:00:04.0 $ cat /sys/devices/virtual/thermal/thermal_zone1/type mlx5_0000:00:05.0 Fixes: c1fef618d611 ("net/mlx5: Implement thermal zone") CC: Sandipan Patra <spatra@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5e: RX, Fix flush and close release flow of regular rq for legacy rqDragos Tatulea
Regular (non-XSK) RQs get flushed on XSK setup and re-activated on XSK close. If the same regular RQ is closed (a config change for example) soon after the XSK close, a double release occurs because the missing wqes get released a second time. Fixes: 3f93f82988bc ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5e: fix memory leak in mlx5e_ptp_openZhengchao Shao
When kvzalloc_node or kvzalloc failed in mlx5e_ptp_open, the memory pointed by "c" or "cparams" is not freed, which can lead to a memory leak. Fix by freeing the array in the error path. Fixes: 145e5637d941 ("net/mlx5e: Add TX PTP port object support") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_createZhengchao Shao
The memory pointed to by the fs->any pointer is not freed in the error path of mlx5e_fs_tt_redirect_any_create, which can lead to a memory leak. Fix by freeing the memory in the error path, thereby making the error path identical to mlx5e_fs_tt_redirect_any_destroy(). Fixes: 0f575c20bf06 ("net/mlx5e: Introduce Flow Steering ANY API") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05net/mlx5e: fix double free in mlx5e_destroy_flow_tableZhengchao Shao
In function accel_fs_tcp_create_groups(), when the ft->g memory is successfully allocated but the 'in' memory fails to be allocated, the memory pointed to by ft->g is released once. And in function accel_fs_tcp_create_table, mlx5e_destroy_flow_table is called to release the memory pointed to by ft->g again. This will cause double free problem. Fixes: c062d52ac24c ("net/mlx5e: Receive flow steering framework for accelerated TCP flows") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-07-05Merge tag 'soundwire-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire updates from Vinod Koul: - Stream handling and slave alert handling - Qualcomm Soundwire v2.0.0 controller support - Intel ACE2.x initial support and code reorganization * tag 'soundwire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (55 commits) soundwire: stream: Make master_list ordered to prevent deadlocks soundwire: bus: Prevent lockdep asserts when stream has multiple buses soundwire: qcom: fix storing port config out-of-bounds soundwire: intel_ace2x: fix SND_SOC_SOF_HDA_MLINK dependency soundwire: debugfs: Add missing SCP registers soundwire: stream: Remove unnecessary gotos soundwire: stream: Invert logic on runtime alloc flags soundwire: stream: Remove unneeded checks for NULL bus soundwire: bandwidth allocation: Remove pointless variable soundwire: cadence: revisit parity injection soundwire: intel/cadence: update hardware reset sequence soundwire: intel_bus_common: enable interrupts last soundwire: intel_bus_common: update error log soundwire: amd: Improve error message in remove callback soundwire: debugfs: fix unbalanced pm_runtime_put() soundwire: qcom: fix unbalanced pm_runtime_put() soundwire: qcom: set clk stop need reset flag at runtime soundwire: qcom: add software workaround for bus clash interrupt assertion soundwire: qcom: wait for fifo to be empty before suspend soundwire: qcom: drop unused struct qcom_swrm_ctrl members ...
2023-07-05Merge tag 'media/v6.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - Lots of improvement at atomisp driver, which is starting to look in good shape - Mediatek vcodec driver has gained support for av1 and hevc stateless codecs - New sensor driver: ov01a10 - verisilicon driver has gained AV1 entropy helpers - tegra-video has gained support for Tegra20 parallel input - dvb core has gained an extra property to better support DVB-S2X - as usual, lots of cleanups, fixes and improvements on media drivers * tag 'media/v6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (253 commits) media: wl128x: fix a clang warning media: dvb: mb86a20s: get rid of a clang-15 warning media: cec: i2c: ch7322: also select REGMAP media: add HAS_IOPORT dependencies media: tc358746: select CONFIG_GENERIC_PHY media: mediatek: vcodec: Add dbgfs help function media: mediatek: vcodec: Add encode to support dbgfs media: mediatek: vcodec: Change dbgfs interface to support encode media: mediatek: vcodec: Get each instance format type media: mediatek: vcodec: Get each context resolution information media: mediatek: vcodec: Add a debugfs file to get different useful information media: mediatek: vcodec: Add debug params to control different log level media: mediatek: vcodec: Add debugfs interface to get debug information media: mediatek: vcodec: support stateless AV1 decoder media: verisilicon: Conditionally ignore native formats media: verisilicon: Enable AV1 decoder on rk3588 media: verisilicon: Add film grain feature to AV1 driver media: verisilicon: Add Rockchip AV1 decoder media: verisilicon: Add AV1 entropy helpers media: verisilicon: Compute motion vectors size for AV1 frames ...
2023-07-05igc: Fix TX Hang issue when QBV Gate is closedMuhammad Husaini Zulkifli
If a user schedules a Gate Control List (GCL) to close one of the QBV gates while also transmitting a packet to that closed gate, TX Hang will be happen. HW would not drop any packet when the gate is closed and keep queuing up in HW TX FIFO until the gate is re-opened. This patch implements the solution to drop the packet for the closed gate. This patch will also reset the adapter to perform SW initialization for each 1st Gate Control List (GCL) to avoid hang. This is due to the HW design, where changing to TSN transmit mode requires SW initialization. Intel Discrete I225/6 transmit mode cannot be changed when in dynamic mode according to Software User Manual Section 7.5.2.1. Subsequent Gate Control List (GCL) operations will proceed without a reset, as they already are in TSN Mode. Step to reproduce: DUT: 1) Configure GCL List with certain gate close. BASE=$(date +%s%N) tc qdisc replace dev $IFACE parent root handle 100 taprio \ num_tc 4 \ map 0 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 \ base-time $BASE \ sched-entry S 0x8 500000 \ sched-entry S 0x4 500000 \ flags 0x2 2) Transmit the packet to closed gate. You may use udp_tai application to transmit UDP packet to any of the closed gate. ./udp_tai -i <interface> -P 100000 -p 90 -c 1 -t <0/1> -u 30004 Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Co-developed-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Chwee Lin Choong <chwee.lin.choong@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05Merge tag 'trace-tools-v6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tooling updates from Steven Rostedt: - Add cgroup support for rtla via the -C option - Add --house-keeping option that tells rtla where to place the housekeeping threads - Have rtla/timerlat have its own tracing instance instead of using the top level tracing instance that is the default for other tracing users to use - Add auto analysis to timerlat_hist - Have rtla start the tracers after creating the instances - Reduce rtla hwnoise down to 75% from 100% as it runs with preemption disabled and can cause system instability at 100% - Add support to run timerlat_top and timerlat_hist threads in user-space instead of just using the kernel tasks - Some minor clean ups and documentation changes * tag 'trace-tools-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: Add tools/rtla timerlat -u option documentation rtla/timerlat_hist: Add timerlat user-space support rtla/timerlat_top: Add timerlat user-space support rtla/hwnoise: Reduce runtime to 75% rtla: Start the tracers after creating all instances rtla/timerlat_hist: Add auto-analysis support rtla/timerlat: Give timerlat auto analysis its own instance rtla: Automatically move rtla to a house-keeping cpu rtla: Change monitored_cpus from char * to cpu_set_t rtla: Add --house-keeping option rtla: Add -C cgroup support
2023-07-05Merge tag 'parisc-for-6.5-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull more parisc architecture updates from Helge Deller: - Fix all compiler warnings in arch/parisc and drivers/parisc when compiled with W=1 * tag 'parisc-for-6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: syscalls: Avoid compiler warnings with W=1 parisc: math-emu: Avoid compiler warnings with W=1 parisc: Raise minimal GCC version to 12.0.0 parisc: unwind: Avoid missing prototype warning for handle_interruption() parisc: smp: Add declaration for start_cpu_itimer() parisc: pdt: Get prototype for arch_report_meminfo()
2023-07-05igc: Remove delay during TX ring configurationMuhammad Husaini Zulkifli
Remove unnecessary delay during the TX ring configuration. This will cause delay, especially during link down and link up activity. Furthermore, old SKUs like as I225 will call the reset_adapter to reset the controller during TSN mode Gate Control List (GCL) setting. This will add more time to the configuration of the real-time use case. It doesn't mentioned about this delay in the Software User Manual. It might have been ported from legacy code I210 in the past. Fixes: 13b5b7fd6a4a ("igc: Add support for Tx/Rx rings") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05igc: Add condition for qbv_config_change_errors counterMuhammad Husaini Zulkifli
Add condition to increase the qbv counter during taprio qbv configuration only. There might be a case when TC already been setup then user configure the ETF/CBS qdisc and this counter will increase if no condition above. Fixes: ae4fe4698300 ("igc: Add qbv_config_change_errors counter") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05sh: Provide unxlate_dev_mem_ptr() in asm/io.hGuenter Roeck
The unxlate_dev_mem_ptr() function has no prototype on the sh architecture which does not include asm-generic/io.h. This results in the following build failure: drivers/char/mem.c: In function 'read_mem': drivers/char/mem.c:164:25: error: implicit declaration of function 'unxlate_dev_mem_ptr' This compile error is now seen because commit 99b619b37ae1 ("mips: provide unxlate_dev_mem_ptr() in asm/io.h") removed the weak function which was previously in place to handle this problem. Add a trivial macro to the sh header to provide the now missing dummy function. Fixes: 99b619b37ae1 ("mips: provide unxlate_dev_mem_ptr() in asm/io.h") Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230704190144.2888679-1-linux@roeck-us.net Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: dma: Correct the number of DMA channels for SH7709Artur Rojek
According to the hardware manual [1], the DMAC found in the SH7709 SoC features only 4 channels. While at it, also sort the existing targets. [1] https://www.renesas.com/us/en/document/mah/sh7709s-group-hardware-manual (p. 373) Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230527164452.64797-4-contact@artur-rojek.eu Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: dma: Drop incorrect SH_DMAC_BASE1 definition for SH4Artur Rojek
None of the supported SH4 family SoCs features a second DMAC module. As this definition negatively impacts DMA channel calculation for the above targets, remove it from the code. Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230527164452.64797-3-contact@artur-rojek.eu Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: dma: Fix DMA channel offset calculationArtur Rojek
Various SoCs of the SH3, SH4 and SH4A family, which use this driver, feature a differing number of DMA channels, which can be distributed between up to two DMAC modules. The existing implementation fails to correctly accommodate for all those variations, resulting in wrong channel offset calculations and leading to kernel panics. Rewrite dma_base_addr() in order to properly calculate channel offsets in a DMAC module. Fix dmaor_read_reg() and dmaor_write_reg(), so that the correct DMAC module base is selected for the DMAOR register. Fixes: 7f47c7189b3e8f19 ("sh: dma: More legacy cpu dma chainsawing.") Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230527164452.64797-2-contact@artur-rojek.eu Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: Remove compiler flag duplicationMasahiro Yamada
Every compiler flag added by arch/sh/Makefile is passed to the compiler twice: $(KBUILD_CPPFLAGS) + $(KBUILD_CFLAGS) is used for compiling *.c $(KBUILD_CPPFLAGS) + $(KBUILD_AFLAGS) is used for compiling *.S Given the above, adding $(cflags-y) to all of KBUILD_{CPP/C/A}FLAGS ends up with duplication. Add -I options to $(KBUILD_CPPFLAGS), and the rest of $(cflags-y) to KBUILD_{C,A}FLAGS. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230219141555.2308306-4-masahiroy@kernel.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: Refactor header include path additionMasahiro Yamada
Shorten the code. No functional change intended. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230219141555.2308306-3-masahiroy@kernel.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: Move build rule for cchips/hd6446x/ to arch/sh/KbuildMasahiro Yamada
This is the last user of core-y in arch/sh. Use the standard obj-y syntax. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230219141555.2308306-2-masahiroy@kernel.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05sh: Fix -Wmissing-include-dirs warnings for various platformsMasahiro Yamada
The 0day bot reports a lot of warnings (or errors due to CONFIG_WERROR) like this: cc1: error: arch/sh/include/mach-hp6xx: No such file or directory [-Werror=missing-include-dirs] Indeed, arch/sh/include/mach-hp6xx does not exist. While -Wmissing-include-dirs is only a W=1 warning, it may be annoying when CONFIG_BTRFS_FS is enabled because fs/btrfs/Makefile unconditionally adds this warning option. arch/sh/Makefile defines machdir-y for two purposes: - Build platform code in arch/sh/boards/mach-*/ - Add arch/sh/include/mach-*/ to the header search path For the latter, some platforms use arch/sh/include/mach-common/ instead of having its own arch/sh/include/mach-*/. Drop unneeded machdir-y to omit non-existing include directories. To build arch/sh/boards/mach-*/, use the standard obj-y syntax in arch/sh/boards/Makefile. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302190641.30VVXnPb-lkp@intel.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230219141555.2308306-1-masahiroy@kernel.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-05gup: make the stack expansion warning a bit more targetedLinus Torvalds
I added a warning about about GUP no longer expanding the stack in commit a425ac5365f6 ("gup: add warning if some caller would seem to want stack expansion"), but didn't really expect anybody to hit it. And it's true that nobody seems to have hit a _real_ case yet, but we certainly have a number of reports of false positives. Which not only causes extra noise in itself, but might also end up hiding any real cases if they do exist. So let's tighten up the warning condition, and replace the simplistic vma = find_vma(mm, start); if (vma && (start < vma->vm_start)) { WARN_ON_ONCE(vma->vm_flags & VM_GROWSDOWN); with a vma = gup_vma_lookup(mm, start); helper function which works otherwise like just "vma_lookup()", but with some heuristics for when to warn about gup no longer causing stack expansion. In particular, don't just warn for "below the stack", but warn if it's _just_ below the stack (with "just below" arbitrarily defined as 64kB, because why not?). And rate-limit it to at most once per hour, which means that any false positives shouldn't completely hide subsequent reports, but we won't be flooding the logs about it either. The previous code triggered when some GUP user (chromium crashpad) accessing past the end of the previous vma, for example. That has never expanded the stack, it just causes GUP to return early, and as such we shouldn't be warning about it. This is still going trigger the randomized testers, but to mitigate the noise from that, use "dump_stack()" instead of "WARN_ON_ONCE()" to get the kernel call chain. We'll get the relevant information, but syzbot shouldn't get too upset about it. Also, don't even bother with the GROWSUP case, which would be using different heuristics entirely, but only happens on parisc. Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: John Hubbard <jhubbard@nvidia.com> Reported-by: syzbot+6cf44e127903fdf9d929@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-05ice: Fix tx queue rate limit when TCs are configuredSridhar Samudrala
Configuring tx_maxrate via sysfs interface /sys/class/net/eth0/queues/tx-1/tx_maxrate was not working when TCs are configured because always main VSI was being used. Fix by using correct VSI in ice_set_tx_maxrate when TCs are configured. Fixes: 1ddef455f4a8 ("ice: Add NDO callback to set the maximum per-queue bitrate") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05ice: Fix max_rate check while configuring TX rate limitsSridhar Samudrala
Remove incorrect check in ice_validate_mqprio_opt() that limits filter configuration when sum of max_rates of all TCs exceeds the link speed. The max rate of each TC is unrelated to value used by other TCs and is valid as long as it is less than link speed. Fixes: fbc7b27af0f9 ("ice: enable ndo_setup_tc support for mqprio_qdisc") Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-07-05dt-bindings: soc: qcom: stats: Update maintainer emailMaulik Shah
Replace my email. Cc: devicetree@vger.kernel.org Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230703092026.4923-1-quic_mkshah@quicinc.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-07-05dt-bindings: cleanup DTS example whitespacesKrzysztof Kozlowski
The DTS code coding style expects spaces around '=' sign. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> #display/msm Acked-by: Neil Armstrong <neil.armstrong@linaro.org> Acked-by: Mike Leach <mike.leach@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Vinod Koul <vkoul@kernel.org> Link: https://lore.kernel.org/r/20230702182308.7583-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-07-05tracing/boot: Test strscpy() against less than zero for errorSteven Rostedt (Google)
Instead of checking for -E2BIG, it is better to just check for less than zero of strscpy() for error. Testing for -E2BIG is not very robust, and the calling code does not really care about the error code, just that there was an error. One of the updates to convert strlcpy() to strscpy() had a v2 version that changed the test from testing against -E2BIG to less than zero, but I took the v1 version that still tested for -E2BIG. Link: https://lore.kernel.org/linux-trace-kernel/20230615180420.400769-1-azeemshaikh38@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230704100807.707d1605@rorschach.local.home Cc: Mark Rutland <mark.rutland@arm.com> Cc: Azeem Shaikh <azeemshaikh38@gmail.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-05risc-v: Fix order of IPI enablement vs RCU startupMarc Zyngier
Conor reports that risc-v tries to enable IPIs before telling the core code to enable RCU. With the introduction of the mapple tree as a backing store for the irq descriptors, this results in a very shouty boot sequence, as RCU is legitimately upset. Restore some sanity by moving the risc_ipi_enable() call after notify_cpu_starting(), which explicitly enables RCU on the calling CPU. Fixes: 832f15f42646 ("RISC-V: Treat IPIs as normal Linux IRQs") Reported-by: Conor Dooley <conor@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230703-dupe-frying-79ae2ccf94eb@spud Cc: Anup Patel <apatel@ventanamicro.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230703183126.1567625-1-maz@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-05mm: riscv: fix an unsafe pte read in huge_pte_alloc()John Hubbard
The WARN_ON_ONCE() statement in riscv's huge_pte_alloc() is susceptible to false positives, because the pte is read twice at the C language level, locklessly, within the same conditional statement. Depending on compiler behavior, this can lead to generated machine code that actually reads the pte just once, or twice. Reading twice will expose the code to changing pte values and cause incorrect behavior. In [1], similar code actually caused a kernel crash on 64-bit x86, when using clang to build the kernel, but only after the conversion from *pte reads, to ptep_get(pte). The latter uses READ_ONCE(), which forced a double read of *pte. Rather than waiting for the upcoming ptep_get() conversion, just convert this part of the code now, but in a way that avoids the above problem: take a single snapshot of the pte before using it in the WARN conditional. As expected, this preparatory step does not actually change the generated code ("make mm/hugetlbpage.s"), on riscv64, when using a gcc 12.2 cross compiler. [1] https://lore.kernel.org/20230630013203.1955064-1-jhubbard@nvidia.com Suggested-by: James Houghton <jthoughton@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20230703190044.311730-1-jhubbard@nvidia.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-05dt-bindings: riscv: deprecate riscv,isaConor Dooley
intro ===== When the RISC-V dt-bindings were accepted upstream in Linux, the base ISA etc had yet to be ratified. By the ratification of the base ISA, incompatible changes had snuck into the specifications - for example the Zicsr and Zifencei extensions were spun out of the base ISA. Fast forward to today, and the reason for this patch. Currently the riscv,isa dt property permits only a specific subset of the ISA string - in particular it excludes version numbering. With the current constraints, it is not possible to discern whether "rv64i" means that the hart supports the fence.i instruction, for example. Future systems may choose to implement their own instruction fencing, perhaps using a vendor extension, or they may not implement the optional counter extensions. Software needs a way to determine this. versioning schemes ================== "Use the extension versions that are described in the ISA manual" you may say, and it's not like this has not been considered. Firstly, software that parses the riscv,isa property at runtime will need to contain a lookup table of some sort that maps arbitrary versions to versions it understands. There is not a consistent application of version number applied to extensions, with a higgledy-piggledy collection of tags, "bare" and versioned documents awaiting the reader on the "recently ratified extensions" page: https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions As an aside, and this is reflected in the patch too, since many extensions have yet to appear in a release of the ISA specs, they are defined by commits in their respective "working draft" repositories. Secondly, there is an issue of backwards compatibility, whereby allowing numbers in the ISA string, some parsers may be broken. This would require an additional property to be created to even use the versions in this manner. ~boolean properties~ string array property ========================================== If a new property is needed, the whole approach may as well be looked at from the bottom up. A string with limited character choices etc is hardly the best approach for communicating extension information to software. Switching to using properties that are defined on a per extension basis, allows us to define explicit meanings for the DT representation of each extension - rather than the current situation where different operating systems or other bits of software may impart different meanings to characters in the string. Clearly the best source of meanings is the specifications themselves, this just provides us the ability to choose at what point in time the meaning is set. If an extension changes incompatibility in the future, a new property will be required. Off-list, some of the RVI folks have committed to shoring up the wording in either the ISA specifications, the riscv-isa-manual or so that in the future, modifications to and additions or removals of features will require a new extension. Codifying that assertion somewhere would make it quite unlikely that compatibility would be broken, but we have the tools required to deal with it, if & when it crops up. It is in our collective interest, as consumers of extension meanings, to define a scheme that enforces compatibility. The use of individual elements, rather than a single string, will also permit validation that the properties have a meaning, as well as potentially reject mutually exclusive combinations, or enforce dependencies between extensions. That would not have be possible with the current dt-schema infrastructure for arbitrary strings, as we would need to add a riscv,isa parser to dt-validate! That's not implemented in this patch, but rather left as future work (for the brave, or the foolish). parser simplicity ================= Many systems that parse DT at runtime already implement an function that can check for the presence of a string in an array of string, as it is similar to the process for parsing a list of compatible strings, so a bunch of new, custom, DT parsing should not be needed. Getting rid of "riscv,isa" parsing would be a nice simplification, but unfortunately for backwards compatibility with old dtbs, existing parsers may not be removable - which may greatly simplify dt parsing code. In Linux, for example, checking for whether a hart supports an extension becomes as simple as: of_property_match_string(node, "riscv,isa-extensions", "zicbom") vendor extensions ================= Compared to riscv,isa, this proposed scheme promotes vendor extensions, oft touted as the strength of RISC-V, to first-class citizens. At present, extensions are defined as meaning what the RISC-V ISA specifications say they do. There is no realistic way of using that interface to provide cross-platform definitions for what vendor extensions mean. Vendor extensions may also have even less consistency than RVI do in terms of versioning, or no care about backwards compatibility. The new property allows us to assign explicit meanings on a per vendor extension basis, backed up by a description of their meanings. fin === Create a new file to store the extension meanings and a new riscv,isa-base property to replace the aspect of riscv,isa that is not represented by the new property - the base ISA implemented by a hart. As a starting point, add properties for extensions currently used in Linux. Finally, mark riscv,isa as deprecated, as removing support for it in existing programs would be an ABI break. CC: Palmer Dabbelt <palmer@dabbelt.com> CC: Paul Walmsley <paul.walmsley@sifive.com> CC: Rob Herring <robh+dt@kernel.org> CC: Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org> CC: Alistair Francis <alistair.francis@wdc.com> CC: Andrew Jones <ajones@ventanamicro.com> CC: Anup Patel <apatel@ventanamicro.com> CC: Atish Patra <atishp@atishpatra.org> CC: Jessica Clarke <jrtc27@jrtc27.com> CC: Rick Chen <rick@andestech.com> CC: Leo <ycliang@andestech.com> CC: Oleksii <oleksii.kurochko@gmail.com> CC: linux-riscv@lists.infradead.org CC: qemu-riscv@nongnu.org CC: u-boot@lists.denx.de CC: devicetree@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230702-eats-scorebook-c951f170d29f@spud Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-05arm64: ftrace: fix build error with CONFIG_FUNCTION_GRAPH_TRACER=nArnd Bergmann
It appears that a merge conflict ended up hiding a newly added constant in some configurations: arch/arm64/kernel/entry-ftrace.S: Assembler messages: arch/arm64/kernel/entry-ftrace.S:59: Error: undefined symbol FTRACE_OPS_DIRECT_CALL used as an immediate value FTRACE_OPS_DIRECT_CALL is still used when CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is enabled, even if CONFIG_FUNCTION_GRAPH_TRACER is disabled, so change the ifdef accordingly. Link: https://lkml.kernel.org/r/20230623152204.2216297-1-arnd@kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Donglin Peng <pengdonglin@sangfor.com.cn> Fixes: 3646970322464 ("arm64: ftrace: Enable HAVE_FUNCTION_GRAPH_RETVAL") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Florent Revest <revest@chromium.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-05tracing: Fix null pointer dereference in tracing_err_log_open()Mateusz Stachyra
Fix an issue in function 'tracing_err_log_open'. The function doesn't call 'seq_open' if the file is opened only with write permissions, which results in 'file->private_data' being left as null. If we then use 'lseek' on that opened file, 'seq_lseek' dereferences 'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic. Writing to this node requires root privileges, therefore this bug has very little security impact. Tracefs node: /sys/kernel/tracing/error_log Example Kernel panic: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038 Call trace: mutex_lock+0x30/0x110 seq_lseek+0x34/0xb8 __arm64_sys_lseek+0x6c/0xb8 invoke_syscall+0x58/0x13c el0_svc_common+0xc4/0x10c do_el0_svc+0x24/0x98 el0_svc+0x24/0x88 el0t_64_sync_handler+0x84/0xe4 el0t_64_sync+0x1b4/0x1b8 Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02) ---[ end trace 561d1b49c12cf8a5 ]--- Kernel panic - not syncing: Oops: Fatal exception Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4 Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3 Cc: stable@vger.kernel.org Fixes: 8a062902be725 ("tracing: Add tracing error log") Signed-off-by: Mateusz Stachyra <m.stachyra@samsung.com> Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-07-05netfilter: nf_tables: do not ignore genmask when looking up chain by idThadeu Lima de Souza Cascardo
When adding a rule to a chain referring to its ID, if that chain had been deleted on the same batch, the rule might end up referring to a deleted chain. This will lead to a WARNING like following: [ 33.098431] ------------[ cut here ]------------ [ 33.098678] WARNING: CPU: 5 PID: 69 at net/netfilter/nf_tables_api.c:2037 nf_tables_chain_destroy+0x23d/0x260 [ 33.099217] Modules linked in: [ 33.099388] CPU: 5 PID: 69 Comm: kworker/5:1 Not tainted 6.4.0+ #409 [ 33.099726] Workqueue: events nf_tables_trans_destroy_work [ 33.100018] RIP: 0010:nf_tables_chain_destroy+0x23d/0x260 [ 33.100306] Code: 8b 7c 24 68 e8 64 9c ed fe 4c 89 e7 e8 5c 9c ed fe 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7 c3 cc cc cc cc <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 89 c6 89 c7 [ 33.101271] RSP: 0018:ffffc900004ffc48 EFLAGS: 00010202 [ 33.101546] RAX: 0000000000000001 RBX: ffff888006fc0a28 RCX: 0000000000000000 [ 33.101920] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 33.102649] RBP: ffffc900004ffc78 R08: 0000000000000000 R09: 0000000000000000 [ 33.103018] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880135ef500 [ 33.103385] R13: 0000000000000000 R14: dead000000000122 R15: ffff888006fc0a10 [ 33.103762] FS: 0000000000000000(0000) GS:ffff888024c80000(0000) knlGS:0000000000000000 [ 33.104184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.104493] CR2: 00007fe863b56a50 CR3: 00000000124b0001 CR4: 0000000000770ee0 [ 33.104872] PKRU: 55555554 [ 33.104999] Call Trace: [ 33.105113] <TASK> [ 33.105214] ? show_regs+0x72/0x90 [ 33.105371] ? __warn+0xa5/0x210 [ 33.105520] ? nf_tables_chain_destroy+0x23d/0x260 [ 33.105732] ? report_bug+0x1f2/0x200 [ 33.105902] ? handle_bug+0x46/0x90 [ 33.106546] ? exc_invalid_op+0x19/0x50 [ 33.106762] ? asm_exc_invalid_op+0x1b/0x20 [ 33.106995] ? nf_tables_chain_destroy+0x23d/0x260 [ 33.107249] ? nf_tables_chain_destroy+0x30/0x260 [ 33.107506] nf_tables_trans_destroy_work+0x669/0x680 [ 33.107782] ? mark_held_locks+0x28/0xa0 [ 33.107996] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [ 33.108294] ? _raw_spin_unlock_irq+0x28/0x70 [ 33.108538] process_one_work+0x68c/0xb70 [ 33.108755] ? lock_acquire+0x17f/0x420 [ 33.108977] ? __pfx_process_one_work+0x10/0x10 [ 33.109218] ? do_raw_spin_lock+0x128/0x1d0 [ 33.109435] ? _raw_spin_lock_irq+0x71/0x80 [ 33.109634] worker_thread+0x2bd/0x700 [ 33.109817] ? __pfx_worker_thread+0x10/0x10 [ 33.110254] kthread+0x18b/0x1d0 [ 33.110410] ? __pfx_kthread+0x10/0x10 [ 33.110581] ret_from_fork+0x29/0x50 [ 33.110757] </TASK> [ 33.110866] irq event stamp: 1651 [ 33.111017] hardirqs last enabled at (1659): [<ffffffffa206a209>] __up_console_sem+0x79/0xa0 [ 33.111379] hardirqs last disabled at (1666): [<ffffffffa206a1ee>] __up_console_sem+0x5e/0xa0 [ 33.111740] softirqs last enabled at (1616): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0 [ 33.112094] softirqs last disabled at (1367): [<ffffffffa1f5d40e>] __irq_exit_rcu+0x9e/0xe0 [ 33.112453] ---[ end trace 0000000000000000 ]--- This is due to the nft_chain_lookup_byid ignoring the genmask. After this change, adding the new rule will fail as it will not find the chain. Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute") Cc: stable@vger.kernel.org Reported-by: Mingi Cho of Theori working with ZDI Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05netfilter: conntrack: don't fold port numbers into addresses before hashingFlorian Westphal
Originally this used jhash2() over tuple and folded the zone id, the pernet hash value, destination port and l4 protocol number into the 32bit seed value. When the switch to siphash was done, I used an on-stack temporary buffer to build a suitable key to be hashed via siphash(). But this showed up as performance regression, so I got rid of the temporary copy and collected to-be-hashed data in 4 u64 variables. This makes it easy to build tuples that produce the same hash, which isn't desirable even though chain lengths are limited. Switch back to plain siphash, but just like with jhash2(), take advantage of the fact that most of to-be-hashed data is already in a suitable order. Use an empty struct as annotation in 'struct nf_conntrack_tuple' to mark last member that can be used as hash input. The only remaining data that isn't present in the tuple structure are the zone identifier and the pernet hash: fold those into the key. Fixes: d2c806abcf0b ("netfilter: conntrack: use siphash_4u64") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05netfilter: conntrack: Avoid nf_ct_helper_hash uses after freeFlorent Revest
If nf_conntrack_init_start() fails (for example due to a register_nf_conntrack_bpf() failure), the nf_conntrack_helper_fini() clean-up path frees the nf_ct_helper_hash map. When built with NF_CONNTRACK=y, further netfilter modules (e.g: netfilter_conntrack_ftp) can still be loaded and call nf_conntrack_helpers_register(), independently of whether nf_conntrack initialized correctly. This accesses the nf_ct_helper_hash dangling pointer and causes a uaf, possibly leading to random memory corruption. This patch guards nf_conntrack_helper_register() from accessing a freed or uninitialized nf_ct_helper_hash pointer and fixes possible uses-after-free when loading a conntrack module. Cc: stable@vger.kernel.org Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Florent Revest <revest@chromium.org> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05netfilter: conntrack: gre: don't set assured flag for clash entriesFlorian Westphal
Now that conntrack core is allowd to insert clashing entries, make sure GRE won't set assured flag on NAT_CLASH entries, just like UDP. Doing so prevents early_drop logic for these entries. Fixes: d671fd82eaa9 ("netfilter: conntrack: allow insertion clash of gre protocol") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05netfilter: nf_tables: report use refcount overflowPablo Neira Ayuso
Overflow use refcount checks are not complete. Add helper function to deal with object reference counter tracking. Report -EMFILE in case UINT_MAX is reached. nft_use_dec() splats in case that reference counter underflows, which should not ever happen. Add nft_use_inc_restore() and nft_use_dec_restore() which are used to restore reference counter from error and abort paths. Use u32 in nft_flowtable and nft_object since helper functions cannot work on bitfields. Remove the few early incomplete checks now that the helper functions are in place and used to check for refcount overflow. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-07-05accel/ivpu: Clear specific interrupt status bits on C0Karol Wachowski
MTL C0 stepping fixed issue related to butrress interrupt status clearing, to clear an interrupt status it is required to write 1 to specific status bit field. This allows to execute read, modify and write routine. Writing 0 will not clear the interrupt and will cause interrupt storm. Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU") Cc: stable@vger.kernel.org # 6.3.x Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230703080725.2065635-2-stanislaw.gruszka@linux.intel.com
2023-07-05accel/ivpu: Fix VPU register access in irq disableKarol Wachowski
Incorrect REGB_WR32() macro was used to access VPUIP register. Use correct REGV_WR32(). Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU") Cc: stable@vger.kernel.org # 6.3.x Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com> Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230703080725.2065635-1-stanislaw.gruszka@linux.intel.com
2023-07-05Merge branch 'mptcp-fixes'David S. Miller
Matthieu Baerts says: ==================== mptcp: fixes for v6.5 Here is a first batch of fixes for v6.5 and older. The fixes are not linked to each others. Patch 1 ensures subflows are unhashed before cleaning the backlog to avoid races. This fixes another recent fix from v6.4. Patch 2 does not rely on implicit state check in mptcp_listen() to avoid races when receiving an MP_FASTCLOSE. A regression from v5.17. The rest fixes issues in the selftests. Patch 3 makes sure errors when setting up the environment are no longer ignored. For v5.17+. Patch 4 uses 'iptables-legacy' if available to be able to run on older kernels. A fix for v5.13 and newer. Patch 5 catches errors when issues are detected with packet marks. Also for v5.13+. Patch 6 uses the correct variable instead of an undefined one. Even if there was no visible impact, it can help to find regressions later. An issue visible in v5.19+. Patch 7 makes sure errors with some sub-tests are reported to have the selftest marked as failed as expected. Also for v5.19+. Patch 8 adds a kernel config that is required to execute MPTCP selftests. It is valid for v5.9+. Patch 9 fixes issues when validating the userspace path-manager with 32-bit arch, an issue affecting v5.19+. ==================== Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
2023-07-05selftests: mptcp: pm_nl_ctl: fix 32-bit supportMatthieu Baerts
When using pm_nl_ctl to validate userspace path-manager's behaviours, it was failing on 32-bit architectures ~half of the time. pm_nl_ctl was not reporting any error but the command was not doing what it was expected to do. As a result, the expected linked event was not triggered after and the test failed. This is due to the fact the token given in argument to the application was parsed as an integer with atoi(): in a 32-bit arch, if the number was bigger than INT_MAX, 2147483647 was used instead. This can simply be fixed by using strtoul() instead of atoi(). The errors have been seen "by chance" when manually looking at the results from LKFT. Fixes: 9a0b36509df0 ("selftests: mptcp: support MPTCP_PM_CMD_ANNOUNCE") Cc: stable@vger.kernel.org Fixes: ecd2a77d672f ("selftests: mptcp: support MPTCP_PM_CMD_REMOVE") Fixes: cf8d0a6dfd64 ("selftests: mptcp: support MPTCP_PM_CMD_SUBFLOW_CREATE") Fixes: 57cc361b8d38 ("selftests: mptcp: support MPTCP_PM_CMD_SUBFLOW_DESTROY") Fixes: ca188a25d43f ("selftests: mptcp: userspace PM support for MP_PRIO signals") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-05selftests: mptcp: depend on SYN_COOKIESMatthieu Baerts
MPTCP selftests are using TCP SYN Cookies for quite a while now, since v5.9. Some CIs don't have this config option enabled and this is causing issues in the tests: # ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 167ms) sysctl: cannot stat /proc/sys/net/ipv4/tcp_syncookies: No such file or directory # [ OK ]./mptcp_connect.sh: line 554: [: -eq: unary operator expected There is no impact in the results but the test is not doing what it is supposed to do. Fixes: fed61c4b584c ("selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-05selftests: mptcp: userspace_pm: report errors with 'remove' testsMatthieu Baerts
A message was mentioning an issue with the "remove" tests but the selftest was not marked as failed. Directly exit with an error like it is done everywhere else in this selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: 259a834fadda ("selftests: mptcp: functional tests for the userspace PM type") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-05selftests: mptcp: userspace_pm: use correct server portMatthieu Baerts
"server4_port" variable is not set but "app4_port" is the server port in v4 and the correct variable name to use. The port is optional so there was no visible impact. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: ca188a25d43f ("selftests: mptcp: userspace PM support for MP_PRIO signals") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-05selftests: mptcp: sockopt: return error if wrong markMatthieu Baerts
When an error was detected when checking the marks, a message was correctly printed mentioning the error but followed by another one saying everything was OK and the selftest was not marked as failed as expected. Now the 'ret' variable is directly set to 1 in order to make sure the exit is done with an error, similar to what is done in other functions. While at it, the error is correctly propagated to the caller. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Fixes: dc65fe82fb07 ("selftests: mptcp: add packet mark test case") Cc: stable@vger.kernel.org Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>