Age | Commit message (Collapse) | Author |
|
XFRM state which is changed to be XFRM_STATE_EXPIRED doesn't really
need to hold lock while modifying flow steering rules to drop traffic.
That state can be deleted only and as such mlx5e_ipsec_handle_tx_limit()
work will be canceled anyway and won't run in parallel.
Fixes: b2f7b01d36a9 ("net/mlx5e: Simulate missing IPsec TX limits hardware functionality")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Previously during mlx5e_ipsec_handle_event the driver tried to execute
an operation that could sleep, while holding a spinlock, which caused
the kernel panic mentioned below.
Move the function call that can sleep outside of the spinlock context.
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x6c
__schedule_bug.cold+0x42/0x4e
schedule_debug.constprop.0+0xe0/0x118
__schedule+0x59/0x58a
? __mod_timer+0x2a1/0x3ef
schedule+0x5e/0xd4
schedule_timeout+0x99/0x164
? __pfx_process_timeout+0x10/0x10
__wait_for_common+0x90/0x1da
? __pfx_schedule_timeout+0x10/0x10
wait_func+0x34/0x142 [mlx5_core]
mlx5_cmd_invoke+0x1f3/0x313 [mlx5_core]
cmd_exec+0x1fe/0x325 [mlx5_core]
mlx5_cmd_do+0x22/0x50 [mlx5_core]
mlx5_cmd_exec+0x1c/0x40 [mlx5_core]
mlx5_modify_ipsec_obj+0xb2/0x17f [mlx5_core]
mlx5e_ipsec_update_esn_state+0x69/0xf0 [mlx5_core]
? wake_affine+0x62/0x1f8
mlx5e_ipsec_handle_event+0xb1/0xc0 [mlx5_core]
process_one_work+0x1e2/0x3e6
? __pfx_worker_thread+0x10/0x10
worker_thread+0x54/0x3ad
? __pfx_worker_thread+0x10/0x10
kthread+0xda/0x101
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x37
</TASK>
BUG: workqueue leaked lock or atomic: kworker/u256:4/0x7fffffff/189754#012 last function: mlx5e_ipsec_handle_event [mlx5_core]
CPU: 66 PID: 189754 Comm: kworker/u256:4 Kdump: loaded Tainted: G W 6.2.0-2596.20230309201517_5.el8uek.rc1.x86_64 #2
Hardware name: Oracle Corporation ORACLE SERVER X9-2/ASMMBX9-2, BIOS 61070300 08/17/2022
Workqueue: mlx5e_ipsec: eth%d mlx5e_ipsec_handle_event [mlx5_core]
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x6c
process_one_work.cold+0x2b/0x3c
? __pfx_worker_thread+0x10/0x10
worker_thread+0x54/0x3ad
? __pfx_worker_thread+0x10/0x10
kthread+0xda/0x101
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x37
</TASK>
BUG: scheduling while atomic: kworker/u256:4/189754/0x00000000
Fixes: cee137a63431 ("net/mlx5e: Handle ESN update events")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
XFRM core provides two callbacks to release resources, one is .xdo_dev_policy_delete()
and another is .xdo_dev_policy_free(). This separation allows delayed release so
"ip xfrm policy free" commands won't starve. Unfortunately, mlx5 command interface
can't run in .xdo_dev_policy_free() callbacks as the latter runs in ATOMIC context.
BUG: scheduling while atomic: swapper/7/0/0x00000100
Modules linked in: act_mirred act_tunnel_key cls_flower sch_ingress vxlan mlx5_vdpa vringh vhost_iotlb vdpa rpcrdma rdma_ucm ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.3.0+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x33/0x50
__schedule_bug+0x4e/0x60
__schedule+0x5d5/0x780
? __mod_timer+0x286/0x3d0
schedule+0x50/0x90
schedule_timeout+0x7c/0xf0
? __bpf_trace_tick_stop+0x10/0x10
__wait_for_common+0x88/0x190
? usleep_range_state+0x90/0x90
cmd_exec+0x42e/0xb40 [mlx5_core]
mlx5_cmd_do+0x1e/0x40 [mlx5_core]
mlx5_cmd_exec+0x18/0x30 [mlx5_core]
mlx5_cmd_delete_fte+0xa8/0xd0 [mlx5_core]
del_hw_fte+0x60/0x120 [mlx5_core]
mlx5_del_flow_rules+0xec/0x270 [mlx5_core]
? default_send_IPI_single_phys+0x26/0x30
mlx5e_accel_ipsec_fs_del_pol+0x1a/0x60 [mlx5_core]
mlx5e_xfrm_free_policy+0x15/0x20 [mlx5_core]
xfrm_policy_destroy+0x5a/0xb0
xfrm4_dst_destroy+0x7b/0x100
dst_destroy+0x37/0x120
rcu_core+0x2d6/0x540
__do_softirq+0xcd/0x273
irq_exit_rcu+0x82/0xb0
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:default_idle+0x13/0x20
Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 7a 4d ee 00 85 c0 7e 07 0f 00 2d 2f 98 2e 00 fb f4 <fa> c3 66 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 40 b4 02 00
RSP: 0018:ffff888100843ee0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffff888100812b00 RCX: 4000000000000000
RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000002d2ec
RBP: 0000000000000007 R08: 00000021daeded59 R09: 0000000000000001
R10: 0000000000000000 R11: 000000000000000f R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
default_idle_call+0x30/0xb0
do_idle+0x1c1/0x1d0
cpu_startup_entry+0x19/0x20
start_secondary+0xfe/0x120
secondary_startup_64_no_verify+0xf3/0xfb
</TASK>
bad: scheduling from the idle thread!
Fixes: a5b8ca9471d3 ("net/mlx5e: Add XFRM policy offload logic")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The kernel IRQ system needs the irq affinity notifier to be clear
before attempting to free the irq, see WARN_ON log below.
On a normal driver unload we don't have this issue since we do the
complete cleanup of the irq resources.
To fix this, put the important resources cleanup in a helper function
and use it in both normal driver unload and shutdown flows.
[ 4497.498434] ------------[ cut here ]------------
[ 4497.498726] WARNING: CPU: 0 PID: 9 at kernel/irq/manage.c:2034 free_irq+0x295/0x340
[ 4497.499193] Modules linked in:
[ 4497.499386] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.4.0-rc4+ #10
[ 4497.499876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 4497.500518] Workqueue: events do_poweroff
[ 4497.500849] RIP: 0010:free_irq+0x295/0x340
[ 4497.501132] Code: 85 c0 0f 84 1d ff ff ff 48 89 ef ff d0 0f 1f 00 e9 10 ff ff ff 0f 0b e9 72 ff ff ff 49 8d 7f 28 ff d0 0f 1f 00 e9 df fd ff ff <0f> 0b 48 c7 80 c0 008
[ 4497.502269] RSP: 0018:ffffc90000053da0 EFLAGS: 00010282
[ 4497.502589] RAX: ffff888100949600 RBX: ffff88810330b948 RCX: 0000000000000000
[ 4497.503035] RDX: ffff888100949600 RSI: ffff888100400490 RDI: 0000000000000023
[ 4497.503472] RBP: ffff88810330c7e0 R08: ffff8881004005d0 R09: ffffffff8273a260
[ 4497.503923] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881009ae000
[ 4497.504359] R13: ffff8881009ae148 R14: 0000000000000000 R15: ffff888100949600
[ 4497.504804] FS: 0000000000000000(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000
[ 4497.505302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4497.505671] CR2: 00007fce98806298 CR3: 000000000262e005 CR4: 0000000000370ef0
[ 4497.506104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4497.506540] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4497.507002] Call Trace:
[ 4497.507158] <TASK>
[ 4497.507299] ? free_irq+0x295/0x340
[ 4497.507522] ? __warn+0x7c/0x130
[ 4497.507740] ? free_irq+0x295/0x340
[ 4497.507963] ? report_bug+0x171/0x1a0
[ 4497.508197] ? handle_bug+0x3c/0x70
[ 4497.508417] ? exc_invalid_op+0x17/0x70
[ 4497.508662] ? asm_exc_invalid_op+0x1a/0x20
[ 4497.508926] ? free_irq+0x295/0x340
[ 4497.509146] mlx5_irq_pool_free_irqs+0x48/0x90
[ 4497.509421] mlx5_irq_table_free_irqs+0x38/0x50
[ 4497.509714] mlx5_core_eq_free_irqs+0x27/0x40
[ 4497.509984] shutdown+0x7b/0x100
[ 4497.510184] pci_device_shutdown+0x30/0x60
[ 4497.510440] device_shutdown+0x14d/0x240
[ 4497.510698] kernel_power_off+0x30/0x70
[ 4497.510938] process_one_work+0x1e6/0x3e0
[ 4497.511183] worker_thread+0x49/0x3b0
[ 4497.511407] ? __pfx_worker_thread+0x10/0x10
[ 4497.511679] kthread+0xe0/0x110
[ 4497.511879] ? __pfx_kthread+0x10/0x10
[ 4497.512114] ret_from_fork+0x29/0x50
[ 4497.512342] </TASK>
Fixes: 9c2d08010963 ("net/mlx5: Free irqs only on shutdown callback")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
|
|
When TUNNEL_L3_TO_L2 decap action was created, a pointer to a local
variable was passed as its HW action data, resulting in attempt to
free invalid address:
BUG: KASAN: invalid-free in mlx5dr_action_destroy+0x318/0x410 [mlx5_core]
Fixes: 4781df92f4da ("net/mlx5: DR, Move STEv0 modify header logic")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In some cases, steering might need to use SW-created action in
FW table, which results in wrong packet reformat being used:
mlx5_core 0000:81:00.1: mlx5_cmd_check:756:(pid 1154):
SET_FLOW_TABLE_ENTRY(0×936) op_mod(0×0) failed,
status bad resource(0×5), syndrome (0xf2ff71)
This patch adds support for usage of SW-created packet reformat (encap)
actions in FW tables, and adds clear error flow for attempt to use
SW-created modify header on FW tables.
Fixes: 6a48faeeca10 ("net/mlx5: Add direct rule fs_cmd implementation")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit removes special handling of CT action. But it
removes too much. Pre ct/ct_nat tables and some other resources
are not destroyed due to the cited commit.
Fix it by adding it back.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commits add hardware miss support to tc action. But if
the rules can't be offloaded, the pointers are null and system
will panic when accessing them.
Fix it by checking null pointer.
Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action")
Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When a PCI device has just one msix vector available, we want to share
this vector between async and completion events. Current code fails to
do that assuming it will always have at least one dedicated vector for
completion events. Fix this by detecting when the pool contains just a
single vector.
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commit missed setting napi_id on XSK RQs, it only affected
regular RQs. Add the missing part to support socket busy polling on XSK
RQs.
Fixes: a2740f529da2 ("net/mlx5e: xsk: Set napi_id to support busy polling")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The cited commits missed passing frag_size to __xdp_rxq_info_reg, which
is required by bpf_xdp_adjust_tail to support growing the tail pointer
in fragmented packets. Pass the missing parameter when the current RQ
mode allows XDP multi buffer.
Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Fixes: 9cb9482ef10e ("net/mlx5e: Use fragments of the same size in non-linear legacy RQ with XDP")
Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two fixes for NOCOW files, a regression fix in scrub and an assertion
fix:
- NOCOW fixes:
- keep length of iomap direct io request in case of a failure
- properly pass mode of extent reference checking, this can break
some cases for swapfile
- fix error value confusion when scrubbing a stripe
- convert assertion to a proper error handling when loading global
roots, reported by syzbot"
* tag 'for-6.4-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: fix a return value overwrite in scrub_stripe()
btrfs: do not ASSERT() on duplicated global roots
btrfs: can_nocow_file_extent should pass down args->strict from callers
btrfs: fix iomap_begin length for nocow writes
|
|
Pull block fix from Jens Axboe:
"Just a single fix for blk-cg stats flushing"
* tag 'block-6.4-2023-06-15' of git://git.kernel.dk/linux:
blk-cgroup: Flush stats before releasing blkcg_gq
|
|
Pull io_uring fixes from Jens Axboe:
"A fix for sendmsg with CMSG, and the followup fix discussed for
avoiding touching task->worker_private after the worker has started
exiting"
* tag 'io_uring-6.4-2023-06-15' of git://git.kernel.dk/linux:
io_uring/io-wq: clear current->worker_private on exit
io_uring/net: save msghdr->msg_control for retries
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes. The only change to the core code is for a
minor race in ALSA OSS sequencer, and the rest are all device-specific
fixes (regression fixes and a usual quirk)"
* tag 'sound-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add quirk flag for HEM devices to enable native DSD playback
ALSA: usb-audio: Fix broken resume due to UAC3 power state
ALSA: seq: oss: Fix racy open/close of MIDI devices
ASoC: tegra: Fix Master Volume Control
ALSA: hda/realtek: Add a quirk for Compaq N14JP6
firmware: cs_dsp: Log correct region name in bin error messages
|
|
"ecpu" field in struct mlx5_sf_table is not used anywhere. Remove it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Separate the event API defined in the generic mlx5.h header into
a dedicated header. And remove the TODO comment in commit
69c1280b1f3b ("net/mlx5: Device events, Use async events chain").
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
This change is needed to use EC VFs with metadata based steering.
There was an assumption that vport was equal to function ID. That's
not the case for EC VF functions. Adjust to function ID and set the
ec_vf_function bit accordingly.
Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The last value is not set correctly. This results in representors not
being created for all EC VFs when the base value is higher than 0.
Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add counter for number of unicast, multicast and broadcast packets/
octets that were loopback.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Add needed HW bits for querying local loopback counter and the
HCA capability for it.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The msglvl support was implemented using the mlx5e_dbg() macro which is
rarely used in the driver, and is not very useful when you can just use
dynamic debug instead.
Remove mlx5e_dbg() and convert its usages to netdev_dbg().
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
These else statement blocks are redundant since the if block already
jumps to the function abort label.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
|
|
For debugging purposes expose offloaded FDB state (flags, counters, etc.)
via debugfs inside 'esw' root directory. Example debugfs file output:
$ cat mlx5/0000\:08\:00.0/esw/bridge/bridge1/fdb
DEV MAC VLAN PACKETS BYTES LASTUSE FLAGS
enp8s0f0_1 e4:0a:05:08:00:06 2 2 204 4295567112 0x0
enp8s0f0_0 e4:0a:05:08:00:03 2 3 278 4295567112 0x0
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patch requires access to additional data in bridge net_device.
Pass the whole structure down the stack instead of adding necessary fields
as function arguments one-by-one.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Following patch in series uses the new directory for bridge FDB debugfs.
The new directory is intended for all future eswitch-specific debugfs
files.
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Added a new event handler to firmware sync reset, which is used to
support firmware sync reset flow on smart NIC. Adding this new stage to
the flow enables the firmware to ensure host PFs unload before ECPFs
unload, to avoid race of PFs recovery.
If firmware sends sync_reset_unload event to driver the driver should
unload and close all HW resources of the function. Once the driver
finishes unloading part, it can't get any more events from firmware as
event queues are closed, so it polls the reset state field to know when
to continue to next stage of the sync reset flow.
Added capability bit for supporting sync_reset_unload event.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The Default Timeout Register (DTOR) provides timeout values to driver
for flows that are device dependent. Zero value for DTOR entry is not
valid and should not be used. In case of reading zero value from DTOR,
the driver should use the hard coded SW default value instead.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Expose new timoueout in Default Timeouts Register to be used on sync
reset flow running on smart NIC. In this flow the driver should know how
much time to wait from getting unload request till firmware will ask the
PF to continue to next stage of the flow.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Verify at reset_request stage that PF is capable to do reset_now. In
case PF is not capable, notify the firmware that the sync reset can not
happen and so firmware will abort the sync reset at early stage and will
not send reset_now event to any PF.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The tick period is aligned very early while the first clock_event_device is
registered. At that point the system runs in periodic mode and switches
later to one-shot mode if possible.
The next wake-up event is programmed based on the aligned value
(tick_next_period) but the delta value, that is used to program the
clock_event_device, is computed based on ktime_get().
With the subtracted offset, the device fires earlier than the exact time
frame. With a large enough offset the system programs the timer for the
next wake-up and the remaining time left is too small to make any boot
progress. The system hangs.
Move the alignment later to the setup of tick_sched timer. At this point
the system switches to oneshot mode and a high resolution clocksource is
available. At this point it is safe to align tick_next_period because
ktime_get() will now return accurate (not jiffies based) time.
[bigeasy: Patch description + testing].
Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.")
Reported-by: Mathias Krause <minipli@grsecurity.net>
Reported-by: "Bhatnagar, Rishabh" <risbhat@amazon.com>
Suggested-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/5a56290d-806e-b9a5-f37c-f21958b5a8c0@grsecurity.net
Link: https://lore.kernel.org/12c6f9a3-d087-b824-0d05-0d18c9bc1bf3@amazon.com
Link: https://lore.kernel.org/r/20230615091830.RxMV2xf_@linutronix.de
|
|
Splicing to SOCK_RAW sockets may set MSG_SPLICE_PAGES, but in such a case,
__ip_append_data() will call skb_splice_from_iter() to access the 'from'
data, assuming it to point to a msghdr struct with an iter, instead of
using the provided getfrag function to access it.
In the case of raw_sendmsg(), however, this is not the case and 'from' will
point to a raw_frag_vec struct and raw_getfrag() will be the frag-getting
function. A similar issue may occur with rawv6_sendmsg().
Fix this by ignoring MSG_SPLICE_PAGES if getfrag != ip_generic_getfrag as
ip_generic_getfrag() expects "from" to be a msghdr*, but the other getfrags
don't. Note that this will prevent MSG_SPLICE_PAGES from being effective
for udplite.
This likely affects ping sockets too. udplite looks like it should be okay
as it expects "from" to be a msghdr.
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000ae4cbf05fdeb8349@google.com/
Fixes: 2dc334f1a63a ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()")
Tested-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1410156.1686729856@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This fixes a spinlock-initialization regression in SRCU that causes
the SRCU notifier to fail.
The fix simply adds the initialization, but introduces a #ifdef
because there is no spinlock to initialize for the Tiny SRCU used in
!SMP builds.
Yes, it would be nice to abstract this somehow in order to hide it in
SRCU, but I still don't see a good way of doing this"
* tag 'urgent-rcu.2023.06.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
notifier: Initialize new struct srcu_usage field
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
- A documentation patch describing how we use patchwork
* tag 'riscv-for-linus-6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
Documentation: RISC-V: patch-acceptance: mention patchwork's role
|
|
With offloading enabled, esp_xmit() gets invoked very late, from within
validate_xmit_xfrm() which is after validate_xmit_skb() validates and
linearizes the skb if the underlying device does not support fragments.
esp_output_tail() may add a fragment to the skb while adding the auth
tag/ IV. Devices without the proper support will then send skb->data
points to with the correct length so the packet will have garbage at the
end. A pcap sniffer will claim that the proper data has been sent since
it parses the skb properly.
It is not affected with INET_ESP_OFFLOAD disabled.
Linearize the skb after offloading if the sending hardware requires it.
It was tested on v4, v6 has been adopted.
Fixes: 7785bba299a8d ("esp: Add a software GRO codepath")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Functions efx_tc_netdev_event and efx_tc_netevent_event do not exist
in that case as object files tc_bindings.o and tc_encap_actions.o
are not built, so the calls to them from ef100_netdev_event and
ef100_netevent_event cause link errors.
Wrap the corresponding header files (tc_bindings.h, tc_encap_actions.h)
with #if IS_ENABLED(CONFIG_SFC_SRIOV), and add an #else with static
inline stubs for these two functions.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306102026.ISK5JfUQ-lkp@intel.com/
Fixes: 7e5e7d800011 ("sfc: neighbour lookup for TC encap action offload")
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_ETHERNET=m or CONFIG_FDDI=m, lcs.s has build errors or
warnings:
../drivers/s390/net/lcs.c:40:2: error: #error Cannot compile lcs.c without some net devices switched on.
40 | #error Cannot compile lcs.c without some net devices switched on.
../drivers/s390/net/lcs.c: In function 'lcs_startlan_auto':
../drivers/s390/net/lcs.c:1601:13: warning: unused variable 'rc' [-Wunused-variable]
1601 | int rc;
Solve this by using IS_ENABLED(CONFIG_symbol) instead of ifdef
CONFIG_symbol. The latter only works for builtin (=y) values
while IS_ENABLED() works for builtin or modular values.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexandra Winter <wintera@linux.ibm.com>
Cc: Wenjia Zhang <wenjia@linux.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.4
A couple more fixes for v6.4, one fixing a misleading error log and
another stopping us seeing spurious failures setting the master volume
on some Tegra systems introduced by a change to how we calculate delay
times.
|
|
This commit adds new DEVICE_FLG with QUIRK_FLAG_DSD_RAW and Vendor Id for
HEM devices which supports native DSD. Prior to this change Linux kernel
was not enabling native DSD playback for HEM devices, and as a result,
DSD audio was being converted to PCM "on the fly". HEM devices,
when connected to the system, would only play audio in PCM format,
even if the source material was in DSD format. With the addition of new
VENDOR_FLG in the quircks.c file, the devices are now correctly
recognized, and raw DSD data is transmitted to the device,
allowing for native DSD playback.
Signed-off-by: Lukasz Tyl <ltyl@hem-e.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20230614122524.30271-1-ltyl@hem-e.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
As reported in the bugzilla below, the PM resume of a UAC3 device may
fail due to the incomplete power state change, stuck at D1. The
reason is that the driver expects the full D0 power state change only
at hw_params, while the normal PCM resume procedure doesn't call
hw_params.
For fixing the bug, we add the same power state update to D0 at the
prepare callback, which is certainly called by the resume procedure.
Note that, with this change, the power state change in the hw_params
becomes almost redundant, since snd_usb_hw_params() doesn't touch the
parameters (at least it tires so). But dropping it is still a bit
risky (e.g. we have the media-driver binding), so I leave the D0 power
state change in snd_usb_hw_params() as is for now.
Fixes: a0a4959eb4e9 ("ALSA: usb-audio: Operate UAC3 Power Domains in PCM callbacks")
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217539
Link: https://lore.kernel.org/r/20230612132818.29486-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Although snd_seq_oss_midi_open() and snd_seq_oss_midi_close() can be
called concurrently from different code paths, we have no proper data
protection against races. Introduce open_mutex to each seq_oss_midi
object for avoiding the races.
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Closes: https://lore.kernel.org/r/7DC9AF71-F481-4ABA-955F-76C535661E33@purdue.edu
Link: https://lore.kernel.org/r/20230612125533.27461-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Implement 64 bit per cpu stats to fix the overflow of netdev->stats
on 32 bit platforms. To simplify the code, we use net core
pcpu_sw_netstats infrastructure. One small drawback is some memory
overhead because litex uses just one queue, but we allocate the
counters per cpu.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Gabriel Somlo <gsomlo@gmail.com>
Link: https://lore.kernel.org/r/20230614162035.300-1-jszhang@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Piotr Gardocki says:
====================
optimize procedure of changing MAC address on interface
The first patch adds an if statement in core to skip early when
the MAC address is not being changes.
The remaining patches remove such checks from Intel drivers
as they're redundant at this point.
====================
Link: https://lore.kernel.org/r/20230614145302.902301-1-piotrx.gardocki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The check has been moved to core. The ndo_set_mac_address callback
is not being called with new MAC address equal to the old one anymore.
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The check has been moved to core. The ndo_set_mac_address callback
is not being called with new MAC address equal to the old one anymore.
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In some cases it is possible for kernel to come with request
to change primary MAC address to the address that is already
set on the given interface.
Add proper check to return fast from the function in these cases.
An example of such case is adding an interface to bonding
channel in balance-alb mode:
modprobe bonding mode=balance-alb miimon=100 max_bonds=1
ip link set bond0 up
ifenslave bond0 <eth>
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Randy reported that linux-next build warns on PowerPC:
drivers/net/ethernet/freescale/fs_enet/mii-fec.c: In function 'fs_enet_mdio_probe':
drivers/net/ethernet/freescale/fs_enet/mii-fec.c:130:50: warning: format '%x' expects argument of type 'unsigned int', but argument 4 has type 'resource_size_t' {aka 'long long unsigned int'} [-Wformat=]
130 | snprintf(new_bus->id, MII_BUS_ID_SIZE, "%x", res.start);
| ~^ ~~~~~~~~~
| | |
| | resource_size_t {aka long long unsigned int}
| unsigned int
| %llx
Use the right print format.
Link: https://lore.kernel.org/all/8f9f8d38-d9c7-9f1b-feb0-103d76902d14@infradead.org/
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20230615035231.2184880-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
splice_to_socket() assumes that a pipe_buffer won't hold more than a single
page of data - but this assumption can be violated by skb_splice_bits()
when it splices from a socket into a pipe.
The problem is that splice_to_socket() doesn't advance the pipe_buffer
length and offset when transcribing from the pipe buf into a bio_vec, so if
the buf is >PAGE_SIZE, it keeps repeating the same initial chunk and
doesn't advance the tail index. It then subtracts this from "remain" and
overcounts the amount of data to be sent.
The cleanup phase then tries to overclean the pipe, hits an unused pipe buf
and a NULL-pointer dereference occurs.
Fix this by not restricting the bio_vec size to PAGE_SIZE and instead
transcribing the entirety of each pipe_buffer into a single bio_vec and
advancing the tail index if remain hasn't hit zero yet.
Large bio_vecs will then be split up by iterator functions such as
iov_iter_extract_pages().
This resulted in a KASAN report looking like:
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
...
RIP: 0010:pipe_buf_release include/linux/pipe_fs_i.h:203 [inline]
RIP: 0010:splice_to_socket+0xa91/0xe30 fs/splice.c:933
Fixes: 2dc334f1a63a ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()")
Reported-by: syzbot+f9e28a23426ac3b24f20@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/0000000000000900e905fdeb8e39@google.com/
Tested-by: syzbot+f9e28a23426ac3b24f20@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christian Brauner <brauner@kernel.org>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/1428985.1686737388@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After merging the net-next tree, today's linux-next build (sparc64
defconfig) failed like this:
drivers/net/ethernet/sun/sunvnet_common.c: In function 'vnet_handle_offloads':
drivers/net/ethernet/sun/sunvnet_common.c:1277:16: error: implicit declaration of function 'skb_gso_segment'; did you mean 'skb_gso_reset'? [-Werror=implicit-function-declaration]
1277 | segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
| ^~~~~~~~~~~~~~~
| skb_gso_reset
drivers/net/ethernet/sun/sunvnet_common.c:1277:14: warning: assignment to 'struct sk_buff *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
1277 | segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
| ^
Fixes: d457a0e329b0 ("net: move gso declarations and functions to their own files")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230613164639.164b2991@canb.auug.org.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current implementation allocates page-sized rx buffers.
As traffic may consist of different types and sizes of packets,
in various cases, buffers are not fully used.
This change (Dynamic RX Buffers - DRB) uses part of the allocated rx
page needed for the incoming packet, and returns the rest of the
unused page to be used again as an rx buffer for future packets.
A threshold of 2K for unused space has been set in order to declare
whether the remainder of the page can be reused again as an rx buffer.
As a page may be reused, dma_sync_single_for_cpu() is added in order
to sync the memory to the CPU side after it was owned by the HW.
In addition, when the rx page can no longer be reused, it is being
unmapped using dma_page_unmap(), which implicitly syncs and then
unmaps the entire page. In case the kernel still handles the skbs
pointing to the previous buffers from that rx page, it may access
garbage pointers, caused by the implicit sync overwriting them.
The implicit dma sync is removed by replacing dma_page_unmap() with
dma_unmap_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC flag.
The functionality is disabled for XDP traffic to avoid handling
several descriptors per packet.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Link: https://lore.kernel.org/r/20230612121448.28829-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|