Age | Commit message (Collapse) | Author |
|
Build checks have pointed out that 'hb' can theoretically
be used before set, so let's initialize it and get rid
of the compiler complaint.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make sure the NIC drops packets that are larger than the
specified MTU.
The front end of the NIC will accept packets larger than MTU and
will copy all the data it can to fill up the driver's posted
buffers - if the buffers are not long enough the packet will
then get dropped. With the Rx SG buffers allocagted as full
pages, we are currently setting up more space than MTU size
available and end up receiving some packets that are larger
than MTU, up to the size of buffers posted. To be sure the
NIC doesn't waste our time with oversized packets we need to
lie a little in the SG descriptor about how long is the last
SG element.
At dealloc time, we know the allocation was a page, so the
deallocation doesn't care about what length we put in the
descriptor.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a counter for packets dropped by the driver, typically
for bad size or a receive error seen by the device.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The subdevice concept is not being used in the driver, so
drop the references to it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now it was possible to pass a packet to a single destination such
as vport or flow table. With the new support if multiple vports or multiple
tables are provided as destinations, fs_dr will create a multiple
destination table action, this action should replace other destination
actions provided to mlx5dr_create_rule.
Each vport destination can be provided with a reformat actions which
will be done before forwarding the packet to the vport.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
A multiple destination table action allows HW packet duplication
to multiple destinations, this is useful for multicast or mirroring
traffic for debug. Duplicating is done using a FW flow table with
multiple destinations.
The new action creation function, mlx5dr_action_create_mult_dest_tbl
will allow creating a single table to iterate over several dr actions.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Function prefix was changed to be similar to other action APIs.
In order to support other FW tables the mlx5_flow_table struct was
replaced with table id and type.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
We need to have the flow-table flags when creation sw-steering tables,
this parameter exists in the layer between fs_core to sw_steering, this
patch gives it to the creation function.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently SW steering doesn't have the means to access HW iterators to
support multi-destination (FTEs) flow table entries.
In order to support multi-destination FTEs for port-mirroring, SW
steering will create a dedicated multi-destination FW managed flow table
and FTEs via direct FW commands that we introduced in the previous patch.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Implement the FW command to setup a FTE (Flow Table Entry) into the FW
managed flow tables.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Instead of using multiple variables use a simple struct. The
number of passed argument was too high after adding the encap
decap support bits arguments used for multiple destination reformat.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use helper routines to setup and teardown multiple EQs and reuse the
code in setup, cleanup and error unwinding flows.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In below sequence, a EQE entry arrives for a CQ which is on the path of
being destroyed.
cpu-0 cpu-1
------ -----
mlx5_core_destroy_cq() mlx5_eq_comp_int()
mlx5_eq_del_cq() [..]
radix_tree_delete() [..]
[..] mlx5_eq_cq_get() /* Didn't find CQ is
* a valid case.
*/
/* destroy CQ in hw */
mlx5_cmd_exec()
This is still a valid scenario and correct delete CQ sequence, as
mirror of the CQ create sequence.
Hence, suppress the non harmful debug message from warn to debug level.
Keep the debug log message rate limited because user application can
trigger it repeatedly.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently the max number of channels is limited to 64, which is half of
the indirection table size to allow some flexibility. But on servers
with more than 64 cores, users may want to utilize more queues.
This patch increases the advertised max number of channels to 128 by
changing the ratio between channels and indirection table slots to 1:1.
At the same time, the driver still enable no more than 64 channels at
loading. Users can change it by ethtool afterwards.
Signed-off-by: Fan Li <fanl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In one case, we may forward packets from one vport
to others, but only one packets flow will be accepted,
which destination ip was assign to VF.
+-----+ +-----+ +-----+
| VFn | | VF1 | | VF0 | accept
+--+--+ +--+--+ hairpin +--^--+
| | <--------------- |
| | |
+--+-----------v-+ +--+-------------+
| eswitch PF1 | | eswitch PF0 |
+----------------+ +----------------+
tc filter add dev $PF0 protocol all parent ffff: prio 1 handle 1 \
flower skip_sw action mirred egress redirect dev $VF0_REP
tc filter add dev $VF0 protocol ip parent ffff: prio 1 handle 1 \
flower skip_sw dst_ip $VF0_IP action pass
tc filter add dev $VF0 protocol all parent ffff: prio 2 handle 2 \
flower skip_sw action mirred egress redirect dev $VF1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In some configurations, gcc tries too hard to optimize this code:
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function 'mlx5e_grp_sw_update_stats':
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:302:1: error: the frame size of 1336 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
As was stated in the bug report, the reason is that gcc runs into a corner
case in the register allocator that is rather hard to fix in a good way.
As there is an easy way to work around it, just add a comment and the
barrier that stops gcc from trying to overoptimize the function.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92657
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
The function mlx5_buf_alloc_node is only used by the function in the
local scope. So it is appropriate to limit this function in the local
scope.
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2020-01-06
This series contains updates to igc to add basic support for
timestamping.
Vinicius adds basic support for timestamping and enables ptp4l/phc2sys
to work with i225 devices. Initially, adds the ability to read and
adjust the PHC clock. Patches 2 & 3 enable and retrieve hardware
timestamps. Patch 4 implements the ethtool ioctl that ptp4l uses to
check what timestamping methods are supported. Lastly, added support to
do timestamping using the "Start of Packet" signal from the PHY, which
is now supported in i225 devices.
While i225 does support multiple PTP domains, with multiple timestamping
registers, we currently only support one PTP domain and use only one of
the timestamping registers for implementation purposes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Andrew Lunn says:
====================
Unique mv88e6xxx IRQ names
There are a few boards which have multiple mv88e6xxx switches. With
such boards, it can be hard to determine which interrupts belong to
which switches. Make the interrupt names unique by including the
device name in the interrupt name. For the SERDES interrupt, also
include the port number. As a result of these patches ZII devel C
looks like:
50: 0 gpio-vf610 27 Level mv88e6xxx-0.1:00
54: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-0.1:00-g1-atu-prob
56: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-0.1:00-g1-vtu-prob
58: 0 mv88e6xxx-g1 7 Edge mv88e6xxx-0.1:00-g2
61: 0 mv88e6xxx-g2 1 Edge !mdio-mux!mdio@1!switch@0!mdio:01
62: 0 mv88e6xxx-g2 2 Edge !mdio-mux!mdio@1!switch@0!mdio:02
63: 0 mv88e6xxx-g2 3 Edge !mdio-mux!mdio@1!switch@0!mdio:03
64: 0 mv88e6xxx-g2 4 Edge !mdio-mux!mdio@1!switch@0!mdio:04
70: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.1:00-serdes-10
75: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-0.1:00-watchdog
76: 5 gpio-vf610 26 Level mv88e6xxx-0.2:00
80: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-0.2:00-g1-atu-prob
82: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-0.2:00-g1-vtu-prob
84: 4 mv88e6xxx-g1 7 Edge mv88e6xxx-0.2:00-g2
87: 2 mv88e6xxx-g2 1 Edge !mdio-mux!mdio@2!switch@0!mdio:01
88: 0 mv88e6xxx-g2 2 Edge !mdio-mux!mdio@2!switch@0!mdio:02
89: 0 mv88e6xxx-g2 3 Edge !mdio-mux!mdio@2!switch@0!mdio:03
90: 0 mv88e6xxx-g2 4 Edge !mdio-mux!mdio@2!switch@0!mdio:04
95: 3 mv88e6xxx-g2 9 Edge mv88e6xxx-0.2:00-serdes-9
96: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.2:00-serdes-10
101: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-0.2:00-watchdog
Interrupt names like !mdio-mux!mdio@2!switch@0!mdio:01 are created by
phylib for the integrated PHYs. The mv88e6xxx driver does not
determine these names.
====================
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dynamically generate a unique interrupt name for the VTU and ATU,
based on the device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dynamically generate a unique g2 interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dynamically generate a unique watchdog interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dynamically generate a unique SERDES interrupt name, based on the
device name and the port the SERDES is for. For example:
95: 3 mv88e6xxx-g2 9 Edge mv88e6xxx-0.2:00-serdes-9
96: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.2:00-serdes-10
The 0.2:00 indicates the switch and -9 indicates port 9.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Dynamically generate a unique switch interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-01-06
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v5.3
('net/mlx5: Move devlink registration before interfaces load')
For -stable v5.4
('net/mlx5e: Fix hairpin RSS table size')
('net/mlx5: DR, Init lists that are used in rule's member')
('net/mlx5e: Always print health reporter message to dmesg')
('net/mlx5: DR, No need for atomic refcount for internal SW steering resources')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:
- kbuild found missing define of MCOUNT_INSN_SIZE for various build
configs
- Initialize variable to zero as gcc thinks it is used undefined (it
really isn't but the code is subtle enough that this doesn't hurt)
- Convert from do_div() to div64_ull() to prevent potential divide by
zero
- Unregister a trace point on error path in sched_wakeup tracer
- Use signed offset for archs that can have stext not be first
- A simple indentation fix (whitespace error)"
* tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix indentation issue
kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
tracing: Change offset type to s32 in preempt/irq tracepoints
ftrace: Avoid potential division by zero in function profiler
tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
tracing: Initialize val to zero in parse_entry of inject code
|
|
Whenever adding new member of rule object we attach it to 2 lists,
These 2 lists should be initialized first.
Fixes: 41d07074154c ("net/mlx5: DR, Expose steering rule functionality")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Set hairpin table size to the corret size, based on the groups that
would be created in it. Groups are laid out on the table such that a
group occupies a range of entries in the table. This implies that the
group ranges should have correspondence to the table they are laid upon.
The patch cited below made group 1's size to grow hence causing
overflow of group range laid on the table.
Fixes: a795d8db2a6d ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
No need for an atomic refcounter for the STE and hashtables.
These are internal SW steering resources and they are always
under domain mutex.
This also fixes the following refcount error:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 9 PID: 3527 at lib/refcount.c:25 refcount_warn_saturate+0x81/0xe0
Call Trace:
dr_table_init_nic+0x10d/0x110 [mlx5_core]
mlx5dr_table_create+0xb4/0x230 [mlx5_core]
mlx5_cmd_dr_create_flow_table+0x39/0x120 [mlx5_core]
__mlx5_create_flow_table+0x221/0x5f0 [mlx5_core]
esw_create_offloads_fdb_tables+0x180/0x5a0 [mlx5_core]
...
Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This reverts commit 7dee607ed0e04500459db53001d8e02f8831f084.
During cleanup path, FTE's parent node group is removed which is
referenced by the FTE while freeing the FTE.
Hence FTE's lockless read lookup optimization done in cited commit is
not possible at the moment.
Hence, revert the commit.
This avoid below KAZAN call trace.
[ 110.390896] BUG: KASAN: use-after-free in find_root.isra.14+0x56/0x60
[mlx5_core]
[ 110.391048] Read of size 4 at addr ffff888c19e6d220 by task
swapper/12/0
[ 110.391219] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 5.5.0-rc1+
[ 110.391222] Hardware name: HP ProLiant DL380p Gen8, BIOS P70
08/02/2014
[ 110.391225] Call Trace:
[ 110.391229] <IRQ>
[ 110.391246] dump_stack+0x95/0xd5
[ 110.391307] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391320] print_address_description.constprop.5+0x20/0x320
[ 110.391379] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391435] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391441] __kasan_report+0x149/0x18c
[ 110.391499] ? find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391504] kasan_report+0x12/0x20
[ 110.391511] __asan_report_load4_noabort+0x14/0x20
[ 110.391567] find_root.isra.14+0x56/0x60 [mlx5_core]
[ 110.391625] del_sw_fte_rcu+0x4a/0x100 [mlx5_core]
[ 110.391633] rcu_core+0x404/0x1950
[ 110.391640] ? rcu_accelerate_cbs_unlocked+0x100/0x100
[ 110.391649] ? run_rebalance_domains+0x201/0x280
[ 110.391654] rcu_core_si+0xe/0x10
[ 110.391661] __do_softirq+0x181/0x66c
[ 110.391670] irq_exit+0x12c/0x150
[ 110.391675] smp_apic_timer_interrupt+0xf0/0x370
[ 110.391681] apic_timer_interrupt+0xf/0x20
[ 110.391684] </IRQ>
[ 110.391695] RIP: 0010:cpuidle_enter_state+0xfa/0xba0
[ 110.391703] Code: 3d c3 9b b5 50 e8 56 75 6e fe 48 89 45 c8 0f 1f 44
00 00 31 ff e8 a6 94 6e fe 45 84 ff 0f 85 f6 02 00 00 fb 66 0f 1f 44 00
00 <45> 85 f6 0f 88 db 06 00 00 4d 63 fe 4b 8d 04 7f 49 8d 04 87 49 8d
[ 110.391706] RSP: 0018:ffff888c23a6fce8 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff13
[ 110.391712] RAX: dffffc0000000000 RBX: ffffe8ffff7002f8 RCX:
000000000000001f
[ 110.391715] RDX: 1ffff11184ee6cb5 RSI: 0000000040277d83 RDI:
ffff888c277365a8
[ 110.391718] RBP: ffff888c23a6fd40 R08: 0000000000000002 R09:
0000000000035280
[ 110.391721] R10: ffff888c23a6fc80 R11: ffffed11847485d0 R12:
ffffffffb1017740
[ 110.391723] R13: 0000000000000003 R14: 0000000000000003 R15:
0000000000000000
[ 110.391732] ? cpuidle_enter_state+0xea/0xba0
[ 110.391738] cpuidle_enter+0x4f/0xa0
[ 110.391747] call_cpuidle+0x6d/0xc0
[ 110.391752] do_idle+0x360/0x430
[ 110.391758] ? arch_cpu_idle_exit+0x40/0x40
[ 110.391765] ? complete+0x67/0x80
[ 110.391771] cpu_startup_entry+0x1d/0x20
[ 110.391779] start_secondary+0x2f3/0x3c0
[ 110.391784] ? set_cpu_sibling_map+0x2500/0x2500
[ 110.391795] secondary_startup_64+0xa4/0xb0
[ 110.391841] Allocated by task 290:
[ 110.391917] save_stack+0x21/0x90
[ 110.391921] __kasan_kmalloc.constprop.8+0xa7/0xd0
[ 110.391925] kasan_kmalloc+0x9/0x10
[ 110.391929] kmem_cache_alloc_trace+0xf6/0x270
[ 110.391987] create_root_ns.isra.36+0x58/0x260 [mlx5_core]
[ 110.392044] mlx5_init_fs+0x5fd/0x1ee0 [mlx5_core]
[ 110.392092] mlx5_load_one+0xc7a/0x3860 [mlx5_core]
[ 110.392139] init_one+0x6ff/0xf90 [mlx5_core]
[ 110.392145] local_pci_probe+0xde/0x190
[ 110.392150] work_for_cpu_fn+0x56/0xa0
[ 110.392153] process_one_work+0x678/0x1140
[ 110.392157] worker_thread+0x573/0xba0
[ 110.392162] kthread+0x341/0x400
[ 110.392166] ret_from_fork+0x1f/0x40
[ 110.392218] Freed by task 2742:
[ 110.392288] save_stack+0x21/0x90
[ 110.392292] __kasan_slab_free+0x137/0x190
[ 110.392296] kasan_slab_free+0xe/0x10
[ 110.392299] kfree+0x94/0x250
[ 110.392357] tree_put_node+0x257/0x360 [mlx5_core]
[ 110.392413] tree_remove_node+0x63/0xb0 [mlx5_core]
[ 110.392469] clean_tree+0x199/0x240 [mlx5_core]
[ 110.392525] mlx5_cleanup_fs+0x76/0x580 [mlx5_core]
[ 110.392572] mlx5_unload+0x22/0xc0 [mlx5_core]
[ 110.392619] mlx5_unload_one+0x99/0x260 [mlx5_core]
[ 110.392666] remove_one+0x61/0x160 [mlx5_core]
[ 110.392671] pci_device_remove+0x10b/0x2c0
[ 110.392677] device_release_driver_internal+0x1e4/0x490
[ 110.392681] device_driver_detach+0x36/0x40
[ 110.392685] unbind_store+0x147/0x200
[ 110.392688] drv_attr_store+0x6f/0xb0
[ 110.392693] sysfs_kf_write+0x127/0x1d0
[ 110.392697] kernfs_fop_write+0x296/0x420
[ 110.392702] __vfs_write+0x66/0x110
[ 110.392707] vfs_write+0x1a0/0x500
[ 110.392711] ksys_write+0x164/0x250
[ 110.392715] __x64_sys_write+0x73/0xb0
[ 110.392720] do_syscall_64+0x9f/0x3a0
[ 110.392725] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 7dee607ed0e0 ("net/mlx5: Support lockless FTE read lookups")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Register devlink before interfaces are added.
This will allow interfaces to use devlink while initalizing. For example,
call mlx5_is_roce_enabled.
Fixes: aba25279c100 ("net/mlx5e: Add TX reporter support")
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In case a reporter exists, error message is logged only to the devlink
tracer. The devlink tracer is a visibility utility only, which user can
choose not to monitor.
After cited patch, 3rd party monitoring tools that tracks these error
message will no longer find them in dmesg, causing a regression.
With this patch, error messages are also logged into the dmesg.
Fixes: c50de4af1d63 ("net/mlx5e: Generalize tx reporter's functionality")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Following scenario easily break driver logic and crash the kernel:
1. Add rule with mirred actions to same device.
2. Delete this rule.
In described scenario rule is not added to database and on deletion
driver access invalid entry.
Example:
$ tc filter add dev ens1f0_0 ingress protocol ip prio 1 \
flower skip_sw \
action mirred egress mirror dev ens1f0_1 pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter del dev ens1f0_0 ingress protocol ip prio 1
Dmesg output:
[ 376.634396] mlx5_core 0000:82:00.0: mlx5_cmd_check:756:(pid 3439): DESTROY_FLOW_GROUP(0x934) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x563e2f)
[ 376.654983] mlx5_core 0000:82:00.0: del_hw_flow_group:567:(pid 3439): flow steering can't destroy fg 89 of ft 3145728
[ 376.673433] kasan: CONFIG_KASAN_INLINE enabled
[ 376.683769] kasan: GPF could be caused by NULL-ptr deref or user memory access
[ 376.695229] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
[ 376.705069] CPU: 7 PID: 3439 Comm: tc Not tainted 5.4.0-rc5+ #76
[ 376.714959] Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0a 08/12/2016
[ 376.726371] RIP: 0010:mlx5_del_flow_rules+0x105/0x960 [mlx5_core]
[ 376.735817] Code: 01 00 00 00 48 83 eb 08 e8 28 d9 ff ff 4c 39 e3 75 d8 4c 8d bd c0 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 84 04 00 00 48 8d 7d 28 8b 9 d
[ 376.761261] RSP: 0018:ffff888847c56db8 EFLAGS: 00010202
[ 376.770054] RAX: dffffc0000000000 RBX: ffff8888582a6da0 RCX: ffff888847c56d60
[ 376.780743] RDX: 0000000000000058 RSI: 0000000000000008 RDI: 0000000000000282
[ 376.791328] RBP: 0000000000000000 R08: fffffbfff0c60ea6 R09: fffffbfff0c60ea6
[ 376.802050] R10: fffffbfff0c60ea5 R11: ffffffff8630752f R12: ffff8888582a6da0
[ 376.812798] R13: dffffc0000000000 R14: ffff8888582a6da0 R15: 00000000000002c0
[ 376.823445] FS: 00007f675f9a8840(0000) GS:ffff88886d200000(0000) knlGS:0000000000000000
[ 376.834971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 376.844179] CR2: 00000000007d9640 CR3: 00000007d3f26003 CR4: 00000000001606e0
[ 376.854843] Call Trace:
[ 376.868542] __mlx5_eswitch_del_rule+0x49/0x300 [mlx5_core]
[ 376.877735] mlx5e_tc_del_fdb_flow+0x6ec/0x9e0 [mlx5_core]
[ 376.921549] mlx5e_flow_put+0x2b/0x50 [mlx5_core]
[ 376.929813] mlx5e_delete_flower+0x5b6/0xbd0 [mlx5_core]
[ 376.973030] tc_setup_cb_reoffload+0x29/0xc0
[ 376.980619] fl_reoffload+0x50a/0x770 [cls_flower]
[ 377.015087] tcf_block_playback_offloads+0xbd/0x250
[ 377.033400] tcf_block_setup+0x1b2/0xc60
[ 377.057247] tcf_block_offload_cmd+0x195/0x240
[ 377.098826] tcf_block_offload_unbind+0xe7/0x180
[ 377.107056] __tcf_block_put+0xe5/0x400
[ 377.114528] ingress_destroy+0x3d/0x60 [sch_ingress]
[ 377.122894] qdisc_destroy+0xf1/0x5a0
[ 377.129993] qdisc_graft+0xa3d/0xe50
[ 377.151227] tc_get_qdisc+0x48e/0xa20
[ 377.165167] rtnetlink_rcv_msg+0x35d/0x8d0
[ 377.199528] netlink_rcv_skb+0x11e/0x340
[ 377.219638] netlink_unicast+0x408/0x5b0
[ 377.239913] netlink_sendmsg+0x71b/0xb30
[ 377.267505] sock_sendmsg+0xb1/0xf0
[ 377.273801] ___sys_sendmsg+0x635/0x900
[ 377.312784] __sys_sendmsg+0xd3/0x170
[ 377.338693] do_syscall_64+0x95/0x460
[ 377.344833] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 377.352321] RIP: 0033:0x7f675e58e090
To avoid this, for every mirred action check if output device was
already processed. If so - drop rule with EOPNOTSUPP error.
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
For better accuracy, i225 is able to do timestamping using the Start of
Packet signal from the PHY.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
This command allows igc to report what types of timestamping are
supported. ptp4l uses this to detect if the hardware supports
timestamping.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Pull tpmd fixes from Jarkko Sakkinen:
"There has been a bunch of reports (e.g. [*]) reporting that when
commit 5b359c7c4372 ("tpm_tis_core: Turn on the TPM before probing
IRQ's") and subsequent fixes are applied it causes boot freezes on
some machines.
Unfortunately hardware where this causes a failure is not widely
available (only one I'm aware is Lenovo T490), which means we cannot
predict yet how long it will take to properly fix tpm_tis interrupt
probing.
Thus, the least worst short term action is to revert the code to the
state before this commit. In long term we need fix the tpm_tis probing
code to work on machines that Stefan's patches were supposed to fix.
With these patches reverted nothing fatal happens, TPM is fallbacked
to be used in polling mode (which is not in the end too bad because
there are no high throughput workloads for TPM).
[*] https://bugzilla.kernel.org/show_bug.cgi?id=205935"
* tag 'tpmdd-next-20200106' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"
tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
tpm: Revert "tpm_tis: reserve chip for duration of tpm_tis_core_init"
|
|
This adds support for timestamping packets being transmitted.
Based on the code from i210. The basic differences is that i225 has 4
registers to store the transmit timestamps (i210 has one). Right now,
we only support retrieving from one register, support for using the
other registers will be added later.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Anatoly has been fuzzing with kBdysch harness and reported a KASAN
slab oob in one of the outcomes:
[...]
[ 77.359642] BUG: KASAN: slab-out-of-bounds in bpf_skb_load_helper_8_no_cache+0x71/0x130
[ 77.360463] Read of size 4 at addr ffff8880679bac68 by task bpf/406
[ 77.361119]
[ 77.361289] CPU: 2 PID: 406 Comm: bpf Not tainted 5.5.0-rc2-xfstests-00157-g2187f215eba #1
[ 77.362134] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 77.362984] Call Trace:
[ 77.363249] dump_stack+0x97/0xe0
[ 77.363603] print_address_description.constprop.0+0x1d/0x220
[ 77.364251] ? bpf_skb_load_helper_8_no_cache+0x71/0x130
[ 77.365030] ? bpf_skb_load_helper_8_no_cache+0x71/0x130
[ 77.365860] __kasan_report.cold+0x37/0x7b
[ 77.366365] ? bpf_skb_load_helper_8_no_cache+0x71/0x130
[ 77.366940] kasan_report+0xe/0x20
[ 77.367295] bpf_skb_load_helper_8_no_cache+0x71/0x130
[ 77.367821] ? bpf_skb_load_helper_8+0xf0/0xf0
[ 77.368278] ? mark_lock+0xa3/0x9b0
[ 77.368641] ? kvm_sched_clock_read+0x14/0x30
[ 77.369096] ? sched_clock+0x5/0x10
[ 77.369460] ? sched_clock_cpu+0x18/0x110
[ 77.369876] ? bpf_skb_load_helper_8+0xf0/0xf0
[ 77.370330] ___bpf_prog_run+0x16c0/0x28f0
[ 77.370755] __bpf_prog_run32+0x83/0xc0
[ 77.371153] ? __bpf_prog_run64+0xc0/0xc0
[ 77.371568] ? match_held_lock+0x1b/0x230
[ 77.371984] ? rcu_read_lock_held+0xa1/0xb0
[ 77.372416] ? rcu_is_watching+0x34/0x50
[ 77.372826] sk_filter_trim_cap+0x17c/0x4d0
[ 77.373259] ? sock_kzfree_s+0x40/0x40
[ 77.373648] ? __get_filter+0x150/0x150
[ 77.374059] ? skb_copy_datagram_from_iter+0x80/0x280
[ 77.374581] ? do_raw_spin_unlock+0xa5/0x140
[ 77.375025] unix_dgram_sendmsg+0x33a/0xa70
[ 77.375459] ? do_raw_spin_lock+0x1d0/0x1d0
[ 77.375893] ? unix_peer_get+0xa0/0xa0
[ 77.376287] ? __fget_light+0xa4/0xf0
[ 77.376670] __sys_sendto+0x265/0x280
[ 77.377056] ? __ia32_sys_getpeername+0x50/0x50
[ 77.377523] ? lock_downgrade+0x350/0x350
[ 77.377940] ? __sys_setsockopt+0x2a6/0x2c0
[ 77.378374] ? sock_read_iter+0x240/0x240
[ 77.378789] ? __sys_socketpair+0x22a/0x300
[ 77.379221] ? __ia32_sys_socket+0x50/0x50
[ 77.379649] ? mark_held_locks+0x1d/0x90
[ 77.380059] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 77.380536] __x64_sys_sendto+0x74/0x90
[ 77.380938] do_syscall_64+0x68/0x2a0
[ 77.381324] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 77.381878] RIP: 0033:0x44c070
[...]
After further debugging, turns out while in case of other helper functions
we disallow passing modified ctx, the special case of ld/abs/ind instruction
which has similar semantics (except r6 being the ctx argument) is missing
such check. Modified ctx is impossible here as bpf_skb_load_helper_8_no_cache()
and others are expecting skb fields in original position, hence, add
check_ctx_reg() to reject any modified ctx. Issue was first introduced back
in f1174f77b50c ("bpf/verifier: rework value tracking").
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200106215157.3553-1-daniel@iogearbox.net
|
|
This adds support for timestamping received packets.
It is based on the i210, as many features of i225 work the same way.
The main difference from i210 is that i225 has support for choosing
the timer register to use when timestamping packets. Right now, we
only support using timer 0. The other difference is that i225 stores
two timestamps in the receive descriptor, right now, we only retrieve
one.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- fix module aliases
- fix potential build errors
- fix missing conversion of imx7ulp_wdt_enable()
- fix platform_get_irq() complaints
- fix NCT6116D support
* tag 'linux-watchdog-5.5-fixes' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: orion: fix platform_get_irq() complaints
watchdog: rn5t618_wdt: fix module aliases
watchdog: tqmx86_wdt: Fix build error
watchdog: max77620_wdt: fix potential build errors
watchdog: imx7ulp: Fix missing conversion of imx7ulp_wdt_enable()
watchdog: w83627hf_wdt: Fix support NCT6116D
|
|
Igor Russkikh says:
====================
Aquantia/Marvell atlantic bugfixes 2020/01
Here is a set of recently discovered bugfixes,
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Function entries were duplicated accidentally, removing the dups.
Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Initial loopback configuration should be called earlier, before
starting traffic on HW blocks. Otherwise depending on race conditions
it could be kept disabled.
Fixes: ea4b4d7fc106 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Last code/checkpatch cleanup did a copy paste error where code from
firmware 3 API logic was moved to firmware 1 logic.
This resulted in FW1.x users would never see the link state as active.
Fixes: 7b0c342f1f67 ("net: atlantic: code style cleanup")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
cgroup bpf structures were released with
corresponding cgroup structures. It guaranteed the hierarchical order
of destruction: children were always first. It preserved attached
programs from being released before their propagated copies.
But with cgroup auto-detachment there are no such guarantees anymore:
cgroup bpf is released as soon as the cgroup is offline and there are
no live associated sockets. It means that an attached program can be
detached and released, while its propagated copy is still living
in the cgroup subtree. This will obviously lead to an use-after-free
bug.
To reproduce the issue the following script can be used:
#!/bin/bash
CGROOT=/sys/fs/cgroup
mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
sleep 1
./test_cgrp2_attach ${CGROOT}/A egress &
A_PID=$!
./test_cgrp2_attach ${CGROOT}/B egress &
B_PID=$!
echo $$ > ${CGROOT}/A/C/cgroup.procs
iperf -s &
S_PID=$!
iperf -c localhost -t 100 &
C_PID=$!
sleep 1
echo $$ > ${CGROOT}/B/cgroup.procs
echo ${S_PID} > ${CGROOT}/B/cgroup.procs
echo ${C_PID} > ${CGROOT}/B/cgroup.procs
sleep 1
rmdir ${CGROOT}/A/C
rmdir ${CGROOT}/A
sleep 1
kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}
On the unpatched kernel the following stacktrace can be obtained:
[ 33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
[ 33.620677] #PF: supervisor read access in kernel mode
[ 33.621293] #PF: error_code(0x0000) - not-present page
[ 33.622754] Oops: 0000 [#1] SMP NOPTI
[ 33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
[ 33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
[ 33.635809] Call Trace:
[ 33.636118] ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
[ 33.636728] ? __switch_to_asm+0x40/0x70
[ 33.637196] ip_finish_output+0x68/0xa0
[ 33.637654] ip_output+0x76/0xf0
[ 33.638046] ? __ip_finish_output+0x1c0/0x1c0
[ 33.638576] __ip_queue_xmit+0x157/0x410
[ 33.639049] __tcp_transmit_skb+0x535/0xaf0
[ 33.639557] tcp_write_xmit+0x378/0x1190
[ 33.640049] ? _copy_from_iter_full+0x8d/0x260
[ 33.640592] tcp_sendmsg_locked+0x2a2/0xdc0
[ 33.641098] ? sock_has_perm+0x10/0xa0
[ 33.641574] tcp_sendmsg+0x28/0x40
[ 33.641985] sock_sendmsg+0x57/0x60
[ 33.642411] sock_write_iter+0x97/0x100
[ 33.642876] new_sync_write+0x1b6/0x1d0
[ 33.643339] vfs_write+0xb6/0x1a0
[ 33.643752] ksys_write+0xa7/0xe0
[ 33.644156] do_syscall_64+0x5b/0x1b0
[ 33.644605] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by grabbing a reference to the bpf structure of each ancestor
on the initialization of the cgroup bpf structure, and dropping the
reference at the end of releasing the cgroup bpf structure.
This will restore the hierarchical order of cgroup bpf releasing,
without adding any operations on hot paths.
Thanks to Josef Bacik for the debugging and the initial analysis of
the problem.
Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Michal Kubecek says:
====================
ethtool: allow nesting of begin() and complete() callbacks
The ethtool ioctl interface used to guarantee that ethtool_ops callbacks
were always called in a block between calls to ->begin() and ->complete()
(if these are defined) and that this whole block was executed with RTNL
lock held:
rtnl_lock();
ops->begin();
/* other ethtool_ops calls */
ops->complete();
rtnl_unlock();
This prevented any nesting or crossing of the begin-complete blocks.
However, this is no longer guaranteed even for ioctl interface as at least
ethtool_phys_id() releases RTNL lock while waiting for a timer. With the
introduction of netlink ethtool interface, the begin-complete pairs are
naturally nested e.g. when a request triggers a netlink notification.
Fortunately, only minority of networking drivers implements begin() and
complete() callbacks and most of those that do, fall into three groups:
- wrappers for pm_runtime_get_sync() and pm_runtime_put()
- wrappers for clk_prepare_enable() and clk_disable_unprepare()
- begin() checks netif_running() (fails if false), no complete()
First two have their own refcounting, third is safe w.r.t. nesting of the
blocks.
Only three in-tree networking drivers need an update to deal with nesting
of begin() and complete() calls: via-velocity and epic100 perform resume
and suspend on their own and wil6210 completely serializes the calls using
its own mutex (which would lead to a deadlock if a request request
triggered a netlink notification). The series addresses these problems.
changes between v1 and v2:
- fix inverted condition in epic100 ethtool_begin() (thanks to Andrew
Lunn)
====================
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unlike most networking drivers using begin() and complete() ethtool_ops
callbacks to resume a device which is down and suspend it again when done,
epic100 does not use standard refcounted infrastructure but sets device
sleep state directly.
With the introduction of netlink ethtool interface, we may have nested
begin-complete blocks so that inner complete() would put the device back to
sleep for the rest of the outer block.
To avoid rewriting an old and not very actively developed driver, just add
a nesting counter and only perform resume and suspend on the outermost
level.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unlike most networking drivers using begin() and complete() ethtool_ops
callbacks to resume a device which is down and suspend it again when done,
via-velocity does not use standard refcounted infrastructure but sets
device sleep state directly.
With the introduction of netlink ethtool interface, we may have nested
begin-complete blocks so that inner complete() would put the device back to
sleep for the rest of the outer block.
To avoid rewriting an old and not very actively developed driver, just add
a nesting counter and only perform resume and suspend on the outermost
level.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The wil6210 driver locks a mutex in begin() ethtool_ops callback and
unlocks it in complete() so that all ethtool requests are serialized. This
is not going to work correctly with netlink interface; e.g. when ioctl
triggers a netlink notification, netlink code would call begin() again
while the mutex taken by ioctl code is still held by the same task.
Let's get rid of the begin() and complete() callbacks and move the mutex
locking into the remaining ethtool_ops handlers except get_drvinfo which
only copies strings that are not changing so that there is no need for
serialization.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix calling multiple tee_client_close_context in case of shm allocation
fails.
Fixes: 246880958ac9 (“firmware: broadcom: add OP-TEE based BNXT f/w manager”)
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|