Age | Commit message (Collapse) | Author |
|
Add the definitions for the commands that do per-instance dump
and re-generate the related code.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-8-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to easily set NLM_F_DUMP_FILTERED for partial dumps, pass the
flags as an arg of dump_one() callback. Currently, it is always
NLM_F_MULTI.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-7-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce dumpit callbacks for generated split ops. Have them
as a thin wrapper around iteration function and allow to pass dump_one()
function pointer directly without need to store in devlink_cmd structs.
Note that the function prototypes are temporary until the generated ones
will replace them in a follow-up patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-6-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Rename netlink doit callback functions for the commands that do
implement per-instance dump to match the generated names that are going
to be introduce in the follow-up patch.
Note that the function prototypes are temporary until the generated ones
will replace them in a follow-up patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-5-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define port handling helpers what don't rely on internal_flags.
Have __devlink_nl_pre_doit() to accept the flags as a function arg and
make devlink_nl_pre_doit() a wrapper helper function calling it.
Introduce new helpers devlink_nl_pre_doit_port() and
devlink_nl_pre_doit_port_optional() to be used by split ops in follow-up
patch.
Note that the function prototypes are temporary until the generated ones
will replace them in a follow-up patch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-4-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No need to give the rate any special treatment in netlink attributes
parsing, as unlike for ports, there is only a couple of commands
benefiting from that.
Remove DEVLINK_NL_FLAG_NEED_RATE*, make pre_doit() callback simpler
by moving the rate attributes parsing to rate_*_doit() ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-3-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
No need to give the linecards any special treatment in netlink attribute
parsing, as unlike for ports, there is only a couple of commands
benefiting from that.
Remove DEVLINK_NL_FLAG_NEED_LINECARD, make pre_doit() callback simpler
by moving the linecard attribute parsing to linecard_[gs]et_doit() ops.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230811155714.1736405-2-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Running the bench_rename test script, the following error occurs:
# ./benchs/run_bench_rename.sh
base : 0.819 ± 0.012M/s
kprobe : 0.538 ± 0.009M/s
kretprobe : 0.503 ± 0.004M/s
rawtp : 0.779 ± 0.020M/s
fentry : 0.726 ± 0.007M/s
fexit : 0.691 ± 0.007M/s
benchmark 'rename-fmodret' not found
The bench_rename_fmodret has been removed in commit b000def2e052
("selftests: Remove fmod_ret from test_overhead"), thus remove it
from the runners in the test script.
Fixes: b000def2e052 ("selftests: Remove fmod_ret from test_overhead")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230814030727.3010390-1-zouyipeng@huawei.com
|
|
There is no way where topts.repeat can be set to 1 when tc_test fails.
Fix the typo where the break statement slipped by one line.
Fixes: fb66223a244f ("selftests/bpf: add test for accessing ctx from syscall program type")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Link: https://lore.kernel.org/bpf/20230814031434.3077944-1-zouyipeng@huawei.com
|
|
This code was only used in the past for the sysfs interface. But since
this was replace with netlink, it was never executed. The function pointer
was only checked to figure out whether the limit 255 (B.A.T.M.A.N. IV) or
2**32-1 (B.A.T.M.A.N. V) should be used as limit.
So instead of keeping the function pointer, just store the limits directly
in struct batadv_algo_gw_ops.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The batadv_netlink_notify_*() functions are not used by any other source
file. Just keep them local to netlink.c to get informed by the compiler
when they are not used anymore.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
This function is no longer used since the sysfs support was removed from
batman-adv.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The current display probe is unable to differentiate between IVB Q and
IVB D GT2 server, as they both have the same device id, but different
subvendor and subdevice. This leads to the latter being misidentified as
the former, and should just end up not having a display. However, the no
display case returns a NULL as the display device info, and promptly
oopses.
As the IVB Q case is rare, and we're anyway moving towards GMD ID,
handle the identification requiring subvendor and subdevice as a special
case first, instead of unnecessarily growing the intel_display_ids[]
array with subvendor and subdevice.
[ 5.425298] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 5.426059] #PF: supervisor read access in kernel mode
[ 5.426810] #PF: error_code(0x0000) - not-present page
[ 5.427570] PGD 0 P4D 0
[ 5.428285] Oops: 0000 [#1] PREEMPT SMP PTI
[ 5.429035] CPU: 0 PID: 137 Comm: (udev-worker) Not tainted 6.4.0-1-amd64 #1 Debian 6.4.4-1
[ 5.429759] Hardware name: HP HP Z220 SFF Workstation/HP Z220 SFF Workstation, BIOS 4.19-218-gb184e6e0a1 02/02/2023
[ 5.430485] RIP: 0010:intel_device_info_driver_create+0xf1/0x120 [i915]
[ 5.431338] Code: 48 8b 97 80 1b 00 00 89 8f c0 1b 00 00 48 89 b7 b0 1b 00 00 48 89 97 b8 1b 00 00 0f b7 fd e8 76 e8 14 00 48 89 83 50 1b 00 00 <48> 8b 08 48 89 8b c4 1b 00 00 48 8b 48 08 48 89 8b cc 1b 00 00 8b
[ 5.432920] RSP: 0018:ffffb8254044fb98 EFLAGS: 00010206
[ 5.433707] RAX: 0000000000000000 RBX: ffff923076e80000 RCX: 0000000000000000
[ 5.434494] RDX: 0000000000000260 RSI: 0000000100001000 RDI: 000000000000016a
[ 5.435277] RBP: 000000000000016a R08: ffffb8254044fb00 R09: 0000000000000000
[ 5.436055] R10: ffff922d02761de8 R11: 00657361656c6572 R12: ffffffffc0e5d140
[ 5.436867] R13: ffff922d00b720d0 R14: 0000000076e80000 R15: ffff923078c0cae8
[ 5.437646] FS: 00007febd19a18c0(0000) GS:ffff92307c000000(0000) knlGS:0000000000000000
[ 5.438434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.439218] CR2: 0000000000000000 CR3: 000000010256e002 CR4: 00000000001706f0
[ 5.440009] Call Trace:
[ 5.440824] <TASK>
[ 5.441611] ? __die+0x23/0x70
[ 5.442394] ? page_fault_oops+0x17d/0x4c0
[ 5.443173] ? exc_page_fault+0x7f/0x180
[ 5.443949] ? asm_exc_page_fault+0x26/0x30
[ 5.444756] ? intel_device_info_driver_create+0xf1/0x120 [i915]
[ 5.445652] ? intel_device_info_driver_create+0xea/0x120 [i915]
[ 5.446545] i915_driver_probe+0x7f/0xb60 [i915]
[ 5.447431] ? drm_privacy_screen_get+0x15c/0x1a0 [drm]
[ 5.448240] local_pci_probe+0x45/0xa0
[ 5.449013] pci_device_probe+0xc7/0x240
[ 5.449748] really_probe+0x19e/0x3e0
[ 5.450464] ? __pfx___driver_attach+0x10/0x10
[ 5.451172] __driver_probe_device+0x78/0x160
[ 5.451870] driver_probe_device+0x1f/0x90
[ 5.452601] __driver_attach+0xd2/0x1c0
[ 5.453293] bus_for_each_dev+0x88/0xd0
[ 5.453989] bus_add_driver+0x116/0x220
[ 5.454672] driver_register+0x59/0x100
[ 5.455336] i915_init+0x25/0xc0 [i915]
[ 5.456104] ? __pfx_i915_init+0x10/0x10 [i915]
[ 5.456882] do_one_initcall+0x5d/0x240
[ 5.457511] do_init_module+0x60/0x250
[ 5.458126] __do_sys_finit_module+0xac/0x120
[ 5.458721] do_syscall_64+0x60/0xc0
[ 5.459314] ? syscall_exit_to_user_mode+0x1b/0x40
[ 5.459897] ? do_syscall_64+0x6c/0xc0
[ 5.460510] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 5.461082] RIP: 0033:0x7febd20b0eb9
[ 5.461648] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2f 1f 0d 00 f7 d8 64 89 01 48
[ 5.462905] RSP: 002b:00007fffabb1ba78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 5.463554] RAX: ffffffffffffffda RBX: 0000561e6304f410 RCX: 00007febd20b0eb9
[ 5.464201] RDX: 0000000000000000 RSI: 00007febd2244f0d RDI: 0000000000000015
[ 5.464869] RBP: 00007febd2244f0d R08: 0000000000000000 R09: 000000000000000a
[ 5.465512] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000020000
[ 5.466124] R13: 0000000000000000 R14: 0000561e63032b60 R15: 000000000000000a
[ 5.466700] </TASK>
[ 5.467271] Modules linked in: i915(+) drm_buddy video crc32_pclmul sr_mod hid_generic wmi crc32c_intel i2c_algo_bit sd_mod cdrom drm_display_helper cec usbhid rc_core ghash_clmulni_intel hid sha512_ssse3 ttm sha512_generic xhci_pci ehci_pci xhci_hcd ehci_hcd nvme ahci drm_kms_helper nvme_core libahci t10_pi libata psmouse aesni_intel scsi_mod crypto_simd i2c_i801 scsi_common crc64_rocksoft_generic cryptd i2c_smbus drm lpc_ich crc64_rocksoft crc_t10dif e1000e usbcore crct10dif_generic usb_common crct10dif_pclmul crc64 crct10dif_common button
[ 5.469750] CR2: 0000000000000000
[ 5.470364] ---[ end trace 0000000000000000 ]---
[ 5.470971] RIP: 0010:intel_device_info_driver_create+0xf1/0x120 [i915]
[ 5.471699] Code: 48 8b 97 80 1b 00 00 89 8f c0 1b 00 00 48 89 b7 b0 1b 00 00 48 89 97 b8 1b 00 00 0f b7 fd e8 76 e8 14 00 48 89 83 50 1b 00 00 <48> 8b 08 48 89 8b c4 1b 00 00 48 8b 48 08 48 89 8b cc 1b 00 00 8b
[ 5.473034] RSP: 0018:ffffb8254044fb98 EFLAGS: 00010206
[ 5.473698] RAX: 0000000000000000 RBX: ffff923076e80000 RCX: 0000000000000000
[ 5.474371] RDX: 0000000000000260 RSI: 0000000100001000 RDI: 000000000000016a
[ 5.475045] RBP: 000000000000016a R08: ffffb8254044fb00 R09: 0000000000000000
[ 5.475725] R10: ffff922d02761de8 R11: 00657361656c6572 R12: ffffffffc0e5d140
[ 5.476405] R13: ffff922d00b720d0 R14: 0000000076e80000 R15: ffff923078c0cae8
[ 5.477124] FS: 00007febd19a18c0(0000) GS:ffff92307c000000(0000) knlGS:0000000000000000
[ 5.477811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5.478499] CR2: 0000000000000000 CR3: 000000010256e002 CR4: 00000000001706f0
Fixes: 69d439818fe5 ("drm/i915/display: Make display responsible for probing its own IP")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8991
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230804084600.1005818-1-jani.nikula@intel.com
(cherry picked from commit 1435188307d128671f677eb908e165666dd83652)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Commit 3f9ffce5765d ("drm/i915: Do panel VBT init early if the VBT
declares an explicit panel type") started using -1 as the value for
unset panel_type. It gets initialized in intel_panel_init_alloc(), but
the SDVO code never calls it.
Call intel_panel_init_alloc() to initialize the panel, including the
panel_type.
Reported-by: Tomi Leppänen <tomi@tomin.site>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8896
Fixes: 3f9ffce5765d ("drm/i915: Do panel VBT init early if the VBT declares an explicit panel type")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.1+
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Tested-by: Tomi Leppänen <tomi@tomin.site>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230803122706.838721-1-jani.nikula@intel.com
(cherry picked from commit 26e60294e8eacedc8ebb33405b2c375fd80e0900)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
This should be done before the soft min/max frequencies are restored.
When we disable the "Ignore efficient frequency" flag, GuC does not
actually bring the requested freq down to RPn.
Specifically, this scenario-
- ignore efficient freq set to true
- reduce min to RPn (from efficient)
- suspend
- resume (includes GuC load, restore soft min/max, restore efficient freq)
- validate min freq has been resored to RPn
This will fail if we didn't first restore(disable, in this case) efficient
freq flag before setting the soft min frequency.
v2: Bring the min freq down to RPn when we disable efficient freq (Rodrigo)
Also made the change to set the min softlimit to RPn at init. Otherwise, we
were storing RPe there.
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8736
Fixes: 55f9720dbf23 ("drm/i915/guc/slpc: Provide sysfs for efficient freq")
Fixes: 95ccf312a1e4 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230726010044.3280402-1-vinay.belgaumkar@intel.com
(cherry picked from commit 28e671114fb0f28f334fac8d0a6b9c395c7b0498)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
|
Enable the close-on-exec flag when using gzopen. This is especially important
for multithreaded programs making use of libbpf, where a fork + exec could
race with libbpf library calls, potentially resulting in a file descriptor
leaked to the new process. This got missed in 59842c5451fe ("libbpf: Ensure
libbpf always opens files with O_CLOEXEC").
Fixes: 59842c5451fe ("libbpf: Ensure libbpf always opens files with O_CLOEXEC")
Signed-off-by: Marco Vedovati <marco.vedovati@crowdstrike.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230810214350.106301-1-martin.kelly@crowdstrike.com
|
|
We recently had problems where a network namespace was deleted
causing hard to debug reconnect problems. To help deal with
configuration issues like this it is useful to dump the network
namespace to better debug what happened.
So add this to information displayed in /proc/fs/cifs/DebugData for
the server (and channels if mounted with multichannel). For example:
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 0 Net namespace: 4026531840
This can be easily compared with what is displayed for the
processes on the system. For example /proc/1/ns/net in this case
showed the same thing (see below), and we can see that the namespace
is still valid in this example.
'net:[4026531840]'
Cc: stable@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Under the current code, when cifs_readpage_worker is called, the call
contract is that the callee should unlock the page. This is documented
in the read_folio section of Documentation/filesystems/vfs.rst as:
> The filesystem should unlock the folio once the read has completed,
> whether it was successful or not.
Without this change, when fscache is in use and cache hit occurs during
a read, the page lock is leaked, producing the following stack on
subsequent reads (via mmap) to the page:
$ cat /proc/3890/task/12864/stack
[<0>] folio_wait_bit_common+0x124/0x350
[<0>] filemap_read_folio+0xad/0xf0
[<0>] filemap_fault+0x8b1/0xab0
[<0>] __do_fault+0x39/0x150
[<0>] do_fault+0x25c/0x3e0
[<0>] __handle_mm_fault+0x6ca/0xc70
[<0>] handle_mm_fault+0xe9/0x350
[<0>] do_user_addr_fault+0x225/0x6c0
[<0>] exc_page_fault+0x84/0x1b0
[<0>] asm_exc_page_fault+0x27/0x30
This requires a reboot to resolve; it is a deadlock.
Note however that the call to cifs_readpage_from_fscache does mark the
page clean, but does not free the folio lock. This happens in
__cifs_readpage_from_fscache on success. Releasing the lock at that
point however is not appropriate as cifs_readahead also calls
cifs_readpage_from_fscache and *does* unconditionally release the lock
after its return. This change therefore effectively makes
cifs_readpage_worker work like cifs_readahead.
Signed-off-by: Russell Harmon <russ@har.mn>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Commit 03e909acd95a ("drm/panel: simple: Add support for AUO G121EAN01.4
panel") added support for this panel model, but the timings it implements
are very different from what the datasheet describes. I checked both the
G121EAN01.0 datasheet from [0] and the G121EAN01.4 one from [1] and they
all have the same timings: for example the LVDS clock typical value is 74.4
MHz, not 66.7 MHz as implemented.
Replace the timings with the ones from the documentation. These timings
have been tested and the clock frequencies verified with an oscilloscope to
ensure they are correct.
Also use struct display_timing instead of struct drm_display_mode in order
to also specify the minimum and maximum values.
[0] https://embedded.avnet.com/product/g121ean01-0/
[1] https://embedded.avnet.com/product/g121ean01-4/
Fixes: 03e909acd95a ("drm/panel: simple: Add support for AUO G121EAN01.4 panel")
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230804151239.835216-1-luca.ceresoli@bootlin.com
|
|
Use the dev_err_probe() helper to simplify error handling during probe.
This also handle scenario, when EDEFER is returned and useless error is printed.
Fixes error:
panel-jdi-lt070me05000 4700000.dsi.0: cannot get enable-gpio -517
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230812185239.378582-1-david@ixit.cz
|
|
This test verifies whether the encapsulated packets have the correct
configured TTL. It does so by sending ICMP packets through the test
topology and mirroring them to a gretap netdevice. On a busy host
however, more than just the test ICMP packets may end up flowing
through the topology, get mirrored, and counted. This leads to
potential spurious failures as the test observes much more mirrored
packets than the sent test packets, and assumes a bug.
Fix this by tightening up the mirror action match. Change it from
matchall to a flower classifier matching on ICMP packets specifically.
Fixes: 45315673e0c5 ("selftests: forwarding: Test changes in mirror-to-gretap")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For the TLB_PTLOCK checks we used an optimization to store the spc
register into the spinlock to unlock it. This optimization works as
long as the lightweight spinlock checks (CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK)
aren't enabled, because they really check if the lock word is zero or
__ARCH_SPIN_LOCK_UNLOCKED_VAL and abort with a kernel crash
("Spinlock was trashed") otherwise.
Drop that optimization to make it possible to activate both checks
at the same time.
Noticed-by: Sam James <sam@gentoo.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Sam James <sam@gentoo.org>
Cc: stable@vger.kernel.org # v6.4+
Fixes: 15e64ef6520e ("parisc: Add lightweight spinlock checks")
|
|
This reverts commit 718cb09aaa6fa78cc8124e9517efbc6c92665384.
The commit triggers multiple syzbot issues, probably due to possibility of
manually creating VLAN 0 on netdevice which will cause the code to delete
it since it can't distinguish such VLAN from implicit VLAN 0 automatically
created for devices with NETIF_F_HW_VLAN_CTAG_FILTER feature.
Reported-by: syzbot+662f783a5cdf3add2719@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000090196d0602a6167d@google.com/
Reported-by: syzbot+4b4f06495414e92701d5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/00000000000096ae870602a61602@google.com/
Reported-by: syzbot+d810d3cd45ed1848c3f7@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000009f0f9c0602a616ce@google.com/
Fixes: 718cb09aaa6f ("vlan: Fix VLAN 0 memory leak")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PSGMII interface is similar to QSGMII. The main difference
is that the PSGMII interface combines five SGMII lines into a
single link while in QSGMII only four lines are combined.
Similarly to the QSGMII, this interface mode might also needs
special handling within the MAC driver.
It is commonly used by Qualcomm with their QCA807x PHY series and
modern WiSoC-s.
Add definitions for the PHY layer to allow to express this type
of connection between the MAC and PHY.
Signed-off-by: Gabor Juhos <j4g8y7@gmail.com>
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new PSGMII mode which is similar to QSGMII with the difference being
that it combines 5 SGMII lines into a single link compared to 4 on QSGMII.
It is commonly used by Qualcomm on their QCA807x PHY series.
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Petr Machata says:
====================
mlxsw: Support traffic redirection from a locked bridge port
Ido Schimmel writes:
It is possible to add a filter that redirects traffic from the ingress
of a bridge port that is locked (i.e., performs security / SMAC lookup)
and has learning enabled. For example:
# ip link add name br0 type bridge
# ip link set dev swp1 master br0
# bridge link set dev swp1 learning on locked on mab on
# tc qdisc add dev swp1 clsact
# tc filter add dev swp1 ingress pref 1 proto ip flower skip_sw src_ip 192.0.2.1 action mirred egress redirect dev swp2
In the kernel's Rx path, this filter is evaluated before the Rx handler
of the bridge, which means that redirected traffic should not be
affected by bridge port configuration such as learning.
However, the hardware data path is a bit different and the redirect
action (FORWARDING_ACTION in hardware) merely attaches a pointer to the
packet, which is later used by the L2 lookup stage to understand how to
forward the packet. Between both stages - ingress ACL and L2 lookup -
learning and security lookup are performed, which means that redirected
traffic is affected by bridge port configuration, unlike in the kernel's
data path.
The learning discrepancy was handled in commit 577fa14d2100 ("mlxsw:
spectrum: Do not process learned records with a dummy FID") by simply
ignoring learning notifications generated by the redirected traffic. A
similar solution is not possible for the security / SMAC lookup since
- unlike learning - the CPU is not involved and packets that failed the
lookup are dropped by the device.
Instead, solve this by prepending the ignore action to the redirect
action and use it to instruct the device to disable both learning and
the security / SMAC lookup for redirected traffic.
Patch #1 adds the ignore action.
Patch #2 prepends the action to the redirect action in flower offload
code.
Patch #3 removes the workaround in commit 577fa14d2100 ("mlxsw:
spectrum: Do not process learned records with a dummy FID") since it is
no longer needed.
Patch #4 adds a test case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Check that traffic can be redirected from a locked bridge port and that
it does not create locked FDB entries.
Cc: Hans J. Schultz <netdev@kapio-technology.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As explained in the previous patch, with the ignore action prepended to
the redirect action, it is not longer possible for redirected traffic to
generate learning notifications.
Therefore, remove the workaround that was added in commit 577fa14d2100
("mlxsw: spectrum: Do not process learned records with a dummy FID") as
it is no longer needed.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is possible to add a filter that redirects traffic from the ingress
of a bridge port that is locked (i.e., performs security / SMAC lookup)
and has learning enabled. For example:
# ip link add name br0 type bridge
# ip link set dev swp1 master br0
# bridge link set dev swp1 learning on locked on mab on
# tc qdisc add dev swp1 clsact
# tc filter add dev swp1 ingress pref 1 proto ip flower skip_sw src_ip 192.0.2.1 action mirred egress redirect dev swp2
In the kernel's Rx path, this filter is evaluated before the Rx handler
of the bridge, which means that redirected traffic should not be
affected by bridge port configuration such as learning.
However, the hardware data path is a bit different and the redirect
action (FORWARDING_ACTION in hardware) merely attaches a pointer to the
packet, which is later used by the L2 lookup stage to understand how to
forward the packet. Between both stages - ingress ACL and L2 lookup -
learning and security lookup are performed, which means that redirected
traffic is affected by bridge port configuration, unlike in the kernel's
data path.
The learning discrepancy was handled in commit 577fa14d2100 ("mlxsw:
spectrum: Do not process learned records with a dummy FID") by simply
ignoring learning notifications generated by the redirected traffic. A
similar solution is not possible for the security / SMAC lookup since
- unlike learning - the CPU is not involved and packets that failed the
lookup are dropped by the device.
Instead, solve this by prepending the ignore action to the redirect
action and use it to instruct the device to disable both learning and
the security / SMAC lookup for redirected traffic.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the IGNORE_ACTION which is used to ignore basic switching functions
such as learning on a per-packet basis.
The action will be prepended to the FORWARDING_ACTION in subsequent
patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
1. Show TSSTSSEL(Timestamp System Time Source),
ADDMACADRSEL(additional MAC addresses), SMASEL(SMA/MDIO Interface),
HDSEL(Half-duplex Support) in debugfs.
2. Show exact number of additional MAC address registers for XGMAC2 core.
3. XGMAC2 core does not have different IP checksum offload types, so just
show rx_coe instead of rx_coe_type1 or rx_coe_type2.
4. XGMAC2 core does not have rxfifo_over_2048 definition, skip it.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Li Zetao says:
====================
Use helper functions to update stats
The patch set uses the helper functions dev_sw_netstats_rx_add() and
dev_sw_netstats_tx_add() to update stats, which is the same as
implementing the function separately.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper functions dev_sw_netstats_rx_add() and
dev_sw_netstats_tx_add() to update stats, which helps to
provide code readability.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the helper functions dev_sw_netstats_rx_add() and
dev_sw_netstats_tx_add() to update stats, which helps to
provide code readability.
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.
Background:
The vmxnet3 rx consists of three rings: ring0, ring1, and dataring.
For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma
mapped to the ring's descriptor. If LRO is enabled and packet size larger
than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of
the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is
allocated using alloc_page. So for LRO packets, the payload will be in one
buffer from r0 and multiple from r1, for non-LRO packets, only one
descriptor in r0 is used for packet size less than 3k.
When receiving a packet, the first descriptor will have the sop (start of
packet) bit set, and the last descriptor will have the eop (end of packet)
bit set. Non-LRO packets will have only one descriptor with both sop and
eop set.
Other than r0 and r1, vmxnet3 dataring is specifically designed for
handling packets with small size, usually 128 bytes, defined in
VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend
driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in
order to avoid memory mapping/unmapping overhead. In summary, packet size:
A. < 128B: use dataring
B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB)
C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE)
As a result, the patch adds XDP support for packets using dataring
and r0 (case A and B), not the large packet size when LRO is enabled.
XDP Implementation:
When user loads and XDP prog, vmxnet3 driver checks configurations, such
as mtu, lro, and re-allocate the rx buffer size for reserving the extra
headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be
associated with every rx queue of the device. Note that when using dataring
for small packet size, vmxnet3 (front-end driver) doesn't control the
buffer allocation, as a result we allocate a new page and copy packet
from the dataring to XDP frame.
The receive side of XDP is implemented for case A and B, by invoking the
bpf program at vmxnet3_rq_rx_complete and handle its returned action.
The vmxnet3_process_xdp(), vmxnet3_process_xdp_small() function handles
the ring0 and dataring case separately, and decides the next journey of
the packet afterward.
For TX, vmxnet3 has split header design. Outgoing packets are parsed
first and protocol headers (L2/L3/L4) are copied to the backend. The
rest of the payload are dma mapped. Since XDP_TX does not parse the
packet protocol, the entire XDP frame is dma mapped for transmission
and transmitted in a batch. Later on, the frame is freed and recycled
back to the memory pool.
Performance:
Tested using two VMs inside one ESXi vSphere 7.0 machine, using single
core on each vmxnet3 device, sender using DPDK testpmd tx-mode attached
to vmxnet3 device, sending 64B or 512B UDP packet.
VM1 txgen:
$ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \
--forward-mode=txonly --eth-peer=0,<mac addr of vm2>
option: add "--txonly-multi-flow"
option: use --txpkts=512 or 64 byte
VM2 running XDP:
$ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode
$ ./samples/bpf/xdp_rxq_info -d ens160 -a <options>
options: XDP_DROP, XDP_PASS, XDP_TX
To test REDIRECT to cpu 0, use
$ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
Single core performance comparison with skb-mode.
64B: skb-mode -> native-mode
XDP_DROP: 1.6Mpps -> 2.4Mpps
XDP_PASS: 338Kpps -> 367Kpps
XDP_TX: 1.1Mpps -> 2.3Mpps
REDIRECT-drop: 1.3Mpps -> 2.3Mpps
512B: skb-mode -> native-mode
XDP_DROP: 863Kpps -> 1.3Mpps
XDP_PASS: 275Kpps -> 376Kpps
XDP_TX: 554Kpps -> 1.2Mpps
REDIRECT-drop: 659Kpps -> 1.2Mpps
Demo: https://youtu.be/4lm1CSCi78Q
Future work:
- XDP frag support
- use napi_consume_skb() instead of dev_kfree_skb_any at unmap
- stats using u64_stats_t
- using bitfield macro BIT()
- optimization for DMA synchronization using actual frame length,
instead of always max_len
Signed-off-by: William Tu <u9012063@gmail.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adrian Moreno says:
====================
openvswitch: add drop reasons
There is currently a gap in drop visibility in the openvswitch module.
This series tries to improve this by adding a new drop reason subsystem
for OVS.
Apart from adding a new drop reasson subsystem and some common drop
reasons, this series takes Eric's preliminary work [1] on adding an
explicit drop action and integrates it into the same subsystem.
A limitation of this series is that it does not report upcall errors.
The reason is that there could be many sources of upcall drops and the
most common one, which is the netlink buffer overflow, cannot be
reported via kfree_skb() because the skb is freed in the netlink layer
(see [2]). Therefore, using a reason for the rare events and not the
common one would be even more misleading. I'd propose we add (in a
follow up patch) a tracepoint to better report upcall errors.
[1] https://lore.kernel.org/netdev/202306300609.tdRdZscy-lkp@intel.com/T/
[2] commit 1100248a5c5c ("openvswitch: Fix double reporting of drops in dropwatch")
---
v4 -> v5:
- Rebased
- Added a helper function to explicitly convert drop reason enum types
v3 -> v4:
- Changed names of errors following Ilya's suggestions
- Moved the ovs-dpctl.py changes from patch 7/7 to 3/7
- Added a test to ensure actions following a drop are rejected
rfc2 -> v3:
- Rebased on top of latest net-next
rfc1 -> rfc2:
- Fail when an explicit drop is not the last
- Added a drop reason for action errors
- Added braces around macros
- Dropped patch that added support for masks in ovs-dpctl.py as it's now
included in Aaron's series [2].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test explicit drops generate the right drop reason. Also, verify that
the kernel rejects flows with actions following an explicit drop.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Test if the correct drop reason is reported when OVS drops a packet due
to an explicit flow.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use drop reasons from include/net/dropreason-core.h when a reasonable
candidate exists.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
By using an independent drop reason it makes it easy to distinguish
between QoS-triggered or flow-triggered drop.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
From: Eric Garver <eric@garver.life>
This adds an explicit drop action. This is used by OVS to drop packets
for which it cannot determine what to do. An explicit action in the
kernel allows passing the reason _why_ the packet is being dropped or
zero to indicate no particular error happened (i.e: OVS intentionally
dropped the packet).
Since the error codes coming from userspace mean nothing for the kernel,
we squash all of them into only two drop reasons:
- OVS_DROP_EXPLICIT_WITH_ERROR to indicate a non-zero value was passed
- OVS_DROP_EXPLICIT to indicate a zero value was passed (no error)
e.g. trace all OVS dropped skbs
# perf trace -e skb:kfree_skb --filter="reason >= 0x30000"
[..]
106.023 ping/2465 skb:kfree_skb(skbaddr: 0xffffa0e8765f2000, \
location:0xffffffffc0d9b462, protocol: 2048, reason: 196611)
reason: 196611 --> 0x30003 (OVS_DROP_EXPLICIT)
Also, this patch allows ovs-dpctl.py to add explicit drop actions as:
"drop" -> implicit empty-action drop
"drop(0)" -> explicit non-error action drop
"drop(42)" -> explicit error action drop
Signed-off-by: Eric Garver <eric@garver.life>
Co-developed-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a drop reason for packets that are dropped because an action
returns a non-zero error code.
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Create a new drop reason subsystem for openvswitch and add the first
drop reason to represent last-action drops.
Last-action drops happen when a flow has an empty action list or there
is no action that consumes the packet (output, userspace, recirc, etc).
It is the most common way in which OVS drops packets.
Implementation-wise, most of these skb-consuming actions already call
"consume_skb" internally and return directly from within the
do_execute_actions() loop so with minimal changes we can assume that
any skb that exits the loop normally is a packet drop.
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Matthieu Baerts says:
====================
mptcp: get rid of msk->subflow
The MPTCP protocol maintains an additional struct socket per connection,
mainly to be able to easily use tcp-level struct socket operations.
This leads to several side effects, beyond the quite unfortunate /
confusing 'subflow' field name:
- active and passive sockets behaviour is inconsistent: only active ones
have a not NULL msk->subflow, leading to different error handling and
different error code returned to the user-space in several places.
- active sockets uses an unneeded, larger amount of memory
- passive sockets can't successfully go through accept(), disconnect(),
accept() sequence, see [1] for more details.
The 13 first patches of this series are from Paolo and address all the
above, finally getting rid of the blamed field:
- The first patch is a minor clean-up.
- In the next 11 patches, msk->subflow usage is systematically removed
from the MPTCP protocol, replacing it with direct msk->first usage,
eventually introducing new core helpers when needed.
- The 13th patch finally disposes the field, and it's the only patch in
the series intended to produce functional changes.
The last and 14th patch is from Kuniyuki and it is not linked to the
previous ones: it is a small clean-up to get rid of an unnecessary check
in mptcp_init_sock().
[1] https://github.com/multipath-tcp/mptcp_net-next/issues/290
====================
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
|
|
__mptcp_init_sock() always returns 0 because mptcp_init_sock() used
to return the value directly.
But after commit 18b683bff89d ("mptcp: queue data for mptcp level
retransmission"), __mptcp_init_sock() need not return value anymore.
Let's remove the unnecessary test for __mptcp_init_sock() and make
it return void.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Such field is now unused just as a flag to control the first subflow
deletion at close() time. Introduce a new bit flag for that and finally
drop the mentioned field.
As an intended side effect, now the first subflow sock is not freed
before close() even for passive sockets. The msk has no open/active
subflows if the first one is closed and the subflow list is singular,
update accordingly the state check in mptcp_stream_accept().
Among other benefits, the subflow removal, reduces the amount of memory
used on the client side for each mptcp connection, allows passive sockets
to go through successful accept()/disconnect()/connect() and makes return
error code consistent for failing both passive and active sockets.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/290
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After the previous patch the __mptcp_nmpc_socket helper is used
only to ensure that the MPTCP socket is a suitable status - that
is, the mptcp capable handshake is not started yet.
Change the return value to the relevant subflow sock, to finally
remove the last references to first subflow socket in the MPTCP stack.
As a bonus, we can get rid of a few local variables in different
functions.
No functional change intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is one of the few remaining spots actually manipulating the
first subflow socket. We can leverage the recently introduced
inet helpers to get rid of ssock there.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mptcp sockopt infrastructure unneedly uses the first subflow
socket struct in a few spots. We are going to remove such field
soon, so use directly the first subflow sock instead.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to remove the first subflow socket soon, so avoid
the additional indirection at accept() time. Instead access
directly the first subflow sock, and update mptcp_accept() to
operate on it. This allows dropping a duplicated check in
mptcp_accept().
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|