Age | Commit message (Collapse) | Author |
|
Alexander Lobakin says:
====================
page_pool: allow direct bulk recycling
Previously, there was no reliable way to check whether it's safe to use
direct PP cache. The drivers were passing @allow_direct to the PP
recycling functions and that was it. Bulk recycling is used by
xdp_return_frame_bulk() on .ndo_xdp_xmit() frames completion where
the page origin is unknown, thus the direct recycling has never been
tried.
Now that we have at least 2 ways of checking if we're allowed to perform
direct recycling -- pool->p.napi (Jakub) and pool->cpuid (Lorenzo), we
can use them when doing bulk recycling as well. Just move that logic
from the skb core to the PP core and call it before
__page_pool_put_page() every time @allow_direct is false.
Under high .ndo_xdp_xmit() traffic load, the win is 2-3% Pps assuming
the sending driver uses xdp_return_frame_bulk() on Tx completion.
====================
Link: https://lore.kernel.org/r/20240329165507.3240110-1-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that the checks for direct recycling possibility live inside the
Page Pool core, reuse them when performing bulk recycling.
page_pool_put_page_bulk() can be called from process context as well,
page_pool_napi_local() takes care of this at the very beginning.
Under high .ndo_xdp_xmit() traffic load, the win is 2-3% Pps assuming
the sending driver uses xdp_return_frame_bulk() on Tx completion.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240329165507.3240110-3-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since we have pool->p.napi (Jakub) and pool->cpuid (Lorenzo) to check
whether it's safe to use direct recycling, we can use both globally for
each page instead of relying solely on @allow_direct argument.
Let's assume that @allow_direct means "I'm sure it's local, don't waste
time rechecking this" and when it's false, try the mentioned params to
still recycle the page directly. If neither is true, we'll lose some
CPU cycles, but then it surely won't be hotpath. On the other hand,
paths where it's possible to use direct cache, but not possible to
safely set @allow_direct, will benefit from this move.
The whole propagation of @napi_safe through a dozen of skb freeing
functions can now go away, which saves us some stack space.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20240329165507.3240110-2-aleksander.lobakin@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On some boards with this chip version the BIOS is buggy and misses
to reset the PHY page selector. This results in the PHY ID read
accessing registers on a different page, returning a more or
less random value. Fix this by resetting the page selector first.
Fixes: f1e911d5d0df ("r8169: add basic phylib support")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/64f2055e-98b8-45ec-8568-665e3d54d4e6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change "a" to "an" according to the usual rules, fix an "if" that
was mistyped as "in", improve grammar in "considerable slow" ->
"considerably slower".
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20240329-misc-rhashtable-v1-1-5862383ff798@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Checking if dump is empty requires a couple of casts.
Add a convenient wrapper.
Add an example use in the netdev sample, loopback is always
present so an empty dump is an error.
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20240329181651.319326-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.
Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: stable@vge.kernel.org
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.
There is currently an object (`tl`), at the beginning of multiple
structures, that contains a flexible structure (`struct nfp_dump_tl`),
for example:
struct nfp_dumpspec_csr {
struct nfp_dump_tl tl;
...
__be32 register_width; /* in bits */
};
So, in order to avoid ending up with flexible-array members in the
middle of multiple other structs, we use the `struct_group_tagged()`
helper to separate the flexible array from the rest of the members
in the flexible structure:
struct nfp_dump_tl {
struct_group_tagged(nfp_dump_tl_hdr, hdr,
... the rest of members
);
char data[];
};
With the change described above, we now declare objects of the type of
the tagged struct, in this case `struct nfp_dump_tl_hdr`, without
embedding flexible arrays in the middle of another struct:
struct nfp_dumpspec_csr {
struct nfp_dump_tl_hdr tl;
...
__be32 register_width; /* in bits */
};
Also, use `container_of()` whenever we need to retrieve a pointer to
the flexible structure, through which we can access the flexible
array if needed.
So, with these changes, fix 33 of the following warnings:
drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:58:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:64:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:70:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:78:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:87:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
drivers/net/ethernet/netronome/nfp/nfp_net_debugdump.c:92:28: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]
Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZgYWlkxdrrieDYIu@neat
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for AQR114C PHY ID. This PHY advertise 10G speed:
SPEED(0x04): 0x6031
capabilities: -400g +5g +2.5g -200g -25g -10g-xr -100g -40g -10g/1g -10
+100 +1000 -10-ts -2-tl +10g
EXTABLE(0x0B): 0x40fc
capabilities: -10g-cx4 -10g-lrm +10g-t +10g-kx4 +10g-kr +1000-t +1000-kx
+100-tx -10-t -p2mp -40g/100g -1000/100-t1 -25g -200g/400g
+2.5g/5g -1000-h
but supports only up to 5G speed (as with AQR111/111B0).
AQR111 init config is used to set max speed 5G.
Signed-off-by: Paweł Owoc <frut3k7@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240401145114.1699451-1-frut3k7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
In the discard worker, we were failing to validate the bucket state -
meaning a corrupt needs_discard btree could cause us to discard a bucket
that we shouldn't.
If check_alloc_info hasn't run yet we just want to bail out, otherwise
it's a filesystem inconsistent error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Print out the mode as a string, and also print out the btree and
watermark.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull documentation fixes from Jonathan Corbet:
"Four small documentation fixes"
* tag 'docs-6.9-fixes' of git://git.lwn.net/linux:
docs: zswap: fix shell command format
tracing: Fix documentation on tp_printk cmdline option
docs: Fix bitfield handling in kernel-doc
Documentation: dev-tools: Add link to RV docs
|
|
Pull bcachefs fixes from Kent Overstreet:
"Lots of fixes for situations with extreme filesystem damage.
One fix ("Fix journal pins in btree write buffer") applicable to
normal usage; also a dio performance fix.
New repair/construction code is in the final stages, should be ready
in about a week. Anyone that lost btree interior nodes (or a variety
of other damage) as a result of the splitbrain bug will be able to
repair then"
* tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits)
bcachefs: On emergency shutdown, print out current journal sequence number
bcachefs: Fix overlapping extent repair
bcachefs: Fix remove_dirent()
bcachefs: Logged op errors should be ignored
bcachefs: Improve -o norecovery; opts.recovery_pass_limit
bcachefs: bch2_run_explicit_recovery_pass_persistent()
bcachefs: Ensure bch_sb_field_ext always exists
bcachefs: Flush journal immediately after replay if we did early repair
bcachefs: Resume logged ops after fsck
bcachefs: Add error messages to logged ops fns
bcachefs: Split out recovery_passes.c
bcachefs: fix backpointer for missing alloc key msg
bcachefs: Fix bch2_btree_increase_depth()
bcachefs: Kill bch2_bkey_ptr_data_type()
bcachefs: Fix use after free in check_root_trans()
bcachefs: Fix repair path for missing indirect extents
bcachefs: Fix use after free in bch2_check_fix_ptrs()
bcachefs: Fix btree node keys accounting in topology repair path
bcachefs: Check btree ptr min_key in .invalid
bcachefs: add REQ_SYNC and REQ_IDLE in write dio
...
|
|
mean_and_variance_test_2 and mean_and_variance_test_4 always fail.
The input parameters to those tests are identical to the input parameters
to tests 1 and 3, yet the expected result for tests 2 and 4 is different
for the mean and stddev tests. That will always fail.
Expected mean_and_variance_get_mean(mv) == mean[i], but
mean_and_variance_get_mean(mv) == 22 (0x16)
mean[i] == 10 (0xa)
Drop the bad tests.
Fixes: 65bc41090720 ("mean and variance: More tests")
Closes: https://lore.kernel.org/lkml/065b94eb-6a24-4248-b7d7-d3212efb4787@roeck-us.net/
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath
ath.git patches for v6.10
ath drivers now have no remaining sparse warnings, otherwise smaller
fixes and some refactoring.
ath11k
* P2P support for QCA6390, WCN6855 and QCA2066
|
|
HEAD
KVM/riscv fixes for 6.9, take #1
- Fix spelling mistake in arch_timer selftest
- Remove redundant semicolon in num_isa_ext_regs()
- Fix APLIC setipnum_le/be write emulation
- Fix APLIC in_clrip[x] read emulation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.9, part #1
- Ensure perf events programmed to count during guest execution
are actually enabled before entering the guest in the nVHE
configuration.
- Restore out-of-range handler for stage-2 translation faults.
- Several fixes to stage-2 TLB invalidations to avoid stale
translations, possibly including partial walk caches.
- Fix early handling of architectural VHE-only systems to ensure E2H is
appropriately set.
- Correct a format specifier warning in the arch_timer selftest.
- Make the KVM banner message correctly handle all of the possible
configurations.
|
|
syzkaller started using corpuses where a BPF tracing program deletes
elements from a sockmap/sockhash map. Because BPF tracing programs can be
invoked from any interrupt context, locks taken during a map_delete_elem
operation must be hardirq-safe. Otherwise a deadlock due to lock inversion
is possible, as reported by lockdep:
CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(&host->lock);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(&host->lock);
Locks in sockmap are hardirq-unsafe by design. We expects elements to be
deleted from sockmap/sockhash only in task (normal) context with interrupts
enabled, or in softirq context.
Detect when map_delete_elem operation is invoked from a context which is
_not_ hardirq-unsafe, that is interrupts are disabled, and bail out with an
error.
Note that map updates are not affected by this issue. BPF verifier does not
allow updating sockmap/sockhash from a BPF tracing program today.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+bc922f476bd65abbd466@syzkaller.appspotmail.com
Reported-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+d4066896495db380182e@syzkaller.appspotmail.com
Acked-by: John Fastabend <john.fastabend@gmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=d4066896495db380182e
Closes: https://syzkaller.appspot.com/bug?extid=bc922f476bd65abbd466
Link: https://lore.kernel.org/bpf/20240402104621.1050319-1-jakub@cloudflare.com
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
No longer hold RTNL while calling inet6_dump_fib().
Also change return value for a completed dump,
so that NLMSG_DONE can be appended to current skb,
saving one recvmsg() system call.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240329183053.644630-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Kicinski says:
====================
genetlink: remove linux/genetlink.h
There are two genetlink headers net/genetlink.h and linux/genetlink.h
This is similar to netlink.h, but for netlink.h both contain good
amount of code. For genetlink.h the linux/ version is leftover
from before uAPI headers were split out, it has 10 lines of code.
Move those 10 lines into other appropriate headers and delete
linux/genetlink.h.
I occasionally open the wrong header in the editor when coding,
I guess I'm not the only one.
v2: https://lore.kernel.org/all/20240325173716.2390605-1-kuba@kernel.org/
v1: https://lore.kernel.org/all/20240309183458.3014713-1-kuba@kernel.org
====================
Link: https://lore.kernel.org/r/20240329175710.291749-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
genetlink.h is a shell of what used to be a combined uAPI
and kernel header over a decade ago. It has fewer than
10 lines of code. Merge it into net/genetlink.h.
In some ways it'd be better to keep the combined header
under linux/ but it would make looking through git history
harder.
Acked-by: Sven Eckelmann <sven@narfation.org>
Link: https://lore.kernel.org/r/20240329175710.291749-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The only legit reason I could think of for net/genetlink.h
and linux/genetlink.h to be separate would be if one was
included by other headers and we wanted to keep it lightweight.
That is not the case, net/openvswitch/meter.h includes
linux/genetlink.h but for no apparent reason (for struct genl_family
perhaps? it's not necessary, types of externs do not need
to be known).
Link: https://lore.kernel.org/r/20240329175710.291749-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are things in linux/genetlink.h which are only used
under net/netlink/. Move them to a new local header.
A new header with just 2 externs isn't great, but alternative
would be to include af_netlink.h in genetlink.c which feels
even worse.
Link: https://lore.kernel.org/r/20240329175710.291749-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-03-29 (net: intel)
This series contains updates to most Intel drivers.
Jesse moves declaration of pci_driver struct to remove need for forward
declarations in igb and converts Intel drivers to user newer power
management ops.
Sasha reworks power management flow on igc to avoid using rtnl_lock()
during those flows.
Maciej reorganizes i40e_nvm file to avoid forward declarations.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: avoid forward declarations in i40e_nvm.c
igc: Refactor runtime power management flow
net: intel: implement modern PM ops declarations
igb: simplify pci ops declaration
====================
Link: https://lore.kernel.org/r/20240329175632.211340-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove
administratively set MAC") fixed an issue where untrusted VF was
allowed to remove its own MAC address although this was assigned
administratively from PF. Unfortunately the introduced check
is wrong because it causes that MAC filters for other MAC addresses
including multi-cast ones are not removed.
<snip>
if (ether_addr_equal(addr, vf->default_lan_addr.addr) &&
i40e_can_vf_change_mac(vf))
was_unimac_deleted = true;
else
continue;
if (i40e_del_mac_filter(vsi, al->list[i].addr)) {
...
</snip>
The else path with `continue` effectively skips any MAC filter
removal except one for primary MAC addr when VF is allowed to do so.
Fix the check condition so the `continue` is only done for primary
MAC address.
Fixes: 73d9629e1c8c ("i40e: Do not allow untrusted VF to remove administratively set MAC")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240329180638.211412-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We lost ability to unload ipv6 module a long time ago.
Instead of calling expensive inet_twsk_purge() twice,
we can handle all families in one round.
Also remove an extra line added in my prior patch,
per Kuniyuki Iwashima feedback.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/20240327192934.6843-1-kuniyu@amazon.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240329153203.345203-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can change inet_csk() to propagate its argument const qualifier,
thanks to container_of_const().
We have to fix few places that had mistakes, like tcp_bound_rto().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240329144931.295800-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Donald Hunter says:
====================
doc: netlink: Add hyperlinks to generated docs
Extend ynl-gen-rst to generate hyperlinks to definitions, attribute sets
and sub-messages from all the places that reference them.
====================
Link: https://lore.kernel.org/r/20240329135021.52534-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The tc spec referenced tc-u32-mark and tc-act-police-attrs but did not
define them. The missing definitions were discovered when building the
docs with generated hyperlinks because the hyperlink target labels were
missing.
Add definitions for tc-u32-mark and tc-act-police-attrs.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240329135021.52534-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update ynl-gen-rst to generate hyperlinks to definitions, attribute
sets and sub-messages from all the places that reference them.
Note that there is a single label namespace for all of the kernel docs.
Hyperlinks within a single netlink doc need to be qualified by the
family name to avoid collisions.
The label format is 'family-type-name' which gives, for example,
'rt-link-attribute-set-link-attrs' as the link id.
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240329135021.52534-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The tables of contents in the generated Netlink docs include individual
attribute definitions. This can make the contents exceedingly long and
repeats a lot of what is on the rest of the pages. See for example:
https://docs.kernel.org/networking/netlink_spec/tc.html
Add a depth limit to the contents directive in generated .rst files to
limit the contents depth to 3 levels. This reduces the contents to:
- Family
- Summary
- Operations
- op-one
- op-two
- ...
- Definitions
- struct-one
- struct-two
- enum-one
- ...
- Attribute sets
- attrs-one
- attrs-two
- ...
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240329135021.52534-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: fix fallback MIB counter and wrong var in selftests
Here are two fixes related to MPTCP.
The first patch fixes when the MPTcpExtMPCapableFallbackACK MIB counter
is modified: it should only be incremented when a connection was using
MPTCP options, but then a fallback to TCP has been done. This patch also
checks the counter is not incremented by mistake during the connect
selftests. This counter was wrongly incremented since its introduction
in v5.7.
The second patch fixes a wrong parsing of the 'dev' endpoint options in
the selftests: the wrong variable was used. This option was not used
before, but it is going to be soon. This issue is visible since v5.18.
====================
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-0-324a8981da48@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly.
If calling it in the 2nd test of endpoint_tests() too, it fails with an
error like this:
creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \
found '10.0.2.2 id 2 subflow dev ns2eth2'
The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it.
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
Cc: stable@vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-324a8981da48@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Alexei reported the following splat:
WARNING: CPU: 32 PID: 3276 at net/mptcp/subflow.c:1430 subflow_data_ready+0x147/0x1c0
Modules linked in: dummy bpf_testmod(O) [last unloaded: bpf_test_no_cfi(O)]
CPU: 32 PID: 3276 Comm: test_progs Tainted: GO 6.8.0-12873-g2c43c33bfd23
Call Trace:
<TASK>
mptcp_set_rcvlowat+0x79/0x1d0
sk_setsockopt+0x6c0/0x1540
__bpf_setsockopt+0x6f/0x90
bpf_sock_ops_setsockopt+0x3c/0x90
bpf_prog_509ce5db2c7f9981_bpf_test_sockopt_int+0xb4/0x11b
bpf_prog_dce07e362d941d2b_bpf_test_socket_sockopt+0x12b/0x132
bpf_prog_348c9b5faaf10092_skops_sockopt+0x954/0xe86
__cgroup_bpf_run_filter_sock_ops+0xbc/0x250
tcp_connect+0x879/0x1160
tcp_v6_connect+0x50c/0x870
mptcp_connect+0x129/0x280
__inet_stream_connect+0xce/0x370
inet_stream_connect+0x36/0x50
bpf_trampoline_6442491565+0x49/0xef
inet_stream_connect+0x5/0x50
__sys_connect+0x63/0x90
__x64_sys_connect+0x14/0x20
The root cause of the issue is that bpf allows accessing mptcp-level
proto_ops from a tcp subflow scope.
Fix the issue detecting the problematic call and preventing any action.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/482
Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/d8cb7d8476d66cb0812a6e29cd1e626869d9d53e.1711738080.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The netdev CI runs in a VM and captures serial, so stdout and
stderr get combined. Because there's a missing new line in
stderr the test ends up corrupting KTAP:
# Successok 1 selftests: net: reuseaddr_conflict
which should have been:
# Success
ok 1 selftests: net: reuseaddr_conflict
Fixes: 422d8dc6fd3a ("selftest: add a reuseaddr test")
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Link: https://lore.kernel.org/r/20240329160559.249476-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In lan8814_get_sig_rx() and lan8814_get_sig_tx() ptp_parse_header() may
return NULL as ptp_header due to abnormal packet type or corrupted packet.
Fix this bug by adding ptp_header check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240329061631.33199-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Joan Bruguera Micó says:
====================
x86/bpf: Fixes for the BPF JIT with retbleed=stuff
From: Joan Bruguera Micó <joanbrugueram@gmail.com>
Fixes two issues that cause kernels panic when using the BPF JIT with
the call depth tracking / stuffing mitigation for Skylake processors
(`retbleed=stuff`). Both issues can be triggered by running simple
BPF programs (e.g. running the test suite should trigger both).
The first (resubmit) fixes a trivial issue related to calculating the
destination IP for call instructions with call depth tracking.
The second is related to using the correct IP for relocations, related
to the recently introduced %rip-relative addressing for PER_CPU_VAR.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
v2:
Simplify calculation of "ip".
Add more details to the commit message.
Joan Bruguera Micó (1):
x86/bpf: Fix IP for relocating call depth accounting
====================
Link: https://lore.kernel.org/r/20240401185821.224068-1-ubizjak@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The commit:
59bec00ace28 ("x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR()")
made PER_CPU_VAR() to use rip-relative addressing, hence
INCREMENT_CALL_DEPTH macro and skl_call_thunk_template got rip-relative
asm code inside of it. A follow up commit:
17bce3b2ae2d ("x86/callthunks: Handle %rip-relative relocations in call thunk template")
changed x86_call_depth_emit_accounting() to use apply_relocation(),
but mistakenly assumed that the code is being patched in-place (where
the destination of the relocation matches the address of the code),
using *pprog as the destination ip. This is not true for the call depth
accounting, emitted by the BPF JIT, so the calculated address was wrong,
JIT-ed BPF progs on kernels with call depth tracking got broken and
usually caused a page fault.
Pass the destination IP when the BPF JIT emits call depth accounting.
Fixes: 17bce3b2ae2d ("x86/callthunks: Handle %rip-relative relocations in call thunk template")
Signed-off-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240401185821.224068-3-ubizjak@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Adjust the IP passed to `emit_patch` so it calculates the correct offset
for the CALL instruction if `x86_call_depth_emit_accounting` emits code.
Otherwise we will skip some instructions and most likely crash.
Fixes: b2e9dfe54be4 ("x86/bpf: Emit call depth accounting if required")
Link: https://lore.kernel.org/lkml/20230105214922.250473-1-joanbrugueram@gmail.com/
Co-developed-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Signed-off-by: Joan Bruguera Micó <joanbrugueram@gmail.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20240401185821.224068-2-ubizjak@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In case kern_mount() fails and returns an error pointer return in the
error branch instead of continuing and dereferencing the error pointer.
While on it drop the never read static variable selinuxfs_mount.
Cc: stable@vger.kernel.org
Fixes: 0619f0f5e36f ("selinux: wrap selinuxfs state")
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim,
for interior btree updates. We've seen a deadlock where journal replay
triggers a ton of btree node merges, and these use up all available open
buckets and then interior updates get stuck.
One cause of this is that we're currently lacking btree node merging on
write buffer btrees - that needs to be fixed as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Sign error when checking the watermark - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux
Pull pwm fix from Uwe Kleine-König:
"This fixes a regression intoduced by an off-by-one in v6.9-rc1 making
the pwm-pxa and the pwm driver in ti-sn65dsi86 unusable for most
consumer drivers because the default period wasn't set"
* tag 'pwm/for-6.9-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
pwm: Fix setting period with #pwm-cells = <1> and of_pwm_single_xlate()
|
|
Simplify devlink lock code in driver by taking it for whole init/cleanup
path. Instead of calling devlink functions that taking lock call the
lockless versions.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Keep devlink related code in a separate file. More devlink port code and
handlers will be added here for other port operations.
Remove no longer needed include of our devlink.h in ice_lib.c.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Only moving whole files, fixing Makefile and bunch of includes.
Some changes to ice_devlink file was done even in representor part (Tx
topology), so keep it as final patch to not mess up with rebasing.
After moving to devlink folder there is no need to have such long name
for these files. Rename them to simple devlink.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|