Age | Commit message (Collapse) | Author |
|
The argument is getting quite large, so use a struct internally to pass
around the information.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.831ab8a87b6f.I3bcc83d90f41d6f8a47b39528575dad0a9ec3564@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The element should never be inherited, so always exclude it.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214435.c0e17989b4ed.I7cecb5ab7cd6919e61839b50ce5156904b41d7d8@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The helper functions to retrieve the EML capabilities and medium
synchronization delay both assume that the type is correct. Instead of
assuming the length is correct and still checking the type, add a new
helper to check both and don't do any verification.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214435.1b50e7a3b3cf.I9385514d8eb6d6d3c82479a6fa732ef65313e554@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Include the Multi-Link elements found in beacon frames
in the CRC calculation, as these elements are intended
to reflect changes in the AP MLD state.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214435.ae8246b93d85.Ia64b45198de90ff7f70abcc997841157f148ea40@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since regulatory disconnect was added, OCB and NAN interface
types were added, which made it completely unusable for any
driver that allowed OCB/NAN. Add OCB/NAN (though NAN doesn't
do anything, we don't have any info) and also remove all the
logic that opts out, so it won't be broken again if/when new
interface types are added.
Fixes: 6e0bd6c35b02 ("cfg80211: 802.11p OCB mode handling")
Fixes: cb3b7d87652a ("cfg80211: add start / stop NAN commands")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230616222844.2794d1625a26.I8e78a3789a29e6149447b3139df724a6f1b46fc3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The multi-link loop here broke disconnect when multi-link
operation (MLO) isn't active for a given interface, since
in that case valid_links is 0 (indicating no links, i.e.
no MLO.)
Fix this by taking that into account properly and skipping
the link only if there are valid_links in the first place.
Cc: stable@vger.kernel.org
Fixes: 7b0a0e3c3a88 ("wifi: cfg80211: do some rework towards MLO link APIs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230616222844.eb073d650c75.I72739923ef80919889ea9b50de9e4ba4baa836ae@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Rename it to ieee80211_mle_basic_sta_prof_size_ok() as it
validates the size of the station profile included in
Basic Multi-Link element.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.9bdfd263974f.I7bebd26894f33716e93cc7da576ef3215e0ba727@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Parse Reconfiguration Multi Link IE.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.6eeb6c9a4a6e.I1cb137da9b3c712fc7c7949a6dec9e314b5d7f63@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
As a preparation to support Reconfiguration Multi Link
element, rename 'multi_link' and 'multi_link_len' fields
in 'struct ieee802_11_elems' to 'ml_basic' and 'ml_basic_len'.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.b11370d3066a.I34280ae3728597056a6a2f313063962206c0d581@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Use the shared functionality rather than copying it into mac80211.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.7dcbf82baade.Ic68d1f547cb75d66037abdbb0f066db20ff41ba3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is already needed within mac80211 and support is also needed by
cfg80211 to parse ML elements.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.29c3ebeed10d.I009c049289dd0162c2e858ed8b68d2875a672ed6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The removed code ran for any BSS that was not included in the MBSSID
element in order to update it. However, instead of using the correct
inheritance rules, it would simply copy the elements from the
transmitting AP. The result is that we would report incorrect elements
in this case.
After some discussions, it seems that there are likely not even APs
actually using this feature. Either way, removing the code decreases
complexity and makes the cfg80211 behaviour more correct.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.cfd6d8db1f26.Ia1044902b86cd7d366400a4bfb93691b8f05d68c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The cfg80211_gen_new_ie function merges the IEs using inheritance rules.
Rewrite this function to fix issues around inheritance rules. In
particular, vendor elements do not require any special handling, as they
are either all inherited or overridden by the subprofile.
Also, add fragmentation handling as this may be needed in some cases.
This also changes the function to not require making a copy. The new
version could be optimized a bit by explicitly tracking which IEs have
been handled already rather than looking that up again every time.
Note that a small behavioural change is the removal of the SSID special
handling. This should be fine for the MBSSID element, as the SSID must
be included in the subelement.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.bc6152e146db.I2b5f3bc45085e1901e5b5192a674436adaf94748@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The TBTT information field type must be zero. This is only changed in
the 802.11be draft specification where the value 1 is used to indicate
that only the MLD parameters are included.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.7865606ffe94.I7ff28afb875d1b4c39acd497df8490a7d3628e3f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Doing this simplifies the code somewhat, as iteration over the
nontransmitted BSSs is not required anymore. Also, mac80211 should
not be iterating over the nontrans_list as it should only be accessed
while the bss_lock is held.
It also simplifies parsing of the IEs somewhat, as cfg80211 already
extracts the IEs and passes them to the callback.
Note that the only user left requiring parsing a specific BSS is the
association code if a beacon is required by the hardware.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.39ebfe2f9e59.Ia012b08e0feed8ec431b666888b459f6366f7bd1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This new function is called from within the inform_bss(_frame)_data
functions in order for the driver to update data that it is tracking.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.8d7781b0f965.I80041183072b75c081996a1a5a230b34aff5c668@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
It is reasonable to hold bss_lock for a little bit longer after
cfg80211_bss_update is done. Right now, this does not make any big
difference, but doing so in preparation for the next patch which adds
a call to the driver.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.61701884ff0d.I3358228209eb6766202aff04d1bae0b8fdff611f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
These calls do not require any locking, so move them in preparation for
the next patches.
A minor change/bugfix is to not hint a beacon for nontransmitted BSSes
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.a5bf3558eae9.I33c7465d983c8bef19deb7a533ee475a16f91774@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
For now, fix this only in station mode. We'll need to fix
the AP mode later.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.41e662ba1d68.I8faae5acb45c58cfeeb6bc6247aedbdaf9249d32@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add NULL check for compat variable to avoid crash in
cfg80211_chandef_compatible() if it got called with
some mixed up channel context where not all the users
compatible with each other, which shouldn't happen.
Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.ae0f10dfd36b.Iea98c74aeb87bf6ef49f6d0c8687bba0dbea2abd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In both of these cases (config_link, prep_channel) it is not needed
to parse the MBSSID data for a nontransmitted BSS. In the config_link
case the frame does not contain any MBSSID element and inheritance
rules are only needed for the ML STA profile. While in the
prep_channel case the IEs have already been processed by cfg80211 and
are already exploded.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.66d2605ff0ad.I7cdd1d390e7b0735c46204231a9e636d45b7f1e4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Extend AID element addition in TDLS setup request and response
frames to add it when HE or EHT capabilities are supported.
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.483bf44ce684.Ia2387eb24c06fa41febc213923160bedafce2085@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add HE and EHT capabilities IE in TDLS setup request,
response, confirm and discovery response frames.
Signed-off-by: Abhishek Naik <abhishek.naik@intel.com>
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.c77128828b0d.Ied2d8800847c759718c2c35e8f6c0902afd6bca1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If the device is associated with an AP MLD, then TDLS data frames
should have
- A1 = peer address,
- A2 = own MLD address (since the peer may now know about MLO), and
- A3 = BSSID.
Change the code to do that.
Signed-off-by: Abhishek Naik <abhishek.naik@intel.com>
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.4bf648b63dfd.I98ef1dabd14b74a92120750f7746a7a512011701@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Userspace can now select the link to use for TDLS management
frames (indicating e.g. which BSSID should be used), use the
link_id received from cfg80211 to build the frames.
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.ce1fc230b505.Ie773c5679805001f5a52680d68d9ce0232c57648@changeid
[Benjamin fixed some locking]
Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
[fix sta mutex locking too]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
For multi-link operation(MLO) TDLS management
frames need to be transmitted on a specific link.
The TDLS setup request will add BSSID along with
peer address and userspace will pass the link-id
based on BSSID value to the driver(or mac80211).
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.cb3d87c22812.Ia3d15ac4a9a182145bf2d418bcb3ddf4539cd0a7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
-Wstringop-overflow is legitimately warning us about extra_size
pontentially being zero at some point, hence potenially ending
up _allocating_ zero bytes of memory for extra pointer and then
trying to access such object in a call to copy_from_user().
Fix this by adding a sanity check to ensure we never end up
trying to allocate zero bytes of data for extra pointer, before
continue executing the rest of the code in the function.
Address the following -Wstringop-overflow warning seen when built
m68k architecture with allyesconfig configuration:
from net/wireless/wext-core.c:11:
In function '_copy_from_user',
inlined from 'copy_from_user' at include/linux/uaccess.h:183:7,
inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:825:7:
arch/m68k/include/asm/string.h:48:25: warning: '__builtin_memset' writing 1 or more bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
48 | #define memset(d, c, n) __builtin_memset(d, c, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/uaccess.h:153:17: note: in expansion of macro 'memset'
153 | memset(to + (n - res), 0, res);
| ^~~~~~
In function 'kmalloc',
inlined from 'kzalloc' at include/linux/slab.h:694:9,
inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:819:10:
include/linux/slab.h:577:16: note: at offset 1 into destination object of size 0 allocated by '__kmalloc'
577 | return __kmalloc(size, flags);
| ^~~~~~~~~~~~~~~~~~~~~~
This help with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/315
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZItSlzvIpjdjNfd8@work
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In mesh mode, ieee80211_chandef_he_6ghz_oper() is called by
mesh_matches_local() for every received mesh beacon.
On a 6 GHz mesh of a HE-only phy, this spams that the hardware does not
have EHT capabilities, even if the received mesh beacon does not have an
EHT element.
Unlike HE, not supporting EHT in the 6 GHz band is not an error so do
not print anything in this case.
Fixes: 5dca295dd767 ("mac80211: Add initial support for EHT and 320 MHz channels")
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230614132648.28995-1-nicolas.cavallari@green-communications.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Add debugfs entry to report dormant (valid but disabled) links.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230611121219.7fa5f022adfb.Iff6fa3e1a3b00ae726612f9d5a31f7fe2fcbfc68@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When the association is complete, do not configure disabled
links, and track them as part of the interface data.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.c194fabeb81a.Iaefdef5ba0492afe9a5ede14c68060a4af36e444@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There are some locking changes that will later otherwise
cause conflicts, so merge wireless into wireless-next to
avoid those.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The double ifdefs (one for the variable declaration and
one around the code) are quite aesthetically displeasing.
Factor this code out into a helper for easier wrapping.
This will become even more ugly when another skb ext
comparison is added in the future.
The resulting machine code looks the same, the compiler
seems to try to use %rax more and some blocks more around
but I haven't spotted minor differences.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 7d81ee8722d6 ("svcrdma: Single-stage RDMA Read") changed the
behavior of svc_rdma_recvfrom() but neglected to update the
documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
In blamed commit, we missed the fact that ip6_validate_gw()
could change dev under us from ip6_route_check_nh()
In this fix, I use GFP_ATOMIC in order to not pass too many additional
arguments to ip6_validate_gw() and ip6_route_check_nh() only
for a rarely used debug feature.
syzbot reported:
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 5006 at lib/refcount.c:31 refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 5006 Comm: syz-executor403 Not tainted 6.4.0-rc5-syzkaller-01229-g97c5209b3d37 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
RIP: 0010:refcount_warn_saturate+0x1d7/0x1f0 lib/refcount.c:31
Code: 05 fb 8e 51 0a 01 e8 98 95 38 fd 0f 0b e9 d3 fe ff ff e8 ac d9 70 fd 48 c7 c7 00 d3 a6 8a c6 05 d8 8e 51 0a 01 e8 79 95 38 fd <0f> 0b e9 b4 fe ff ff 48 89 ef e8 1a d7 c3 fd e9 5c fe ff ff 0f 1f
RSP: 0018:ffffc900039df6b8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888026d71dc0 RSI: ffffffff814c03b7 RDI: 0000000000000001
RBP: ffff888146a505fc R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 1ffff9200073bedc
R13: 00000000ffffffef R14: ffff888146a505fc R15: ffff8880284eb5a8
FS: 0000555556c88300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004585c0 CR3: 000000002b1b1000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
ref_tracker_free+0x539/0x820 lib/ref_tracker.c:236
netdev_tracker_free include/linux/netdevice.h:4097 [inline]
netdev_put include/linux/netdevice.h:4114 [inline]
netdev_put include/linux/netdevice.h:4110 [inline]
fib6_nh_init+0xb96/0x1bd0 net/ipv6/route.c:3624
ip6_route_info_create+0x10f3/0x1980 net/ipv6/route.c:3791
ip6_route_add+0x28/0x150 net/ipv6/route.c:3835
ipv6_route_ioctl+0x3fc/0x570 net/ipv6/route.c:4459
inet6_ioctl+0x246/0x290 net/ipv6/af_inet6.c:569
sock_do_ioctl+0xcc/0x230 net/socket.c:1189
sock_ioctl+0x1f8/0x680 net/socket.c:1306
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
Fixes: 70f7457ad6d6 ("net: create device lookup API with reference tracking")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Per-VMA locking allows us to lock a struct vm_area_struct without
taking the process-wide mmap lock in read mode.
Consider a process workload where the mmap lock is taken constantly in
write mode. In this scenario, all zerocopy receives are periodically
blocked during that period of time - though in principle, the memory
ranges being used by TCP are not touched by the operations that need
the mmap write lock. This results in performance degradation.
Now consider another workload where the mmap lock is never taken in
write mode, but there are many TCP connections using receive zerocopy
that are concurrently receiving. These connections all take the mmap
lock in read mode, but this does induce a lot of contention and atomic
ops for this process-wide lock. This results in additional CPU
overhead caused by contending on the cache line for this lock.
However, with per-vma locking, both of these problems can be avoided.
As a test, I ran an RPC-style request/response workload with 4KB
payloads and receive zerocopy enabled, with 100 simultaneous TCP
connections. I measured perf cycles within the
find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
without per-vma locking enabled.
When using process-wide mmap semaphore read locking, about 1% of
measured perf cycles were within this path. With per-VMA locking, this
value dropped to about 0.45%.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
$ make C=1 W=1 net/sunrpc/svc.o
make[1]: Entering directory 'linux/obj/manet.1015granger.net'
GEN Makefile
CALL linux/server-development/scripts/checksyscalls.sh
DESCEND objtool
INSTALL libsubcmd_headers
DESCEND bpf/resolve_btfids
INSTALL libsubcmd_headers
CC [M] net/sunrpc/svc.o
CHECK linux/server-development/net/sunrpc/svc.c
linux/server-development/net/sunrpc/svc.c:1225:9: warning: incorrect type in argument 1 (different address spaces)
linux/server-development/net/sunrpc/svc.c:1225:9: expected struct spinlock [usertype] *lock
linux/server-development/net/sunrpc/svc.c:1225:9: got struct spinlock [noderef] __rcu *
linux/server-development/net/sunrpc/svc.c:1227:40: warning: incorrect type in argument 1 (different address spaces)
linux/server-development/net/sunrpc/svc.c:1227:40: expected struct spinlock [usertype] *lock
linux/server-development/net/sunrpc/svc.c:1227:40: got struct spinlock [noderef] __rcu *
make[1]: Leaving directory 'linux/obj/manet.1015granger.net'
Warning introduced by commit 913292c97d75 ("sched.h: Annotate
sighand_struct with __rcu").
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Part of an effort to remove strlcpy() tree-wide [1].
Direct replacement is safe here since the getter in kernel_params_ops
handles -errno return [2].
[1] https://github.com/KSPP/linux/issues/89
[2] https://elixir.bootlin.com/linux/v6.4-rc6/source/include/linux/moduleparam.h#L52
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Remove a couple of dprintk call sites that are of little value.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
The preceding block comment before svc_register_xprt_class() is
not related to that function.
While we're here, add proper documenting comments for these two
publicly-visible functions.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Clean up.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This event brackets the svcrdma_post_* trace points. If this trace
event is enabled but does not appear as expected, that indicates a
chunk_ctxt leak.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Try to catch incorrect calling contexts mechanically rather than by
code review.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Micro-optimization: Call ktime_get() only when ->xpo_recvfrom() has
given us a full RPC message to process. rq_stime isn't used
otherwise, so this avoids pointless work.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Now that we have bulk page allocation and release APIs, it's more
efficient to use those than it is for nfsd threads to wait for send
completions. Previous patches have eliminated the calls to
wait_for_completion() and complete(), in order to avoid scheduler
overhead.
Now release pages-under-I/O in the send completion handler using
the efficient bulk release API.
I've measured a 7% reduction in cumulative CPU utilization in
svc_rdma_sendto(), svc_rdma_wc_send(), and svc_xprt_release(). In
particular, using release_pages() instead of complete() cuts the
time per svc_rdma_wc_send() call by two-thirds. This helps improve
scalability because svc_rdma_wc_send() is single-threaded per
connection.
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
I noticed that svc_rqst_release_pages() was still unnecessarily
releasing a page when svc_rdma_recvfrom() returns zero.
Fixes: a53d5cb0646a ("svcrdma: Avoid releasing a page in svc_xprt_release()")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Under certain circumstances, the tcp receive buffer memory limit
set by autotuning (sk_rcvbuf) is increased due to incoming data
packets as a result of the window not closing when it should be.
This can result in the receive buffer growing all the way up to
tcp_rmem[2], even for tcp sessions with a low BDP.
To reproduce: Connect a TCP session with the receiver doing
nothing and the sender sending small packets (an infinite loop
of socket send() with 4 bytes of payload with a sleep of 1 ms
in between each send()). This will cause the tcp receive buffer
to grow all the way up to tcp_rmem[2].
As a result, a host can have individual tcp sessions with receive
buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
limits, causing the host to go into tcp memory pressure mode.
The fundamental issue is the relationship between the granularity
of the window scaling factor and the number of byte ACKed back
to the sender. This problem has previously been identified in
RFC 7323, appendix F [1].
The Linux kernel currently adheres to never shrinking the window.
In addition to the overallocation of memory mentioned above, the
current behavior is functionally incorrect, because once tcp_rmem[2]
is reached when no remediations remain (i.e. tcp collapse fails to
free up any more memory and there are no packets to prune from the
out-of-order queue), the receiver will drop in-window packets
resulting in retransmissions and an eventual timeout of the tcp
session. A receive buffer full condition should instead result
in a zero window and an indefinite wait.
In practice, this problem is largely hidden for most flows. It
is not applicable to mice flows. Elephant flows can send data
fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
triggering a zero window.
But this problem does show up for other types of flows. Examples
are websockets and other type of flows that send small amounts of
data spaced apart slightly in time. In these cases, we directly
encounter the problem described in [1].
RFC 7323, section 2.4 [2], says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122,
section 4.2.2.16 [3]. All prior RFCs on the topic of tcp window
management have made clear that sender must accept a shrunk window
from the receiver, including RFC 793 [4] and RFC 1323 [5].
This patch implements the functionality to shrink the tcp window
when necessary to keep the right edge within the memory limit by
autotuning (sk_rcvbuf). This new functionality is enabled with
the new sysctl: net.ipv4.tcp_shrink_window
Additional information can be found at:
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/
[1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
[2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
[3] https://www.rfc-editor.org/rfc/rfc1122#page-91
[4] https://www.rfc-editor.org/rfc/rfc793
[5] https://www.rfc-editor.org/rfc/rfc1323
Signed-off-by: Mike Freemon <mfreemon@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
devlink_port_type_warn is scheduled for port devlink and warning
when the port type is not set. But from this warning it is not easy
found out which device (driver) has no devlink port set.
[ 3709.975552] Type was not set for devlink port.
[ 3709.975579] WARNING: CPU: 1 PID: 13092 at net/devlink/leftover.c:6775 devlink_port_type_warn+0x11/0x20
[ 3709.993967] Modules linked in: openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nfnetlink bluetooth rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs vhost_net vhost vhost_iotlb tap tun bridge stp llc qrtr intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal mlx5_ib intel_powerclamp coretemp dell_wmi ledtrig_audio sparse_keymap ipmi_ssif kvm_intel ib_uverbs rfkill ib_core video kvm iTCO_wdt acpi_ipmi intel_vsec irqbypass ipmi_si iTCO_vendor_support dcdbas ipmi_devintf mei_me ipmi_msghandler rapl mei intel_cstate isst_if_mmio isst_if_mbox_pci dell_smbios intel_uncore isst_if_common i2c_i801 dell_wmi_descriptor wmi_bmof i2c_smbus intel_pch_thermal pcspkr acpi_power_meter xfs libcrc32c sd_mod sg nvme_tcp mgag200 i2c_algo_bit nvme_fabrics drm_shmem_helper drm_kms_helper nvme syscopyarea ahci sysfillrect sysimgblt nvme_core fb_sys_fops crct10dif_pclmul libahci mlx5_core sfc crc32_pclmul nvme_common drm
[ 3709.994030] crc32c_intel mtd t10_pi mlxfw libata tg3 mdio megaraid_sas psample ghash_clmulni_intel pci_hyperv_intf wmi dm_multipath sunrpc dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse
[ 3710.108431] CPU: 1 PID: 13092 Comm: kworker/1:1 Kdump: loaded Not tainted 5.14.0-319.el9.x86_64 #1
[ 3710.108435] Hardware name: Dell Inc. PowerEdge R750/0PJ80M, BIOS 1.8.2 09/14/2022
[ 3710.108437] Workqueue: events devlink_port_type_warn
[ 3710.108440] RIP: 0010:devlink_port_type_warn+0x11/0x20
[ 3710.108443] Code: 84 76 fe ff ff 48 c7 03 20 0e 1a ad 31 c0 e9 96 fd ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 48 c7 c7 18 24 4e ad e8 ef 71 62 ff <0f> 0b c3 cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f6 87
[ 3710.108445] RSP: 0018:ff3b6d2e8b3c7e90 EFLAGS: 00010282
[ 3710.108447] RAX: 0000000000000000 RBX: ff366d6580127080 RCX: 0000000000000027
[ 3710.108448] RDX: 0000000000000027 RSI: 00000000ffff86de RDI: ff366d753f41f8c8
[ 3710.108449] RBP: ff366d658ff5a0c0 R08: ff366d753f41f8c0 R09: ff3b6d2e8b3c7e18
[ 3710.108450] R10: 0000000000000001 R11: 0000000000000023 R12: ff366d753f430600
[ 3710.108451] R13: ff366d753f436900 R14: 0000000000000000 R15: ff366d753f436905
[ 3710.108452] FS: 0000000000000000(0000) GS:ff366d753f400000(0000) knlGS:0000000000000000
[ 3710.108453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3710.108454] CR2: 00007f1c57bc74e0 CR3: 000000111d26a001 CR4: 0000000000773ee0
[ 3710.108456] PKRU: 55555554
[ 3710.108457] Call Trace:
[ 3710.108458] <TASK>
[ 3710.108459] process_one_work+0x1e2/0x3b0
[ 3710.108466] ? rescuer_thread+0x390/0x390
[ 3710.108468] worker_thread+0x50/0x3a0
[ 3710.108471] ? rescuer_thread+0x390/0x390
[ 3710.108473] kthread+0xdd/0x100
[ 3710.108477] ? kthread_complete_and_exit+0x20/0x20
[ 3710.108479] ret_from_fork+0x1f/0x30
[ 3710.108485] </TASK>
[ 3710.108486] ---[ end trace 1b4b23cd0c65d6a0 ]---
After patch:
[ 402.473064] ice 0000:41:00.0: Type was not set for devlink port.
[ 402.473064] ice 0000:41:00.1: Type was not set for devlink port.
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230615095447.8259-1-poros@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current mctp_newroute() contains two exactly same check against
rtm->rtm_type
static int mctp_newroute(...)
{
...
if (rtm->rtm_type != RTN_UNICAST) { // (1)
NL_SET_ERR_MSG(extack, "rtm_type must be RTN_UNICAST");
return -EINVAL;
}
...
if (rtm->rtm_type != RTN_UNICAST) // (2)
return -EINVAL;
...
}
This commits removes the (2) check as it is redundant.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20230615152240.1749428-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
kcm_write_msgs() calls unreserve_psock() to release its hold on the
underlying TCP socket if it has run out of things to transmit, but if we
have nothing in the write queue on entry (e.g. because someone did a
zero-length sendmsg), we don't actually go into the transmission loop and
as a consequence don't call reserve_psock().
Fix this by skipping the call to unreserve_psock() if we didn't reserve a
psock.
Fixes: c31a25e1db48 ("kcm: Send multiple frags in one sendmsg()")
Reported-by: syzbot+dd1339599f1840e4cc65@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000a61ffe05fe0c3d08@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+dd1339599f1840e4cc65@syzkaller.appspotmail.com
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/20787.1686828722@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since the return values
from the helper macros are ignored by the callers.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230613003326.3538391-1-azeemshaikh38@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|