Age | Commit message (Collapse) | Author |
|
The `char` type with no explicit sign is sometimes signed and sometimes
unsigned. This code will break on platforms such as arm, where char is
unsigned. So mark it here as explicitly signed, so that the
todrop_counter decrement and subsequent comparison is correct.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Jakub reported that the addition of the "network_byte_order"
member in struct nla_policy increases size of 32bit platforms.
Instead of scraping the bit from elsewhere Johannes suggested
to add explicit NLA_BE types instead, so do this here.
NLA_POLICY_MAX_BE() macro is removed again, there is no need
for it: NLA_POLICY_MAX(NLA_BE.., ..) will do the right thing.
NLA_BE64 can be added later.
Fixes: 08724ef69907 ("netlink: introduce NLA_POLICY_MAX_BE")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20221031123407.9158-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When lan966x was receiving a frame, then it was building the skb and
after that it was calling dma_unmap_single with frame size as the
length. This actually has 2 issues:
1. It is using a length to map and a different length to unmap.
2. When the unmap was happening, the data was sync for cpu but it could
be that this will overwrite what build_skb was initializing.
The fix for these two problems is to change the order of operations.
First to sync the frame for cpu, then to build the skb and in the end to
unmap using the correct size but without sync the frame again for cpu.
Fixes: c8349639324a ("net: lan966x: Add FDMA functionality")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20221031133421.1283196-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After commits 36a6503fedda ("tcp: refine tcp_prune_ofo_queue()
to not drop all packets") and 72cd43ba64fc1
("tcp: free batches of packets in tcp_prune_ofo_queue()")
tcp_prune_ofo_queue() drops a fraction of ooo queue,
to make room for incoming packet.
However it makes no sense to drop packets that are
before the incoming packet, in sequence space.
In order to recover from packet losses faster,
it makes more sense to only drop ooo packets
which are after the incoming packet.
Tested:
packetdrill test:
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [3800], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 0>
+.1 < . 1:1(0) ack 1 win 1024
+0 accept(3, ..., ...) = 4
+.01 < . 200:300(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 200:300>
+.01 < . 400:500(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 400:500 200:300>
+.01 < . 600:700(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 600:700 400:500 200:300>
+.01 < . 800:900(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 800:900 600:700 400:500 200:300>
+.01 < . 1000:1100(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 1000:1100 800:900 600:700 400:500>
+.01 < . 1200:1300(100) ack 1 win 1024
+0 > . 1:1(0) ack 1 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
// this packet is dropped because we have no room left.
+.01 < . 1400:1500(100) ack 1 win 1024
+.01 < . 1:200(199) ack 1 win 1024
// Make sure kernel did not drop 200:300 sequence
+0 > . 1:1(0) ack 300 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
// Make room, since our RCVBUF is very small
+0 read(4, ..., 299) = 299
+.01 < . 300:400(100) ack 1 win 1024
+0 > . 1:1(0) ack 500 <nop,nop, sack 1200:1300 1000:1100 800:900 600:700>
+.01 < . 500:600(100) ack 1 win 1024
+0 > . 1:1(0) ack 700 <nop,nop, sack 1200:1300 1000:1100 800:900>
+0 read(4, ..., 400) = 400
+.01 < . 700:800(100) ack 1 win 1024
+0 > . 1:1(0) ack 900 <nop,nop, sack 1200:1300 1000:1100>
+.01 < . 900:1000(100) ack 1 win 1024
+0 > . 1:1(0) ack 1100 <nop,nop, sack 1200:1300>
+.01 < . 1100:1200(100) ack 1 win 1024
// This checks that 1200:1300 has not been removed from ooo queue
+0 > . 1:1(0) ack 1300
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221101035234.3910189-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Horatiu Vultur says:
====================
net: lan966x: Fixes for when MTU is changed
There were multiple problems in different parts of the driver when
the MTU was changed.
The first problem was that the HW was missing to configure the correct
value, it was missing ETH_HLEN and ETH_FCS_LEN. The second problem was
when vlan filtering was enabled/disabled, the MRU was not adjusted
corretly. While the last issue was that the FDMA was calculated wrongly
the correct maximum MTU.
====================
Link: https://lore.kernel.org/r/20221030213636.1031408-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When MTU is changed, FDMA is required to calculate what is the maximum
size of the frame that it can received. So it can calculate what is the
page order needed to allocate for the received frames.
The first problem was that, when the max MTU was calculated it was
reading the value from dev and not from HW, so in this way it was
missing L2 header + the FCS.
The other problem was that once the skb is created using
__build_skb_around, it would reserve some space for skb_shared_info.
So if we received a frame which size is at the limit of the page order
then the creating will failed because it would not have space to put all
the data.
Fixes: 2ea1cbac267e ("net: lan966x: Update FDMA to change MTU.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When vlan filtering is enabled/disabled, it is required to adjust the
maximum received frame size that it can received. When vlan filtering is
enabled, it would all to receive extra 4 bytes, that are the vlan tag.
So the maximum frame size would be 1522 with a vlan tag. If vlan
filtering is disabled then the maximum frame size would be 1518
regardless if there is or not a vlan tag.
Fixes: 6d2c186afa5d ("net: lan966x: Add vlan support.")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the MTU was changed, the lan966x didn't take in consideration
the L2 header and the FCS. So the HW was configured with a smaller
value than what was desired. Therefore the correct value to configure
the HW would be new_mtu + ETH_HLEN + ETH_FCS_LEN.
The vlan tag is not considered here, because at the time when the
blamed commit was added, there was no vlan filtering support. The
vlan fix will be part of the next patch.
Fixes: d28d6d2e37d1 ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
inet[46]_pton check the input length against
a sane length limit (INET[6]_ADDRSTRLEN), but
the strlen value gets truncated due to being stored in an int,
so there's a theoretical potential for a >4G string to pass
the limit test.
Use size_t since that's what strlen actually returns.
I've had a hunt for callers that could hit this, but
I've not managed to find anything that doesn't get checked with
some other limit first; but it's possible that I've missed
something in the depth of the storage target paths.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20221029014604.114024-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull documentation fixes from Jonathan Corbet:
"Four small fixes for the docs tree"
* tag 'docs-6.1-fixes' of git://git.lwn.net/linux:
docs/process/howto: Replace C89 with C11
Documentation: Fix spelling mistake in hacking.rst
Documentation: process: replace outdated LTS table w/ link
tracing/histogram: Update document for KEYS_MAX size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Fix a loop that occurs when using multiple net namespaces
* tag 'nfsd-6.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix net-namespace logic in __nfsd_file_cache_purge
|
|
If the namespace doesn't match the one in "net", then we'll continue,
but that doesn't cause another rhashtable_walk_next call, so it will
loop infinitely.
Fixes: ce502f81ba88 ("NFSD: Convert the filecache to use rhashtable")
Reported-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/ltp/Y1%2FP8gDAcWC%2F+VR3@pevik/
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc fixes from Paul McKenney:
"This contains a couple of fixes for string-function bugs"
* tag 'nolibc-urgent.2022.10.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
tools/nolibc/string: Fix memcmp() implementation
tools/nolibc: Fix missing strlen() definition and infinite loop with gcc-12
|
|
Pull kvm fixes from Paolo Bonzini:
"x86:
- fix lock initialization race in gfn-to-pfn cache (+selftests)
- fix two refcounting errors
- emulator fixes
- mask off reserved bits in CPUID
- fix bug with disabling SGX
RISC-V:
- update MAINTAINERS"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign()
KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format
KVM: x86: emulator: update the emulation mode after CR0 write
KVM: x86: emulator: update the emulation mode after rsm
KVM: x86: emulator: introduce emulator_recalc_and_set_mode
KVM: x86: emulator: em_sysexit should update ctxt->mode
KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test
KVM: selftests: Add tests in xen_shinfo_test to detect lock races
KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache
KVM: Initialize gfn_to_pfn_cache locks in dedicated helper
KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable
KVM: x86: Exempt pending triple fault from event injection sanity check
MAINTAINERS: git://github -> https://github.com for kvm-riscv
KVM: debugfs: Return retval of simple_attr_open() if it fails
KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open()
KVM: x86: Mask off reserved bits in CPUID.8000001FH
KVM: x86: Mask off reserved bits in CPUID.8000001AH
KVM: x86: Mask off reserved bits in CPUID.80000008H
KVM: x86: Mask off reserved bits in CPUID.80000006H
KVM: x86: Mask off reserved bits in CPUID.80000001H
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- fix use after free in exar driver
- spelling fix in comment
* tag 'linux-watchdog-6.1-rc4' of git://www.linux-watchdog.org/linux-watchdog:
drivers: watchdog: exar_wdt.c fix use after free
watchdog: sp805_wdt: fix spelling typo in comment
|
|
If an error occurs after the first kzalloc() the corresponding memory
allocation is never freed.
Add the missing kfree() in the error handling path, as already done in the
remove() function.
Fixes: 7e773594dada ("sfc: Separate efx_nic memory from net_device memory")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/dc114193121c52c8fa3779e49bdd99d4b41344a9.1667077009.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix 'cofiguration' typo in BPF samples README.
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221030180254.34138-1-tegongkang@gmail.com
|
|
No need to postpone this to the commit release path, since no packets
are walking over this object, this is accessed from control plane only.
This helped uncovered UAF triggered by races with the netlink notifier.
Fixes: 9dd732e0bdf5 ("netfilter: nf_tables: memleak flow rule from commit path")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
commit release path is invoked via call_rcu and it runs lockless to
release the objects after rcu grace period. The netlink notifier handler
might win race to remove objects that the transaction context is still
referencing from the commit release path.
Call rcu_barrier() to ensure pending rcu callbacks run to completion
if the list of transactions to be destroyed is not empty.
Fixes: 6001a930ce03 ("netfilter: nftables: introduce table ownership")
Reported-by: syzbot+8f747f62763bc6c32916@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
On 32-bit kernels, 64-bit syscall arguments are split into two
registers. For that to work with syscall wrappers, the prototype of the
syscall must have the argument split so that the wrapper macro properly
unpacks the arguments from pt_regs.
The fanotify_mark() syscall is one such syscall, which already has a
split prototype, guarded behind ARCH_SPLIT_ARG64.
So select ARCH_SPLIT_ARG64 to get that prototype and fix fanotify_mark()
on 32-bit kernels with syscall wrappers.
Note also that fanotify_mark() is the only usage of ARCH_SPLIT_ARG64.
Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221101034852.2340319-1-mpe@ellerman.id.au
|
|
Eric Dumazet says:
====================
inet: add drop monitor support
I recently tried to analyse flakes in ip_defrag selftest.
This failed miserably.
IPv4 and IPv6 reassembly units are causing false kfree_skb()
notifications. It is time to deal with this issue.
First two patches are changing core networking to better
deal with eventual skb frag_list chains, in respect
of kfree_skb/consume_skb status.
Last three patches are adding three new drop reasons,
and make sure skbs that have been reassembled into
a large datagram are no longer viewed as dropped ones.
After this, understanding why ip_defrag selftest is flaky
is possible using standard drop monitoring tools.
====================
Link: https://lore.kernel.org/r/20221029154520.2747444-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IPv4 reassembly unit can decide to drop frags based on
/proc/sys/net/ipv4/ipfrag_max_dist sysctl.
Add a specific drop reason to track this specific
and weird case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Used to track skbs freed after a timeout happened
in a reassmbly unit.
Passing a @reason argument to inet_frag_rbtree_purge()
allows to use correct consumed status for frags
that have been successfully re-assembled.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is used to track when a duplicate segment received by various
reassembly units is dropped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When an skb with a frag list is consumed, we currently
pretend all skbs in the frag list were dropped.
In order to fix this, add a @reason argument to skb_release_data()
and skb_release_all().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This will allow to simply use in the future:
kfree_skb_reason(skb, reason);
Instead of repeating sequences like:
if (dropped)
kfree_skb_reason(skb, reason);
else
consume_skb(skb);
For instance, following patch in the series is adding
@reason to skb_release_data() and skb_release_all(),
so that we can propagate a meaningful @reason whenever
consume_skb()/kfree_skb() have to take care of a potential frag_list.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RDMA overflows can happen if the Ethernet controller does not have
enough bandwidth allocated at the memory controller level, report RDMA
overflows and deal with saturation, similar to the RBUF overflow
counter.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221028222141.3208429-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Recently, we got two syzkaller problems because of oversize packet
when napi frags enabled.
One of the problems is because the first seg size of the iov_iter
from user space is very big, it is 2147479538 which is bigger than
the threshold value for bail out early in __alloc_pages(). And
skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc
reserves without __GFP_NOWARN flag. Thus we got a warning as following:
========================================================
WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
...
Call trace:
__alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
__alloc_pages_node include/linux/gfp.h:550 [inline]
alloc_pages_node include/linux/gfp.h:564 [inline]
kmalloc_large_node+0x94/0x350 mm/slub.c:4038
__kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
__kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
__skb_grow include/linux/skbuff.h:2779 [inline]
tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
The other problem is because odd IPv6 packets without NEXTHDR_NONE
extension header and have big packet length, it is 2127925 which is
bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in
ipv6_gro_receive(), network_header offset and transport_header offset
are all bigger than U16_MAX. That would trigger skb->network_header
and skb->transport_header overflow error, because they are all '__u16'
type. Eventually, it would affect the value for __skb_push(skb, value),
and make it be a big value. After __skb_push() in ipv6_gro_receive(),
skb->data would less than skb->head, an out of bounds memory bug occurred.
That would trigger the problem as following:
==================================================================
BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260
...
Call trace:
dump_backtrace+0xd8/0x130
show_stack+0x1c/0x50
dump_stack_lvl+0x64/0x7c
print_address_description.constprop.0+0xbc/0x2e8
print_report+0x100/0x1e4
kasan_report+0x80/0x120
__asan_load8+0x78/0xa0
eth_type_trans+0x100/0x260
napi_gro_frags+0x164/0x550
tun_get_user+0xda4/0x1270
tun_chr_write_iter+0x74/0x130
do_iter_readv_writev+0x130/0x1ec
do_iter_write+0xbc/0x1e0
vfs_writev+0x13c/0x26c
To fix the problems, restrict the packet size less than
(ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved
skb space in napi_alloc_skb() because transport_header is an offset from
skb->head. Add len check in tun_napi_alloc_frags() simply.
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the FTMAC100 is used as a DSA master, then it is expected that frames
which are MTU sized on the wire facing the external switch port (1500
octets in L2 payload, plus L2 header) also get a DSA tag when seen by
the host port.
This extra tag increases the length of the packet as the host port sees
it, and the FTMAC100 is not prepared to handle frames whose length
exceeds 1518 octets (including FCS) at all.
Only a minimal rework is needed to support this configuration. Since
MTU-sized DSA-tagged frames still fit within a single buffer (RX_BUF_SIZE),
we just need to optimize the resource management rather than implement
multi buffer RX.
In ndo_change_mtu(), we toggle the FTMAC100_MACCR_RX_FTL bit to tell the
hardware to drop (or not) frames with an L2 payload length larger than
1500. We need to replicate the MACCR configuration in ftmac100_start_hw()
as well, since there is a hardware reset there which clears previous
settings.
The advantage of dynamically changing FTMAC100_MACCR_RX_FTL is that when
dev->mtu is at the default value of 1500, large frames are automatically
dropped in hardware and we do not spend CPU cycles dropping them.
Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20221028183220.155948-3-saproj@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver uses the MAX_PKT_SIZE (1518) for both MTU reporting and for
TX. However, the 2 places do not measure the same thing.
On TX, skb->len measures the entire L2 packet length (without FCS, which
software does not possess). So the comparison against 1518 there is
correct.
What is not correct is the reporting of dev->max_mtu as 1518. Since MTU
measures L2 *payload* length (excluding L2 overhead) and not total L2
packet length, it means that the correct max_mtu supported by this
device is the standard 1500. Anything higher than that will be dropped
on RX currently.
To fix this, subtract VLAN_ETH_HLEN from MAX_PKT_SIZE when reporting the
max_mtu, since that is the difference between L2 payload length and
total L2 length as seen by software.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20221028183220.155948-2-saproj@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eliminate one check in the data path and move it elsewhere, to where our
real limitation is. We'll want to start processing "too long" frames in
the driver (currently there is a hardware MAC setting which drops
theses).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20221028183220.155948-1-saproj@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the .port_set_rgmii_delay hook is missing for the 88E6320
family, which causes failure to retrieve an IP address via DHCP.
Add mv88e6320_port_set_rgmii_delay() that allows applying the RGMII
delay for ports 2, 5, and 6, which are the only ports that can be used
in RGMII mode.
Tested on a custom i.MX8MN board connected to an 88E6320 switch.
This change also applies safely to the 88E6321 variant.
The only difference between 88E6320 versus 88E6321 is the temperature
grade and pinout.
They share exactly the same MDIO register map for ports 2, 5, and 6,
which are the only ports that can be used in RGMII mode.
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
[fabio: Improved commit log and extended it to mv88e6321_ops]
Signed-off-by: Fabio Estevam <festevam@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221028163158.198108-1-festevam@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Changed maintainers for vnic driver, since Dany has new responsibilities.
Also added Nick Child as reviewer.
Signed-off-by: Rick Lindsley <ricklind@us.ibm.com>
Link: https://lore.kernel.org/r/20221028203509.4070154-1-ricklind@us.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Hangbin Liu says:
====================
rtnetlink: Honour NLM_F_ECHO flag in rtnl_{new, del}link
Netlink messages are used for communicating between user and kernel space.
When user space configures the kernel with netlink messages, it can set the
NLM_F_ECHO flag to request the kernel to send the applied configuration back
to the caller. This allows user space to retrieve configuration information
that are filled by the kernel (either because these parameters can only be
set by the kernel or because user space let the kernel choose a default
value).
The kernel has support this feature in some places like RTM_{NEW, DEL}ADDR,
RTM_{NEW, DEL}ROUTE. This patch set handles NLM_F_ECHO flag and send link
info back after rtnl_{new, del}link.
====================
Link: https://lore.kernel.org/r/20221028084224.3509611-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch use the new helper unregister_netdevice_many_notify() for
rtnl_delete_link(), so that the kernel could reply unicast when userspace
set NLM_F_ECHO flag to request the new created interface info.
At the same time, the parameters of rtnl_delete_link() need to be updated
since we need nlmsghdr and portid info.
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch pass the netlink header message in rtnl_newlink_create() to
the new updated rtnl_configure_link(), so that the kernel could reply
unicast when userspace set NLM_F_ECHO flag to request the new created
interface info.
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add new helper unregister_netdevice_many_notify(), pass netlink message
header and portid, which could be used to notify userspace when flag
NLM_F_ECHO is set.
Make the unregister_netdevice_many() as a wrapper of new function
unregister_netdevice_many_notify().
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch pass netlink message header and portid to rtnl_configure_link()
All the functions in this call chain need to add the parameters so we can
use them in the last call rtnl_notify(), and notify the userspace about
the new link info if NLM_F_ECHO flag is set.
- rtnl_configure_link()
- __dev_notify_flags()
- rtmsg_ifinfo()
- rtmsg_ifinfo_event()
- rtmsg_ifinfo_build_skb()
- rtmsg_ifinfo_send()
- rtnl_notify()
Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub
suggested.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When this device is deferred, there is often no way to determine what
the cause was. Add some debug prints to make it easier to figure out
what is blocking the probe.
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/20221027190005.400839-1-sean.anderson@seco.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With the introduction of syscall wrappers all wrappers for syscalls with
64-bit arguments must be handled specially, not only those that have
unaligned 64-bit arguments. This left out the fallocate() and
sync_file_range2() syscalls.
Fixes: 7e92e01b7245 ("powerpc: Provide syscall wrapper")
Fixes: e23750623835 ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/87mt9cxd6g.fsf_-_@igel.home
|
|
The macros are defined backwards.
This affects the following compat syscalls:
- compat_sys_truncate64()
- compat_sys_ftruncate64()
- compat_sys_fallocate()
- compat_sys_sync_file_range()
- compat_sys_fadvise64_64()
- compat_sys_readahead()
- compat_sys_pread64()
- compat_sys_pwrite64()
Fixes: 43d5de2b67d7 ("asm-generic: compat: Support BE for long long args in 32-bit ABIs")
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Add list of affected syscalls]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/871qqoyvni.fsf_-_@igel.home
|
|
Now that the 32bit UP oddity is gone and 32bit uses always a sequence
count, there is no need for the fetch_irq() variants anymore.
Convert to the regular interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20221026123110.331690-1-bigeasy@linutronix.de
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes and regression fixes:
- fix a corner case when handling tree-mod-log chagnes in reallocated
notes
- fix crash on raid0 filesystems created with <5.4 mkfs.btrfs that
could lead to division by zero
- add missing super block checksum verification after thawing
filesystem
- handle one more case in send when dealing with orphan files
- fix parameter type mismatch for generation when reading dentry
- improved error handling in raid56 code
- better struct bio packing after recent cleanups"
* tag 'for-6.1-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: don't use btrfs_chunk::sub_stripes from disk
btrfs: fix type of parameter generation in btrfs_get_dentry
btrfs: send: fix send failure of a subcase of orphan inodes
btrfs: make thaw time super block check to also verify checksum
btrfs: fix tree mod log mishandling of reallocated nodes
btrfs: reorder btrfs_bio for better packing
btrfs: raid56: avoid double freeing for rbio if full_stripe_write() failed
btrfs: raid56: properly handle the error when unable to find the missing stripe
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM fix from Paul Moore:
"A single patch to the capabilities code to fix a potential memory leak
in the xattr allocation error handling"
* tag 'lsm-pr-20221031' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
|
|
Avoid that the hardware path is shown twice in the kernel log, and clean
up the output of the version numbers to show up in the same order as
they are listed in the hardware database in the hardware.c file.
Additionally, optimize the memory footprint of the hardware database
and mark some code as init code.
Fixes: cab56b51ec0e ("parisc: Fix device names in /proc/iomem")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.9+
|
|
We can't use "skb" again after passing it to qdisc_enqueue(). This is
basically identical to commit 2f09707d0c97 ("sch_sfb: Also store skb
len before calling child enqueue").
Fixes: d7f4f332f082 ("sch_red: update backlog as well")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Variable i is just being incremented and it's never used anywhere else. The
variable and the increment are redundant so remove it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jacob Keller says:
====================
ptp: convert drivers to .adjfine
Many drivers implementing PTP have not yet migrated to the new .adjfine
frequency adjustment implementation.
A handful of these drivers use hardware with a simple increment value which
is adjusted by multiplying by the adjustment factor and then dividing by
1 billion. This calculation is very easy to convert to .adjfine, by simply
updating the divisor.
Introduce new helper functions, diff_by_scaled_ppm and adjust_by_scaled_ppm
which perform the most common calculations used by drivers for this purpose.
The adjust_by_scaled_ppm takes the base increment and scaled PPM value, and
calculates the new increment to use.
A few drivers need the difference and direction rather than a raw increment
value. The diff_by_scaled_ppm calculates the difference and returns true if
it should be a subtraction, false otherwise. This most closely aligns with
existing driver implementations.
I previously submitted v1 of this series at [1], and got some feedback only
on a handful of drivers. In the interest of merging the changes which have
received feedback, I've dropped the following drivers out of this send:
* ptp_phc
* ptp_ipx46x
* tg3
* hclge
* stmac
* cpts
I plan to submit those drivers changes again at a later date. As before,
there are some drivers which are not trivial to convert to the new helper
functions. While they may be able to work, their implementation is different
and I lack the hardware or datasheets to determine what the correct
implementation would be.
* drivers/net/ethernet/broadcom/bnx2x
* drivers/net/ethernet/broadcom/bnxt
* drivers/net/ethernet/cavium/liquidio
* drivers/net/ethernet/chelsio/cxgb4
* drivers/net/ethernet/freescale
* drivers/net/ethernet/qlogic/qed
* drivers/net/ethernet/qlogic/qede
* drivers/net/ethernet/sfc
* drivers/net/ethernet/sfc/siena
* drivers/net/ethernet/ti/am65-cpts.c
* drivers/ptp/ptp_dte.c
My end goal is to drop the .adjfreq implementation entirely, and to that end
I plan on modifying these drivers in the future to directly use
scaled_ppm_to_ppb as the simplest method to convert them.
Changes since v2:
* Rebased to allow landing in 6.2
* Added Richard's Acked-by
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Siva Reddy Kallam <siva.kallam@broadcom.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Bryan Whitehead <bryan.whitehead@microchip.com>
Cc: Sergey Shtylyov <s.shtylyov@omp.ru>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Vivek Thampi <vithampi@vmware.com>
Cc: VMware PV-Drivers Reviewers <pv-drivers@vmware.com>
Cc: Jie Wang <wangjie125@huawei.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Guangbin Huang <huangguangbin2@huawei.com>
Cc: Eran Ben Elisha <eranbe@nvidia.com>
Cc: Aya Levin <ayal@nvidia.com>
Cc: Cai Huoqing <cai.huoqing@linux.dev>
Cc: Biju Das <biju.das.jz@bp.renesas.com>
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Wan Jiabing <wanjiabing@vivo.com>
Cc: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Arnd Bergmann <arnd@arndb.de>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The xgbe implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this driver to .adjfine and use adjust_by_scaled_ppm to calculate
the new addend value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ravb implementation of .adjfreq is implemented in terms of a
straight forward "base * ppb / 1 billion" calculation.
Convert this driver to .adjfine and use the adjust_by_scaled_ppm helper
function to calculate the new addend.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Cc: Sergey Shtylyov <s.shtylyov@omp.ru>
Cc: Biju Das <biju.das.jz@bp.renesas.com>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Cc: linux-renesas-soc@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|