summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-02-03bpf: test_run: Fix OOB access in bpf_prog_test_run_xdpLorenzo Bianconi
Fix the following kasan issue reported by syzbot: BUG: KASAN: slab-out-of-bounds in __skb_frag_set_page include/linux/skbuff.h:3242 [inline] BUG: KASAN: slab-out-of-bounds in bpf_prog_test_run_xdp+0x10ac/0x1150 net/bpf/test_run.c:972 Write of size 8 at addr ffff888048c75000 by task syz-executor.5/23405 CPU: 1 PID: 23405 Comm: syz-executor.5 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 __skb_frag_set_page include/linux/skbuff.h:3242 [inline] bpf_prog_test_run_xdp+0x10ac/0x1150 net/bpf/test_run.c:972 bpf_prog_test_run kernel/bpf/syscall.c:3356 [inline] __sys_bpf+0x1858/0x59a0 kernel/bpf/syscall.c:4658 __do_sys_bpf kernel/bpf/syscall.c:4744 [inline] __se_sys_bpf kernel/bpf/syscall.c:4742 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4742 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f4ea30dd059 RSP: 002b:00007f4ea1a52168 EFLAGS: 00000246 ORIG_RAX: 0000000000000141 RAX: ffffffffffffffda RBX: 00007f4ea31eff60 RCX: 00007f4ea30dd059 RDX: 0000000000000048 RSI: 0000000020000000 RDI: 000000000000000a RBP: 00007f4ea313708d R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc8367c5af R14: 00007f4ea1a52300 R15: 0000000000022000 </TASK> Allocated by task 23405: kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:437 [inline] ____kasan_kmalloc mm/kasan/common.c:516 [inline] ____kasan_kmalloc mm/kasan/common.c:475 [inline] __kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525 kmalloc include/linux/slab.h:586 [inline] kzalloc include/linux/slab.h:715 [inline] bpf_test_init.isra.0+0x9f/0x150 net/bpf/test_run.c:411 bpf_prog_test_run_xdp+0x2f8/0x1150 net/bpf/test_run.c:941 bpf_prog_test_run kernel/bpf/syscall.c:3356 [inline] __sys_bpf+0x1858/0x59a0 kernel/bpf/syscall.c:4658 __do_sys_bpf kernel/bpf/syscall.c:4744 [inline] __se_sys_bpf kernel/bpf/syscall.c:4742 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4742 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff888048c74000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes to the right of 4096-byte region [ffff888048c74000, ffff888048c75000) The buggy address belongs to the page: page:ffffea0001231c00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x48c70 head:ffffea0001231c00 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000010200 dead000000000100 dead000000000122 ffff888010c42140 raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated prep_new_page mm/page_alloc.c:2434 [inline] get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4165 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5389 alloc_pages+0x1aa/0x310 mm/mempolicy.c:2271 alloc_slab_page mm/slub.c:1799 [inline] allocate_slab mm/slub.c:1944 [inline] new_slab+0x28a/0x3b0 mm/slub.c:2004 ___slab_alloc+0x87c/0xe90 mm/slub.c:3018 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3105 slab_alloc_node mm/slub.c:3196 [inline] __kmalloc_node_track_caller+0x2cb/0x360 mm/slub.c:4957 kmalloc_reserve net/core/skbuff.c:354 [inline] __alloc_skb+0xde/0x340 net/core/skbuff.c:426 alloc_skb include/linux/skbuff.h:1159 [inline] nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:745 [inline] nsim_dev_trap_report drivers/net/netdevsim/dev.c:802 [inline] nsim_dev_trap_report_work+0x29a/0xbc0 drivers/net/netdevsim/dev.c:843 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 page last free stack trace: reset_page_owner include/linux/page_owner.h:24 [inline] free_pages_prepare mm/page_alloc.c:1352 [inline] free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1404 free_unref_page_prepare mm/page_alloc.c:3325 [inline] free_unref_page+0x19/0x690 mm/page_alloc.c:3404 qlink_free mm/kasan/quarantine.c:157 [inline] qlist_free_all+0x6d/0x160 mm/kasan/quarantine.c:176 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:283 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447 kasan_slab_alloc include/linux/kasan.h:260 [inline] slab_post_alloc_hook mm/slab.h:732 [inline] slab_alloc_node mm/slub.c:3230 [inline] slab_alloc mm/slub.c:3238 [inline] kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3243 getname_flags.part.0+0x50/0x4f0 fs/namei.c:138 getname_flags include/linux/audit.h:323 [inline] getname+0x8e/0xd0 fs/namei.c:217 do_sys_openat2+0xf5/0x4d0 fs/open.c:1208 do_sys_open fs/open.c:1230 [inline] __do_sys_openat fs/open.c:1246 [inline] __se_sys_openat fs/open.c:1241 [inline] __x64_sys_openat+0x13f/0x1f0 fs/open.c:1241 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Memory state around the buggy address: ffff888048c74f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888048c74f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffff888048c75080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888048c75100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: 1c19499825246 ("bpf: introduce frags support to bpf_prog_test_run_xdp()") Reported-by: syzbot+6d70ca7438345077c549@syzkaller.appspotmail.com Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/688c26f9dd6e885e58e8e834ede3f0139bb7fa95.1643835097.git.lorenzo@kernel.org
2022-02-03bpf, docs: Better document the atomic instructionsChristoph Hellwig
Use proper tables and RST markup to document the atomic instructions in a structured way. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220131183638.3934982-6-hch@lst.de
2022-02-03bpf, docs: Better document the extended instruction formatChristoph Hellwig
In addition to the normal 64-bit instruction encoding, eBPF also has a single instruction that uses a second 64-bit bits for a second immediate value. Instead of only documenting this format deep down in the document mention it in the instruction encoding section. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220131183638.3934982-5-hch@lst.de
2022-02-03bpf, docs: Better document the legacy packet access instructionChristoph Hellwig
Use consistent terminology and structured RST elements to better document these two oddball instructions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220131183638.3934982-4-hch@lst.de
2022-02-03bpf, docs: Better document the regular load and store instructionsChristoph Hellwig
Add a separate section and a little intro blurb for the regular load and store instructions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220131183638.3934982-3-hch@lst.de
2022-02-03bpf, docs: Document the byte swapping instructionsChristoph Hellwig
Add a section to document the byte swapping instructions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220131183638.3934982-2-hch@lst.de
2022-02-03Merge branch 'for-5.17-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Eric's fix for a long standing cgroup1 permission issue where it only checks for uid 0 instead of CAP which inadvertently allows unprivileged userns roots to modify release_agent userhelper - Fixes for the fallout from Waiman's recent cpuset work * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning cgroup-v1: Require capabilities to set release_agent cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
2022-02-03Merge branch 'net-ipa-enable-register-retention'Jakub Kicinski
Alex Elder says: ==================== net: ipa: enable register retention With runtime power management in place, we sometimes need to issue a command to enable retention of IPA register values before power collapse. This requires a new Device Tree property, whose presence will also be used to signal that the command is required. ==================== Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03net: ipa: request IPA register values be retainedAlex Elder
In some cases, the IPA hardware needs to request the always-on subsystem (AOSS) to coordinate with the IPA microcontroller to retain IPA register values at power collapse. This is done by issuing a QMP request to the AOSS microcontroller. A similar request ondoes that request. We must get and hold the "QMP" handle early, because we might get back EPROBE_DEFER for that. But the actual request should be sent while we know the IPA clock is active, and when we know the microcontroller is operational. Fixes: 1aac309d3207 ("net: ipa: use autosuspend") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03dt-bindings: net: qcom,ipa: add optional qcom,qmp propertyAlex Elder
For some systems, the IPA driver must make a request to ensure that its registers are retained across power collapse of the IPA hardware. On such systems, we'll use the existence of the "qcom,qmp" property as a signal that this request is required. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03cgroup/cpuset: Fix "suspicious RCU usage" lockdep warningWaiman Long
It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Fixes: 4716909cc5c5 ("cpuset: Track cpusets that use parent's effective_cpus") Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-03Merge branch 'bpf-libbpf-deprecated-cleanup'Daniel Borkmann
Andrii Nakryiko says: ==================== Clean up remaining missed uses of deprecated libbpf APIs across samples/bpf, selftests/bpf, libbpf, and bpftool. Also fix uninit variable warning in bpftool. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2022-02-03samples/bpf: Get rid of bpf_prog_load_xattr() useAndrii Nakryiko
Remove all the remaining uses of deprecated bpf_prog_load_xattr() API. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220202225916.3313522-7-andrii@kernel.org
2022-02-03selftests/bpf: Redo the switch to new libbpf XDP APIsAndrii Nakryiko
Switch to using new bpf_xdp_*() APIs across all selftests. Take advantage of a more straightforward and user-friendly semantics of old_prog_fd (0 means "don't care") in few places. This is a redo of 544356524dd6 ("selftests/bpf: switch to new libbpf XDP APIs"), which was previously reverted to minimize conflicts during bpf and bpf-next tree merge. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220202225916.3313522-6-andrii@kernel.org
2022-02-03selftests/bpf: Remove usage of deprecated feature probing APIsAndrii Nakryiko
Switch to libbpf_probe_*() APIs instead of the deprecated ones. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220202225916.3313522-5-andrii@kernel.org
2022-02-03bpftool: Fix uninit variable compilation warningAndrii Nakryiko
Newer GCC complains about capturing the address of unitialized variable. While there is nothing wrong with the code (the variable is filled out by the kernel), initialize the variable anyway to make compiler happy. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220202225916.3313522-4-andrii@kernel.org
2022-02-03bpftool: Stop supporting BPF offload-enabled feature probingAndrii Nakryiko
libbpf 1.0 is not going to support passing ifindex to BPF prog/map/helper feature probing APIs. Remove the support for BPF offload feature probing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220202225916.3313522-3-andrii@kernel.org
2022-02-03libbpf: Stop using deprecated bpf_map__is_offload_neutral()Andrii Nakryiko
Open-code bpf_map__is_offload_neutral() logic in one place in to-be-deprecated bpf_prog_load_xattr2. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20220202225916.3313522-2-andrii@kernel.org
2022-02-03tools/resolve_btfids: Do not print any commands when building silentlyNathan Chancellor
When building with 'make -s', there is some output from resolve_btfids: $ make -sj"$(nproc)" oldconfig prepare MKDIR .../tools/bpf/resolve_btfids/libbpf/ MKDIR .../tools/bpf/resolve_btfids//libsubcmd LINK resolve_btfids Silent mode means that no information should be emitted about what is currently being done. Use the $(silent) variable from Makefile.include to avoid defining the msg macro so that there is no information printed. Fixes: fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
2022-02-03Revert "mm/gup: small refactoring: simplify try_grab_page()"John Hubbard
This reverts commit 54d516b1d62ff8f17cee2da06e5e4706a0d00b8a That commit did a refactoring that effectively combined fast and slow gup paths (again). And that was again incorrect, for two reasons: a) Fast gup and slow gup get reference counts on pages in different ways and with different goals: see Linus' writeup in commit cd1adf1b63a1 ("Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly""), and b) try_grab_compound_head() also has a specific check for "FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller can fall back to slow gup. This resulted in new failures, as recently report by Will McVicker [1]. But (a) has problems too, even though they may not have been reported yet. So just revert this. Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1] Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()") Reported-and-tested-by: Will McVicker <willmcvicker@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Minchan Kim <minchan@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: stable@vger.kernel.org # 5.15 Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03Merge tag 'mips-fixes-5.17_2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fixes from Thomas Bogendoerfer: - fix missed change for PTR->PTR_WD conversion - kernel-doc fixes * tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: KVM: fix vz.c kernel-doc notation MIPS: octeon: Fix missed PTR->PTR_WD conversion
2022-02-03Merge branch 'dsa-mv88e6xxx-phylink_generic_validate'David S. Miller
Russell King says: ==================== net: dsa: mv88e6xxx: convert to phylink_generic_validate() The overall objective of this series is to convert the mv88e6xxx DSA driver to use phylink_generic_validate(). Patch 1 adds a new helper mv88e6352_g2_scratch_port_has_serdes() which indicates whether an 88e6352 port has a serdes associated with it. This is necessary as ports 4 and 5 will normally be in automedia mode, where the CMODE field in the port status register will change e.g. between 15 (internal PHY) and 9 (1000base-X) depending on whether the serdes has link. The existing code caches the cmode field, and depending whether the serdes has link at probe time, determines whether we allow things such as the serdes statistics to be accessed. This means if the link isn't up at probe time, the serdes is essentially unavailable. Patch 1 addresses this by reading the pin configuration to find out whether the serdes is attached to port 4 or port 5. Patch 2 is a joint effort between myself and Marek Behún, adding the supported interfaces and MAC capabilities to all mv88e6xxx supported switch devices. This is slightly more restrictive than the original code as we didn't used to care too much about the interface mode, but with this we do - which is why we must know if there's a serdes associated now. Patch 3 switches mv88e6xxx to use the generic validation by removing the initialisation of the phylink_validate pointer in the dsa_ops struct. Patch 4 updates the statistics code to use the new helper in patch 1, so the serdes statistics are available even if the link was down at driver probe time. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: improve 88e6352 serdes statistics detectionRussell King (Oracle)
The decision whether to report serdes statistics currently depends on the cached C_Mode value for the port, read at probe time or updated by configuration. However, port 4 can be in "automedia" mode when it is used as a serdes port, meaning it switches between the internal PHY and the serdes, changing the read-only C_Mode value depending on which first gains link. Consequently, the C_Mode value read at probe does not accurately reflect whether the port has the serdes associated with it. In "net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()", we added a way to read the hardware configuration to determine which port has the serdes associated with it. Use this to determine which port reports the serdes statistics. Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: convert to phylink_generic_validate()Russell King (Oracle)
Now that the mv88e6xxx chip drivers are supplying the supported interfaces and MAC capabilities, switch the driver to use the generic phylink validation implementation by removing our own validation implementations. This causes DSA to call phylink_generic_validate() on our behalf. Reviewed-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: populate supported_interfaces and mac_capabilitiesRussell King (Oracle)
Populate the supported interfaces and MAC capabilities for the Marvell MV88E6xxx DSA switches in preparation to using these for the validation functionality. Patch co-authored by Marek. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marek Behún <kabel@kernel.org> [ fixed 6341 and 6393x ] Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: add mv88e6352_g2_scratch_port_has_serdes()Russell King (Oracle)
Read the hardware configuration to determine which port is attached to the serdes. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'dsa-mv88e6xxx-port-isolation'David S. Miller
Tobias Waldekranz says: ==================== net: dsa: mv88e6xxx: Improve standalone port isolation The ideal isolation between standalone ports satisfies two properties: 1. Packets from one standalone port must not be forwarded to any other port. 2. Packets from a standalone port must be sent to the CPU port. mv88e6xxx solves (1) by isolating standalone ports using the PVT. Up to this point though, (2) has not guaranteed; as the ATU is still consulted, there is a chance that incoming packets never reach the CPU if its DA has previously been used as the SA of an earlier packet (see 1/5 for more details). This is typically not a problem, except for one very useful setup in which switch ports are looped in order to run the bridge kselftests in tools/testing/selftests/net/forwarding. This series attempts to solve (2). Ideally, we could simply use the "ForceMap" bit of more modern chips (Agate and newer) to classify all incoming packets as MGMT. This is not available on older silicon that is still widely used (Opal Plus chips like the 6097 for example). Instead, this series takes a two pronged approach: 1/5: Always clear MapDA on standalone ports to make sure that no ATU entry can lead packets astray. This solves (2) for single-chip systems. 2/5: Trivial prep work for 4/5. 3/5: Trivial prep work for 4/5. 4/5: On multi-chip systems though, this is not enough. On the incoming chip, the packet will be forced out towards the CPU thanks to 1/5, but on any intermediate chips the ATU is still consulted. We override this behavior by marking the reserved standalone VID (0) as a policy VID, the DSA ports' VID policy is set to TRAP. This will cause the packet to be reclassified as MGMT on the first intermediate chip, after which it's a straight shot towards the CPU. Finally, we allow more tests to be run on mv88e6xxx: 5/5: The bridge_vlan{,un}aware suites sets an ageing_time of 10s on the bridge it creates, but mv88e6xxx has a minimum supported time of 15s. Allow this time to be overridden in forwarding.config. With this series in place, mv88e6xxx passes the following kselftest suites: - bridge_port_isolation.sh - bridge_sticky_fdb.sh - bridge_vlan_aware.sh - bridge_vlan_unaware.sh v1 -> v2: - Wording/spelling (Vladimir) - Use standard iterator in dsa_switch_upstream_port (Vladimir) - Limit enabling of VTU port policy to downstream DSA ports (Vladimir) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03selftests: net: bridge: Parameterize ageing timeoutTobias Waldekranz
Allow the ageing timeout that is set on bridges to be customized from forwarding.config. This allows the tests to be run on hardware which does not support a 10s timeout (e.g. mv88e6xxx). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Improve multichip isolation of standalone portsTobias Waldekranz
Given that standalone ports are now configured to bypass the ATU and forward all frames towards the upstream port, extend the ATU bypass to multichip systems. Load VID 0 (standalone) into the VTU with the policy bit set. Since VID 4095 (bridged) is already loaded, we now know that all VIDs in use are always available in all VTUs. Therefore, we can safely enable 802.1Q on DSA ports. Setting the DSA ports' VTU policy to TRAP means that all incoming frames on VID 0 will be classified as MGMT - as a result, the ATU is bypassed on all subsequent switches. With this isolation in place, we are able to support configurations that are simultaneously very quirky and very useful. Quirky because it involves looping cables between local switchports like in this example: CPU | .------. .---0---. | .----0----. | sw0 | | | sw1 | '-1-2-3-' | '-1-2-3-4-' $ @ '---' $ @ % % We have three physically looped pairs ($, @, and %). This is very useful because it allows us to run the kernel's kselftests for the bridge on mv88e6xxx hardware. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Enable port policy support on 6097Tobias Waldekranz
This chip has support for the same per-port policy actions found in later versions of LinkStreet devices. Fixes: f3a2cd326e44 ("net: dsa: mv88e6xxx: introduce .port_set_policy") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Support policy entries in the VTUTobias Waldekranz
A VTU entry with policy enabled is used in combination with a port's VTU policy setting to override normal switching behavior for frames assigned to the entry's VID. A typical example is to Treat all frames in a particular VLAN as control traffic, and trap them to the CPU. In which case the relevant user port's VTU policy would be set to TRAP. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: mv88e6xxx: Improve isolation of standalone portsTobias Waldekranz
Clear MapDA on standalone ports to bypass any ATU lookup that might point the packet in the wrong direction. This means that all packets are flooded using the PVT config. So make sure that standalone ports are only allowed to communicate with the local upstream port. Here is a scenario in which this is needed: CPU | .----. .---0---. | .--0--. | sw0 | | | sw1 | '-1-2-3-' | '-1-2-' '---' - sw0p1 and sw1p1 are bridged - sw0p2 and sw1p2 are in standalone mode - Learning must be enabled on sw0p3 in order for hardware forwarding to work properly between bridged ports 1. A packet with SA :aa comes in on sw1p2 1a. Egresses sw1p0 1b. Ingresses sw0p3, ATU adds an entry for :aa towards port 3 1c. Egresses sw0p0 2. A packet with DA :aa comes in on sw0p2 2a. If an ATU lookup is done at this point, the packet will be incorrectly forwarded towards sw0p3. With this change in place, the ATU is bypassed and the packet is forwarded in accordance with the PVT, which only contains the CPU port. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'ptp-virtual-clock-improvements'David S. Miller
Miroslav Lichvar says: ==================== Virtual PTP clock improvements and fix v2: - dropped patch changing initial time of virtual clocks The first patch fixes an oops when unloading a driver with PTP clock and enabled virtual clocks. The other patches add missing features to make synchronization with virtual clocks work as well as with the physical clock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: add getcrosststamp() to virtual clocks.Miroslav Lichvar
If the physical clock supports cross timestamping (it has the getcrosststamp() function), provide a wrapper in the virtual clock to enable cross timestamping. This adds support for the PTP_SYS_OFFSET_PRECISE ioctl. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: add gettimex64() to virtual clocks.Miroslav Lichvar
If the physical clock has the gettimex64() function, provide a gettimex64() wrapper in the virtual clock to enable more accurate and stable synchronization. This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: increase maximum adjustment of virtual clocks.Miroslav Lichvar
Increase the maximum frequency offset of virtual clocks to 50% to enable faster slewing corrections. This value cannot be represented as scaled ppm when long has 32 bits, but that is already the case for other drivers, even those that provide the adjfine() function, i.e. 32-bit applications are expected to check for the limit. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03ptp: unregister virtual clocks when unregistering physical clock.Miroslav Lichvar
When unregistering a physical clock which has some virtual clocks, unregister the virtual clocks with it. This fixes the following oops, which can be triggered by unloading a driver providing a PTP clock when it has enabled virtual clocks: BUG: unable to handle page fault for address: ffffffffc04fc4d8 Oops: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:ptp_vclock_read+0x31/0xb0 Call Trace: timecounter_read+0xf/0x50 ptp_vclock_refresh+0x2c/0x50 ? ptp_clock_release+0x40/0x40 ptp_aux_kworker+0x17/0x30 kthread_worker_fn+0x9b/0x240 ? kthread_should_park+0x30/0x30 kthread+0xe2/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Fixes: 73f37068d540 ("ptp: support ptp physical/virtual clocks conversion") Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03page_pool: Refactor page_pool to enable fragmenting after allocationAlexander Duyck
This change is meant to permit a driver to perform "fragmenting" of the page from within the driver instead of the current model which requires pre-partitioning the page. The main motivation behind this is to support use cases where the page will be split up by the driver after DMA instead of before. With this change it becomes possible to start using page pool to replace some of the existing use cases where multiple references were being used for a single page, but the number needed was unknown as the size could be dynamic. For example, with this code it would be possible to do something like the following to handle allocation: page = page_pool_alloc_pages(); if (!page) return NULL; page_pool_fragment_page(page, DRIVER_PAGECNT_BIAS_MAX); rx_buf->page = page; rx_buf->pagecnt_bias = DRIVER_PAGECNT_BIAS_MAX; Then we would process a received buffer by handling it with: rx_buf->pagecnt_bias--; Once the page has been fully consumed we could then flush the remaining instances with: if (page_pool_defrag_page(page, rx_buf->pagecnt_bias)) continue; page_pool_put_defragged_page(pool, page -1, !!budget); The general idea is that we want to have the ability to allocate a page with excess fragment count and then trim off the unneeded fragments. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'dsa-phylink_generic_validate'David S. Miller
Russell King says: ==================== Trivial DSA conversions to phylink_generic_validate() This series converts five DSA drivers to use phylink_generic_validate(). No feedback or testing reports were received from the CFT posting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: xrs700x: convert to phylink_generic_validate()Russell King (Oracle)
Populate the supported interfaces and MAC capabilities for the xrs700x family of DSA switches and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. According to commit ee00b24f32eb ("net: dsa: add Arrow SpeedChips XRS700x driver") the switch supports one RMII port and up to three RGMII ports. This commit assumes that port 0 is the RMII port and the remainder are RGMII. This commit also results in the Autoneg bit being set in the ethtool link modes, which wasn't in the original; if this switch supports RGMII to a 10/100/1G PHY, then surely we want to allow Autoneg on the PHY. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: qca8k: convert to phylink_generic_validate()Russell King (Oracle)
Populate the supported interfaces and MAC capabilities for the QCA8K DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. In making this change, we bring consistency to the ethtool linkmodes that phylink's validate step produces, thereby following the expected behaviour as the phylink documentation has explained. Specifically, the ethtool 1000baseX_Full capability is now permitted for all interface modes, as it is a property of the PHY driver whether 1000baseX fiber connections can be supported. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: ksz8795: convert to phylink_generic_validate()Russell King (Oracle)
Populate the supported interfaces and MAC capabilities for the Microchip KSZ8795 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: bcm_sf2: convert to phylink_generic_validate()Russell King (Oracle)
Populate the supported interfaces and MAC capabilities for the bcm_sf2 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. The exclusion of Gigabit linkmodes for MII and Reverse MII links is handled within phylink_generic_validate() in phylink, so there is no need to make them conditional on the interface mode in the driver. Thanks to Florian Fainelli for suggesting how to populate the supported interfaces. Link: https://lore.kernel.org/r/3b3fed98-0c82-99e9-dc72-09fe01c2bcf3@gmail.com Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03net: dsa: ar9331: convert to phylink_generic_validate()Russell King (Oracle)
Populate the supported interfaces and MAC capabilities for the AR9331 DSA switch and remove the old validate implementation to allow DSA to use phylink_generic_validate() for this switch driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03Merge branch 'mptcp-next'David S. Miller
Mat Martineau says: ==================== mptcp: Miscellaneous changes for 5.18 Patch 1 has some minor cleanup in mptcp_write_options(). Patch 2 moves a rarely-needed branch to optimize mptcp_write_options(). Patch 3 adds a comment explaining which combinations of MPTCP option headers are expected. Patch 4 adds a pr_debug() for the MPTCP_RST option. Patches 5-7 allow setting MPTCP_PM_ADDR_FLAG_FULLMESH with the "set flags" netlink command. This allows changing the behavior of existing path manager endpoints. The flag was previously only set at endpoint creation time. Associated selftests also updated. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03selftests: mptcp: add fullmesh setting testsGeliang Tang
This patch added the fullmesh setting and clearing selftests in mptcp_join.sh. Now we can set both backup and fullmesh flags, so avoid using the words 'backup' and 'bkup'. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03selftests: mptcp: set fullmesh flag in pm_nl_ctlGeliang Tang
This patch added the fullmesh flag setting and clearing support in pm_nl_ctl: # pm_nl_ctl set ip flags fullmesh # pm_nl_ctl set ip flags nofullmesh Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: set fullmesh flag in pm_netlinkGeliang Tang
This patch added the fullmesh flag setting support in pm_netlink. If the fullmesh flag of the address is changed, remove all the related subflows, update the fullmesh flag and create subflows again. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: print out reset infos of MP_RSTGeliang Tang
This patch printed out the reset infos, reset_transient and reset_reason, of MP_RST in mptcp_parse_option() to show that MP_RST is received. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-03mptcp: clarify when options can be usedMatthieu Baerts
RFC8684 doesn't seem to clearly specify which MPTCP options can be used together. Some options are mutually exclusive -- e.g. MP_CAPABLE and MP_JOIN --, some can be used together -- e.g. DSS + MP_PRIO --, some can but we prefer not to -- e.g. DSS + ADD_ADDR -- and some have to be used together at some points -- e.g. MP_FAIL and DSS. We need to clarify this as a base before allowing other modifications. For example, does it make sense to send a RM_ADDR with an MPC or MPJ? This remains open for possible future discussions. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>