summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-03ipv6: sr: block BH in seg6_output_core() and seg6_input_core()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in seg6_output_core() is not good enough, because seg6_output_core() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter seg6_output_core() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in seg6_input_core(). Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled") Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Lebrun <dlebrun@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03net: ipv6: rpl_iptunnel: block BH in rpl_output() and rpl_input()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in rpl_output() is not good enough, because rpl_output() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter rpl_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Apply a similar change in rpl_input(). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <aahringo@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03ipv6: ioam: block BH from ioam6_output()Eric Dumazet
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled. Disabling preemption in ioam6_output() is not good enough, because ioam6_output() is called from process context, lwtunnel_output() only uses rcu_read_lock(). We might be interrupted by a softirq, re-enter ioam6_output() and corrupt dst_cache data structures. Fix the race by using local_bh_disable() instead of preempt_disable(). Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Justin Iurman <justin.iurman@uliege.be> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03vmxnet3: disable rx data ring on dma allocation failureMatthias Stocker
When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base, the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset rq->data_ring.desc_size for the data ring that failed, which presumably causes the hypervisor to reference it on packet reception. To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell the hypervisor to disable this feature. [ 95.436876] kernel BUG at net/core/skbuff.c:207! [ 95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1 [ 95.441558] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [ 95.443481] RIP: 0010:skb_panic+0x4d/0x4f [ 95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50 ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9 ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24 [ 95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246 [ 95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f [ 95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f [ 95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60 [ 95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000 [ 95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0 [ 95.455682] FS: 0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000 [ 95.457178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0 [ 95.459791] Call Trace: [ 95.460515] <IRQ> [ 95.461180] ? __die_body.cold+0x19/0x27 [ 95.462150] ? die+0x2e/0x50 [ 95.462976] ? do_trap+0xca/0x110 [ 95.463973] ? do_error_trap+0x6a/0x90 [ 95.464966] ? skb_panic+0x4d/0x4f [ 95.465901] ? exc_invalid_op+0x50/0x70 [ 95.466849] ? skb_panic+0x4d/0x4f [ 95.467718] ? asm_exc_invalid_op+0x1a/0x20 [ 95.468758] ? skb_panic+0x4d/0x4f [ 95.469655] skb_put.cold+0x10/0x10 [ 95.470573] vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3] [ 95.471853] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3] [ 95.473185] __napi_poll+0x2b/0x160 [ 95.474145] net_rx_action+0x2c6/0x3b0 [ 95.475115] handle_softirqs+0xe7/0x2a0 [ 95.476122] __irq_exit_rcu+0x97/0xb0 [ 95.477109] common_interrupt+0x85/0xa0 [ 95.478102] </IRQ> [ 95.478846] <TASK> [ 95.479603] asm_common_interrupt+0x26/0x40 [ 95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ 95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 [ 95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246 [ 95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000 [ 95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001 [ 95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3 [ 95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260 [ 95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000 [ 95.495035] acpi_safe_halt+0x14/0x20 [ 95.496127] acpi_idle_do_entry+0x2f/0x50 [ 95.497221] acpi_idle_enter+0x7f/0xd0 [ 95.498272] cpuidle_enter_state+0x81/0x420 [ 95.499375] cpuidle_enter+0x2d/0x40 [ 95.500400] do_idle+0x1e5/0x240 [ 95.501385] cpu_startup_entry+0x29/0x30 [ 95.502422] start_secondary+0x11c/0x140 [ 95.503454] common_startup_64+0x13e/0x141 [ 95.504466] </TASK> [ 95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse [ 95.516536] ---[ end trace 0000000000000000 ]--- Fixes: 6f4833383e85 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()") Signed-off-by: Matthias Stocker <mstocker@barracuda.com> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Reviewed-by: Ronak Doshi <ronak.doshi@broadcom.com> Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03r8152: Wake up the system if the we need a resetDouglas Anderson
If we get to the end of the r8152's suspend() routine and we find that the USB device is INACCESSIBLE then it means that some of our preparation for suspend didn't take place. We need a USB reset to get ourselves back in a consistent state so we can try again and that can't happen during system suspend. Call pm_wakeup_event() to wake the system up in this case. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/66590f25.170a0220.8b5ad.1752@mx.google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03r8152: If inaccessible at resume time, issue a resetDouglas Anderson
If we happened to get a USB transfer error during the transition to suspend then the usb_queue_reset_device() that r8152_control_msg() calls will get dropped on the floor. This is because usb_lock_device_for_reset() (which usb_queue_reset_device() uses) silently fails if it's called when a device is suspended or if too much time passes. Let's resolve this by resetting the device ourselves in r8152's resume() function. NOTE: due to timing, it's _possible_ that we could end up with two USB resets: the one queued previously and the one called from the resume() patch. This didn't happen in test cases I ran, though it's conceivably possible. We can't easily know if this happened since usb_queue_reset_device() can just silently drop the reset request. In any case, it's not expected that this is a problem since the two resets can't run at the same time (because of the device lock) and it should be OK to reset the device twice. If somehow the double-reset causes problems we could prevent resets from being queued up while suspend is running. Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/66590f22.170a0220.8b5ad.1750@mx.google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-03Merge tag 'cxl-fixes-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: - Compile fix for cxl-test from missing linux/vmalloc.h - Fix for memregion leaks in devm_cxl_add_region() * tag 'cxl-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/region: Fix memregion leaks in devm_cxl_add_region() cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c
2024-06-03selftests/bpf: Drop duplicate bpf_map_lookup_elem in test_sockmapGeliang Tang
bpf_map_lookup_elem is invoked in bpf_prog3() already, no need to invoke it again. This patch drops it. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/ea8458462b876ee445173e3effb535fd126137ed.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Check length of recv in test_sockmapGeliang Tang
The value of recv in msg_loop may be negative, like EWOULDBLOCK, so it's necessary to check if it is positive before accumulating it to bytes_recvd. Fixes: 16962b2404ac ("bpf: sockmap, add selftests") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/5172563f7c7b2a2e953cef02e89fc34664a7b190.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Fix size of map_fd in test_sockmapGeliang Tang
The array size of map_fd[] is 9, not 8. This patch changes it as a more general form: ARRAY_SIZE(map_fd). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/0972529ee01ebf8a8fd2b310bdec90831c94be77.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Drop prog_fd array in test_sockmapGeliang Tang
The program fds can be got by using bpf_program__fd(progs[]), then prog_fd becomes useless. This patch drops it. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/9a6335e4d8dbab23c0d8906074457ceddd61e74b.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Replace tx_prog_fd with tx_prog in test_sockmapGeliang Tang
bpf_program__attach_sockmap() needs to take a parameter of type bpf_program instead of an fd, so tx_prog_fd becomes useless. This patch uses a pointer tx_prog to point to an item in progs[] array. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/23b37f932c547dd1ebfe154bbc0b0e957be21ee6.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Use bpf_link attachments in test_sockmapGeliang Tang
Switch attachments to bpf_link using bpf_program__attach_sockmap() instead of bpf_prog_attach(). This patch adds a new array progs[] to replace prog_fd[] array, set in populate_progs() for each program in bpf object. And another new array links[] to save the attached bpf_link. It is initalized as NULL in populate_progs, set as the return valuses of bpf_program__attach_sockmap(), and detached by bpf_link__detach(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/32cf8376a810e2e9c719f8e4cfb97132ed2d1f9c.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Drop duplicate definition of i in test_sockmapGeliang Tang
There's already a definition of i in run_options() at the beginning, no need to define a new one in "if (tx_prog_fd > 0)" block. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/8d690682330a59361562bca75d6903253d16f312.1716446893.git.tanggeliang@kylinos.cn
2024-06-03selftests/bpf: Fix tx_prog_fd values in test_sockmapGeliang Tang
The values of tx_prog_fd in run_options() should not be 0, so set it as -1 in else branch, and test it using "if (tx_prog_fd > 0)" condition, not "if (tx_prog_fd)" or "if (tx_prog_fd >= 0)". Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/08b20ffc544324d40939efeae93800772a91a58e.1716446893.git.tanggeliang@kylinos.cn
2024-06-03Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini
into HEAD KVM/riscv fixes for 6.10, take #1 - No need to use mask when hart-index-bits is 0 - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
2024-06-03Merge branch 'kvm-fixes-6.10-1' into HEADPaolo Bonzini
* Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. * Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had.
2024-06-03KVM: x86: Drop support for hand tuning APIC timer advancement from userspaceSean Christopherson
Remove support for specifying a static local APIC timer advancement value, and instead present a read-only boolean parameter to let userspace enable or disable KVM's dynamic APIC timer advancement. Realistically, it's all but impossible for userspace to specify an advancement that is more precise than what KVM's adaptive tuning can provide. E.g. a static value needs to be tuned for the exact hardware and kernel, and if KVM is using hrtimers, likely requires additional tuning for the exact configuration of the entire system. Dropping support for a userspace provided value also fixes several flaws in the interface. E.g. KVM interprets a negative value other than -1 as a large advancement, toggling between a negative and positive value yields unpredictable behavior as vCPUs will switch from dynamic to static advancement, changing the advancement in the middle of VM creation can result in different values for vCPUs within a VM, etc. Those flaws are mostly fixable, but there's almost no justification for taking on yet more complexity (it's minimal complexity, but still non-zero). The only arguments against using KVM's adaptive tuning is if a setup needs a higher maximum, or if the adjustments are too reactive, but those are arguments for letting userspace control the absolute max advancement and the granularity of each adjustment, e.g. similar to how KVM provides knobs for halt polling. Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com Cc: Shuling Zhou <zhoushuling@huawei.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240522010304.1650603-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Delegate LBR virtualization to the processorRavi Bangoria
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. Although KVM currently enforces LBRV for SEV-ES guests, there are multiple issues with it: o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR interception is used to dynamically toggle LBRV for performance reasons, this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3: [guest ~]# wrmsr 0x1d9 0x4 KVM: entry failed, hardware error 0xffffffff EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR is of swap type A. o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA encryption. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Co-developed-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absentRavi Bangoria
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. So, prevent SEV-ES guests when LBRV support is missing. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Prevent MSR access post VMSA encryptionNikunj A Dadhania
KVM currently allows userspace to read/write MSRs even after the VMSA is encrypted. This can cause unintentional issues if MSR access has side- effects. For ex, while migrating a guest, userspace could attempt to migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on the target. Fix this by preventing access to those MSRs which are context switched via the VMSA, once the VMSA is encrypted. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03Merge tag 'loongarch-fixes-6.10-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Some bootloader interface fixes, a dts fix, and a trivial cleanup" * tag 'loongarch-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: Fix GMAC's phy-mode definitions in dts LoongArch: Override higher address bits in JUMP_VIRT_ADDR LoongArch: Fix entry point in kernel image header LoongArch: Add all CPUs enabled by fdt to NUMA node 0 LoongArch: Fix built-in DTB detection LoongArch: Remove CONFIG_ACPI_TABLE_UPGRADE in platform_init()
2024-06-03bpf: Fix a potential use-after-free in bpf_link_free()Cong Wang
After commit 1a80dbcb2dba, bpf_link can be freed by link->ops->dealloc_deferred, but the code still tests and uses link->ops->dealloc afterward, which leads to a use-after-free as reported by syzbot. Actually, one of them should be sufficient, so just call one of them instead of both. Also add a WARN_ON() in case of any problematic implementation. Fixes: 1a80dbcb2dba ("bpf: support deferring bpf_link dealloc to after RCU grace period") Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com
2024-06-03cpufreq: intel_pstate: Fix unchecked HWP MSR accessSrinivas Pandruvada
Fix unchecked MSR access error for processors with no HWP support. On such processors, maximum frequency can be changed by the system firmware using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results in accessing HWP MSR 0x771. Call Trace: <TASK> generic_exec_single+0x58/0x120 smp_call_function_single+0xbf/0x110 rdmsrl_on_cpu+0x46/0x60 intel_pstate_get_hwp_cap+0x1b/0x70 intel_pstate_update_limits+0x2a/0x60 acpi_processor_notify+0xb7/0x140 acpi_ev_notify_dispatch+0x3b/0x60 HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled. Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is true. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/ Fixes: e8217b4bece3 ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently") Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-03bpf, devmap: Remove unnecessary if check in for loopThorsten Blum
The iterator variable dst cannot be NULL and the if check can be removed. Remove it and fix the following Coccinelle/coccicheck warning reported by itnull.cocci: ERROR: iterator variable bound on line 762 cannot be NULL Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com
2024-06-03test_bpf: Add missing MODULE_DESCRIPTION()Jeff Johnson
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_bpf.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240531-md-lib-test_bpf-v1-1-868e4bd2f9ed@quicinc.com
2024-06-03bpftool: Fix typo in MAX_NUM_METRICS macro nameSwan Beaujard
Correct typo in bpftool profiler and change all instances of 'MATRICS' to 'METRICS' in the profiler.bpf.c file. Signed-off-by: Swan Beaujard <beaujardswan@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/bpf/20240602225812.81171-1-beaujardswan@gmail.com
2024-06-03selftests/bpf: Remove unused struct 'libcap'Dr. David Alan Gilbert
'libcap' is unused since commit b1c2768a82b9 ("bpf: selftests: Remove libcap usage from test_verifier"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240602234112.225107-4-linux@treblig.org
2024-06-03selftests/bpf: Remove unused 'key_t' structsDr. David Alan Gilbert
'key_t' is unused in a couple of files since the original commit 60dd49ea6539 ("selftests/bpf: Add test for bpf array map iterators"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240602234112.225107-3-linux@treblig.org
2024-06-03selftests/bpf: Remove unused struct 'scale_test_def'Dr. David Alan Gilbert
'scale_test_def' is unused since commit 3762a39ce85f ("selftests/bpf: Split out bpf_verif_scale selftests into multiple tests"). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240602234112.225107-2-linux@treblig.org
2024-06-03riscv, bpf: Introduce shift add helper with Zba optimizationXiao Wang
Zba extension is very useful for generating addresses that index into array of basic data types. This patch introduces sh2add and sh3add helpers for RV32 and RV64 respectively, to accelerate addressing for array of unsigned long data. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240524075543.4050464-3-xiao.w.wang@intel.com
2024-06-03tomoyo: update project linksTetsuo Handa
TOMOYO project has moved to SourceForge.net . Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
2024-06-03wifi: ath12k: add hw_link_id in ath12k_pdevKarthikeyan Periyasamy
Currently, hw_link_id is sent in WMI service ready event but it is not parsed anywhere. But, in future, for multi-link operation, this parameter would be needed by many WMI commands such as WMI beacon template (WMI_BCN_TMPL_CMDID), WMI vdev start for Multi-link virtual AP interfaces (WMI_VDEV_START_REQUEST_CMDID), WMI peer assoc command (WMI_PEER_ASSOC_CMDID) for Multi-link peer and so on. Hence, add changes to parse and store the hw_link_id received in WMI service ready event in ath12k_pdev structure. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 Signed-off-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com> Signed-off-by: Harshitha Prem <quic_hprem@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240529054955.4105240-1-quic_hprem@quicinc.com
2024-06-03wifi: ath12k: add panic handlerBaochen Qiang
Currently for ath12k PCI devices, firmware could be running in one of several execution environments, e.g., PBL, SBL and mission mode etc. Among which PBL is the only stage where PCIe link negotiation could happen. So normally firmware runs in PBL in order to be enumerated during system reboot. However it might not work in kernel crash scenario: ath12k target is not found after warm reboot from kernel crash. This is because when kernel crashes, ath12k host does nothing to firmware. And during warm reboot, WLAN power is sustained. So firmware is likely to keep running in mission mode throughout the bootup process. As a result PCIe link is not established and thus target not enumerated. So add a handler in panic notification list for ath12k. When kernel crashes, this handler gets called and tries to reset target to PBL state. Then PCIe link negotiation could happen and target gets enumerated. This change applies to all PCI devices including WCN7850 and QCN9274. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4 Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240529021533.10861-1-quic_bqiang@quicinc.com
2024-06-03Merge branch 'Felix-DSA-probing-cleanup'David S. Miller
Vladimir Oltean says: ==================== Probing cleanup for the Felix DSA driver This is a follow-up to Russell King's request for code consolidation among felix_vsc9959, seville_vsc9953 and ocelot_ext, stated here: https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/ Details are in individual patches. Testing was done on NXP LS1028A (felix_vsc9959). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: unexport felix_phylink_mac_ops and felix_switch_opsVladimir Oltean
Now that the common felix_register_switch() from the umbrella driver is the only entity that accesses these data structures, we can remove them from the list of the exported symbols. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Sai Krishna <saikrishnag@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: common probing codeVladimir Oltean
Russell King suggested that felix_vsc9959, seville_vsc9953 and ocelot_ext have a large portion of duplicated init code, which could be made common [1]. [1]: https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/ Here, we take the following common steps: - "felix" and "ds" structure allocation - "felix", "ocelot" and "ds" basic structure initialization - dsa_register_switch() call and we make a common function out of them. For every driver except felix_vsc9959, this is also the entire probing procedure. For felix_vsc9959, we also need to do some PCI-specific stuff, which can easily be reordered to be done before, and unwound on failure. We also have to convert the bus-specific platform_set_drvdata() and pci_set_drvdata() calls into dev_set_drvdata(). But this should have no impact on the behavior. Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: use ds->num_tx_queues = OCELOT_NUM_TC for all modelsVladimir Oltean
Russell King points out that seville_vsc9953 populates felix->info->num_tx_queues = 8, but this doesn't make it all the way into ds->num_tx_queues (which is how the user interface netdev queues get allocated) [1]. [1]: https://lore.kernel.org/all/20240415160150.yejcazpjqvn7vhxu@skbuf/ When num_tx_queues=0 for seville, this is implicitly converted to 1 by dsa_user_create(), and this is good enough for basic operation for a switch port. The tc qdisc offload layer works with netdev TX queues, so for QoS offload we need to pretend we have multiple TX queues. The VSC9953, like ocelot_ext, doesn't export QoS offload, so it doesn't really matter. But we can definitely set num_tx_queues=8 for all switches. The felix->info->num_tx_queues construct itself seems unnecessary. It was introduced by commit de143c0e274b ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload") at a time when vsc9959 (LS1028A) was the only switch supported by the driver. 8 traffic classes, and 1 queue per traffic class, is a common architectural feature of all switches in the family. So they could all just set OCELOT_NUM_TC and be fine. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: move devm_request_threaded_irq() to felix_setup()Vladimir Oltean
The current placement of devm_request_threaded_irq() is inconvenient. It is between the allocation of the "felix" structure and dsa_register_switch(), both of which we'd like to refactor into a function that's common for all switches. But the IRQ is specific to felix_vsc9959. A closer inspection of the felix_irq_handler() code suggests that it does things that depend on the data structures having been fully initialized. For example, ocelot_get_txtstamp() takes &port->tx_skbs.lock, which has only been initialized in ocelot_init_port() which has not run yet. It is not one of those IRQF_SHARED IRQs, so CONFIG_DEBUG_SHIRQ_FIXME shouldn't apply here, and thus, it doesn't really matter, because in practice, the IRQ will not be triggered so early. Nonetheless, it is a good practice for the driver to be prepared for it to fire as soon as it is requested. Create a new felix->info method for running custom code for vsc9959 from within felix_setup(), and move the request_irq() call there. The ocelot_ext should have an IRQ as well, so this should be a step in the right direction for that model (VSC7512) as well. Some minor changes are made while moving the code. Casts from void * aren't necessary, so drop them, and rename felix_irq_handler() to the more specific vsc9959_irq_handler(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: consistently use devres in felix_pci_probe()Vladimir Oltean
Russell King suggested that felix_vsc9959, seville_vsc9953 and ocelot_ext have a large portion of duplicated init and teardown code, which could be made common [1]. The teardown code could even be simplified away if we made use of devres, something which is used here and there in the felix driver, just not very consistently. [1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/ Prepare the ground in the felix_vsc9959 driver, by allocating the data structures using devres and deleting the kfree() calls. This also deletes the "Failed to allocate ..." message, since memory allocation errors are extremely loud anyway, and it's hard to miss them. Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: delete open coded status = "disabled" parsingVladimir Oltean
Since commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status"), PCI device drivers with OF bindings no longer need this check. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: use devres in seville_probe()Vladimir Oltean
Russell King suggested that felix_vsc9959, seville_vsc9953 and ocelot_ext have a large portion of duplicated init and teardown code, which could be made common [1]. The teardown code could even be simplified away if we made use of devres, something which is used here and there in the felix driver, just not very consistently. [1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/ Prepare the ground in the seville_vsc9953 driver, by allocating the data structures using devres and deleting the kfree() calls. This also deletes the "Failed to allocate ..." message, since memory allocation errors are extremely loud anyway, and it's hard to miss them. Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: dsa: ocelot: use devres in ocelot_ext_probe()Vladimir Oltean
Russell King suggested that felix_vsc9959, seville_vsc9953 and ocelot_ext have a large portion of duplicated init and teardown code, which could be made common [1]. The teardown code could even be simplified away if we made use of devres, something which is used here and there in the felix driver, just not very consistently. [1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/ Prepare the ground in the ocelot_ext driver, by allocating the data structures using devres and deleting the kfree() calls. This also deletes the "Failed to allocate ..." message, since memory allocation errors are extremely loud anyway, and it's hard to miss them. Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03Merge branch 'net-smc-snd_buf-rcv_buf'David S. Miller
Guangguan Wang says: ==================== net/smc: Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf. The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_ RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB. TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2] whose default value is 4MB or 6MB, is much larger than SMC-R's upper boundary. In some scenarios, such as Recommendation System, the communication pattern is mainly large size send/recv, where the size of snd_buf and rcv_buf greatly affects performance. Due to the upper boundary disadvantage, SMC-R performs poor than TCP in those scenarios. So it is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf, so that the SMC-R's snd_buf and rcv_buf can be configured to larger size for performance gain in such scenarios. The SMC-R rcv_buf's size will be transferred to peer by the field rmbe_size in clc accept and confirm message. The length of the field rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf is 512MB. As the real memory usage is determined by the value of net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES to the maximum value has no side affects. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net/smc: change SMCR_RMBE_SIZES from 5 to 15Guangguan Wang
SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf. The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_ RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB. TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2] whose default value is 4MB or 6MB, is much larger than SMC-R's upper boundary. In some scenarios, such as Recommendation System, the communication pattern is mainly large size send/recv, where the size of snd_buf and rcv_buf greatly affects performance. Due to the upper boundary disadvantage, SMC-R performs poor than TCP in those scenarios. So it is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf, so that the SMC-R's snd_buf and rcv_buf can be configured to larger size for performance gain in such scenarios. The SMC-R rcv_buf's size will be transferred to peer by the field rmbe_size in clc accept and confirm message. The length of the field rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf is 512MB. As the real memory usage is determined by the value of net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES to the maximum value has no side affects. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Co-developed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when ↵Guangguan Wang
CONFIG_ARCH_NO_SG_CHAIN is defined SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that will be allocated in one piece of scatterlist. When the entries of scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"), we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify whether sg chain is supported. So, SMC-R's rmb buffer should be limited by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is defined. Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Co-developed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03net: phy: micrel: fix KSZ9477 PHY issues after suspend/resumeTristram Ha
When the PHY is powered up after powered down most of the registers are reset, so the PHY setup code needs to be done again. In addition the interrupt register will need to be setup again so that link status indication works again. Fixes: 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-03LoongArch: Fix GMAC's phy-mode definitions in dtsHuacai Chen
The GMAC of Loongson chips cannot insert the correct 1.5-2ns delay. So we need the PHY to insert internal delays for both transmit and receive data lines from/to the PHY device. Fix this by changing the "phy-mode" from "rgmii" to "rgmii-id" in dts. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Override higher address bits in JUMP_VIRT_ADDRJiaxun Yang
In JUMP_VIRT_ADDR we are performing an or calculation on address value directly from pcaddi. This will only work if we are currently running from direct 1:1 mapping addresses or firmware's DMW is configured exactly same as kernel. Still, we should not rely on such assumption. Fix by overriding higher bits in address comes from pcaddi, so we can get rid of or operator. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Fix entry point in kernel image headerJiaxun Yang
Currently kernel entry in head.S is in DMW address range, firmware is instructed to jump to this address after loading the kernel image. However kernel should not make any assumption on firmware's DMW setting, thus the entry point should be a physical address falls into direct translation region. Fix by converting entry address to physical and amend entry calculation logic in libstub accordingly. BTW, use ABSOLUTE() to calculate variables to make Clang/LLVM happy. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>