Age | Commit message (Collapse) | Author |
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
Disabling preemption in seg6_output_core() is not good enough,
because seg6_output_core() is called from process context,
lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter seg6_output_core()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of
preempt_disable().
Apply a similar change in seg6_input_core().
Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled")
Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Lebrun <dlebrun@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
Disabling preemption in rpl_output() is not good enough,
because rpl_output() is called from process context,
lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter rpl_output()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of
preempt_disable().
Apply a similar change in rpl_input().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Aring <aahringo@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As explained in commit 1378817486d6 ("tipc: block BH
before using dst_cache"), net/core/dst_cache.c
helpers need to be called with BH disabled.
Disabling preemption in ioam6_output() is not good enough,
because ioam6_output() is called from process context,
lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter ioam6_output()
and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of
preempt_disable().
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Justin Iurman <justin.iurman@uliege.be>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base,
the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset
rq->data_ring.desc_size for the data ring that failed, which presumably
causes the hypervisor to reference it on packet reception.
To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell
the hypervisor to disable this feature.
[ 95.436876] kernel BUG at net/core/skbuff.c:207!
[ 95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1
[ 95.441558] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 95.443481] RIP: 0010:skb_panic+0x4d/0x4f
[ 95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50
ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9
ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24
[ 95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246
[ 95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f
[ 95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[ 95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60
[ 95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000
[ 95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0
[ 95.455682] FS: 0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000
[ 95.457178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0
[ 95.459791] Call Trace:
[ 95.460515] <IRQ>
[ 95.461180] ? __die_body.cold+0x19/0x27
[ 95.462150] ? die+0x2e/0x50
[ 95.462976] ? do_trap+0xca/0x110
[ 95.463973] ? do_error_trap+0x6a/0x90
[ 95.464966] ? skb_panic+0x4d/0x4f
[ 95.465901] ? exc_invalid_op+0x50/0x70
[ 95.466849] ? skb_panic+0x4d/0x4f
[ 95.467718] ? asm_exc_invalid_op+0x1a/0x20
[ 95.468758] ? skb_panic+0x4d/0x4f
[ 95.469655] skb_put.cold+0x10/0x10
[ 95.470573] vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3]
[ 95.471853] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3]
[ 95.473185] __napi_poll+0x2b/0x160
[ 95.474145] net_rx_action+0x2c6/0x3b0
[ 95.475115] handle_softirqs+0xe7/0x2a0
[ 95.476122] __irq_exit_rcu+0x97/0xb0
[ 95.477109] common_interrupt+0x85/0xa0
[ 95.478102] </IRQ>
[ 95.478846] <TASK>
[ 95.479603] asm_common_interrupt+0x26/0x40
[ 95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20
[ 95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
[ 95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246
[ 95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000
[ 95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001
[ 95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3
[ 95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260
[ 95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000
[ 95.495035] acpi_safe_halt+0x14/0x20
[ 95.496127] acpi_idle_do_entry+0x2f/0x50
[ 95.497221] acpi_idle_enter+0x7f/0xd0
[ 95.498272] cpuidle_enter_state+0x81/0x420
[ 95.499375] cpuidle_enter+0x2d/0x40
[ 95.500400] do_idle+0x1e5/0x240
[ 95.501385] cpu_startup_entry+0x29/0x30
[ 95.502422] start_secondary+0x11c/0x140
[ 95.503454] common_startup_64+0x13e/0x141
[ 95.504466] </TASK>
[ 95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback
vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock
sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev
loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx
crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel
sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi
ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc
scsi_dh_alua ip6_tables ip_tables fuse
[ 95.516536] ---[ end trace 0000000000000000 ]---
Fixes: 6f4833383e85 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()")
Signed-off-by: Matthias Stocker <mstocker@barracuda.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Ronak Doshi <ronak.doshi@broadcom.com>
Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we get to the end of the r8152's suspend() routine and we find that
the USB device is INACCESSIBLE then it means that some of our
preparation for suspend didn't take place. We need a USB reset to get
ourselves back in a consistent state so we can try again and that
can't happen during system suspend. Call pm_wakeup_event() to wake the
system up in this case.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/66590f25.170a0220.8b5ad.1752@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If we happened to get a USB transfer error during the transition to
suspend then the usb_queue_reset_device() that r8152_control_msg()
calls will get dropped on the floor. This is because
usb_lock_device_for_reset() (which usb_queue_reset_device() uses)
silently fails if it's called when a device is suspended or if too
much time passes.
Let's resolve this by resetting the device ourselves in r8152's
resume() function.
NOTE: due to timing, it's _possible_ that we could end up with two USB
resets: the one queued previously and the one called from the resume()
patch. This didn't happen in test cases I ran, though it's conceivably
possible. We can't easily know if this happened since
usb_queue_reset_device() can just silently drop the reset request. In
any case, it's not expected that this is a problem since the two
resets can't run at the same time (because of the device lock) and it
should be OK to reset the device twice. If somehow the double-reset
causes problems we could prevent resets from being queued up while
suspend is running.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/66590f22.170a0220.8b5ad.1750@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:
- Compile fix for cxl-test from missing linux/vmalloc.h
- Fix for memregion leaks in devm_cxl_add_region()
* tag 'cxl-fixes-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Fix memregion leaks in devm_cxl_add_region()
cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c
|
|
bpf_map_lookup_elem is invoked in bpf_prog3() already, no need to invoke
it again. This patch drops it.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/ea8458462b876ee445173e3effb535fd126137ed.1716446893.git.tanggeliang@kylinos.cn
|
|
The value of recv in msg_loop may be negative, like EWOULDBLOCK, so it's
necessary to check if it is positive before accumulating it to bytes_recvd.
Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/5172563f7c7b2a2e953cef02e89fc34664a7b190.1716446893.git.tanggeliang@kylinos.cn
|
|
The array size of map_fd[] is 9, not 8. This patch changes it as a more
general form: ARRAY_SIZE(map_fd).
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/0972529ee01ebf8a8fd2b310bdec90831c94be77.1716446893.git.tanggeliang@kylinos.cn
|
|
The program fds can be got by using bpf_program__fd(progs[]), then
prog_fd becomes useless. This patch drops it.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/9a6335e4d8dbab23c0d8906074457ceddd61e74b.1716446893.git.tanggeliang@kylinos.cn
|
|
bpf_program__attach_sockmap() needs to take a parameter of type bpf_program
instead of an fd, so tx_prog_fd becomes useless. This patch uses a pointer
tx_prog to point to an item in progs[] array.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/23b37f932c547dd1ebfe154bbc0b0e957be21ee6.1716446893.git.tanggeliang@kylinos.cn
|
|
Switch attachments to bpf_link using bpf_program__attach_sockmap() instead
of bpf_prog_attach().
This patch adds a new array progs[] to replace prog_fd[] array, set in
populate_progs() for each program in bpf object.
And another new array links[] to save the attached bpf_link. It is
initalized as NULL in populate_progs, set as the return valuses of
bpf_program__attach_sockmap(), and detached by bpf_link__detach().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/32cf8376a810e2e9c719f8e4cfb97132ed2d1f9c.1716446893.git.tanggeliang@kylinos.cn
|
|
There's already a definition of i in run_options() at the beginning, no
need to define a new one in "if (tx_prog_fd > 0)" block.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/8d690682330a59361562bca75d6903253d16f312.1716446893.git.tanggeliang@kylinos.cn
|
|
The values of tx_prog_fd in run_options() should not be 0, so set it as -1
in else branch, and test it using "if (tx_prog_fd > 0)" condition, not
"if (tx_prog_fd)" or "if (tx_prog_fd >= 0)".
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/08b20ffc544324d40939efeae93800772a91a58e.1716446893.git.tanggeliang@kylinos.cn
|
|
into HEAD
KVM/riscv fixes for 6.10, take #1
- No need to use mask when hart-index-bits is 0
- Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
|
|
* Fixes and debugging help for the #VE sanity check. Also disable
it by default, even for CONFIG_DEBUG_KERNEL, because it was found
to trigger spuriously (most likely a processor erratum as the
exact symptoms vary by generation).
* Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled
situation (GIF=0 or interrupt shadow) when the processor supports
virtual NMI. While generally KVM will not request an NMI window
when virtual NMIs are supported, in this case it *does* have to
single-step over the interrupt shadow or enable the STGI intercept,
in order to deliver the latched second NMI.
* Drop support for hand tuning APIC timer advancement from userspace.
Since we have adaptive tuning, and it has proved to work well,
drop the module parameter for manual configuration and with it a
few stupid bugs that it had.
|
|
Remove support for specifying a static local APIC timer advancement value,
and instead present a read-only boolean parameter to let userspace enable
or disable KVM's dynamic APIC timer advancement. Realistically, it's all
but impossible for userspace to specify an advancement that is more
precise than what KVM's adaptive tuning can provide. E.g. a static value
needs to be tuned for the exact hardware and kernel, and if KVM is using
hrtimers, likely requires additional tuning for the exact configuration of
the entire system.
Dropping support for a userspace provided value also fixes several flaws
in the interface. E.g. KVM interprets a negative value other than -1 as a
large advancement, toggling between a negative and positive value yields
unpredictable behavior as vCPUs will switch from dynamic to static
advancement, changing the advancement in the middle of VM creation can
result in different values for vCPUs within a VM, etc. Those flaws are
mostly fixable, but there's almost no justification for taking on yet more
complexity (it's minimal complexity, but still non-zero).
The only arguments against using KVM's adaptive tuning is if a setup needs
a higher maximum, or if the adjustments are too reactive, but those are
arguments for letting userspace control the absolute max advancement and
the granularity of each adjustment, e.g. similar to how KVM provides knobs
for halt polling.
Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com
Cc: Shuling Zhou <zhoushuling@huawei.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240522010304.1650603-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. Although KVM currently enforces LBRV for SEV-ES guests, there
are multiple issues with it:
o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR
interception is used to dynamically toggle LBRV for performance reasons,
this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:
[guest ~]# wrmsr 0x1d9 0x4
KVM: entry failed, hardware error 0xffffffff
EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
is of swap type A.
o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the
VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA
encryption.
[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
2023, Vol 2, 15.35.2 Enabling SEV-ES.
https://bugzilla.kernel.org/attachment.cgi?id=304653
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Co-developed-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES
guests. So, prevent SEV-ES guests when LBRV support is missing.
[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
2023, Vol 2, 15.35.2 Enabling SEV-ES.
https://bugzilla.kernel.org/attachment.cgi?id=304653
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM currently allows userspace to read/write MSRs even after the VMSA is
encrypted. This can cause unintentional issues if MSR access has side-
effects. For ex, while migrating a guest, userspace could attempt to
migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on
the target. Fix this by preventing access to those MSRs which are context
switched via the VMSA, once the VMSA is encrypted.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Some bootloader interface fixes, a dts fix, and a trivial cleanup"
* tag 'loongarch-fixes-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix GMAC's phy-mode definitions in dts
LoongArch: Override higher address bits in JUMP_VIRT_ADDR
LoongArch: Fix entry point in kernel image header
LoongArch: Add all CPUs enabled by fdt to NUMA node 0
LoongArch: Fix built-in DTB detection
LoongArch: Remove CONFIG_ACPI_TABLE_UPGRADE in platform_init()
|
|
After commit 1a80dbcb2dba, bpf_link can be freed by
link->ops->dealloc_deferred, but the code still tests and uses
link->ops->dealloc afterward, which leads to a use-after-free as
reported by syzbot. Actually, one of them should be sufficient, so
just call one of them instead of both. Also add a WARN_ON() in case
of any problematic implementation.
Fixes: 1a80dbcb2dba ("bpf: support deferring bpf_link dealloc to after RCU grace period")
Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com
|
|
Fix unchecked MSR access error for processors with no HWP support. On
such processors, maximum frequency can be changed by the system firmware
using ACPI event ACPI_PROCESSOR_NOTIFY_HIGEST_PERF_CHANGED. This results
in accessing HWP MSR 0x771.
Call Trace:
<TASK>
generic_exec_single+0x58/0x120
smp_call_function_single+0xbf/0x110
rdmsrl_on_cpu+0x46/0x60
intel_pstate_get_hwp_cap+0x1b/0x70
intel_pstate_update_limits+0x2a/0x60
acpi_processor_notify+0xb7/0x140
acpi_ev_notify_dispatch+0x3b/0x60
HWP MSR 0x771 can be only read on a CPU which supports HWP and enabled.
Hence intel_pstate_get_hwp_cap() can only be called when hwp_active is
true.
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Closes: https://lore.kernel.org/linux-pm/20240529155740.Hq2Hw7be@linutronix.de/
Fixes: e8217b4bece3 ("cpufreq: intel_pstate: Update the maximum CPU frequency consistently")
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The iterator variable dst cannot be NULL and the if check can be removed.
Remove it and fix the following Coccinelle/coccicheck warning reported
by itnull.cocci:
ERROR: iterator variable bound on line 762 cannot be NULL
Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240529101900.103913-2-thorsten.blum@toblux.com
|
|
make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in lib/test_bpf.o
Add the missing invocation of the MODULE_DESCRIPTION() macro.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240531-md-lib-test_bpf-v1-1-868e4bd2f9ed@quicinc.com
|
|
Correct typo in bpftool profiler and change all instances of 'MATRICS' to
'METRICS' in the profiler.bpf.c file.
Signed-off-by: Swan Beaujard <beaujardswan@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240602225812.81171-1-beaujardswan@gmail.com
|
|
'libcap' is unused since commit b1c2768a82b9 ("bpf: selftests: Remove libcap
usage from test_verifier"). Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240602234112.225107-4-linux@treblig.org
|
|
'key_t' is unused in a couple of files since the original commit 60dd49ea6539
("selftests/bpf: Add test for bpf array map iterators"). Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240602234112.225107-3-linux@treblig.org
|
|
'scale_test_def' is unused since commit 3762a39ce85f ("selftests/bpf: Split out
bpf_verif_scale selftests into multiple tests"). Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240602234112.225107-2-linux@treblig.org
|
|
Zba extension is very useful for generating addresses that index into array
of basic data types. This patch introduces sh2add and sh3add helpers for
RV32 and RV64 respectively, to accelerate addressing for array of unsigned
long data.
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240524075543.4050464-3-xiao.w.wang@intel.com
|
|
TOMOYO project has moved to SourceForge.net .
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
|
|
Currently, hw_link_id is sent in WMI service ready event but it
is not parsed anywhere.
But, in future, for multi-link operation, this parameter would be
needed by many WMI commands such as WMI beacon template
(WMI_BCN_TMPL_CMDID), WMI vdev start for Multi-link virtual AP
interfaces (WMI_VDEV_START_REQUEST_CMDID), WMI peer assoc command
(WMI_PEER_ASSOC_CMDID) for Multi-link peer and so on.
Hence, add changes to parse and store the hw_link_id received in
WMI service ready event in ath12k_pdev structure.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Signed-off-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com>
Signed-off-by: Harshitha Prem <quic_hprem@quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://msgid.link/20240529054955.4105240-1-quic_hprem@quicinc.com
|
|
Currently for ath12k PCI devices, firmware could be running in one of
several execution environments, e.g., PBL, SBL and mission mode etc. Among
which PBL is the only stage where PCIe link negotiation could happen. So
normally firmware runs in PBL in order to be enumerated during system reboot.
However it might not work in kernel crash scenario: ath12k target is not
found after warm reboot from kernel crash. This is because when kernel crashes,
ath12k host does nothing to firmware. And during warm reboot, WLAN power
is sustained. So firmware is likely to keep running in mission mode throughout
the bootup process. As a result PCIe link is not established and thus target
not enumerated.
So add a handler in panic notification list for ath12k. When kernel crashes,
this handler gets called and tries to reset target to PBL state. Then PCIe
link negotiation could happen and target gets enumerated.
This change applies to all PCI devices including WCN7850 and QCN9274.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://msgid.link/20240529021533.10861-1-quic_bqiang@quicinc.com
|
|
Vladimir Oltean says:
====================
Probing cleanup for the Felix DSA driver
This is a follow-up to Russell King's request for code consolidation
among felix_vsc9959, seville_vsc9953 and ocelot_ext, stated here:
https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/
Details are in individual patches. Testing was done on NXP LS1028A
(felix_vsc9959).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that the common felix_register_switch() from the umbrella driver
is the only entity that accesses these data structures, we can remove
them from the list of the exported symbols.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init code, which could be
made common [1].
[1]: https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/
Here, we take the following common steps:
- "felix" and "ds" structure allocation
- "felix", "ocelot" and "ds" basic structure initialization
- dsa_register_switch() call
and we make a common function out of them.
For every driver except felix_vsc9959, this is also the entire probing
procedure. For felix_vsc9959, we also need to do some PCI-specific
stuff, which can easily be reordered to be done before, and unwound on
failure.
We also have to convert the bus-specific platform_set_drvdata() and
pci_set_drvdata() calls into dev_set_drvdata(). But this should have no
impact on the behavior.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King points out that seville_vsc9953 populates
felix->info->num_tx_queues = 8, but this doesn't make it all the way
into ds->num_tx_queues (which is how the user interface netdev queues
get allocated) [1].
[1]: https://lore.kernel.org/all/20240415160150.yejcazpjqvn7vhxu@skbuf/
When num_tx_queues=0 for seville, this is implicitly converted to 1 by
dsa_user_create(), and this is good enough for basic operation for a
switch port. The tc qdisc offload layer works with netdev TX queues,
so for QoS offload we need to pretend we have multiple TX queues. The
VSC9953, like ocelot_ext, doesn't export QoS offload, so it doesn't
really matter. But we can definitely set num_tx_queues=8 for all
switches.
The felix->info->num_tx_queues construct itself seems unnecessary.
It was introduced by commit de143c0e274b ("net: dsa: felix: Configure
Time-Aware Scheduler via taprio offload") at a time when vsc9959
(LS1028A) was the only switch supported by the driver.
8 traffic classes, and 1 queue per traffic class, is a common
architectural feature of all switches in the family. So they could
all just set OCELOT_NUM_TC and be fine.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current placement of devm_request_threaded_irq() is inconvenient.
It is between the allocation of the "felix" structure and
dsa_register_switch(), both of which we'd like to refactor into a
function that's common for all switches. But the IRQ is specific to
felix_vsc9959.
A closer inspection of the felix_irq_handler() code suggests that
it does things that depend on the data structures having been fully
initialized. For example, ocelot_get_txtstamp() takes
&port->tx_skbs.lock, which has only been initialized in
ocelot_init_port() which has not run yet.
It is not one of those IRQF_SHARED IRQs, so CONFIG_DEBUG_SHIRQ_FIXME
shouldn't apply here, and thus, it doesn't really matter, because in
practice, the IRQ will not be triggered so early. Nonetheless, it is a
good practice for the driver to be prepared for it to fire as soon as it
is requested.
Create a new felix->info method for running custom code for vsc9959 from
within felix_setup(), and move the request_irq() call there. The
ocelot_ext should have an IRQ as well, so this should be a step in the
right direction for that model (VSC7512) as well.
Some minor changes are made while moving the code. Casts from void *
aren't necessary, so drop them, and rename felix_irq_handler() to the
more specific vsc9959_irq_handler().
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.
[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/
Prepare the ground in the felix_vsc9959 driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled
status"), PCI device drivers with OF bindings no longer need this check.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.
[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/
Prepare the ground in the seville_vsc9953 driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Russell King suggested that felix_vsc9959, seville_vsc9953 and
ocelot_ext have a large portion of duplicated init and teardown code,
which could be made common [1]. The teardown code could even be
simplified away if we made use of devres, something which is used here
and there in the felix driver, just not very consistently.
[1] https://lore.kernel.org/all/Zh1GvcOTXqb7CpQt@shell.armlinux.org.uk/
Prepare the ground in the ocelot_ext driver, by allocating the data
structures using devres and deleting the kfree() calls. This also
deletes the "Failed to allocate ..." message, since memory allocation
errors are extremely loud anyway, and it's hard to miss them.
Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Guangguan Wang says:
====================
net/smc: Change the upper boundary of SMC-R's snd_buf and rcv_buf to 512MB
SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.
In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.
The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SMCR_RMBE_SIZES is the upper boundary of SMC-R's snd_buf and rcv_buf.
The maximum bytes of snd_buf and rcv_buf can be calculated by 2^SMCR_
RMBE_SIZES * 16KB. SMCR_RMBE_SIZES = 5 means the upper boundary is 512KB.
TCP's snd_buf and rcv_buf max size is configured by net.ipv4.tcp_w/rmem[2]
whose default value is 4MB or 6MB, is much larger than SMC-R's upper
boundary.
In some scenarios, such as Recommendation System, the communication
pattern is mainly large size send/recv, where the size of snd_buf and
rcv_buf greatly affects performance. Due to the upper boundary
disadvantage, SMC-R performs poor than TCP in those scenarios. So it
is time to enlarge the upper boundary size of SMC-R's snd_buf and rcv_buf,
so that the SMC-R's snd_buf and rcv_buf can be configured to larger size
for performance gain in such scenarios.
The SMC-R rcv_buf's size will be transferred to peer by the field
rmbe_size in clc accept and confirm message. The length of the field
rmbe_size is four bits, which means the maximum value of SMCR_RMBE_SIZES
is 15. In case of frequently adjusting the value of SMCR_RMBE_SIZES
in different scenarios, set the value of SMCR_RMBE_SIZES to the maximum
value 15, which means the upper boundary of SMC-R's snd_buf and rcv_buf
is 512MB. As the real memory usage is determined by the value of
net.smc.w/rmem, not by the upper boundary, set the value of SMCR_RMBE_SIZES
to the maximum value has no side affects.
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
CONFIG_ARCH_NO_SG_CHAIN is defined
SG_MAX_SINGLE_ALLOC is used to limit maximum number of entries that
will be allocated in one piece of scatterlist. When the entries of
scatterlist exceeds SG_MAX_SINGLE_ALLOC, sg chain will be used. From
commit 7c703e54cc71 ("arch: switch the default on ARCH_HAS_SG_CHAIN"),
we can know that the macro CONFIG_ARCH_NO_SG_CHAIN is used to identify
whether sg chain is supported. So, SMC-R's rmb buffer should be limited
by SG_MAX_SINGLE_ALLOC only when the macro CONFIG_ARCH_NO_SG_CHAIN is
defined.
Fixes: a3fe3d01bd0d ("net/smc: introduce sg-logic for RMBs")
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Co-developed-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the PHY is powered up after powered down most of the registers are
reset, so the PHY setup code needs to be done again. In addition the
interrupt register will need to be setup again so that link status
indication works again.
Fixes: 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver")
Signed-off-by: Tristram Ha <tristram.ha@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The GMAC of Loongson chips cannot insert the correct 1.5-2ns delay. So
we need the PHY to insert internal delays for both transmit and receive
data lines from/to the PHY device. Fix this by changing the "phy-mode"
from "rgmii" to "rgmii-id" in dts.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
In JUMP_VIRT_ADDR we are performing an or calculation on address value
directly from pcaddi.
This will only work if we are currently running from direct 1:1 mapping
addresses or firmware's DMW is configured exactly same as kernel. Still,
we should not rely on such assumption.
Fix by overriding higher bits in address comes from pcaddi, so we can
get rid of or operator.
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Currently kernel entry in head.S is in DMW address range, firmware is
instructed to jump to this address after loading the kernel image.
However kernel should not make any assumption on firmware's DMW
setting, thus the entry point should be a physical address falls into
direct translation region.
Fix by converting entry address to physical and amend entry calculation
logic in libstub accordingly.
BTW, use ABSOLUTE() to calculate variables to make Clang/LLVM happy.
Cc: stable@vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|