summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-30bcachefs: improve missing journal write device error messageKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-30Merge tag 'modules-6.15-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules fixes from Petr Pavlu: "A single series to properly handle the module_kobject creation. This fixes a problem with missing /sys/module/<module>/drivers for built-in modules" * tag 'modules-6.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: drivers: base: handle module_kobject creation kernel: globalize lookup_or_create_module_kobject() kernel: refactor lookup_or_create_module_kobject() kernel: param: rename locate_module_kobject
2025-04-30cifs: Fix changing times and read-only attr over SMB1 smb_set_file_info() ↵Pali Rohár
function Function CIFSSMBSetPathInfo() is not supported by non-NT servers and returns error. Fallback code via open filehandle and CIFSSMBSetFileInfo() does not work neither because CIFS_open() works also only on NT server. Therefore currently the whole smb_set_file_info() function as a SMB1 callback for the ->set_file_info() does not work with older non-NT SMB servers, like Win9x and others. This change implements fallback code in smb_set_file_info() which will works with any server and allows to change time values and also to set or clear read-only attributes. To make existing fallback code via CIFSSMBSetFileInfo() working with also non-NT servers, it is needed to change open function from CIFS_open() (which is NT specific) to cifs_open_file() which works with any server (this is just a open wrapper function which choose the correct open function supported by the server). CIFSSMBSetFileInfo() is working also on non-NT servers, but zero time values are not treated specially. So first it is needed to fill all time values if some of them are missing, via cifs_query_path_info() call. There is another issue, opening file in write-mode (needed for changing attributes) is not possible when the file has read-only attribute set. The only option how to clear read-only attribute is via SMB_COM_SETATTR command. And opening directory is not possible neither and here the SMB_COM_SETATTR command is the only option how to change attributes. And CIFSSMBSetFileInfo() does not honor setting read-only attribute, so for setting is also needed to use SMB_COM_SETATTR command. Existing code in cifs_query_path_info() is already using SMB_COM_GETATTR as a fallback code path (function SMBQueryInformation()), so introduce a new function SMBSetInformation which will implement SMB_COM_SETATTR command. My testing showed that Windows XP SMB1 client is also using SMB_COM_SETATTR command for setting or clearing read-only attribute against non-NT server. So this can prove that this is the correct way how to do it. With this change it is possible set all 4 time values and all attributes, including clearing and setting read-only bit on non-NT SMB servers. Tested against Win98 SMB1 server. This change fixes "touch" command which was failing when called on existing file. And fixes also "chmod +w" and "chmod -w" commands which were also failing (as they are changing read-only attribute). Note that this change depends on following change "cifs: Improve cifs_query_path_info() and cifs_query_file_info()" as it require to query all 4 time attribute values. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-30cifs: Fix and improve cifs_query_path_info() and cifs_query_file_info()Pali Rohár
When CAP_NT_SMBS was not negotiated then do not issue CIFSSMBQPathInfo() and CIFSSMBQFileInfo() commands. CIFSSMBQPathInfo() is not supported by non-NT Win9x SMB server and CIFSSMBQFileInfo() returns from Win9x SMB server bogus data in Attributes field (for example lot of files are marked as reparse points, even Win9x does not support them and read-only bit is not marked for read-only files). Correct information is returned by CIFSFindFirst() or SMBQueryInformation() command. So as a fallback in cifs_query_path_info() function use CIFSFindFirst() with SMB_FIND_FILE_FULL_DIRECTORY_INFO level which is supported by both NT and non-NT servers and as a last option use SMBQueryInformation() as it was before. And in function cifs_query_file_info() immediately returns -EOPNOTSUPP when not communicating with NT server. Client then revalidate inode entry by the cifs_query_path_info() call, which is working fine. So fstat() syscall on already opened file will receive correct information. Note that both fallback functions in non-UNICODE mode expands wildcards. Therefore those fallback functions cannot be used on paths which contain SMB wildcard characters (* ? " > <). CIFSFindFirst() returns all 4 time attributes as opposite of SMBQueryInformation() which returns only one. With this change it is possible to query all 4 times attributes from Win9x server and at the same time, client minimize sending of unsupported commands to server. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-30smb: client: fix zero length for mkdir POSIX create contextJethro Donaldson
SMB create requests issued via smb311_posix_mkdir() have an incorrect length of zero bytes for the POSIX create context data. ksmbd server rejects such requests and logs "cli req too short" causing mkdir to fail with "invalid argument" on the client side. It also causes subsequent rmmod to crash in cifs_destroy_request_bufs() Inspection of packets sent by cifs.ko using wireshark show valid data for the SMB2_POSIX_CREATE_CONTEXT is appended with the correct offset, but with an incorrect length of zero bytes. Fails with ksmbd+cifs.ko only as Windows server/client does not use POSIX extensions. Fix smb311_posix_mkdir() to set req->CreateContextsLength as part of appending the POSIX creation context to the request. Signed-off-by: Jethro Donaldson <devel@jro.nz> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Namjae Jeon <linkinjeon@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-30cpufreq: intel_pstate: Unchecked MSR aceess in legacy modeSrinivas Pandruvada
When turbo mode is unavailable on a Skylake-X system, executing the command: # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo results in an unchecked MSR access error: WRMSR to 0x199 (attempted to write 0x0000000100001300). This issue was reproduced on an OEM (Original Equipment Manufacturer) system and is not a common problem across all Skylake-X systems. This error occurs because the MSR 0x199 Turbo Engage Bit (bit 32) is set when turbo mode is disabled. The issue arises when intel_pstate fails to detect that turbo mode is disabled. Here intel_pstate relies on MSR_IA32_MISC_ENABLE bit 38 to determine the status of turbo mode. However, on this system, bit 38 is not set even when turbo mode is disabled. According to the Intel Software Developer's Manual (SDM), the BIOS sets this bit during platform initialization to enable or disable opportunistic processor performance operations. Logically, this bit should be set in such cases. However, the SDM also specifies that "OS and applications must use CPUID leaf 06H to detect processors with opportunistic processor performance operations enabled." Therefore, in addition to checking MSR_IA32_MISC_ENABLE bit 38, verify that CPUID.06H:EAX[1] is 0 to accurately determine if turbo mode is disabled. Fixes: 4521e1a0ce17 ("cpufreq: intel_pstate: Reflect current no_turbo state correctly") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-30soundwire: intel_auxdevice: Fix system suspend/resume handlingRafael J. Wysocki
Before commit bca84a7b93fd ("PM: sleep: Use DPM_FLAG_SMART_SUSPEND conditionally") the runtime PM status of the device in intel_resume() had always been RPM_ACTIVE because setting DPM_FLAG_SMART_SUSPEND had caused the core to call pm_runtime_set_active() for that device during the "noirq" resume phase. For this reason, the pm_runtime_suspended() check in intel_resume() had never triggered and the code depending on it had never run. That had not caused any observable functional issues to appear, so effectively the code in question had never been needed. After commit bca84a7b93fd the core does not call pm_runtime_set_active() for all devices with DPM_FLAG_SMART_SUSPEND set any more and the code depending on the pm_runtime_suspended() check in intel_resume() runs if the device is runtime-suspended prior to a system-wide suspend transition. Unfortunately, when it runs, it breaks things due to the attempt to runtime-resume bus->dev which most likely is not ready for a runtime resume at that point. It also does other more-or-less questionable things. Namely, it calls pm_runtime_idle() for a device with a nonzero runtime PM usage counter which has no effect (all devices have nonzero runtime PM usage counters during system-wide suspend and resume). It also calls pm_runtime_mark_last_busy() for the device even though devices cannot runtime-suspend during system-wide suspend and resume (because their runtime PM usage counters are nonzero) and an analogous call is made in the same function later. Moreover, it sets the runtime PM status of the device to RPM_ACTIVE before activating it. For the reasons listed above, remove that code altogether. On top of that, add a pm_runtime_disable() call to intel_suspend() to prevent the device from being runtime-resumed at any point after intel_suspend() has started to manipulate it because the changes made by that function would be undone by a runtime-suspend of the device. Next, once runtime PM has been disabled, the runtime PM status of the device cannot change, so pm_runtime_status_suspended() can be used instead of pm_runtime_suspended() in intel_suspend(). Finally, make intel_resume() call pm_runtime_set_active() at the end to set the runtime PM status of the device to "active" because it has just been activated and re-enable runtime PM for it after that. Additionally, drop the setting of DPM_FLAG_SMART_SUSPEND from the driver because it has no effect on devices handled by it. Fixes: bca84a7b93fd ("PM: sleep: Use DPM_FLAG_SMART_SUSPEND conditionally") Reported-by: Bard Liao <yung-chuan.liao@linux.intel.com> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/12680420.O9o76ZdvQC@rjwysocki.net
2025-04-30io_uring/fdinfo: annotate racy sq/cq head/tail readsJens Axboe
syzbot complains about the cached sq head read, and it's totally right. But we don't need to care, it's just reading fdinfo, and reading the CQ or SQ tail/head entries are known racy in that they are just a view into that very instant and may of course be outdated by the time they are reported. Annotate both the SQ head and CQ tail read with data_race() to avoid this syzbot complaint. Link: https://lore.kernel.org/io-uring/6811f6dc.050a0220.39e3a1.0d0e.GAE@google.com/ Reported-by: syzbot+3e77fd302e99f5af9394@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-30nvmet-auth: always free derived key dataHannes Reinecke
After calling nvme_auth_derive_tls_psk() we need to free the resulting psk data, as either TLS is disable (and we don't need the data anyway) or the psk data is copied into the resulting key (and can be free, too). Fixes: fa2e0f8bbc68 ("nvmet-tcp: support secure channel concatenation") Reported-by: Yi Zhang <yi.zhang@redhat.com> Suggested-by: Maurizio Lombardi <mlombard@bsdbackstore.eu> Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-tcp: don't restore null sk_state_changeAlistair Francis
queue->state_change is set as part of nvmet_tcp_set_queue_sock(), but if the TCP connection isn't established when nvmet_tcp_set_queue_sock() is called then queue->state_change isn't set and sock->sk->sk_state_change isn't replaced. As such we don't need to restore sock->sk->sk_state_change if queue->state_change is NULL. This avoids NULL pointer dereferences such as this: [ 286.462026][ C0] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 286.462814][ C0] #PF: supervisor instruction fetch in kernel mode [ 286.463796][ C0] #PF: error_code(0x0010) - not-present page [ 286.464392][ C0] PGD 8000000140620067 P4D 8000000140620067 PUD 114201067 PMD 0 [ 286.465086][ C0] Oops: Oops: 0010 [#1] SMP KASAN PTI [ 286.465559][ C0] CPU: 0 UID: 0 PID: 1628 Comm: nvme Not tainted 6.15.0-rc2+ #11 PREEMPT(voluntary) [ 286.466393][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 286.467147][ C0] RIP: 0010:0x0 [ 286.467420][ C0] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ 286.467977][ C0] RSP: 0018:ffff8883ae008580 EFLAGS: 00010246 [ 286.468425][ C0] RAX: 0000000000000000 RBX: ffff88813fd34100 RCX: ffffffffa386cc43 [ 286.469019][ C0] RDX: 1ffff11027fa68b6 RSI: 0000000000000008 RDI: ffff88813fd34100 [ 286.469545][ C0] RBP: ffff88813fd34160 R08: 0000000000000000 R09: ffffed1027fa682c [ 286.470072][ C0] R10: ffff88813fd34167 R11: 0000000000000000 R12: ffff88813fd344c3 [ 286.470585][ C0] R13: ffff88813fd34112 R14: ffff88813fd34aec R15: ffff888132cdd268 [ 286.471070][ C0] FS: 00007fe3c04c7d80(0000) GS:ffff88840743f000(0000) knlGS:0000000000000000 [ 286.471644][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 286.472543][ C0] CR2: ffffffffffffffd6 CR3: 000000012daca000 CR4: 00000000000006f0 [ 286.473500][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 286.474467][ C0] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 [ 286.475453][ C0] Call Trace: [ 286.476102][ C0] <IRQ> [ 286.476719][ C0] tcp_fin+0x2bb/0x440 [ 286.477429][ C0] tcp_data_queue+0x190f/0x4e60 [ 286.478174][ C0] ? __build_skb_around+0x234/0x330 [ 286.478940][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.479659][ C0] ? __pfx_tcp_data_queue+0x10/0x10 [ 286.480431][ C0] ? tcp_try_undo_loss+0x640/0x6c0 [ 286.481196][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.482046][ C0] ? kvm_clock_get_cycles+0x14/0x30 [ 286.482769][ C0] ? ktime_get+0x66/0x150 [ 286.483433][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.484146][ C0] tcp_rcv_established+0x6e4/0x2050 [ 286.484857][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.485523][ C0] ? ipv4_dst_check+0x160/0x2b0 [ 286.486203][ C0] ? __pfx_tcp_rcv_established+0x10/0x10 [ 286.486917][ C0] ? lock_release+0x217/0x2c0 [ 286.487595][ C0] tcp_v4_do_rcv+0x4d6/0x9b0 [ 286.488279][ C0] tcp_v4_rcv+0x2af8/0x3e30 [ 286.488904][ C0] ? raw_local_deliver+0x51b/0xad0 [ 286.489551][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.490198][ C0] ? __pfx_tcp_v4_rcv+0x10/0x10 [ 286.490813][ C0] ? __pfx_raw_local_deliver+0x10/0x10 [ 286.491487][ C0] ? __pfx_nf_confirm+0x10/0x10 [nf_conntrack] [ 286.492275][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.492900][ C0] ip_protocol_deliver_rcu+0x8f/0x370 [ 286.493579][ C0] ip_local_deliver_finish+0x297/0x420 [ 286.494268][ C0] ip_local_deliver+0x168/0x430 [ 286.494867][ C0] ? __pfx_ip_local_deliver+0x10/0x10 [ 286.495498][ C0] ? __pfx_ip_local_deliver_finish+0x10/0x10 [ 286.496204][ C0] ? ip_rcv_finish_core+0x19a/0x1f20 [ 286.496806][ C0] ? lock_release+0x217/0x2c0 [ 286.497414][ C0] ip_rcv+0x455/0x6e0 [ 286.497945][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.498550][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.499137][ C0] ? __pfx_ip_rcv_finish+0x10/0x10 [ 286.499763][ C0] ? lock_release+0x217/0x2c0 [ 286.500327][ C0] ? dl_scaled_delta_exec+0xd1/0x2c0 [ 286.500922][ C0] ? __pfx_ip_rcv+0x10/0x10 [ 286.501480][ C0] __netif_receive_skb_one_core+0x166/0x1b0 [ 286.502173][ C0] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 286.502903][ C0] ? lock_acquire+0x2b2/0x310 [ 286.503487][ C0] ? process_backlog+0x372/0x1350 [ 286.504087][ C0] ? lock_release+0x217/0x2c0 [ 286.504642][ C0] process_backlog+0x3b9/0x1350 [ 286.505214][ C0] ? process_backlog+0x372/0x1350 [ 286.505779][ C0] __napi_poll.constprop.0+0xa6/0x490 [ 286.506363][ C0] net_rx_action+0x92e/0xe10 [ 286.506889][ C0] ? __pfx_net_rx_action+0x10/0x10 [ 286.507437][ C0] ? timerqueue_add+0x1f0/0x320 [ 286.507977][ C0] ? sched_clock_cpu+0x68/0x540 [ 286.508492][ C0] ? lock_acquire+0x2b2/0x310 [ 286.509043][ C0] ? kvm_sched_clock_read+0xd/0x20 [ 286.509607][ C0] ? handle_softirqs+0x1aa/0x7d0 [ 286.510187][ C0] handle_softirqs+0x1f2/0x7d0 [ 286.510754][ C0] ? __pfx_handle_softirqs+0x10/0x10 [ 286.511348][ C0] ? irqtime_account_irq+0x181/0x290 [ 286.511937][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.512510][ C0] do_softirq.part.0+0x89/0xc0 [ 286.513100][ C0] </IRQ> [ 286.513548][ C0] <TASK> [ 286.513953][ C0] __local_bh_enable_ip+0x112/0x140 [ 286.514522][ C0] ? __dev_queue_xmit+0x85d/0x3450 [ 286.515072][ C0] __dev_queue_xmit+0x872/0x3450 [ 286.515619][ C0] ? nft_do_chain+0xe16/0x15b0 [nf_tables] [ 286.516252][ C0] ? __pfx___dev_queue_xmit+0x10/0x10 [ 286.516817][ C0] ? selinux_ip_postroute+0x43c/0xc50 [ 286.517433][ C0] ? __pfx_selinux_ip_postroute+0x10/0x10 [ 286.518061][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.518606][ C0] ? ip_output+0x164/0x4a0 [ 286.519149][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.519671][ C0] ? ip_finish_output2+0x17d5/0x1fb0 [ 286.520258][ C0] ip_finish_output2+0xb4b/0x1fb0 [ 286.520787][ C0] ? __pfx_ip_finish_output2+0x10/0x10 [ 286.521355][ C0] ? __ip_finish_output+0x15d/0x750 [ 286.521890][ C0] ip_output+0x164/0x4a0 [ 286.522372][ C0] ? __pfx_ip_output+0x10/0x10 [ 286.522872][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.523402][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.524031][ C0] ? __pfx_ip_finish_output+0x10/0x10 [ 286.524605][ C0] ? __ip_queue_xmit+0x999/0x2260 [ 286.525200][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.525744][ C0] ? ipv4_dst_check+0x16a/0x2b0 [ 286.526279][ C0] ? lock_release+0x217/0x2c0 [ 286.526793][ C0] __ip_queue_xmit+0x1883/0x2260 [ 286.527324][ C0] ? __skb_clone+0x54c/0x730 [ 286.527827][ C0] __tcp_transmit_skb+0x209b/0x37a0 [ 286.528374][ C0] ? __pfx___tcp_transmit_skb+0x10/0x10 [ 286.528952][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.529472][ C0] ? seqcount_lockdep_reader_access.constprop.0+0x82/0x90 [ 286.530152][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.530691][ C0] tcp_write_xmit+0xb81/0x88b0 [ 286.531224][ C0] ? mod_memcg_state+0x4d/0x60 [ 286.531736][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.532253][ C0] __tcp_push_pending_frames+0x90/0x320 [ 286.532826][ C0] tcp_send_fin+0x141/0xb50 [ 286.533352][ C0] ? __pfx_tcp_send_fin+0x10/0x10 [ 286.533908][ C0] ? __local_bh_enable_ip+0xab/0x140 [ 286.534495][ C0] inet_shutdown+0x243/0x320 [ 286.535077][ C0] nvme_tcp_alloc_queue+0xb3b/0x2590 [nvme_tcp] [ 286.535709][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.536314][ C0] ? __pfx_nvme_tcp_alloc_queue+0x10/0x10 [nvme_tcp] [ 286.536996][ C0] ? do_raw_spin_unlock+0x54/0x1e0 [ 286.537550][ C0] ? _raw_spin_unlock+0x29/0x50 [ 286.538127][ C0] ? do_raw_spin_lock+0x129/0x260 [ 286.538664][ C0] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 286.539249][ C0] ? nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.539892][ C0] ? __wake_up+0x40/0x60 [ 286.540392][ C0] nvme_tcp_alloc_admin_queue+0xd5/0x340 [nvme_tcp] [ 286.541047][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.541589][ C0] nvme_tcp_setup_ctrl+0x8b/0x7a0 [nvme_tcp] [ 286.542254][ C0] ? _raw_spin_unlock_irqrestore+0x4c/0x60 [ 286.542887][ C0] ? __pfx_nvme_tcp_setup_ctrl+0x10/0x10 [nvme_tcp] [ 286.543568][ C0] ? trace_hardirqs_on+0x12/0x120 [ 286.544166][ C0] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 286.544792][ C0] ? nvme_change_ctrl_state+0x196/0x2e0 [nvme_core] [ 286.545477][ C0] nvme_tcp_create_ctrl+0x839/0xb90 [nvme_tcp] [ 286.546126][ C0] nvmf_dev_write+0x3db/0x7e0 [nvme_fabrics] [ 286.546775][ C0] ? rw_verify_area+0x69/0x520 [ 286.547334][ C0] vfs_write+0x218/0xe90 [ 286.547854][ C0] ? do_syscall_64+0x9f/0x190 [ 286.548408][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.549037][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.549659][ C0] ? __pfx_vfs_write+0x10/0x10 [ 286.550259][ C0] ? do_syscall_64+0x9f/0x190 [ 286.550840][ C0] ? syscall_exit_to_user_mode+0x8e/0x280 [ 286.551516][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.552180][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.552834][ C0] ? ksys_read+0xf5/0x1c0 [ 286.553386][ C0] ? __pfx_ksys_read+0x10/0x10 [ 286.553964][ C0] ksys_write+0xf5/0x1c0 [ 286.554499][ C0] ? __pfx_ksys_write+0x10/0x10 [ 286.555072][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.555698][ C0] ? syscall_exit_to_user_mode+0x93/0x280 [ 286.556319][ C0] ? do_syscall_64+0x54/0x190 [ 286.556866][ C0] do_syscall_64+0x93/0x190 [ 286.557420][ C0] ? rcu_read_unlock+0x17/0x60 [ 286.557986][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.558526][ C0] ? lock_release+0x217/0x2c0 [ 286.559087][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.559659][ C0] ? count_memcg_events.constprop.0+0x4a/0x60 [ 286.560476][ C0] ? exc_page_fault+0x7a/0x110 [ 286.561064][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.561647][ C0] ? lock_release+0x217/0x2c0 [ 286.562257][ C0] ? do_user_addr_fault+0x171/0xa00 [ 286.562839][ C0] ? do_user_addr_fault+0x4a2/0xa00 [ 286.563453][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.564112][ C0] ? rcu_is_watching+0x11/0xb0 [ 286.564677][ C0] ? irqentry_exit_to_user_mode+0x84/0x270 [ 286.565317][ C0] ? trace_hardirqs_on_prepare+0xdb/0x120 [ 286.565922][ C0] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 286.566542][ C0] RIP: 0033:0x7fe3c05e6504 [ 286.567102][ C0] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 8b 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 286.568931][ C0] RSP: 002b:00007fff76444f58 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 286.569807][ C0] RAX: ffffffffffffffda RBX: 000000003b40d930 RCX: 00007fe3c05e6504 [ 286.570621][ C0] RDX: 00000000000000cf RSI: 000000003b40d930 RDI: 0000000000000003 [ 286.571443][ C0] RBP: 0000000000000003 R08: 00000000000000cf R09: 000000003b40d930 [ 286.572246][ C0] R10: 0000000000000000 R11: 0000000000000202 R12: 000000003b40cd60 [ 286.573069][ C0] R13: 00000000000000cf R14: 00007fe3c07417f8 R15: 00007fe3c073502e [ 286.573886][ C0] </TASK> Closes: https://lore.kernel.org/linux-nvme/5hdonndzoqa265oq3bj6iarwtfk5dewxxjtbjvn5uqnwclpwt6@a2n6w3taxxex/ Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvmet-tcp: select CONFIG_TLS from CONFIG_NVME_TARGET_TCP_TLSAlistair Francis
Ensure that TLS support is enabled in the kernel when CONFIG_NVME_TARGET_TCP_TLS is enabled. Without this the code compiles, but does not actually work unless something else enables CONFIG_TLS. Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall") Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvme-tcp: select CONFIG_TLS from CONFIG_NVME_TCP_TLSAlistair Francis
Ensure that TLS support is enabled in the kernel when CONFIG_NVME_TCP_TLS is enabled. Without this the code compiles, but does not actually work unless something else enables CONFIG_TLS. Fixes: be8e82caa68 ("nvme-tcp: enable TLS handshake upcall") Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30nvme-tcp: fix premature queue removal and I/O failoverMichael Liang
This patch addresses a data corruption issue observed in nvme-tcp during testing. In an NVMe native multipath setup, when an I/O timeout occurs, all inflight I/Os are canceled almost immediately after the kernel socket is shut down. These canceled I/Os are reported as host path errors, triggering a failover that succeeds on a different path. However, at this point, the original I/O may still be outstanding in the host's network transmission path (e.g., the NIC’s TX queue). From the user-space app's perspective, the buffer associated with the I/O is considered completed since they're acked on the different path and may be reused for new I/O requests. Because nvme-tcp enables zero-copy by default in the transmission path, this can lead to corrupted data being sent to the original target, ultimately causing data corruption. We can reproduce this data corruption by injecting delay on one path and triggering i/o timeout. To prevent this issue, this change ensures that all inflight transmissions are fully completed from host's perspective before returning from queue stop. To handle concurrent I/O timeout from multiple namespaces under the same controller, always wait in queue stop regardless of queue's state. This aligns with the behavior of queue stopping in other NVMe fabric transports. Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Michael Liang <mliang@purestorage.com> Reviewed-by: Mohamed Khalfella <mkhalfella@purestorage.com> Reviewed-by: Randy Jennings <randyj@purestorage.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-04-30Merge branch 'bnxt_en-fixes'David S. Miller
Michael Chan says: ==================== bnxt_en: Misc. bug fixes This series fixes a bug in the driver initialization path, MSIX setup sequencing issue in the FW error and AER paths, a missing skb_mark_for_recycle() in the VLAN error path, some ethtool coredump fixes, an ethtool selftest fix, and an ethtool register dump byte order fix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix ethtool -d byte order for 32-bit valuesMichael Chan
For version 1 register dump that includes the PCIe stats, the existing code incorrectly assumes that all PCIe stats are 64-bit values. Fix it by using an array containing the starting and ending index of the 32-bit values. The loop in bnxt_get_regs() will use the array to do proper endian swap for the 32-bit values. Fixes: b5d600b027eb ("bnxt_en: Add support for 'ethtool -d'") Reviewed-by: Shruti Parab <shruti.parab@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix out-of-bound memcpy() during ethtool -wShruti Parab
When retrieving the FW coredump using ethtool, it can sometimes cause memory corruption: BUG: KFENCE: memory corruption in __bnxt_get_coredump+0x3ef/0x670 [bnxt_en] Corrupted memory at 0x000000008f0f30e8 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#45): __bnxt_get_coredump+0x3ef/0x670 [bnxt_en] ethtool_get_dump_data+0xdc/0x1a0 __dev_ethtool+0xa1e/0x1af0 dev_ethtool+0xa8/0x170 dev_ioctl+0x1b5/0x580 sock_do_ioctl+0xab/0xf0 sock_ioctl+0x1ce/0x2e0 __x64_sys_ioctl+0x87/0xc0 do_syscall_64+0x5c/0xf0 entry_SYSCALL_64_after_hwframe+0x78/0x80 ... This happens when copying the coredump segment list in bnxt_hwrm_dbg_dma_data() with the HWRM_DBG_COREDUMP_LIST FW command. The info->dest_buf buffer is allocated based on the number of coredump segments returned by the FW. The segment list is then DMA'ed by the FW and the length of the DMA is returned by FW. The driver then copies this DMA'ed segment list to info->dest_buf. In some cases, this DMA length may exceed the info->dest_buf length and cause the above BUG condition. Fix it by capping the copy length to not exceed the length of info->dest_buf. The extra DMA data contains no useful information. This code path is shared for the HWRM_DBG_COREDUMP_LIST and the HWRM_DBG_COREDUMP_RETRIEVE FW commands. The buffering is different for these 2 FW commands. To simplify the logic, we need to move the line to adjust the buffer length for HWRM_DBG_COREDUMP_RETRIEVE up, so that the new check to cap the copy length will work for both commands. Fixes: c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix coredump logic to free allocated bufferShruti Parab
When handling HWRM_DBG_COREDUMP_LIST FW command in bnxt_hwrm_dbg_dma_data(), the allocated buffer info->dest_buf is not freed in the error path. In the normal path, info->dest_buf is assigned to coredump->data and it will eventually be freed after the coredump is collected. Free info->dest_buf immediately inside bnxt_hwrm_dbg_dma_data() in the error path. Fixes: c74751f4c392 ("bnxt_en: Return error if FW returns more data than dump length") Reported-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Shruti Parab <shruti.parab@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: delay pci_alloc_irq_vectors() in the AER pathKashyap Desai
This patch is similar to the last patch to delay the pci_alloc_irq_vectors() call in the AER path until after calling bnxt_reserve_rings(). bnxt_reserve_rings() needs to properly map the MSIX table first before we call pci_alloc_irq_vectors() which may immediately write to the MSIX table in some architectures. Move the bnxt_init_int_mode() call from bnxt_io_slot_reset() to bnxt_io_resume() after calling bnxt_reserve_rings(). With this change, the AER path may call bnxt_open() -> bnxt_hwrm_if_change() with bp->irq_tbl set to NULL. bp->irq_tbl is cleared when we call bnxt_clear_int_mode() in bnxt_io_slot_reset(). So we cannot use !bp->irq_tbl to detect aborted FW reset. Add a new BNXT_FW_RESET_STATE_ABORT to detect aborted FW reset in bnxt_hwrm_if_change(). Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: call pci_alloc_irq_vectors() after bnxt_reserve_rings()Kashyap Desai
On some architectures (e.g. ARM), calling pci_alloc_irq_vectors() will immediately cause the MSIX table to be written. This will not work if we haven't called bnxt_reserve_rings() to properly map the MSIX table to the MSIX vectors reserved by FW. Fix the FW error recovery path to delay the bnxt_init_int_mode() -> pci_alloc_irq_vectors() call by removing it from bnxt_hwrm_if_change(). bnxt_request_irq() later in the code path will call it and by then the MSIX table is properly mapped. Fixes: 4343838ca5eb ("bnxt_en: Replace deprecated PCI MSIX APIs") Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Add missing skb_mark_for_recycle() in bnxt_rx_vlan()Somnath Kotur
If bnxt_rx_vlan() fails because the VLAN protocol ID is invalid, the SKB is freed but we're missing the call to recycle it. This may cause the warning: "page_pool_release_retry() stalled pool shutdown" Add the missing skb_mark_for_recycle() in bnxt_rx_vlan(). Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix ethtool selftest output in one of the failure casesKalesh AP
When RDMA driver is loaded, running offline self test is not supported and driver returns failure early. But it is not clearing the input buffer and hence the application prints some junk characters for individual test results. Fix it by clearing the buffer before returning. Fixes: 895621f1c816 ("bnxt_en: Don't support offline self test when RoCE driver is loaded") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30bnxt_en: Fix error handling path in bnxt_init_chip()Shravya KN
WARN_ON() is triggered in __flush_work() if bnxt_init_chip() fails because we call cancel_work_sync() on dim work that has not been initialized. WARNING: CPU: 37 PID: 5223 at kernel/workqueue.c:4201 __flush_work.isra.0+0x212/0x230 The driver relies on the BNXT_STATE_NAPI_DISABLED bit to check if dim work has already been cancelled. But in the bnxt_open() path, BNXT_STATE_NAPI_DISABLED is not set and this causes the error path to think that it needs to cancel the uninitalized dim work. Fix it by setting BNXT_STATE_NAPI_DISABLED during initialization. The bit will be cleared when we enable NAPI and initialize dim work. Fixes: 40452969a506 ("bnxt_en: Fix DIM shutdown") Suggested-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.Sean Christopherson
When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE value. Consulting only the host kernel's host vs. guest masks results in running the guest with PEBS enabled even when the guest doesn't want to use PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply looking at exclude_host can't differentiate between events created by host userspace, and events created by KVM on behalf of the guest. Running the guest with PEBS unexpectedly enabled typically manifests as crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when trying to record PEBS events. The issue is most easily reproduced by running `perf kvm top` from before commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after which, `perf kvm top` effectively stopped using PEBS). The userspace side of perf creates a guest-only PEBS event, which intel_guest_get_msrs() misconstrues a guest-*owned* PEBS event. Arguably, this is a userspace bug, as enabling PEBS on guest-only events simply cannot work, and userspace can kill VMs in many other ways (there is no danger to the host). However, even if this is considered to be bad userspace behavior, there's zero downside to perf/KVM restricting PEBS to guest-owned events. Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") fixed the case where host userspace is profiling KVM *and* userspace, but missed the case where userspace is profiling only KVM. Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon Reported-by: Seth Forshee <sforshee@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: "Seth Forshee (DigitalOcean)" <sforshee@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250426001355.1026530-1-seanjc@google.com
2025-04-30Merge branch 'pds_core-cleanups'David S. Miller
Shannon Nelson says: ==================== pds_core: small code updates These are a few little code touch ups for a kdoc complaint, quicker error detection, and a cleaner initialization. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30pds_core: init viftype default in declarationShannon Nelson
Initialize the .enabled field of the FWCTL viftype default in the declaration rather than as a bit of code as it is always to be enabled and needs no logic around it. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30pds_core: smaller adminq poll starting intervalShannon Nelson
Shorten the adminq poll starting interval in order to notice any transaction errors more quickly. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30pds_core: remove extra name descriptionShannon Nelson
Fix the kernel-doc complaint include/linux/pds/pds_adminq.h:481: warning: Excess struct member 'name' description in 'pds_core_lif_getattr_comp' Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-04-30ALSA: hda/realtek: Fix built-mic regression on other ASUS modelsTakashi Iwai
A few ASUS models use the ALC256_FIXUP_ASUS_HEADSET_MODE although they have no built-in mic pin on NID 0x13, as found in the commit c1732ede5e80 ("ALSA: hda/realtek - Fix headset and mic on several Asus laptops with ALC256"). This was relatively harmless in the past as NID 0x13 was assigned as the secondary mic. But since the fix for the pin sort order, this pin became the primary one, hence user started noticing the broken input, and we've fixed already for a few ASUS models to switch to ALC256_FIXUP_ASUS_MIC_NO_PRESENCE. This patch corrects the other ASUS models to use the right quirk entry for fixing the built-in mic regression. Here we cover X541SA (1043:12e0), X541UV (1043:12f0), Z550SA (1043:13bf0) and X555UB (1043:1ccd). Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort") Link: https://bugzilla.kernel.org/show_bug.cgi?id=220058 Link: https://patch.msgid.link/20250430053210.31776-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-04-30erofs: remove unused enum typeHongbo Li
Opt_err is not used in EROFS, we can remove it. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Link: https://lore.kernel.org/r/20250429075056.689570-1-lihongbo22@huawei.com Signed-off-by: Gao Xiang <xiang@kernel.org>
2025-04-29Merge tag 'v6.15-p6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression in scompress" * tag 'v6.15-p6' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: scompress - increment scomp_scratch_users when already allocated
2025-04-30spi: spi-qpic-snand: fix NAND_READ_LOCATION_2 register handlingGabor Juhos
The precomputed value for the NAND_READ_LOCATION_2 register should be stored in 'snandc->regs->read_location2'. Fix the qcom_spi_set_read_loc_first() function accordingly. Fixes: 7304d1909080 ("spi: spi-qpic: add driver for QCOM SPI NAND flash Interface") Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Reviewed-by: Md Sadre Alam <quic_mdalam@quicinc.com> Link: https://patch.msgid.link/20250428-qpic-snand-readloc2-fix-v1-1-50ce0877ff72@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-30ASoC: Intel: catpt: avoid type mismatch in dev_dbg() formatArnd Bergmann
Depending on the architecture __ffs() returns either an 'unsigned long' or 'unsigned int' result. Compile-testing this driver on targets that use the latter produces a warning: sound/soc/intel/catpt/dsp.c: In function 'catpt_dsp_set_srampge': sound/soc/intel/catpt/dsp.c:181:44: error: format '%ld' expects argument of type 'long int', but argument 4 has type 'u32' {aka 'unsigned int'} [-Werror=format=] 181 | dev_dbg(cdev->dev, "sanitize block %ld: off 0x%08x\n", | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Change the type of the local variable to match the format string and avoid the warning on any architecture. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Cezary Rojewski <cezary.rojewski@intel.com> Link: https://patch.msgid.link/20250429073545.3558494-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-04-29Merge tag 'nf-next-25-04-29' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next: 1) Replace msecs_to_jiffies() by secs_to_jiffies(), from Easwar Hariharan. 2) Allow to compile xt_cgroup with cgroupsv2 support only, from Michal Koutny. 3) Prepare for sock_cgroup_classid() removal by wrapping it around ifdef, also from Michal Koutny. 4) Remove redundant pointer fetch on conntrack template, from Xuanqiang Luo. 5) Re-format one block in the tproxy documentation for consistency, from Chen Linxuan. 6) Expose set element count and type via netlink attributes, from Florian Westphal. * tag 'nf-next-25-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: export set count and backend name to userspace docs: tproxy: fix formatting for nft code block netfilter: conntrack: Remove redundant NFCT_ALIGN call net: cgroup: Guard users of sock_cgroup_classid() netfilter: xt_cgroup: Make it independent from net_cls netfilter: xt_IDLETIMER: convert timeouts to secs_to_jiffies() ==================== Link: https://patch.msgid.link/20250428221254.3853-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29idpf: remove unreachable code from setting mailboxMichal Swiatkowski
Remove code that isn't reached. There is no need to check for adapter->req_vec_chunks, because if it isn't set idpf_set_mb_vec_id() won't be called. Only one path when idpf_set_mb_vec_id() is called: idpf_intr_req() -> idpf_send_alloc_vectors_msg() -> adapter->req_vec_chunk is allocated here, otherwise an error is returned and idpf_intr_req() exits with an error. The idpf_set_mb_vec_id() becomes one-liner and it is called only once. Remove it and set mailbox vector index directly. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Reviewed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29tools: ynl: fix typo in info stringRuben Wauters
replaces formmated with formatted also corrects grammar by replacing a with an, and capitalises RST Signed-off-by: Ruben Wauters <rubenru09@aol.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250428215541.6029-1-rubenru09@aol.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29idpf: assign extracted ptype to struct libeth_rqe_info fieldMateusz Polchlopek
Assign the ptype extracted from qword to the ptype field of struct libeth_rqe_info. Remove the now excess ptype param of idpf_rx_singleq_extract_fields(), idpf_rx_singleq_extract_base_fields() and idpf_rx_singleq_extract_flex_fields(). Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29ixgbe: devlink: add devlink region support for E610Slawomir Mrozowicz
Provide support for the following devlink cmds: -DEVLINK_CMD_REGION_GET -DEVLINK_CMD_REGION_NEW -DEVLINK_CMD_REGION_DEL -DEVLINK_CMD_REGION_READ ixgbe devlink region implementation, similarly to the ice one, lets user to create snapshots of content of Non Volatile Memory, content of Shadow RAM, and capabilities of the device. For both NVM and SRAM regions provide .read() handler to let user read their contents without the need to create full snapshots. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Co-developed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29ixgbe: add E610 .set_phys_id() callback implementationJedrzej Jagielski
Legacy implementation of .set_phys_id() ethtool callback is not applicable for E610 device. Add new implementation which uses 0x06E9 command by calling ixgbe_aci_set_port_id_led(). Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29ixgbe: apply different rules for setting FC on E610Jedrzej Jagielski
E610 device doesn't support disabling FC autonegotiation. Create dedicated E610 .set_pauseparam() implementation and assign it to ixgbe_ethtool_ops_e610. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29ixgbe: add support for ACPI WOL for E610Jedrzej Jagielski
Currently only APM (Advanced Power Management) is supported by the ixgbe driver. It works for magic packets only, as for different sources of wake-up E610 adapter utilizes different feature. Add E610 specific implementation of ixgbe_set_wol() callback. When any of broadcast/multicast/unicast wake-up is set, disable APM and configure ACPI (Advanced Configuration and Power Interface). Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29ixgbe: create E610 specific ethtool_ops structureJedrzej Jagielski
E610's implementation of various ethtool ops is different than the ones corresponding to ixgbe legacy products. Therefore create separate E610 ethtool_ops struct which will be filled out in the forthcoming patches. Add adequate ops struct basing on MAC type. This step requires changing a bit the flow of probing by placing ixgbe_set_ethtool_ops after hw.mac.type is assigned. So move the whole netdev assignment block after hw.mac.type is known. This step doesn't have any additional impact on probing sequence. Suggested-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Bharath R <bharath.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29igc: Change Tx mode for MQPRIO offloadingKurt Kanzenbach
The current MQPRIO offload implementation uses the legacy TSN Tx mode. In this mode the hardware uses four packet buffers and considers queue priorities. In order to harmonize the TAPRIO implementation with MQPRIO, switch to the regular TSN Tx mode. This mode also uses four packet buffers and considers queue priorities. In addition to the legacy mode, transmission is always coupled to Qbv. The driver already has mechanisms to use a dummy schedule of 1 second with all gates open for ETF. Simply use this for MQPRIO too. This reduces code and makes it easier to add support for frame preemption later. Tested on i225 with real time application using high priority queue, iperf3 using low priority queue and network TAP device. Acked-by: Faizal Rahim <faizal.abdul.rahim@linux.intel.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29igc: Limit netdev_tc calls to MQPRIOKurt Kanzenbach
Limit netdev_tc calls to MQPRIO. Currently these calls are made in igc_tsn_enable_offload() and igc_tsn_disable_offload() which are used by TAPRIO and ETF as well. However, these are only required for MQPRIO. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29igb: Get rid of spurious interruptsKurt Kanzenbach
When running the igc with XDP/ZC in busy polling mode with deferral of hard interrupts, interrupts still happen from time to time. That is caused by the igb task watchdog which triggers Rx interrupts periodically. That mechanism has been introduced to overcome skb/memory allocation failures [1]. So the Rx clean functions stop processing the Rx ring in case of such failure. The task watchdog triggers Rx interrupts periodically in the hope that memory became available in the mean time. The current behavior is undesirable for real time applications, because the driver induced Rx interrupts trigger also the softirq processing. However, all real time packets should be processed by the application which uses the busy polling method. Therefore, only trigger the Rx interrupts in case of real allocation failures. Introduce a new flag for signaling that condition. Follow the same logic as in commit 8dcf2c212078 ("igc: Get rid of spurious interrupts"). [1] - https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=3be507547e6177e5c808544bd6a2efa2c7f1d436 Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Sweta Kumari <sweta.kumari@intel.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29igb: Add support for persistent NAPI configKurt Kanzenbach
Use netif_napi_add_config() to assign persistent per-NAPI config. This is useful for preserving NAPI settings when changing queue counts or for user space programs using SO_INCOMING_NAPI_ID. Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29igb: Link queues to NAPI instancesKurt Kanzenbach
Link queues to NAPI instances via netdev-genl API. This is required to use XDP/ZC busy polling. See commit 5ef44b3cb43b ("xsk: Bring back busy polling support") for details. This also allows users to query the info with netlink: |$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ | --dump queue-get --json='{"ifindex": 2}' |[{'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'rx'}, | {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'rx'}, | {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'rx'}, | {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'rx'}, | {'id': 0, 'ifindex': 2, 'napi-id': 8201, 'type': 'tx'}, | {'id': 1, 'ifindex': 2, 'napi-id': 8202, 'type': 'tx'}, | {'id': 2, 'ifindex': 2, 'napi-id': 8203, 'type': 'tx'}, | {'id': 3, 'ifindex': 2, 'napi-id': 8204, 'type': 'tx'}] Add rtnl locking to PCI error handlers, because netif_queue_set_napi() requires the lock held. While at __igb_open() use RCT coding style. Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Joe Damato <jdamato@fastly.com> Tested-by: Sweta Kumari <sweta.kumari@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29igb: Link IRQs to NAPI instancesKurt Kanzenbach
Link IRQs to NAPI instances via netdev-genl API. This allows users to query that information via netlink: |$ ./tools/net/ynl/pyynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ | --dump napi-get --json='{"ifindex": 2}' |[{'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8204, | 'ifindex': 2, | 'irq': 127, | 'irq-suspend-timeout': 0}, | {'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8203, | 'ifindex': 2, | 'irq': 126, | 'irq-suspend-timeout': 0}, | {'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8202, | 'ifindex': 2, | 'irq': 125, | 'irq-suspend-timeout': 0}, | {'defer-hard-irqs': 0, | 'gro-flush-timeout': 0, | 'id': 8201, | 'ifindex': 2, | 'irq': 124, | 'irq-suspend-timeout': 0}] |$ cat /proc/interrupts | grep enp2s0 |123: 0 1 IR-PCI-MSIX-0000:02:00.0 0-edge enp2s0 |124: 0 7 IR-PCI-MSIX-0000:02:00.0 1-edge enp2s0-TxRx-0 |125: 0 0 IR-PCI-MSIX-0000:02:00.0 2-edge enp2s0-TxRx-1 |126: 0 5 IR-PCI-MSIX-0000:02:00.0 3-edge enp2s0-TxRx-2 |127: 0 0 IR-PCI-MSIX-0000:02:00.0 4-edge enp2s0-TxRx-3 Reviewed-by: Joe Damato <jdamato@fastly.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-29net: phy: aquantia: fix commenting formatAryan Srivastava
Comment was erroneously added with /**, amend this to use /* as it is not a kernel-doc. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202504262247.1UBrDBVN-lkp@intel.com/ Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250428214920.813038-1-aryan.srivastava@alliedtelesis.co.nz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29net: ipv6: fix UDPv6 GSO segmentation with NATFelix Fietkau
If any address or port is changed, update it in all packets and recalculate checksum. Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250426153210.14044-1-nbd@nbd.name Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-29Merge branch 'fix-felix-dsa-taprio-gates-after-clock-jump'Jakub Kicinski
Vladimir Oltean says: ==================== Fix Felix DSA taprio gates after clock jump Richie Pearn presented a reproducible situation where traffic would get blocked on the NXP LS1028A switch if a certain taprio schedule was applied, and stepping the PTP clock would take place. The latter event is an expected initial occurrence, but also at runtime, for example when transitioning from one grandmaster to another. The issue is completely described in patch 1/4, which also contains the fix, but it has left me with some doubts regarding the need for vsc9959_tas_clock_adjust() in general. In order to prove to myself that vsc9959_tas_clock_adjust() is needed in general, I have written a selftest for the tc-taprio data path in patch 4/4. On the LS1028A, we can clearly see the following failures without that function: INFO: Forcing a backward clock jump TEST: ping [FAIL] INFO: Setting up taprio after PTP TEST: In band with gate [FAIL] Reception of 100 packets failed TEST: Out of band with gate [FAIL] Reception of 100 packets failed As for testing my fix from patch 1/4, that was quite a bit more complex to do automatically. In fact, I couldn't find any other schedule that would fail to be updated by vsc9959_tas_clock_adjust() as cleanly as the schedule from Richie, so I've added that specific schedule as the test_clock_jump_backward() test. The test ordering is also (unfortunately) very strategic. Running the selftest to the end dirties the GCL RAM, and when running test_clock_jump_backward() once again, the GCL entries won't be all zeroes as they were the first time around. They will contain bits and pieces of old schedules, making it very challenging to make it fail. Thus, test_clock_jump_backward() is the first in the test suite, and without patch 1/4, it is only supposed to fail the _first_ time when running after a clean boot. ==================== Link: https://patch.msgid.link/20250426144859.3128352-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>