summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-15mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vmaLiu Shixin
syzkaller reported such a BUG_ON(): ------------[ cut here ]------------ kernel BUG at mm/khugepaged.c:1835! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP ... CPU: 6 UID: 0 PID: 8009 Comm: syz.15.106 Kdump: loaded Tainted: G W 6.13.0-rc6 #22 Tainted: [W]=WARN Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : collapse_file+0xa44/0x1400 lr : collapse_file+0x88/0x1400 sp : ffff80008afe3a60 ... Call trace: collapse_file+0xa44/0x1400 (P) hpage_collapse_scan_file+0x278/0x400 madvise_collapse+0x1bc/0x678 madvise_vma_behavior+0x32c/0x448 madvise_walk_vmas.constprop.0+0xbc/0x140 do_madvise.part.0+0xdc/0x2c8 __arm64_sys_madvise+0x68/0x88 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x34/0x128 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x190/0x198 This indicates that the pgoff is unaligned. After analysis, I confirm the vma is mapped to /dev/zero. Such a vma certainly has vm_file, but it is set to anonymous by mmap_zero(). So even if it's mmapped by 2m-unaligned, it can pass the check in thp_vma_allowable_order() as it is an anonymous-mmap, but then be collapsed as a file-mmap. It seems the problem has existed for a long time, but actually, since we have khugepaged_max_ptes_none check before, we will skip collapse it as it is /dev/zero and so has no present page. But commit d8ea7cc8547c limit the check for only khugepaged, so the BUG_ON() can be triggered by madvise_collapse(). Add vma_is_anonymous() check to make such vma be processed by hpage_collapse_scan_pmd(). Link: https://lkml.kernel.org/r/20250111034511.2223353-1-liushixin2@huawei.com Fixes: d8ea7cc8547c ("mm/khugepaged: add flag to predicate khugepaged-only behavior") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Mattew Wilcox <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-15mm: shmem: use signed int for version handling in casefold optionKaran Sanghavi
Fixes an issue where the use of an unsigned data type in `shmem_parse_opt_casefold()` caused incorrect evaluation of negative conditions. Link: https://lkml.kernel.org/r/20250111-unsignedcompare1601569-v3-1-c861b4221831@gmail.com Fixes: 58e55efd6c72 ("tmpfs: Add casefold lookup support") Reviewed-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Karan Sanghavi <karansanghvi98@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Hugh Dickens <hughd@google.com> Cc: Shuah khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-15alloc_tag: skip pgalloc_tag_swap if profiling is disabledSuren Baghdasaryan
When memory allocation profiling is disabled, there is no need to swap allocation tags during migration. Skip it to avoid unnecessary overhead. Once I added these checks, the overhead of the mode when memory profiling is enabled but turned off went down by about 50%. Link: https://lkml.kernel.org/r/20241226211639.1357704-2-surenb@google.com Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Cc: David Wang <00107082@163.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-15mm: page_alloc: fix missed updates of lowmem_reserve in ↵zihan zhou
adjust_managed_page_count In the kernel, the zone's lowmem_reserve and _watermark, and the global variable 'totalreserve_pages' depend on the value of managed_pages, but after running adjust_managed_page_count, these values aren't updated, which causes some problems. For example, in a system with six 1GB large pages, we found that the value of protection in zoneinfo (zone->lowmem_reserve), is not right. Its value seems to be calculated from the initial managed_pages, but after the managed_pages changed, was not updated. Only after reading the file /proc/sys/vm/lowmem_reserve_ratio, updates happen. read file /proc/sys/vm/lowmem_reserve_ratio: lowmem_reserve_ratio_sysctl_handler ----setup_per_zone_lowmem_reserve --------calculate_totalreserve_pages protection changed after reading file: [root@test ~]# cat /proc/zoneinfo | grep protection protection: (0, 2719, 57360, 0) protection: (0, 0, 54640, 0) protection: (0, 0, 0, 0) protection: (0, 0, 0, 0) [root@test ~]# cat /proc/sys/vm/lowmem_reserve_ratio 256 256 32 0 [root@test ~]# cat /proc/zoneinfo | grep protection protection: (0, 2735, 63524, 0) protection: (0, 0, 60788, 0) protection: (0, 0, 0, 0) protection: (0, 0, 0, 0) lowmem_reserve increased also makes the totalreserve_pages increased, which causes a decrease in available memory. The one above is just a test machine, and the increase is not significant. On our online machine, the reserved memory will increase by several GB due to reading this file. It is clearly unreasonable to cause a sharp drop in available memory just by reading a file. In this patch, we update reserve memory when update managed_pages, The size of reserved memory becomes stable. But it seems that the _watermark should also be updated along with the managed_pages. We have not done it because we are unsure if it is reasonable to set the watermark through the initial managed_pages. If it is not reasonable, we will propose new patch. Link: https://lkml.kernel.org/r/20241225021034.45693-1-15645113830zzh@gmail.com Signed-off-by: zihan zhou <15645113830zzh@gmail.com> Signed-off-by: yaowenchao <yaowenchao@jd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-15net: make page_pool_ref_netmem work with net iovsPavel Begunkov
page_pool_ref_netmem() should work with either netmem representation, but currently it casts to a page with netmem_to_page(), which will fail with net iovs. Use netmem_get_pp_ref_count_ref() instead. Fixes: 8ab79ed50cf1 ("page_pool: devmem support") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://lore.kernel.org/20250108220644.3528845-2-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16Merge tag 'drm-misc-fixes-2025-01-15' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v6.13: - itee-it6263 error handling fix. - Fix warn when unloading v3d. - Fix W=1 build for kunit tests. - Fix backlight regression for macbooks 5,1 in nouveau. - Handle YCbCr420 better in bridge code, with tests. - Fix cross-device fence handling in nouveau. - Fix BO reservation handling in vmwgfx. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/a89adcd5-2042-4e7f-93f4-2b299bb1ef17@linux.intel.com
2025-01-15smb: client: fix double free of TCP_Server_Info::hostnamePaulo Alcantara
When shutting down the server in cifs_put_tcp_session(), cifsd thread might be reconnecting to multiple DFS targets before it realizes it should exit the loop, so @server->hostname can't be freed as long as cifsd thread isn't done. Otherwise the following can happen: RIP: 0010:__slab_free+0x223/0x3c0 Code: 5e 41 5f c3 cc cc cc cc 4c 89 de 4c 89 cf 44 89 44 24 08 4c 89 1c 24 e8 fb cf 8e 00 44 8b 44 24 08 4c 8b 1c 24 e9 5f fe ff ff <0f> 0b 41 f7 45 08 00 0d 21 00 0f 85 2d ff ff ff e9 1f ff ff ff 80 RSP: 0018:ffffb26180dbfd08 EFLAGS: 00010246 RAX: ffff8ea34728e510 RBX: ffff8ea34728e500 RCX: 0000000000800068 RDX: 0000000000800068 RSI: 0000000000000000 RDI: ffff8ea340042400 RBP: ffffe112041ca380 R08: 0000000000000001 R09: 0000000000000000 R10: 6170732e31303000 R11: 70726f632e786563 R12: ffff8ea34728e500 R13: ffff8ea340042400 R14: ffff8ea34728e500 R15: 0000000000800068 FS: 0000000000000000(0000) GS:ffff8ea66fd80000(0000) 000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc25376080 CR3: 000000012a2ba001 CR4: PKRU: 55555554 Call Trace: <TASK> ? show_trace_log_lvl+0x1c4/0x2df ? show_trace_log_lvl+0x1c4/0x2df ? __reconnect_target_unlocked+0x3e/0x160 [cifs] ? __die_body.cold+0x8/0xd ? die+0x2b/0x50 ? do_trap+0xce/0x120 ? __slab_free+0x223/0x3c0 ? do_error_trap+0x65/0x80 ? __slab_free+0x223/0x3c0 ? exc_invalid_op+0x4e/0x70 ? __slab_free+0x223/0x3c0 ? asm_exc_invalid_op+0x16/0x20 ? __slab_free+0x223/0x3c0 ? extract_hostname+0x5c/0xa0 [cifs] ? extract_hostname+0x5c/0xa0 [cifs] ? __kmalloc+0x4b/0x140 __reconnect_target_unlocked+0x3e/0x160 [cifs] reconnect_dfs_server+0x145/0x430 [cifs] cifs_handle_standard+0x1ad/0x1d0 [cifs] cifs_demultiplex_thread+0x592/0x730 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0xdd/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> Fixes: 7be3248f3139 ("cifs: To match file servers, make sure the server hostname matches") Reported-by: Jay Shin <jaeshin@redhat.com> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-01-15hwmon: (ltc2991) Fix mixed signed/unsigned in DIV_ROUND_CLOSESTDavid Lechner
Fix use of DIV_ROUND_CLOSEST where a possibly negative value is divided by an unsigned type by casting the unsigned type to the signed type of the same size (st->r_sense_uohm[channel] has type of u32). The docs on the DIV_ROUND_CLOSEST macro explain that dividing a negative value by an unsigned type is undefined behavior. The actual behavior is that it converts both values to unsigned before doing the division, for example: int ret = DIV_ROUND_CLOSEST(-100, 3U); results in ret == 1431655732 instead of -33. Fixes: 2b9ea4262ae9 ("hwmon: Add driver for ltc2991") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20250115-hwmon-ltc2991-fix-div-round-closest-v1-1-b4929667e457@baylibre.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-01-15net: ethernet: xgbe: re-add aneg to supported features in PHY quirksHeiner Kallweit
In 4.19, before the switch to linkmode bitmaps, PHY_GBIT_FEATURES included feature bits for aneg and TP/MII ports. SUPPORTED_TP | \ SUPPORTED_MII) SUPPORTED_10baseT_Full) SUPPORTED_100baseT_Full) SUPPORTED_1000baseT_Full) PHY_100BT_FEATURES | \ PHY_DEFAULT_FEATURES) PHY_1000BT_FEATURES) Referenced commit expanded PHY_GBIT_FEATURES, silently removing PHY_DEFAULT_FEATURES. The removed part can be re-added by using the new PHY_GBIT_FEATURES definition. Not clear to me is why nobody seems to have noticed this issue. I stumbled across this when checking what it takes to make phy_10_100_features_array et al private to phylib. Fixes: d0939c26c53a ("net: ethernet: xgbe: expand PHY_GBIT_FEAUTRES") Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/46521973-7738-4157-9f5e-0bb6f694acba@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: pcs: xpcs: actively unset DW_VR_MII_DIG_CTRL1_2G5_EN for 1G SGMIIVladimir Oltean
xpcs_config_2500basex() sets DW_VR_MII_DIG_CTRL1_2G5_EN, but xpcs_config_aneg_c37_sgmii() never unsets it. So, on a protocol change from 2500base-x to sgmii, the DW_VR_MII_DIG_CTRL1_2G5_EN bit will remain set. Fixes: f27abde3042a ("net: pcs: add 2500BASEX support for Intel mGbE controller") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250114164721.2879380-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15net: pcs: xpcs: fix DW_VR_MII_DIG_CTRL1_2G5_EN bit being set for 1G SGMII ↵Vladimir Oltean
w/o inband On a port with SGMII fixed-link at SPEED_1000, DW_VR_MII_DIG_CTRL1 gets set to 0x2404. This is incorrect, because bit 2 (DW_VR_MII_DIG_CTRL1_2G5_EN) is set. It comes from the previous write to DW_VR_MII_AN_CTRL, because the "val" variable is reused and is dirty. Actually, its value is 0x4, aka FIELD_PREP(DW_VR_MII_PCS_MODE_MASK, DW_VR_MII_PCS_MODE_C37_SGMII). Resolve the issue by clearing "val" to 0 when writing to a new register. After the fix, the register value is 0x2400. Prior to the blamed commit, when the read-modify-write was open-coded, the code saved the content of the DW_VR_MII_DIG_CTRL1 register in the "ret" variable. Fixes: ce8d6081fcf4 ("net: pcs: xpcs: add _modify() accessors") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20250114164721.2879380-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15genirq/timings: Add kernel-doc for a function parameterRandy Dunlap
Add the description for @now to eliminate a kernel-doc warning. timings.c:537: warning: Function parameter or struct member 'now' not described in 'irq_timings_next_event' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250111062954.910657-1-rdunlap@infradead.org
2025-01-15genirq: Remove IRQ_MOVE_PCNTXT and related codeThomas Gleixner
Now that x86 is converted over to use the IRQCHIP_MOVE_DEFERRED flags, remove IRQ*_MOVE_PCNTXT and related code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210103335.626707225@linutronix.de
2025-01-15x86/apic: Convert to IRQCHIP_MOVE_DEFERREDThomas Gleixner
Instead of marking individual interrupts as safe to be migrated in arbitrary contexts, mark the interrupt chips, which require the interrupt to be moved in actual interrupt context, with the new IRQCHIP_MOVE_DEFERRED flag. This makes more sense because this is a per interrupt chip property and not restricted to individual interrupts. That flips the logic from the historical opt-out to a opt-in model. This is simpler to handle for other architectures, which default to unrestricted affinity setting. It also allows to cleanup the redundant core logic significantly. All interrupt chips, which belong to a top-level domain sitting directly on top of the x86 vector domain are marked accordingly, unless the related setup code marks the interrupts with IRQ_MOVE_PCNTXT, i.e. XEN. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/all/20241210103335.563277044@linutronix.de
2025-01-15i2c: testunit: on errors, repeat NACK until STOPWolfram Sang
This backend requests a NACK from the controller driver when it detects an error. If that request gets ignored from some reason, subsequent accesses will wrongly be handled OK. To fix this, an error now changes the state machine, so the backend will report NACK until a STOP condition has been detected. This make the driver more robust against controllers which will sadly apply the NACK not to the current byte but the next one. Fixes: a8335c64c5f0 ("i2c: add slave testunit driver") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-01-15i2c: rcar: fix NACK handling when being a targetWolfram Sang
When this controller is a target, the NACK handling had two issues. First, the return value from the backend was not checked on the initial WRITE_REQUESTED. So, the driver missed to send a NACK in this case. Also, the NACK always arrives one byte late on the bus, even in the WRITE_RECEIVED case. This seems to be a HW issue. We should then not rely on the backend to correctly NACK the superfluous byte as well. Fix both issues by introducing a flag which gets set whenever the backend requests a NACK and keep sending it until we get a STOP condition. Fixes: de20d1857dd6 ("i2c: rcar: add slave support") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-01-15i2c: mux: demux-pinctrl: correct commentWolfram Sang
Two characters flipped, fix them. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-01-15i2c: mux: demux-pinctrl: check initial mux selection, tooWolfram Sang
When misconfigured, the initial setup of the current mux channel can fail, too. It must be checked as well. Fixes: 50a5ba876908 ("i2c: mux: demux-pinctrl: add driver") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-01-15Revert "mtd: spi-nor: core: replace dummy buswidth from addr to data"Pratyush Yadav
This reverts commit 98d1fb94ce75f39febd456d6d3cbbe58b6678795. The commit uses data nbits instead of addr nbits for dummy phase. This causes a regression for all boards where spi-tx-bus-width is smaller than spi-rx-bus-width. It is a common pattern for boards to have spi-tx-bus-width == 1 and spi-rx-bus-width > 1. The regression causes all reads with a dummy phase to become unavailable for such boards, leading to a usually slower 0-dummy-cycle read being selected. Most controllers' supports_op hooks call spi_mem_default_supports_op(). In spi_mem_default_supports_op(), spi_mem_check_buswidth() is called to check if the buswidths for the op can actually be supported by the board's wiring. This wiring information comes from (among other things) the spi-{tx,rx}-bus-width DT properties. Based on these properties, SPI_TX_* or SPI_RX_* flags are set by of_spi_parse_dt(). spi_mem_check_buswidth() then uses these flags to make the decision whether an op can be supported by the board's wiring (in a way, indirectly checking against spi-{rx,tx}-bus-width). Now the tricky bit here is that spi_mem_check_buswidth() does: if (op->dummy.nbytes && spi_check_buswidth_req(mem, op->dummy.buswidth, true)) return false; The true argument to spi_check_buswidth_req() means the op is treated as a TX op. For a board that has say 1-bit TX and 4-bit RX, a 4-bit dummy TX is considered as unsupported, and the op gets rejected. The commit being reverted uses the data buswidth for dummy buswidth. So for reads, the RX buswidth gets used for the dummy phase, uncovering this issue. In reality, a dummy phase is neither RX nor TX. As the name suggests, these are just dummy cycles that send or receive no data, and thus don't really need to have any buswidth at all. Ideally, dummy phases should not be checked against the board's wiring capabilities at all, and should only be sanity-checked for having a sane buswidth value. Since we are now at rc7 and such a change might introduce many unexpected bugs, revert the commit for now. It can be sent out later along with the spi_mem_check_buswidth() fix. Fixes: 98d1fb94ce75 ("mtd: spi-nor: core: replace dummy buswidth from addr to data") Reported-by: Alexander Stein <alexander.stein@ew.tq-group.com> Closes: https://lore.kernel.org/linux-mtd/3342163.44csPzL39Z@steina-w/ Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Pratyush Yadav <pratyush@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2025-01-15signal/posixtimers: Handle ignore/blocked sequences correctlyThomas Gleixner
syzbot triggered the warning in posixtimer_send_sigqueue(), which warns about a non-ignored signal being already queued on the ignored list. The warning is actually bogus, as the following sequence causes this: signal($SIG, SIGIGN); timer_settime(...); // arm periodic timer timer fires, signal is ignored and queued on ignored list sigprocmask(SIG_BLOCK, ...); // block the signal timer_settime(...); // re-arm periodic timer timer fires, signal is not ignored because it is blocked ---> Warning triggers as signal is on the ignored list Ideally timer_settime() could remove the signal, but that's racy and incomplete vs. other scenarios and requires a full reevaluation of the pending signal list. Instead of adding more complexity, handle it gracefully by removing the warning and requeueing the signal to the pending list. That's correct versus: 1) sig[timed]wait() as that does not check for SIGIGN and only relies on dequeue_signal() -> posixtimers_deliver_signal() to check whether the pending signal is still valid. 2) Unblocking of the signal. - If the unblocking happens before SIGIGN is replaced by a signal handler, then the timer is rearmed in dequeue_signal(), but get_signal() will ignore it. The next timer expiry will move it back to the ignored list. - If SIGIGN was replaced before unblocking, then the signal will be delivered and a subsequent expiry will queue a signal on the pending list again. There is a related scenario to trigger the complementary warning in the signal ignored path, which does not expect the signal to be on the pending list when it is ignored. That can be triggered even before the above change via: task1 task2 signal($SIG, SIGIGN); sigprocmask(SIG_BLOCK, ...); timer_create(); // Signal target is task2 timer_settime(...); // arm periodic timer timer fires, signal is not ignored because it is blocked and queued on the pending list of task2 syscall() // Sets the pending flag sigprocmask(SIG_UNBLOCK, ...); -> preemption, task2 cannot dequeue the signal timer_settime(...); // re-arm periodic timer timer fires, signal is ignored ---> Warning triggers as signal is on task2's pending list and the thread group is not exiting Consequently, remove that warning too and just keep the signal on the pending list. The following attempt to deliver the signal on return to user space of task2 will ignore the signal and a subsequent expiry will bring it back to the ignored list, if it did not get blocked or un-ignored before that. Fixes: df7a996b4dab ("signal: Queue ignored posixtimers on ignore list") Reported-by: syzbot+3c2e3cc60665d71de2f7@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/87ikqhcnjn.ffs@tglx
2025-01-15io_uring/register: cache old SQ/CQ head reading for copiesJens Axboe
The SQ and CQ ring heads are read twice - once for verifying that it's within bounds, and once inside the loops copying SQE and CQE entries. This is technically incorrect, in case the values could get modified in between verifying them and using them in the copy loop. While this won't lead to anything truly nefarious, it may cause longer loop times for the copies than expected. Read the ring head values once, and use the verified value in the copy loops. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-15io_uring/register: document io_register_resize_rings() shared mem usageJens Axboe
It can be a bit hard to tell which parts of io_register_resize_rings() are operating on shared memory, and which ones are not. And anything reading or writing to those regions should really use the read/write once primitives. Hence add those, ensuring sanity in how this memory is accessed, and helping document the shared nature of it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-15Merge tag 'ti-driver-soc-for-v6.14' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux into arm/fixes TI SoC driver updates for v6.14 - Build fixup when CONFIG_TI_PRUSS is disabled.
2025-01-15io_uring/register: use stable SQ/CQ ring data during resizeJens Axboe
Normally the kernel would not expect an application to modify any of the data shared with the kernel during a resize operation, but of course the kernel cannot always assume good intent on behalf of the application. As part of resizing the rings, existing SQEs and CQEs are copied over to the new storage. Resizing uses the masks in the newly allocated shared storage to index the arrays, however it's possible that malicious userspace could modify these after they have been sanity checked. Use the validated and locally stored CQ and SQ ring sizing for masking to ensure the values are both stable and valid. Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-15hwmon: (drivetemp) Set scsi command timeout to 10sRussell Harmon
There's at least one drive (MaxDigitalData OOS14000G) such that if it receives a large amount of I/O while entering an idle power state will first exit idle before responding, including causing SMART temperature requests to be delayed. This causes the drivetemp request to exceed its timeout of 1 second. Signed-off-by: Russell Harmon <russ@har.mn> Link: https://lore.kernel.org/r/20250115131340.3178988-1-russ@har.mn Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-01-15hwmon: (acpi_power_meter) Fix a check for the return value of ↵Kazuhiro Abe
read_domain_devices(). After commit fabb1f813ec0 ("hwmon: (acpi_power_meter) Fix fail to load module on platform without _PMD method"), the acpi_power_meter driver fails to load if the platform has _PMD method. To address this, add a check for successful read_domain_devices(). Tested on Nvidia Grace machine. Fixes: fabb1f813ec0 ("hwmon: (acpi_power_meter) Fix fail to load module on platform without _PMD method") Signed-off-by: Kazuhiro Abe <fj1078ii@aa.jp.fujitsu.com> Link: https://lore.kernel.org/r/20250115073532.3211000-1-fj1078ii@aa.jp.fujitsu.com [groeck: Dropped unnecessary () from expression] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2025-01-15Merge tag 'reset-fixes-for-v6.13' of git://git.pengutronix.de/pza/linux into ↵Arnd Bergmann
arm/fixes Reset controller fixes for v6.13 * Fix rzg2l-usb-vbus-regulator lookup by assigning the proper of node to the allocated platform device in the rzg2l-usbphy-ctrl driver. * tag 'reset-fixes-for-v6.13' of git://git.pengutronix.de/pza/linux: reset: rzg2l-usbphy-ctrl: Assign proper of node to the allocated device Link: https://lore.kernel.org/r/20250113163642.1757160-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-01-15drm/xe/oa: Add missing VISACTL mux registersAshutosh Dixit
Add missing VISACTL mux registers required for some OA config's (e.g. RenderPipeCtrl). Fixes: cdf02fe1a94a ("drm/xe/oa/uapi: Add/remove OA config perf ops") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250111021539.2920346-1-ashutosh.dixit@intel.com (cherry picked from commit c26f22dac3449d8a687237cdfc59a6445eb8f75a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe: make change ccs_mode a synchronous actionMaciej Patelczyk
If ccs_mode is being modified via /sys/class/drm/cardX/device/tileY/gtY/ccs_mode the asynchronous reset is triggered and the write returns immediately. With that some test receive false information about number of CCS engines or even fail if they proceed without delay after changing the ccs_mode. Changing the ccs_mode change from async to sync to prevent failures in tests. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211111727.1481476-3-maciej.patelczyk@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 480fb9806e2e073532f7786166287114c696b340) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe: introduce xe_gt_reset and xe_gt_wait_for_resetMaciej Patelczyk
Add synchronous version gt reset as there are few places where it is expected. Also add a wait helper to wait until gt reset is done. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211111727.1481476-2-maciej.patelczyk@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 155c77f45f63dd58a37eeb0896b0b140ab785836) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe/guc: Adding steering info support for GuC register listsJesus Narvaez
The guc_mmio_reg interface supports steering, but it is currently not implemented. This will allow the GuC to control steering of MMIO registers after save-restore and avoid reading from fused off MCR register instances. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com (cherry picked from commit ee5a1321df90891d59d83b7c9d5b6c5b755d059d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15genirq: Provide IRQCHIP_MOVE_DEFERREDThomas Gleixner
The logic of GENERIC_PENDING_IRQ is backwards for historical reasons. Most interrupt controllers allow to move the interrupt from arbitrary contexts. If GENERIC_PENDING_IRQ is enabled by an architecture to support a chip, which requires the affinity change to happen in interrupt context, all other chips have to be marked with IRQF_MOVE_PCNTXT. That's tedious and there is no real good reason for the extra flags in the irq descriptor and the irq data status fields. In fact the decision whether interrupts can be moved in arbitrary context or not is a property of the interrupt chip. To simplify adoption for RISC-V provide a new mechanism which is enabled via a config switch and allows to add a flag to irq_chip::flags to request that interrupt affinity changes are deferred. Setting the top level chip of an interrupt evaluates the flag and maps it into the existing logic. The config switch and the various PCNTXT flags are temporary until x86 is converted over to this scheme. This intermediate step also allows trivial backporting of the mechanism to plug the affinity change race of various RISC-V interrupt controllers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210103335.500314436@linutronix.de
2025-01-15hexagon: Remove GENERIC_PENDING_IRQ leftoverThomas Gleixner
Commented out since 2011.... Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Brian Cain <bcain@quicinc.com> Link: https://lore.kernel.org/all/20241210103335.437630614@linutronix.de
2025-01-15ARC: Remove GENERIC_PENDING_IRQThomas Gleixner
Nothing uses the actual functionality and the MCIP controller sets the flags which disables the deferred affinity change. The other interrupt controller does not support affinity setting at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Vineet Gupta <vgupta@kernel.org>   # arch/arc/ Link: https://lore.kernel.org/all/20241210103335.373392568@linutronix.de
2025-01-15genirq: Remove handle_enforce_irqctx() wrapperThomas Gleixner
Now that it is unconditionally available, remove the wrapper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210101811.561078243@linutronix.de
2025-01-15genirq: Make handle_enforce_irqctx() unconditionally availableThomas Gleixner
Commit 1b57d91b969c ("irqchip/gic-v2, v3: Prevent SW resends entirely") sett the flag which enforces interrupt handling in interrupt context and prevents software base resends for ARM GIC v2/v3. But it missed that the helper function which checks the flag was hidden behind CONFIG_GENERIC_PENDING_IRQ, which is not set by ARM[64]. Make the helper unconditionally available so that the enforcement actually works. Fixes: 1b57d91b969c ("irqchip/gic-v2, v3: Prevent SW resends entirely") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241210101811.497716609@linutronix.de
2025-01-15irqchip: Plug a OF node reference leak in platform_irqchip_probe()Joe Hattori
platform_irqchip_probe() leaks a OF node when irq_init_cb() fails. Fix it by declaring par_np with the __free(device_node) cleanup construct. This bug was found by an experimental static analysis tool that I am developing. Fixes: f8410e626569 ("irqchip: Add IRQCHIP_PLATFORM_DRIVER_BEGIN/END and IRQCHIP_MATCH helper macros") Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241215033945.3414223-1-joe@pf.is.s.u-tokyo.ac.jp
2025-01-15selftests: net: Adapt ethtool mq tests to fix in qdisc graftVictor Nogueira
Because of patch[1] the graft behaviour changed So the command: tcq replace parent 100:1 handle 204: Is no longer valid and will not delete 100:4 added by command: tcq replace parent 100:4 handle 204: pfifo_fast So to maintain the original behaviour, this patch manually deletes 100:4 and grafts 100:1 Note: This change will also work fine without [1] [1] https://lore.kernel.org/netdev/20250111151455.75480-1-jhs@mojatatu.com/T/#u Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-15irqchip/loongarch-avec: Add multi-nodes topology supportTianyang Zhang
avecintc_init() enables the Advanced Interrupt Controller (AVEC) of the boot CPU node, but nothing enables the AVEC on secondary nodes. Move the enablement to the CPU hotplug callback so that secondary nodes get the AVEC enabled too. In theory enabling it once per node would be sufficient, but redundant enabling does no hurt, so keep the code simple and do it unconditionally. Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Huacai Chen <chenhuacai@loongson.cn> Link: https://lore.kernel.org/all/20250111023704.17285-1-zhangtianyang@loongson.cn
2025-01-15irqchip/ts4800: Replace seq_printf() by seq_puts()Geert Uytterhoeven
Simplify "seq_printf(p, "%s", ...)" to "seq_puts(p, ...)". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/1ba5692126804f9e1ff062ac24939b24030b4f72.1733403985.git.geert+renesas@glider.be
2025-01-15irqchip/ti-sci-inta : Add module build supportNicolas Frayer
Add module build support in Kconfig for the TI SCI interrupt aggregator driver. The driver's default build is built-in and it also depends on ARCH_K3 as the driver uses some 64 bit ops and should only be built for 64-bit platforms. Signed-off-by: Nicolas Frayer <nfrayer@baylibre.com> Signed-off-by: Guillaume La Roque <glaroque@baylibre.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/all/20241224-timodules-v4-2-c5e010f58e2c@baylibre.com
2025-01-15irqchip/ti-sci-intr: Add module build supportNicolas Frayer
Add module build support in Kconfig for the TI SCI interrupt router driver. This driver depends on the TI sci firmware driver which aready supports module build. Signed-off-by: Nicolas Frayer <nfrayer@baylibre.com> Signed-off-by: Guillaume La Roque <glaroque@baylibre.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nishanth Menon <nm@ti.com> Link: https://lore.kernel.org/all/20241224-timodules-v4-1-c5e010f58e2c@baylibre.com
2025-01-15irqchip/irq-brcmstb-l2: Replace brcmstb_l2_mask_and_ack() by generic functionDr. David Alan Gilbert
Replace brcmstb_l2_mask_and_ack() by the generic irq_gc_mask_disable_and_ack_set(). brcmstb_l2_mask_and_ack() was added in commit 49aa6ef0b439 ("irqchip/brcmstb-l2: Remove some processing from the handler") in September 2017 with a comment saying it was actually generic and someone should add it to the generic code. commit 20608924cc2e ("genirq: generic chip: Add irq_gc_mask_disable_and_ack_set()") did that a few weeks later, however no one went back and took the brcmstb variant out. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/all/20241224001727.149337-1-linux@treblig.org
2025-01-15irqchip: keystone: Use syscon_regmap_lookup_by_phandle_argsKrzysztof Kozlowski
Use syscon_regmap_lookup_by_phandle_args() which is a wrapper over syscon_regmap_lookup_by_phandle() combined with getting the syscon argument. Except simpler code this annotates within one line that given phandle has arguments, so grepping for code would be easier. There is also no real benefit in printing errors on missing syscon argument, because this is done just too late: runtime check on static/build-time data. Dtschema and Devicetree bindings offer the static/build-time check for this already. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250111185414.183971-1-krzysztof.kozlowski@linaro.org
2025-01-15irqchip/sunxi-nmi: Add missing SKIP_WAKE flagPhilippe Simons
Some boards with Allwinner SoCs connect the PMIC's IRQ pin to the SoC's NMI pin instead of a normal GPIO. Since the power key is connected to the PMIC, and people expect to wake up a suspended system via this key, the NMI IRQ controller must stay alive when the system goes into suspend. Add the SKIP_WAKE flag to prevent the sunxi NMI controller from going to sleep, so that the power key can wake up those systems. [ tglx: Fixed up coding style ] Signed-off-by: Philippe Simons <simons.philippe@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250112123402.388520-1-simons.philippe@gmail.com
2025-01-15irqchip/gic-v3-its: Don't enable interrupts in its_irq_set_vcpu_affinity()Tomas Krcka
The following call-chain leads to enabling interrupts in a nested interrupt disabled section: irq_set_vcpu_affinity() irq_get_desc_lock() raw_spin_lock_irqsave() <--- Disable interrupts its_irq_set_vcpu_affinity() guard(raw_spinlock_irq) <--- Enables interrupts when leaving the guard() irq_put_desc_unlock() <--- Warns because interrupts are enabled This was broken in commit b97e8a2f7130, which replaced the original raw_spin_[un]lock() pair with guard(raw_spinlock_irq). Fix the issue by using guard(raw_spinlock). [ tglx: Massaged change log ] Fixes: b97e8a2f7130 ("irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()") Signed-off-by: Tomas Krcka <krckatom@amazon.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241230150825.62894-1-krckatom@amazon.de
2025-01-15irqchip/gic-v3: Handle CPU_PM_ENTER_FAILED correctlyYogesh Lal
When a CPU attempts to enter low power mode, it disables the redistributor and Group 1 interrupts and reinitializes the system registers upon wakeup. If the transition into low power mode fails, then the CPU_PM framework invokes the PM notifier callback with CPU_PM_ENTER_FAILED to allow the drivers to undo the state changes. The GIC V3 driver ignores CPU_PM_ENTER_FAILED, which leaves the GIC in disabled state. Handle CPU_PM_ENTER_FAILED in the same way as CPU_PM_EXIT to restore normal operation. [ tglx: Massage change log, add Fixes tag ] Fixes: 3708d52fc6bb ("irqchip: gic-v3: Implement CPU PM notifier") Signed-off-by: Yogesh Lal <quic_ylal@quicinc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241220093907.2747601-1-quic_ylal@quicinc.com
2025-01-15drm/bridge: ite-it6263: Prevent error pointer dereference in probe()Dan Carpenter
If devm_i2c_new_dummy_device() fails then we were supposed to return an error code, but instead the function continues and will crash on the next line. Add the missing return statement. Fixes: 049723628716 ("drm/bridge: Add ITE IT6263 LVDS to HDMI converter") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/804a758b-f2e7-4116-b72d-29bc8905beed@stanley.mountain
2025-01-14net: fec: handle page_pool_dev_alloc_pages errorKevin Groeneveld
The fec_enet_update_cbd function calls page_pool_dev_alloc_pages but did not handle the case when it returned NULL. There was a WARN_ON(!new_page) but it would still proceed to use the NULL pointer and then crash. This case does seem somewhat rare but when the system is under memory pressure it can happen. One case where I can duplicate this with some frequency is when writing over a smbd share to a SATA HDD attached to an imx6q. Setting /proc/sys/vm/min_free_kbytes to higher values also seems to solve the problem for my test case. But it still seems wrong that the fec driver ignores the memory allocation error and can crash. This commit handles the allocation error by dropping the current packet. Fixes: 95698ff6177b5 ("net: fec: using page pool to manage RX buffers") Signed-off-by: Kevin Groeneveld <kgroeneveld@lenbrook.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250113154846.1765414-1-kgroeneveld@lenbrook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14net: netpoll: ensure skb_pool list is always initializedJohn Sperbeck
When __netpoll_setup() is called directly, instead of through netpoll_setup(), the np->skb_pool list head isn't initialized. If skb_pool_flush() is later called, then we hit a NULL pointer in skb_queue_purge_reason(). This can be seen with this repro, when CONFIG_NETCONSOLE is enabled as a module: ip tuntap add mode tap tap0 ip link add name br0 type bridge ip link set dev tap0 master br0 modprobe netconsole netconsole=4444@10.0.0.1/br0,9353@10.0.0.2/ rmmod netconsole The backtrace is: BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page ... ... ... Call Trace: <TASK> __netpoll_free+0xa5/0xf0 br_netpoll_cleanup+0x43/0x50 [bridge] do_netpoll_cleanup+0x43/0xc0 netconsole_netdev_event+0x1e3/0x300 [netconsole] unregister_netdevice_notifier+0xd9/0x150 cleanup_module+0x45/0x920 [netconsole] __se_sys_delete_module+0x205/0x290 do_syscall_64+0x70/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e Move the skb_pool list setup and initial skb fill into __netpoll_setup(). Fixes: 221a9c1df790 ("net: netpoll: Individualize the skb pool") Signed-off-by: John Sperbeck <jsperbeck@google.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250114011354.2096812-1-jsperbeck@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>