summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-05-03drm/amdgpu: Enable doorbell selfring after resize FB BARShane Xiao
[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from *_common_hw_init to *_common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <shane.xiao@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Christian K�nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: Fix mode2 reset for sienna cichlidlyndonli
Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03Merge tag 'parisc-for-6.4-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "Two important fixes in here: - The argument pointer register was wrong when calling 64-bit firmware functions, which may cause random memory corruption or crashes. - Ensure page alignment in cache flush functions, otherwise not all memory might get flushed. The rest are cleanups (mmap implementation, panic path) and usual smaller updates. Summary: - Calculate correct argument pointer in real64_call_asm() - Cleanup mmap implementation regarding color alignment (John David Anglin) - Spinlock fixes in panic path (Guilherme G. Piccoli) - build doc update for parisc64 (Randy Dunlap) - Ensure page alignment in flush functions" * tag 'parisc-for-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix argument pointer in real64_call_asm() parisc: Cleanup mmap implementation regarding color alignment parisc: Drop HP-UX constants and structs from grfioctl.h parisc: Ensure page alignment in flush functions parisc: Replace regular spinlock with spin_trylock on panic path parisc: update kbuild doc. aliases for parisc64 parisc: Limit amount of kgdb breakpoints on parisc
2023-05-03Merge tag 'modules-6.4-rc1-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules fix from Luis Chamberlain: "One fix by Arnd far for modules which came in after the first pull request. The issue was found as part of some late compile tests with 0-day. I take it 0-day does some secondary late builds with after some initial ones" * tag 'modules-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: include internal.h in module/dups.c
2023-05-03Merge tag 'sysctl-6.4-rc1-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull more sysctl updates from Luis Chamberlain: "As mentioned on my first pull request for sysctl-next, for v6.4-rc1 we're very close to being able to deprecating register_sysctl_paths(). I was going to assess the situation after the first week of the merge window. That time is now and things are looking good. We only have one which had already an ACK for so I'm picking this up here now and the last patch is the one that uses an axe. I have boot tested the last patch and 0-day build completed successfully" * tag 'sysctl-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: remove register_sysctl_paths() kernel: pid_namespace: simplify sysctls with register_sysctl()
2023-05-03Merge tag 'uml-for-linus-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull uml updates from Richard Weinberger: - Make stub data pages configurable - Make it harder to mix user and kernel code by accident * tag 'uml-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: make stub data pages size tweakable um: prevent user code in modules um: further clean up user_syms um: don't export printf() um: hostfs: define our own API boundary um: add __weak for exported functions
2023-05-03Merge tag 'ubifs-for-linus-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: "UBI: - Fix error value for try_write_vid_and_data() - Minor cleanups UBIFS: - Fixes for various memory leaks - Minor cleanups" * tag 'ubifs-for-linus-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Fix memleak when insert_old_idx() failed Revert "ubifs: dirty_cow_znode: Fix memleak in error handling path" ubifs: Fix memory leak in do_rename ubifs: Free memory for tmpfile name ubi: Fix return value overwrite issue in try_write_vid_and_data() ubifs: Remove return in compr_exit() ubi: Simplify bool conversion
2023-05-03Merge tag 'pm-6.4-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These fix a hibernation test mode regression and clean up the intel_idle driver. Specifics: - Make test_resume work again after the changes that made hibernation open the snapshot device in exclusive mode (Chen Yu) - Clean up code in several places in intel_idle (Artem Bityutskiy)" * tag 'pm-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: mark few variables as __read_mostly intel_idle: do not sprinkle module parameter definitions around intel_idle: fix confusing message intel_idle: improve C-state flags handling robustness intel_idle: further intel_idle_init_cstates_icpu() cleanup intel_idle: clean up intel_idle_init_cstates_icpu() intel_idle: use pr_info() instead of printk() PM: hibernate: Do not get block device exclusively in test_resume mode PM: hibernate: Turn snapshot_test into global variable
2023-05-03Merge tag 'acpi-6.4-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These add two ACPI-related quirks and extend support for Apple device properties supplied via ACPI _DSM. Specifics: - Do not turn off unused power resources during initialization on the Toshiba Click Mini (Hans de Goede) - Support strings in device properties supplied by ACPI _DSM on Apple platforms (Hector Martin) - Add an ACPI device ID quirk for Lenovo Yoga Tablet 2 (Marius Hoch)" * tag 'acpi-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Support strings in Apple _DSM props ACPI: x86: utils: Remove Lenovo Yoga Tablet 2's MAGN0001 ACPI: PM: Do not turn of unused power resources on the Toshiba Click Mini
2023-05-03Merge tag 'thermal-6.4-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more thermal control updates from Rafael Wysocki: "These are mostly cleanups on top of the previously merged thermal control changes plus some driver fixes and the removal of the Intel Menlow thermal driver. Specifics: - Add compatible DT bindings for imx6sll and imx6ul to fix a dtbs check warning (Stefan Wahren) - Update the example in the DT bindings to reflect changes with the ADC node name for QCom TM and TM5 (Marijn Suijten) - Fix comments for the cpuidle_cooling_register() function to match the function prototype (Chenggang Wang) - Fix inconsistent temperature read and some Mediatek variant board reboot by reverting a change and handling the temperature differently (AngeloGioacchino Del Regno) - Fix a memory leak in the initialization error path for the Mediatek driver (Kang Chen) - Use of_address_to_resource() in the Mediatek driver (Rob Herring) - Fix unit address in the QCom tsens driver DT bindings (Krzysztof Kozlowski) - Clean up the step-wise thermal governor (Zhang Rui) - Introduce thermal_zone_device() for accessing the device field of struct thermal_zone_device and two drivers use it (Daniel Lezcano) - Clean up the ACPI thermal driver a bit (Daniel Lezcano) - Delete the thermal driver for Intel Menlow platforms that is not expected to have any users (Rafael Wysocki)" * tag 'thermal-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: intel: menlow: Get rid of this driver ACPI: thermal: Move to dedicated function sysfs extra attr creation ACPI: thermal: Use thermal_zone_device() thermal: intel: pch_thermal: Use thermal driver device to write a trace thermal: core: Encapsulate tz->device field thermal: gov_step_wise: Adjust code logic to match comment thermal: gov_step_wise: Delete obsolete comment dt-bindings: thermal: qcom-tsens: Correct unit address thermal/drivers/mediatek: Use of_address_to_resource() thermal/drivers/mediatek: Change clk_prepare_enable to devm_clk_get_enabled in mtk_thermal_probe thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe thermal/drivers/mediatek: Add temperature constraints to validate read Revert "thermal/drivers/mediatek: Add delay after thermal banks initialization" thermal/drivers/cpuidle_cooling: Delete unmatched comments dt-bindings: thermal: Use generic ADC node name in examples dt-bindings: imx-thermal: Add imx6sll and imx6ul compatible
2023-05-03Merge tag 'pwm/for-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "The bulk of this is trivial conversions to the new .remove_new() callback for drivers as part of Uwe's effort to clean that up. Other than that a driver is added for Apple devices and various small fixes are included for existing drivers. Last but not least, this finally gets rid of the old pwm_request() and pwm_free() APIs are removed since the last user was dropped in v6.3" * tag 'pwm/for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (44 commits) pwm: Remove unused radix tree pwm: Delete deprecated functions pwm_request() and pwm_free() pwm: meson: Fix g12a ao clk81 name pwm: meson: Fix axg ao mux parents pwm: stm32: Enforce settings for PWM capture MAINTAINERS: Add entries for Apple PWM driver pwm: Add Apple PWM controller dt-bindings: pwm: Add Apple PWM controller pwm: mtk-disp: Configure double buffering before reading in .get_state() pwm: mtk-disp: Disable shadow registers before setting backlight values pwm: stm32-lp: Drop of_match_ptr for ID table pwm: rcar: Drop of_match_ptr for ID table dt-bindings: pwm: Convert Amlogic Meson PWM binding dt-bindings: pwm: mediatek: Add mediatek,mt7986 compatible pwm: xilinx: Convert to platform remove callback returning void pwm: vt8500: Convert to platform remove callback returning void pwm: tiehrpwm: Convert to platform remove callback returning void pwm: tiecap: Convert to platform remove callback returning void pwm: tegra: Convert to platform remove callback returning void pwm: sun4i: Convert to platform remove callback returning void ...
2023-05-03Merge tag 'soundwire-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire updates from Vinod Koul: "This features AMD soundwire controller driver, a bunch of Intel changes for future platform support, sdw API updates etc: - Support for AMD soundwire controller - Intel driver updates to support future platforms - Core API sdw_nread/nwrite_no_pm updates to handle page boundaries" * tag 'soundwire-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (38 commits) soundwire: intel_auxdevice: improve pm_prepare step soundwire: bus: Fix unbalanced pm_runtime_put() causing usage count underflow soundwire: intel: don't save hw_params for use in prepare soundwire: bus: Update sdw_nread/nwrite_no_pm to handle page boundaries soundwire: bus: Update kernel doc for no_pm functions soundwire: bus: Remove now outdated comments on no_pm IO soundwire: stream: uniquify dev_err() logs soundwire: stream: remove bus->dev from logs on multiple buses soundwire: amd: add pm_prepare callback and pm ops support soundwire: amd: handle SoundWire wake enable interrupt soundwire: amd: add runtime pm ops for AMD SoundWire manager driver soundwire: amd: add SoundWire manager interrupt handling soundwire: amd: enable build for AMD SoundWire manager driver soundwire: amd: register SoundWire manager dai ops soundwire: amd: Add support for AMD Manager driver soundwire: export sdw_compute_slave_ports() function soundwire: stream: restore cumulative bus bandwidth when compute_params callback failed soundwire: bandwidth allocation: Use hweight32() to calculate set bits soundwire: qcom: gracefully handle too many ports in DT soundwire: qcom: define hardcoded version magic numbers ...
2023-05-03Merge tag 'phy-for-6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy updates from Vinod Koul: "New support: - UFS PHY for Qualcomm SA8775p, SM7150 - PCIe 2 lane phy support for sc8180x and PCIe PHY for SDX65 - Mediatke hdmi phy support for mt8195 - rockchip naneng combo phy support for RK358 Updates: - Drop Thunder Bay eMMC PHY driver - RC support for PCIe phy for Qualcomm SDX55 - SGMII support in WIZ driver for J721E - PCIe and multilink SGMII PHY support in cadence driver - Big pile of platform remove callback returning void conversions" * tag 'phy-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: (77 commits) phy: cadence: cdns-dphy-rx: Add common module reset support phy: ti: j721e-wiz: Add SGMII support in WIZ driver for J721E dt-bindings: phy: ti: phy-gmii-sel: Add support for J784S4 CPSW9G phy: ti: j721e-wiz: Fix unreachable code in wiz_mode_select() phy: cadence: Sierra: Add PCIe + SGMII PHY multilink configuration phy: mediatek: add support for phy-mtk-hdmi-mt8195 phy: phy-mtk-hdmi: Add generic phy configure callback dt-bindings: phy: mediatek: hdmi-phy: Add mt8195 compatible phy: tegra: xusb: Add missing tegra_xusb_port_unregister for usb2_port and ulpi_port dt-bindings: phy: ti,phy-j721e-wiz: document clock-output-names dt-bindings: phy: ti,phy-j721e-wiz: drop assigned-clocks dt-bindings: phy: ti,phy-am654-serdes: drop assigned-clocks type dt-bindings: phy: cadence-torrent: drop assigned-clocks dt-bindings: phy: cadence-sierra: drop assigned-clocks phy: rockchip: remove unused hw_to_inno function phy: qualcomm: phy-qcom-qmp-ufs: add definitions for sa8775p dt-bindings: phy: qmp-ufs: describe the UFS PHY for sa8775p phy: qcom-qmp-pcie: drop sdm845_qhp_pcie_rx_tbl phy: qcom-qmp-pcie: sc8180x PCIe PHY has 2 lanes phy: qcom-qmp-ufs: Add SM7150 support ...
2023-05-03Merge tag 'dmaengine-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine updates from Vinod Koul: "New support: - Apple admac t8112 device support - StarFive JH7110 DMA controller Updates: - Big pile of idxd updates to support IAA 2.0 device capabilities, DSA 2.0 Event Log and completion record faulting features and new DSA operations - at_xdmac supend & resume updates and driver code cleanup - k3-udma supend & resume support - k3-psil thread support for J784s4" * tag 'dmaengine-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (57 commits) dmaengine: idxd: add per wq PRS disable dmaengine: idxd: add pid to exported sysfs attribute for opened file dmaengine: idxd: expose fault counters to sysfs dmaengine: idxd: add a device to represent the file opened dmaengine: idxd: add per file user counters for completion record faults dmaengine: idxd: process batch descriptor completion record faults dmaengine: idxd: add descs_completed field for completion record dmaengine: idxd: process user page faults for completion record dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling dmaengine: idxd: create kmem cache for event log fault items dmaengine: idxd: add per DSA wq workqueue for processing cr faults dmanegine: idxd: add debugfs for event log dump dmaengine: idxd: add interrupt handling for event log dmaengine: idxd: setup event log configuration dmaengine: idxd: add event log size sysfs attribute dmaengine: idxd: make misc interrupt one shot dt-bindings: dma: snps,dw-axi-dmac: constrain the items of resets for JH7110 dma dt-bindings: dma: Drop unneeded quotes dmaengine: at_xdmac: align declaration of ret with the rest of variables dmaengine: at_xdmac: add a warning message regarding for unpaused channels ...
2023-05-03Merge tag 'for-6.4-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pateldipen1984/linux Pull hardware timestamp engine updates from Dipen Patel: "The changes for the hte subsystem include: - Add Tegra234 HTE provider and relevant DT bindings - Update MAINTAINERS file for the HTE subsystem" * tag 'for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pateldipen1984/linux: hte: tegra-194: Use proper includes hte: Use device_match_of_node() hte: tegra-194: Fix off by one in tegra_hte_map_to_line_id() hte: tegra: fix 'struct of_device_id' build error hte: Use of_property_present() for testing DT property presence gpio: tegra186: Add Tegra234 hte support hte: handle nvidia,gpio-controller property hte: Deprecate nvidia,slices property hte: Add Tegra234 provider hte: Re-phrase tegra API document arm64: tegra: Add Tegra234 GTE nodes dt-bindings: timestamp: Deprecate nvidia,slices property dt-bindings: timestamp: Add Tegra234 support MAINTAINERS: Add HTE/timestamp subsystem details
2023-05-03x86-64: mm: clarify the 'positive addresses' user address rulesLinus Torvalds
Dave Hansen found the "(long) addr >= 0" code in the x86-64 access_ok checks somewhat confusing, and suggested using a helper to clarify what the code is doing. So this does exactly that: clarifying what the sign bit check is all about, by adding a helper macro that makes it clear what it is testing. This also adds some explicit comments talking about how even with LAM enabled, any addresses with the sign bit will still GP-fault in the non-canonical region just above the sign bit. This is all what allows us to do the user address checks with just the sign bit, and furthermore be a bit cavalier about accesses that might be done with an additional offset even past that point. (And yes, this talks about 'positive' even though zero is also a valid user address and so technically we should call them 'non-negative'. But I don't think using 'non-negative' ends up being more understandable). Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: mm: remove 'sign' games from LAM untagged_addr*() macrosLinus Torvalds
The intent of the sign games was to not modify kernel addresses when untagging them. However, that had two issues: (a) it didn't actually work as intended, since the mask was calculated as 'addr >> 63' on an _unsigned_ address. So instead of getting a mask of all ones for kernel addresses, you just got '1'. (b) untagging a kernel address isn't actually a valid operation anyway. Now, (a) had originally been true for both 'untagged_addr()' and the remote version of it, but had accidentally been fixed for the regular version of untagged_addr() by commit e0bddc19ba95 ("x86/mm: Reduce untagged_addr() overhead for systems without LAM"). That one rewrote the shift to be part of the alternative asm code, and in the process changed the unsigned shift into a signed 'sar' instruction. And while it is true that we don't want to turn what looks like a kernel address into a user address by masking off the high bit, that doesn't need these sign masking games - all it needs is that the mm context 'untag_mask' value has the high bit set. Which it always does. So simplify the code by just removing the superfluous (and in the case of untagged_addr_remote(), still buggy) sign bit games in the address masking. Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> headerLinus Torvalds
The x86 <asm/uaccess.h> file has grown features that are specific to x86-64 like LAM support and the related access_ok() changes. They really should be in the <asm/uaccess_64.h> file and not pollute the generic x86 header. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: mm: remove architecture-specific 'access_ok()' defineLinus Torvalds
There's already a generic definition of 'access_ok()' in the asm-generic/access_ok.h header file, and the only difference bwteen that and the x86-specific one is the added check for WARN_ON_IN_IRQ(). And it turns out that the reason for that check is long gone: it used to use a "user_addr_max()" inline function that depended on the current thread, and caused problems in non-thread contexts. For details, see commits 7c4788950ba5 ("x86/uaccess, sched/preempt: Verify access_ok() context") and in particular commit ae31fe51a3cc ("perf/x86: Restore TASK_SIZE check on frame pointer") about how and why this came to be. But that "current task" issue was removed in the big set_fs() removal by Christoph Hellwig in commit 47058bb54b57 ("x86: remove address space overrides using set_fs()"). So the reason for the test and the architecture-specific access_ok() define no longer exists, and is actually harmful these days. For example, it led various 'copy_from_user_nmi()' games (eg using __range_not_ok() instead, and then later converted to __access_ok() when that became ok). And that in turn meant that LAM was broken for the frame following before this series, because __access_ok() used to not do the address untagging. Accessing user state still needs care in many contexts, but access_ok() is not the place for this test. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds torvalds@linux-foundation.org>
2023-05-03x86-64: make access_ok() independent of LAMLinus Torvalds
The linear address masking (LAM) code made access_ok() more complicated, in that it now needs to untag the address in order to verify the access range. See commit 74c228d20a51 ("x86/uaccess: Provide untagged_addr() and remove tags before address check"). We were able to avoid that overhead in the get_user/put_user code paths by simply using the sign bit for the address check, and depending on the GP fault if the address was non-canonical, which made it all independent of LAM. And we can do the same thing for access_ok(): simply check that the user pointer range has the high bit clear. No need to bother with any address bit masking. In fact, we can go a bit further, and just check the starting address for known small accesses ranges: any accesses that overflow will still be in the non-canonical area and will still GP fault. To still make syzkaller catch any potentially unchecked user addresses, we'll continue to warn about GP faults that are caused by accesses in the non-canonical range. But we'll limit that to purely "high bit set and past the one-page 'slop' area". We could probably just do that "check only starting address" for any arbitrary range size: realistically all kernel accesses to user space will be done starting at the low address. But let's leave that kind of optimization for later. As it is, this already allows us to generate simpler code and not worry about any tag bits in the address. The one thing to look out for is the GUP address check: instead of actually copying data in the virtual address range (and thus bad addresses being caught by the GP fault), GUP will look up the page tables manually. As a result, the page table limits need to be checked, and that was previously implicitly done by the access_ok(). With the relaxed access_ok() check, we need to just do an explicit check for TASK_SIZE_MAX in the GUP code instead. The GUP code already needs to do the tag bit unmasking anyway, so there this is all very straightforward, and there are no LAM issues. Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03tracing: Fix permissions for the buffer_percent fileOndrej Mosnacek
This file defines both read and write operations, yet it is being created as read-only. This means that it can't be written to without the CAP_DAC_OVERRIDE capability. Fix the permissions to allow root to write to it without the need to override DAC perms. Link: https://lore.kernel.org/linux-trace-kernel/20230503140114.3280002-1-omosnace@redhat.com Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 03329f993978 ("tracing: Add tracefs file buffer_percentage") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-03parisc: Fix argument pointer in real64_call_asm()Helge Deller
Fix the argument pointer (ap) to point to real-mode memory instead of virtual memory. It's interesting that this issue hasn't shown up earlier, as this could have happened with any 64-bit PDC ROM code. I just noticed it because I suddenly faced a HPMC while trying to execute the 64-bit STI ROM code of an Visualize-FXe graphics card for the STI text console. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org>
2023-05-03parisc: Cleanup mmap implementation regarding color alignmentJohn David Anglin
This change simplifies the randomization of file mapping regions. It reworks the code to remove duplication. The flow is now similar to that for mips. Finally, we consistently use the do_color_align variable to determine when color alignment is needed. Tested on rp3440. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-03parisc: Drop HP-UX constants and structs from grfioctl.hHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-03parisc: Ensure page alignment in flush functionsHelge Deller
Matthew Wilcox noticed, that if ARCH_HAS_FLUSH_ON_KUNMAP is defined (which is the case for PA-RISC), __kunmap_local() calls kunmap_flush_on_unmap(), which may call the parisc flush functions with a non-page-aligned address and thus the page might not be fully flushed. This patch ensures that flush_kernel_dcache_page_asm() and flush_kernel_dcache_page_asm() will always operate on page-aligned addresses. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+
2023-05-03parisc: Replace regular spinlock with spin_trylock on panic pathGuilherme G. Piccoli
The panic notifiers' callbacks execute in an atomic context, with interrupts/preemption disabled, and all CPUs not running the panic function are off, so it's very dangerous to wait on a regular spinlock, there's a risk of deadlock. Refactor the panic notifier of parisc/power driver to make use of spin_trylock - for that, we've added a second version of the soft-power function. Also, some comments were reorganized and trailing white spaces, useless header inclusion and blank lines were removed. Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeroen Roovers <jer@xs4all.nl> Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-03parisc: update kbuild doc. aliases for parisc64Randy Dunlap
ARCH=parisc64 is now supported for 64-bit parisc builds, so add this alias to the kbuild.rst documentation. Fixes: 3dcfb729b5f4 ("parisc: Make CONFIG_64BIT available for ARCH=parisc64 only") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-parisc@vger.kernel.org Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Acked-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-03parisc: Limit amount of kgdb breakpoints on pariscHelge Deller
kgdb is rarely used and 40 breakpoints seems enough to debug parisc specific bugs. Signed-off-by: Helge Deller <deller@gmx.de>
2023-05-03Merge branch 'pm-sleep'Rafael J. Wysocki
Merge hibernation test mode fix for 6.4-rc1. * pm-sleep: PM: hibernate: Do not get block device exclusively in test_resume mode PM: hibernate: Turn snapshot_test into global variable
2023-05-03Merge branches 'acpi-pm' and 'acpi-properties'Rafael J. Wysocki
Merge an ACPI power management quirk and a change related to the handling of ACPI device properties for 6.4-rc1: - Do not turn off unused power resources during initialization on the Toshiba Click Mini (Hans de Goede). - Support strings in device properties supplied by ACPI _DSM on Apple platforms (Hector Martin). * acpi-pm: ACPI: PM: Do not turn of unused power resources on the Toshiba Click Mini * acpi-properties: ACPI: property: Support strings in Apple _DSM props
2023-05-03Merge branch 'thermal-core'Rafael J. Wysocki
Merge additional thermal core and ACPI thermal changes for 6.4-rc1: - Clean up the step-wise thermal governor (Zhang Rui). - Introduce thermal_zone_device() for accessing the device field of struct thermal_zone_device and two drivers use it (Daniel Lezcano). - Clean up the ACPI thermal driver a bit (Daniel Lezcano). - Delete the thermal driver for Intel Menlow platforms that is not expected to have any users (Rafael Wysocki). * thermal-core: thermal: intel: menlow: Get rid of this driver ACPI: thermal: Move to dedicated function sysfs extra attr creation ACPI: thermal: Use thermal_zone_device() thermal: intel: pch_thermal: Use thermal driver device to write a trace thermal: core: Encapsulate tz->device field thermal: gov_step_wise: Adjust code logic to match comment thermal: gov_step_wise: Delete obsolete comment
2023-05-03drm/i915/dsi: Use unconditional msleep() instead of intel_dsi_msleep()Hans de Goede
The intel_dsi_msleep() helper skips sleeping if the MIPI-sequences have a version of 3 or newer and the panel is in vid-mode. This is based on the big comment around line 730 which starts with "Panel enable/disable sequences from the VBT spec.", where the "v3 video mode seq" column does not have any wait t# entries. Checking the Windows driver shows that it does always honor the VBT delays independent of the version of the VBT sequences. Commit 6fdb335f1c9c ("drm/i915/dsi: Use unconditional msleep for the panel_on_delay when there is no reset-deassert MIPI-sequence") switched to a direct msleep() instead of intel_dsi_msleep() when there is no MIPI_SEQ_DEASSERT_RESET sequence, to fix the panel on an Acer Aspire Switch 10 E SW3-016 not turning on. And now testing on a Nextbook Ares 8A shows that panel_on_delay must always be honored otherwise the panel will not turn on. Instead of only always using regular msleep() for panel_on_delay do as Windows does and always use regular msleep() everywhere were intel_dsi_msleep() is used and drop the intel_dsi_msleep() helper. Changes in v2: - Replace all intel_dsi_msleep() calls instead of just the intel_dsi_msleep(panel_on_delay) call Cc: stable@vger.kernel.org Fixes: 6fdb335f1c9c ("drm/i915/dsi: Use unconditional msleep for the panel_on_delay when there is no reset-deassert MIPI-sequence") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230425194441.68086-1-hdegoede@redhat.com (cherry picked from commit fa83c12132f71302f7d4b02758dc0d46048d3f5f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2023-05-03drm/i915/mtl: Add the missing CPU transcoder mask in intel_device_infoRadhakrishna Sripada
CPU transcoder mask is used to iterate over the available CPU transcoders in the macro for_each_cpu_transcoder(). The macro is broken on MTL and got highlighted when audio state was being tracked for each transcoder added in [1]. Add the missing CPU transcoder mask which is similar to ADL-P mask but without DSI transcoders. [1]: https://patchwork.freedesktop.org/patch/523723/ Fixes: 7835303982d1 ("drm/i915/mtl: Add MeteorLake PCI IDs") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Acked-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420221248.2511314-1-radhakrishna.sripada@intel.com (cherry picked from commit bddc18913bd44adae5c828fd514d570f43ba1576) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2023-05-03drm/i915/guc: Actually return an error if GuC version range check failsJohn Harrison
Dan Carpenter pointed out that 'err' was not being set in the case where the GuC firmware version range check fails. Fix that. Note that while this is a bug fix for a previous patch (see Fixes tag below). It is an exceedingly low risk bug. The range check is asserting that the GuC firmware version is within spec. So it should not be possible to ever have a firmware file that fails this check. If larger version numbers are required in the future, that would be a backwards breaking spec change and thus require a major version bump, in which case an old i915 driver would not load that new version anyway. Fixes: 9bbba0667f37 ("drm/i915/guc: Use GuC submission API version number") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230421224742.2357198-1-John.C.Harrison@Intel.com (cherry picked from commit 80ab31799002166ac7c660bacfbff4f85bc29107) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2023-05-02module: include internal.h in module/dups.cArnd Bergmann
Two newly introduced functions are declared in a header that is not included before the definition, causing a warning with sparse or 'make W=1': kernel/module/dups.c:118:6: error: no previous prototype for 'kmod_dup_request_exists_wait' [-Werror=missing-prototypes] 118 | bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/module/dups.c:220:6: error: no previous prototype for 'kmod_dup_request_announce' [-Werror=missing-prototypes] 220 | void kmod_dup_request_announce(char *module_name, int ret) | ^~~~~~~~~~~~~~~~~~~~~~~~~ Add an explicit include to ensure the prototypes match. Fixes: 8660484ed1cf ("module: add debugging auto-load duplicate module support") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202304141440.DYO4NAzp-lkp@intel.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-02sysctl: remove register_sysctl_paths()Luis Chamberlain
The deprecation for register_sysctl_paths() is over. We can rejoice as we nuke register_sysctl_paths(). The routine register_sysctl_table() was the only user left of register_sysctl_paths(), so we can now just open code and move the implementation over to what used to be to __register_sysctl_paths(). The old dynamic struct ctl_table_set *set is now the point to sysctl_table_root.default_set. The old dynamic const struct ctl_path *path was being used in the routine register_sysctl_paths() with a static: static const struct ctl_path null_path[] = { {} }; Since this is a null path we can now just simplfy the old routine and remove its use as its always empty. This saves us a total of 230 bytes. $ ./scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 2/7 grow/shrink: 1/1 up/down: 1015/-1245 (-230) Function old new delta register_leaf_sysctl_tables.constprop - 524 +524 register_sysctl_table 22 497 +475 __pfx_register_leaf_sysctl_tables.constprop - 16 +16 null_path 8 - -8 __pfx_register_sysctl_paths 16 - -16 __pfx_register_leaf_sysctl_tables 16 - -16 __pfx___register_sysctl_paths 16 - -16 __register_sysctl_base 29 12 -17 register_sysctl_paths 18 - -18 register_leaf_sysctl_tables 534 - -534 __register_sysctl_paths 620 - -620 Total: Before=21259666, After=21259436, chg -0.00% Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-05-02kernel: pid_namespace: simplify sysctls with register_sysctl()Luis Chamberlain
register_sysctl_paths() is only required if your child (directories) have entries and pid_namespace does not. So use register_sysctl_init() instead where we don't care about the return value and use register_sysctl() where we do. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jeff Xu <jeffxu@google.com> Link: https://lore.kernel.org/r/20230302202826.776286-9-mcgrof@kernel.org
2023-05-02mm: change per-VMA lock statistics to be disabled by defaultSuren Baghdasaryan
Change CONFIG_PER_VMA_LOCK_STATS to be disabled by default, as most users don't need it. Add configuration help to clarify its usage. Link: https://lkml.kernel.org/r/20230428173533.18158-1-surenb@google.com Fixes: 52f238653e45 ("mm: introduce per-VMA lock statistics") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02MAINTAINERS: update Michal Simek's emailMichal Simek
@xilinx.com is still working but better to switch to new amd.com after AMD/Xilinx acquisition. Link: https://lkml.kernel.org/r/bd073d026f8c367a9cfb45d26d39f26e40c665dc.1683035692.git.michal.simek@amd.com Signed-off-by: Michal Simek <michal.simek@amd.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> Cc: Qais Yousef <qyousef@layalina.io> Cc: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/mempolicy: correctly update prev when policy is equal on mbindLorenzo Stoakes
The refactoring in commit f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator") introduces a subtle bug which arises when attempting to apply a new NUMA policy across a range of VMAs in mbind_range(). The refactoring passes a **prev pointer to keep track of the previous VMA in order to reduce duplication, and in all but one case it keeps this correctly updated. The bug arises when a VMA within the specified range has an equivalent policy as determined by mpol_equal() - which unlike other cases, does not update prev. This can result in a situation where, later in the iteration, a VMA is found whose policy does need to change. At this point, vma_merge() is invoked with prev pointing to a VMA which is before the previous VMA. Since vma_merge() discovers the curr VMA by looking for the one immediately after prev, it will now be in a situation where this VMA is incorrect and the merge will not proceed correctly. This is checked in the VM_WARN_ON() invariant case with end > curr->vm_end, which, if a merge is possible, results in a warning (if CONFIG_DEBUG_VM is specified). I note that vma_merge() performs these invariant checks only after merge_prev/merge_next are checked, which is debatable as it hides this issue if no merge is possible even though a buggy situation has arisen. The solution is simply to update the prev pointer even when policies are equal. This caused a bug to arise in the 6.2.y stable tree, and this patch resolves this bug. Link: https://lkml.kernel.org/r/83f1d612acb519d777bebf7f3359317c4e7f4265.1682866629.git.lstoakes@gmail.com Fixes: f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator") Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202304292203.44ddeff6-oliver.sang@intel.com Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02relayfs: fix out-of-bounds access in relay_file_readZhang Zhengming
There is a crash in relay_file_read, as the var from point to the end of last subbuf. The oops looks something like: pc : __arch_copy_to_user+0x180/0x310 lr : relay_file_read+0x20c/0x2c8 Call trace: __arch_copy_to_user+0x180/0x310 full_proxy_read+0x68/0x98 vfs_read+0xb0/0x1d0 ksys_read+0x6c/0xf0 __arm64_sys_read+0x20/0x28 el0_svc_common.constprop.3+0x84/0x108 do_el0_svc+0x74/0x90 el0_svc+0x1c/0x28 el0_sync_handler+0x88/0xb0 el0_sync+0x148/0x180 We get the condition by analyzing the vmcore: 1). The last produced byte and last consumed byte both at the end of the last subbuf 2). A softirq calls function(e.g __blk_add_trace) to write relay buffer occurs when an program is calling relay_file_read_avail(). relay_file_read relay_file_read_avail relay_file_read_consume(buf, 0, 0); //interrupted by softirq who will write subbuf .... return 1; //read_start point to the end of the last subbuf read_start = relay_file_read_start_pos //avail is equal to subsize avail = relay_file_read_subbuf_avail //from points to an invalid memory address from = buf->start + read_start //system is crashed copy_to_user(buffer, from, avail) Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com Fixes: 8d62fdebdaf9 ("relay file read: start-pos fix") Signed-off-by: Zhang Zhengming <zhang.zhengming@h3c.com> Reviewed-by: Zhao Lei <zhao_lei1@hoperun.com> Reviewed-by: Zhou Kete <zhou.kete@h3c.com> Reviewed-by: Pengcheng Yang <yangpc@wangsu.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02kasan: hw_tags: avoid invalid virt_to_page()Mark Rutland
When booting with 'kasan.vmalloc=off', a kernel configured with support for KASAN_HW_TAGS will explode at boot time due to bogus use of virt_to_page() on a vmalloc adddress. With CONFIG_DEBUG_VIRTUAL selected this will be reported explicitly, and with or without CONFIG_DEBUG_VIRTUAL the kernel will dereference a bogus address: | ------------[ cut here ]------------ | virt_to_phys used for non-linear address: (____ptrval____) (0xffff800008000000) | WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x78/0x80 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc3-00073-g83865133300d-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __virt_to_phys+0x78/0x80 | lr : __virt_to_phys+0x78/0x80 | sp : ffffcd076afd3c80 | x29: ffffcd076afd3c80 x28: 0068000000000f07 x27: ffff800008000000 | x26: fffffbfff0000000 x25: fffffbffff000000 x24: ff00000000000000 | x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000 | x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000 | x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004 | x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000003 | x11: 00000000ffffefff x10: c0000000ffffefff x9 : 0000000000000000 | x8 : 0000000000000000 x7 : 205d303030303030 x6 : 302e30202020205b | x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : 000000000000004f | Call trace: | __virt_to_phys+0x78/0x80 | __kasan_unpoison_vmalloc+0xd4/0x478 | __vmalloc_node_range+0x77c/0x7b8 | __vmalloc_node+0x54/0x64 | init_IRQ+0x94/0xc8 | start_kernel+0x194/0x420 | __primary_switched+0xbc/0xc4 | ---[ end trace 0000000000000000 ]--- | Unable to handle kernel paging request at virtual address 03fffacbe27b8000 | Mem abort info: | ESR = 0x0000000096000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041bc5000 | [03fffacbe27b8000] pgd=0000000000000000, p4d=0000000000000000 | Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.3.0-rc3-00073-g83865133300d-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __kasan_unpoison_vmalloc+0xe4/0x478 | lr : __kasan_unpoison_vmalloc+0xd4/0x478 | sp : ffffcd076afd3ca0 | x29: ffffcd076afd3ca0 x28: 0068000000000f07 x27: ffff800008000000 | x26: 0000000000000000 x25: 03fffacbe27b8000 x24: ff00000000000000 | x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000 | x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000 | x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004 | x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000001 | x11: 0000800008000000 x10: ffff800008000000 x9 : ffffb2f8dee00000 | x8 : 000ffffb2f8dee00 x7 : 205d303030303030 x6 : 302e30202020205b | x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : ffffb2f8dee00000 | Call trace: | __kasan_unpoison_vmalloc+0xe4/0x478 | __vmalloc_node_range+0x77c/0x7b8 | __vmalloc_node+0x54/0x64 | init_IRQ+0x94/0xc8 | start_kernel+0x194/0x420 | __primary_switched+0xbc/0xc4 | Code: d34cfc08 aa1f03fa 8b081b39 d503201f (f9400328) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill the idle task! This is because init_vmalloc_pages() erroneously calls virt_to_page() on a vmalloc address, while virt_to_page() is only valid for addresses in the linear/direct map. Since init_vmalloc_pages() expects virtual addresses in the vmalloc range, it must use vmalloc_to_page() rather than virt_to_page(). We call init_vmalloc_pages() from __kasan_unpoison_vmalloc(), where we check !is_vmalloc_or_module_addr(), suggesting that we might encounter a non-vmalloc address. Luckily, this never happens. By design, we only call __kasan_unpoison_vmalloc() on pointers in the vmalloc area, and I have verified that we don't violate that expectation. Given that, is_vmalloc_or_module_addr() must always be true for any legitimate argument to __kasan_unpoison_vmalloc(). Correct init_vmalloc_pages() to use vmalloc_to_page(), and remove the redundant and misleading use of is_vmalloc_or_module_addr() in __kasan_unpoison_vmalloc(). Link: https://lkml.kernel.org/r/20230418164212.1775741-1-mark.rutland@arm.com Fixes: 6c2f761dad7851d8 ("kasan: fix zeroing vmalloc memory with HW_TAGS") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm: hwpoison: coredump: support recovery from dump_user_range()Kefeng Wang
dump_user_range() is used to copy the user page to a coredump file, but if a hardware memory error occurred during copy, which called from __kernel_write_iter() in dump_user_range(), it crashes, CPU: 112 PID: 7014 Comm: mca-recover Not tainted 6.3.0-rc2 #425 pc : __memcpy+0x110/0x260 lr : _copy_from_iter+0x3bc/0x4c8 ... Call trace: __memcpy+0x110/0x260 copy_page_from_iter+0xcc/0x130 pipe_write+0x164/0x6d8 __kernel_write_iter+0x9c/0x210 dump_user_range+0xc8/0x1d8 elf_core_dump+0x308/0x368 do_coredump+0x2e8/0xa40 get_signal+0x59c/0x788 do_signal+0x118/0x1f8 do_notify_resume+0xf0/0x280 el0_da+0x130/0x138 el0t_64_sync_handler+0x68/0xc0 el0t_64_sync+0x188/0x190 Generally, the '->write_iter' of file ops will use copy_page_from_iter() and copy_page_from_iter_atomic(), change memcpy() to copy_mc_to_kernel() in both of them to handle #MC during source read, which stop coredump processing and kill the task instead of kernel panic, but the source address may not always a user address, so introduce a new copy_mc flag in struct iov_iter{} to indicate that the iter could do a safe memory copy, also introduce the helpers to set/cleck the flag, for now, it's only used in coredump's dump_user_range(), but it could expand to any other scenarios to fix the similar issue. Link: https://lkml.kernel.org/r/20230417045323.11054-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Tong Tiangen <tongtiangen@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/page_alloc: add some comments to explain the possible hole in ↵Baolin Wang
__pageblock_pfn_to_page() Now the __pageblock_pfn_to_page() is used by set_zone_contiguous(), which checks whether the given zone contains holes, and uses pfn_to_online_page() to validate if the start pfn is online and valid, as well as using pfn_valid() to validate the end pfn. However, the __pageblock_pfn_to_page() function may return non-NULL even if the end pfn of a pageblock is in a memory hole in some situations. For example, if the pageblock order is MAX_ORDER, which will fall into 2 sub-sections, and the end pfn of the pageblock may be hole even though the start pfn is online and valid. See below memory layout as an example and suppose the pageblock order is MAX_ORDER. [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000000100000000-0x0000001fa7ffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000040000000-0x0000001fa3c7ffff] [ 0.000000] node 0: [mem 0x0000001fa3c80000-0x0000001fa3ffffff] [ 0.000000] node 0: [mem 0x0000001fa4000000-0x0000001fa402ffff] [ 0.000000] node 0: [mem 0x0000001fa4030000-0x0000001fa40effff] [ 0.000000] node 0: [mem 0x0000001fa40f0000-0x0000001fa73cffff] [ 0.000000] node 0: [mem 0x0000001fa73d0000-0x0000001fa745ffff] [ 0.000000] node 0: [mem 0x0000001fa7460000-0x0000001fa746ffff] [ 0.000000] node 0: [mem 0x0000001fa7470000-0x0000001fa758ffff] [ 0.000000] node 0: [mem 0x0000001fa7590000-0x0000001fa7dfffff] Focus on the last memory range, and there is a hole for the range [mem 0x0000001fa7590000-0x0000001fa7dfffff]. That means the last pageblock will contain the range from 0x1fa7c00000 to 0x1fa7ffffff, since the pageblock must be 4M aligned. And in this pageblock, these pfns will fall into 2 sub-section (the sub-section size is 2M aligned). So, the 1st sub-section (indicates pfn range: 0x1fa7c00000 - 0x1fa7dfffff ) in this pageblock is valid by calling subsection_map_init() in free_area_init(), but the 2nd sub-section (indicates pfn range: 0x1fa7e00000 - 0x1fa7ffffff ) in this pageblock is not valid. This did not break anything until now, but the zone continuous is fragile in this possible scenario. So as previous discussion[1], it is better to add some comments to explain this possible issue in case there are some future pfn walkers that rely on this. [1] https://lore.kernel.org/all/87r0sdsmr6.fsf@yhuang6-desk2.ccr.corp.intel.com/ Link: https://lkml.kernel.org/r/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/5c26368865e79c743a453dea48d30670b19d2e4f.1682425534.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/ksm: move disabling KSM from s390/gmap code to KSM codeDavid Hildenbrand
Let's factor out actual disabling of KSM. The existing "mm->def_flags &= ~VM_MERGEABLE;" was essentially a NOP and can be dropped, because def_flags should never include VM_MERGEABLE. Note that we don't currently prevent re-enabling KSM. This should now be faster in case KSM was never enabled, because we only conditionally iterate all VMAs. Further, it certainly looks cleaner. Link: https://lkml.kernel.org/r/20230422210156.33630-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Stefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02selftests/ksm: ksm_functional_tests: add prctl unmerge testDavid Hildenbrand
Let's test whether setting PR_SET_MEMORY_MERGE to 0 after setting it to 1 will unmerge pages, similar to how setting MADV_UNMERGEABLE after setting MADV_MERGEABLE would. Link: https://lkml.kernel.org/r/20230422205420.30372-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Stefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0David Hildenbrand
Patch series "mm/ksm: improve PR_SET_MEMORY_MERGE=0 handling and cleanup disabling KSM", v2. (1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE does, (2) add a selftest for it and (3) factor out disabling of KSM from s390/gmap code. This patch (of 3): Let's unmerge any KSM pages when setting PR_SET_MEMORY_MERGE=0, and clear the VM_MERGEABLE flag from all VMAs -- just like KSM would. Of course, only do that if we previously set PR_SET_MEMORY_MERGE=1. Link: https://lkml.kernel.org/r/20230422205420.30372-1-david@redhat.com Link: https://lkml.kernel.org/r/20230422205420.30372-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Stefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/damon/paddr: fix missing folio_sz update in damon_pa_young()Kefeng Wang
The *folio_sz in damon_pa_young() will be used(as last_folio_sz) by __damon_pa_check_access(), so it's need to be updated, fix missing branch. Link: https://lkml.kernel.org/r/20230308083311.120951-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-02mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate()Kefeng Wang
Omit one line by unified folio_put(), and make code more clear. Link: https://lkml.kernel.org/r/20230308083311.120951-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>