summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-27usb-storage: Add Hiksemi USB3-FW to IGNORE_UASHongling Zeng
The UAS mode of Hiksemi USB_HDD is reported to fail to work on several platforms with the following error message, then after re-connecting the device will be offlined and not working at all. [ 592.518442][ 2] sd 8:0:0:0: [sda] tag#17 uas_eh_abort_handler 0 uas-tag 18 inflight: CMD [ 592.527575][ 2] sd 8:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 03 6f 88 00 00 04 00 00 [ 592.536330][ 2] sd 8:0:0:0: [sda] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD [ 592.545266][ 2] sd 8:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 07 44 1a 88 00 00 08 00 These disks have a broken uas implementation, the tag field of the status iu-s is not set properly,so we need to fall-back to usb-storage. Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn> Link: https://lore.kernel.org/r/1663901185-21067-1-git-send-email-zenghongling@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-27uas: add no-uas quirk for Hiksemi usb_diskHongling Zeng
The UAS mode of Hiksemi is reported to fail to work on several platforms with the following error message, then after re-connecting the device will be offlined and not working at all. [ 592.518442][ 2] sd 8:0:0:0: [sda] tag#17 uas_eh_abort_handler 0 uas-tag 18 inflight: CMD [ 592.527575][ 2] sd 8:0:0:0: [sda] tag#17 CDB: Write(10) 2a 00 03 6f 88 00 00 04 00 00 [ 592.536330][ 2] sd 8:0:0:0: [sda] tag#0 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD [ 592.545266][ 2] sd 8:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 07 44 1a 88 00 00 08 00 These disks have a broken uas implementation, the tag field of the status iu-s is not set properly,so we need to fall-back to usb-storage. Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn> Link: https://lore.kernel.org/r/1663901173-21020-1-git-send-email-zenghongling@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-27usb: dwc3: st: Fix node's child namePatrice Chotard
Update node's child name from "dwc3" to "usb", this fixes the following issue: [3.773852] usb-st-dwc3 8f94000.dwc3: failed to find dwc3 core node Fixes: 3120910a099b ("ARM: dts: stih407-family: Harmonize DWC USB3 DT nodes name") Reported-by: Jerome Audu <jerome.audu@st.com> Signed-off-by: Patrice Chotard <patrice.chotard@foss.st.com> Link: https://lore.kernel.org/r/20220926124359.304770-1-patrice.chotard@foss.st.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-27usb: typec: ucsi: Remove incorrect warningHeikki Krogerus
Sink only devices do not have any source capabilities, so the driver should not warn about that. Also DRP (Dual Role Power) capable devices, such as USB Type-C docking stations, do not return any source capabilities unless they are plugged to a power supply themselves. Fixes: 1f4642b72be7 ("usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Cc: <stable@vger.kernel.org> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20220922145924.80667-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-27net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()Lukas Wunner
Commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") introduced a WARN() on resume from system sleep if a PHY is not in PHY_HALTED state. Commit 6dbe852c379f ("net: phy: Don't WARN for PHY_READY state in mdio_bus_phy_resume()") added an exemption for PHY_READY state from the WARN(). It turns out PHY_UP state needs to be exempted as well because the following may happen on suspend: mdio_bus_phy_suspend() phy_stop_machine() phydev->state = PHY_UP # if (phydev->state >= PHY_UP) Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/netdev/2b1a1588-505e-dff3-301d-bfc1fb14d685@samsung.com/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Xiaolei Wang <xiaolei.wang@windriver.com> Link: https://lore.kernel.org/r/8128fdb51eeebc9efbf3776a4097363a1317aaf1.1663905575.git.lukas@wunner.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-27Merge tag 'thunderbolt-for-v6.0' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus Mika writes: "thunderbolt: Fix for v6.0 final This includes a single fix from Mario that resets the plug event delay back to the USB4 spec value. This has been in linux-next with no reported issues." * tag 'thunderbolt-for-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: thunderbolt: Explicitly reset plug events delay back to USB4 spec value
2022-09-27net: stmmac: power up/down serdes in stmmac_open/releaseJunxiao Chang
This commit fixes DMA engine reset timeout issue in suspend/resume with ADLink I-Pi SMARC Plus board which dmesg shows: ... [ 54.678271] PM: suspend exit [ 54.754066] intel-eth-pci 0000:00:1d.2 enp0s29f2: PHY [stmmac-3:01] driver [Maxlinear Ethernet GPY215B] (irq=POLL) [ 54.755808] intel-eth-pci 0000:00:1d.2 enp0s29f2: Register MEM_TYPE_PAGE_POOL RxQ-0 ... [ 54.780482] intel-eth-pci 0000:00:1d.2 enp0s29f2: Register MEM_TYPE_PAGE_POOL RxQ-7 [ 55.784098] intel-eth-pci 0000:00:1d.2: Failed to reset the dma [ 55.784111] intel-eth-pci 0000:00:1d.2 enp0s29f2: stmmac_hw_setup: DMA engine initialization failed [ 55.784115] intel-eth-pci 0000:00:1d.2 enp0s29f2: stmmac_open: Hw setup failed ... The issue is related with serdes which impacts clock. There is serdes in ADLink I-Pi SMARC board ethernet controller. Please refer to commit b9663b7ca6ff78 ("net: stmmac: Enable SERDES power up/down sequence") for detial. When issue is reproduced, DMA engine clock is not ready because serdes is not powered up. To reproduce DMA engine reset timeout issue with hardware which has serdes in GBE controller, install Ubuntu. In Ubuntu GUI, click "Power Off/Log Out" -> "Suspend" menu, it disables network interface, then goes to sleep mode. When it wakes up, it enables network interface again. Stmmac driver is called in this way: 1. stmmac_release: Stop network interface. In this function, it disables DMA engine and network interface; 2. stmmac_suspend: It is called in kernel suspend flow. But because network interface has been disabled(netif_running(ndev) is false), it does nothing and returns directly; 3. System goes into S3 or S0ix state. Some time later, system is waken up by keyboard or mouse; 4. stmmac_resume: It does nothing because network interface has been disabled; 5. stmmac_open: It is called to enable network interace again. DMA engine is initialized in this API, but serdes is not power on so there will be DMA engine reset timeout issue. Similarly, serdes powerdown should be added in stmmac_release. Network interface might be disabled by cmd "ifconfig eth0 down", DMA engine, phy and mac have been disabled in ndo_stop callback, serdes should be powered down as well. It doesn't make sense that serdes is on while other components have been turned off. If ethernet interface is in enabled state(netif_running(ndev) is true) before suspend/resume, the issue couldn't be reproduced because serdes could be powered up in stmmac_resume. Because serdes_powerup is added in stmmac_open, it doesn't need to be called in probe function. Fixes: b9663b7ca6ff78 ("net: stmmac: Enable SERDES power up/down sequence") Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Reviewed-by: Voon Weifeng <weifeng.voon@intel.com> Tested-by: Jimmy JS Chen <jimmyjs.chen@adlinktech.com> Tested-by: Looi, Hong Aun <hong.aun.looi@intel.com> Link: https://lore.kernel.org/r/20220923050448.1220250-1-junxiao.chang@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-27wifi: mac80211: mlme: Fix double unlock on assoc success handlingRafael Mendonca
Commit 6911458dc428 ("wifi: mac80211: mlme: refactor assoc success handling") moved the per-link setup out of ieee80211_assoc_success() into a new function ieee80211_assoc_config_link() but missed to remove the unlock of 'sta_mtx' in case of HE capability/operation missing on HE AP, which leads to a double unlock: ieee80211_assoc_success() { ... ieee80211_assoc_config_link() { ... if (!(link->u.mgd.conn_flags & IEEE80211_CONN_DISABLE_HE) && (!elems->he_cap || !elems->he_operation)) { mutex_unlock(&sdata->local->sta_mtx); ... } ... } ... mutex_unlock(&sdata->local->sta_mtx); ... } Fixes: 6911458dc428 ("wifi: mac80211: mlme: refactor assoc success handling") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20220925143420.784975-1-rafaelmendsr@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27wifi: mac80211: mlme: Fix missing unlock on beacon RXRafael Mendonca
Commit 98b0b467466c ("wifi: mac80211: mlme: use correct link_sta") switched to link station instead of deflink and added some checks to do that, which are done with the 'sta_mtx' mutex held. However, the error path of these checks does not unlock 'sta_mtx' before returning. Fixes: 98b0b467466c ("wifi: mac80211: mlme: use correct link_sta") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20220924184042.778676-1-rafaelmendsr@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()Paweł Lenkow
During our testing of WFM200 module over SDIO on i.MX6Q-based platform, we discovered a memory corruption on the system, tracing back to the wfx driver. Using kfence, it was possible to trace it back to the root cause, which is hw->max_rates set to 8 in wfx_init_common, while the maximum defined by IEEE80211_TX_TABLE_SIZE is 4. This causes array out-of-bounds writes during updates of the rate table, as seen below: BUG: KFENCE: memory corruption in kfree_rcu_work+0x320/0x36c Corrupted memory at 0xe0a4ffe0 [ 0x03 0x03 0x03 0x03 0x01 0x00 0x00 0x02 0x02 0x02 0x09 0x00 0x21 0xbb 0xbb 0xbb ] (in kfence-#81): kfree_rcu_work+0x320/0x36c process_one_work+0x3ec/0x920 worker_thread+0x60/0x7a4 kthread+0x174/0x1b4 ret_from_fork+0x14/0x2c 0x0 kfence-#81: 0xe0a4ffc0-0xe0a4ffdf, size=32, cache=kmalloc-64 allocated by task 297 on cpu 0 at 631.039555s: minstrel_ht_update_rates+0x38/0x2b0 [mac80211] rate_control_tx_status+0xb4/0x148 [mac80211] ieee80211_tx_status_ext+0x364/0x1030 [mac80211] ieee80211_tx_status+0xe0/0x118 [mac80211] ieee80211_tasklet_handler+0xb0/0xe0 [mac80211] tasklet_action_common.constprop.0+0x11c/0x148 __do_softirq+0x1a4/0x61c irq_exit+0xcc/0x104 call_with_stack+0x18/0x20 __irq_svc+0x80/0xb0 wq_worker_sleeping+0x10/0x100 wq_worker_sleeping+0x10/0x100 schedule+0x50/0xe0 schedule_timeout+0x2e0/0x474 wait_for_completion+0xdc/0x1ec mmc_wait_for_req_done+0xc4/0xf8 mmc_io_rw_extended+0x3b4/0x4ec sdio_io_rw_ext_helper+0x290/0x384 sdio_memcpy_toio+0x30/0x38 wfx_sdio_copy_to_io+0x88/0x108 [wfx] wfx_data_write+0x88/0x1f0 [wfx] bh_work+0x1c8/0xcc0 [wfx] process_one_work+0x3ec/0x920 worker_thread+0x60/0x7a4 kthread+0x174/0x1b4 ret_from_fork+0x14/0x2c 0x0 After discussion on the wireless mailing list it was clarified that the issue has been introduced by: commit ee0e16ab756a ("mac80211: minstrel_ht: fill all requested rates") and fix shall be in minstrel_ht_update_rates in rc80211_minstrel_ht.c. Fixes: ee0e16ab756a ("mac80211: minstrel_ht: fill all requested rates") Link: https://lore.kernel.org/all/12e5adcd-8aed-f0f7-70cc-4fb7b656b829@camlingroup.com/ Link: https://lore.kernel.org/linux-wireless/20220915131445.30600-1-lech.perczak@camlingroup.com/ Cc: Jérôme Pouiller <jerome.pouiller@silabs.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Peter Seiderer <ps.report@gmx.net> Cc: Kalle Valo <kvalo@kernel.org> Cc: Krzysztof Drobiński <krzysztof.drobinski@camlingroup.com>, Signed-off-by: Paweł Lenkow <pawel.lenkow@camlingroup.com> Signed-off-by: Lech Perczak <lech.perczak@camlingroup.com> Reviewed-by: Peter Seiderer <ps.report@gmx.net> Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com> Acked-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27wifi: mac80211: fix regression with non-QoS driversHans de Goede
Commit 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port") changed ieee80211_tx_control_port() to aways call __ieee80211_select_queue() without checking local->hw.queues. __ieee80211_select_queue() returns a queue-id between 0 and 3, which means that now ieee80211_tx_control_port() may end up setting the queue-mapping for a skb to a value higher then local->hw.queues if local->hw.queues is less then 4. Specifically this is a problem for ralink rt2500-pci cards where local->hw.queues is 2. There this causes rt2x00queue_get_tx_queue() to return NULL and the following error to be logged: "ieee80211 phy0: rt2x00mac_tx: Error - Attempt to send packet over invalid queue 2", after which association with the AP fails. Other callers of __ieee80211_select_queue() skip calling it when local->hw.queues < IEEE80211_NUM_ACS, add the same check to ieee80211_tx_control_port(). This fixes ralink rt2500-pci and similar cards when less then 4 tx-queues no longer working. Fixes: 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port") Cc: Markus Theil <markus.theil@tu-ilmenau.de> Suggested-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220918192052.443529-1-hdegoede@redhat.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27wifi: mac80211: ensure vif queues are operational after startAlexander Wetzel
Make sure local->queue_stop_reasons and vif.txqs_stopped stay in sync. When a new vif is created the queues may end up in an inconsistent state and be inoperable: Communication not using iTXQ will work, allowing to e.g. complete the association. But the 4-way handshake will time out. The sta will not send out any skbs queued in iTXQs. All normal attempts to start the queues will fail when reaching this state. local->queue_stop_reasons will have marked all queues as operational but vif.txqs_stopped will still be set, creating an inconsistent internal state. In reality this seems to be race between the mac80211 function ieee80211_do_open() setting SDATA_STATE_RUNNING and the wake_txqs_tasklet: Depending on the driver and the timing the queues may end up to be operational or not. Cc: stable@vger.kernel.org Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped") Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220915130946.302803-1-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27wifi: mac80211: don't start TX with fq->lock to fix deadlockAlexander Wetzel
ieee80211_txq_purge() calls fq_tin_reset() and ieee80211_purge_tx_queue(); Both are then calling ieee80211_free_txskb(). Which can decide to TX the skb again. There are at least two ways to get a deadlock: 1) When we have a TDLS teardown packet queued in either tin or frags ieee80211_tdls_td_tx_handle() will call ieee80211_subif_start_xmit() while we still hold fq->lock. ieee80211_txq_enqueue() will thus deadlock. 2) A variant of the above happens if aggregation is up and running: In that case ieee80211_iface_work() will deadlock with the original task: The original tasks already holds fq->lock and tries to get sta->lock after kicking off ieee80211_iface_work(). But the worker can get sta->lock prior to the original task and will then spin for fq->lock. Avoid these deadlocks by not sending out any skbs when called via ieee80211_free_txskb(). Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20220915124120.301918-1-alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27wifi: cfg80211: fix MCS divisor valueTamizh Chelvam Raja
The Bitrate for HE/EHT MCS6 is calculated wrongly due to the incorrect MCS divisor value for mcs6. Fix it with the proper value. previous mcs_divisor value = (11769/6144) = 1.915527 fixed mcs_divisor value = (11377/6144) = 1.851725 Fixes: 9c97c88d2f4b ("cfg80211: Add support to calculate and report 4096-QAM HE rates") Signed-off-by: Tamizh Chelvam Raja <quic_tamizhr@quicinc.com> Link: https://lore.kernel.org/r/20220908181034.9936-1-quic_tamizhr@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27media: rkvdec: Disable H.264 error detectionNicolas Dufresne
Quite often, the HW get stuck in error condition if a stream error was detected. As documented, the HW should stop immediately and self reset. There is likely a problem or a miss-understanding of the self reset mechanism, as unless we make a long pause, the next command will then report an error even if there is no error in it. Disabling error detection fixes the issue, and let the decoder continue after an error. This patch is safe for backport into older kernels. Fixes: cd33c830448b ("media: rkvdec: Add the rkvdec driver") Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Tested-by: Brian Norris <briannorris@chromium.org> Reviewed-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-27media: mediatek: vcodec: Drop platform_get_resource(IORESOURCE_IRQ)Nícolas F. R. A. Prado
Commit a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT core") removed support for calling platform_get_resource(..., IORESOURCE_IRQ, ...) on DT-based drivers, but the probe() function of mtk-vcodec's encoder was still making use of it. This caused the encoder driver to fail probe. Since the platform_get_resource() call was only being used to check for the presence of the interrupt (its returned resource wasn't even used) and platform_get_irq() was already being used to get the IRQ, simply drop the use of platform_get_resource(IORESOURCE_IRQ) and handle the failure of platform_get_irq(), to get the driver probing again. [hverkuil: drop unused struct resource *res] Fixes: a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT core") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-27media: dvb_vb2: fix possible out of bound accessHangyu Hua
vb2_core_qbuf and vb2_core_querybuf don't check the range of b->index controlled by the user. Fix this by adding range checking code before using them. Fixes: 57868acc369a ("media: videobuf2: Add new uAPI for DVB streaming I/O") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-27media: v4l2-ioctl.c: fix incorrect error pathHans Verkuil
If allocating array_buf fails, or copying data from userspace into that buffer fails, then just free memory and return the error. Don't attempt to call video_put_user() since there is no point, and it would copy back data on error even if INFO_FL_ALWAYS_COPY wasn't set. So if writing the array back to userspace fails, then don't go to out_array_args, instead just continue with the regular code that just returns the error unless 'always_copy' is set. Update the VIDIOC_G/S/TRY_EXT_CTRLS ioctls to set the ALWAYS_COPY flag since they now need it. Before this worked due to this buggy code, but now that that is fixed these ioctls need to set this flag explicitly. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-27media: v4l2-compat-ioctl32.c: zero buffer passed to v4l2_compat_get_array_args()Hans Verkuil
The v4l2_compat_get_array_args() function can leave uninitialized memory in the buffer it is passed. So zero it before copying array elements from userspace into the buffer. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reported-by: syzbot+ff18193ff05f3f87f226@syzkaller.appspotmail.com Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-09-27nvme-pci: disable Write Zeroes on Phison E3C/E4CTina Hsu
E3C/E4C SSDs do support the Write Zeroes command in theory, but have very bad performance when using it. As the firmware has been frozen for these products we can not expect firmware improvements for it, so disable Write Zeroes. Signed-off-by: Tina Hsu <tina_hsu@phison.corp-partner.google.com> [hch: update the commit message] Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devicesMichael Kelley
The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are non-functional on NVMe devices because the nvme_pr_clear() and nvme_pr_release() functions set the IEKEY field incorrectly. The IEKEY field should be set only when the key is zero (i.e, not specified). The current code does it backwards. Furthermore, the NVMe spec describes the persistent reservation "clear" function as an option on the reservation release command. The current implementation of nvme_pr_clear() erroneously uses the reservation register command. Fix these errors. Note that NVMe version 1.3 and later specify that setting the IEKEY field will return an error of Invalid Field in Command. The fix will set IEKEY when the key is zero, which is appropriate as these ioctls consider a zero key to be "unspecified", and the intention of the spec change is to require a valid key. Tested on a version 1.4 PCI NVMe device in an Azure VM. Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code") Fixes: 1d277a637a71 ("NVMe: Add persistent reservation ops") Signed-off-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-09-27libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205Niklas Cassel
Commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") added an explicit entry for AMD Green Sardine AHCI controller using the board_ahci_mobile configuration (this configuration has later been renamed to board_ahci_low_power). The board_ahci_low_power configuration enables support for low power modes. This explicit entry takes precedence over the generic AHCI controller entry, which does not enable support for low power modes. Therefore, when commit 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") was backported to stable kernels, it make some Pioneer optical drives, which was working perfectly fine before the commit was backported, stop working. The real problem is that the Pioneer optical drives do not handle low power modes correctly. If these optical drives would have been tested on another AHCI controller using the board_ahci_low_power configuration, this issue would have been detected earlier. Unfortunately, the board_ahci_low_power configuration is only used in less than 15% of the total AHCI controller entries, so many devices have never been tested with an AHCI controller with low power modes. Fixes: 1527f69204fe ("ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile") Cc: stable@vger.kernel.org Reported-by: Jaap Berkhout <j.j.berkhout@staalenberk.nl> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2022-09-26Merge tag 'x86_urgent_for_v6.0-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Dave Hansen: - A performance fix for recent large AMD systems that avoids an ancient cpu idle hardware workaround - A new Intel model number. Folks like these upstream as soon as possible so that each developer doing feature development doesn't need to carry their own #define - SGX fixes for a userspace crash and a rare kernel warning * tag 'x86_urgent_for_v6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ACPI: processor idle: Practically limit "Dummy wait" workaround to old Intel systems x86/sgx: Handle VA page allocation failure for EAUG on PF. x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd x86/cpu: Add CPU model numbers for Meteor Lake
2022-09-26ARM: dts: integrator: Fix DMA rangesLinus Walleij
A recent change affecting the behaviour of phys_to_dma() to actually require the device tree ranges to work unmasked a bug in the Integrator DMA ranges. The PL110 uses the CMA allocator to obtain coherent allocations from a dedicated 1MB video memory, leading to the following call chain: drm_gem_cma_create() dma_alloc_attrs() dma_alloc_from_dev_coherent() __dma_alloc_from_coherent() dma_get_device_base() phys_to_dma() translate_phys_to_dma() phys_to_dma() by way of translate_phys_to_dma() will nowadays not provide 1:1 mappings unless the ranges are properly defined in the device tree and reflected into the dev->dma_range_map. There is a bug in the device trees because the DMA ranges are incorrectly specified, and the patch uncovers this bug. Solution: - Fix the LB (logic bus) ranges to be 1-to-1 like they should have always been. - Provide a 1:1 dma-ranges attribute to the PL110. - Mark the PL110 display controller as DMA coherent. This makes the DMA ranges work right and makes the PL110 framebuffer work again. Fixes: af6f23b88e95 ("ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220926073311.1610568-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-09-26efi: libstub: remove pointless goto kludgeArd Biesheuvel
Remove some goto cruft that serves no purpose and obfuscates the code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-26efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmapArd Biesheuvel
Currently, struct efi_boot_memmap is a struct that is passed around between callers of efi_get_memory_map() and the users of the resulting data, and which carries pointers to various variables whose values are provided by the EFI GetMemoryMap() boot service. This is overly complex, and it is much easier to carry these values in the struct itself. So turn the struct into one that carries these data items directly, including a flex array for the variable number of EFI memory descriptors that the boot service may return. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-26efi: libstub: avoid efi_get_memory_map() for allocating the virt mapArd Biesheuvel
The virt map is a set of efi_memory_desc_t descriptors that are passed to SetVirtualAddressMap() to inform the firmware about the desired virtual mapping of the regions marked as EFI_MEMORY_RUNTIME. The only reason we currently call the efi_get_memory_map() helper is that it gives us an allocation that is guaranteed to be of sufficient size. However, efi_get_memory_map() has grown some additional complexity over the years, and today, we're actually better off calling the EFI boot service directly with a zero size, which tells us how much memory should be enough for the virt map. While at it, avoid creating the VA map allocation if we will not be using it anyway, i.e., if efi_novamap is true. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-09-26Merge tag 'mm-hotfixes-stable-2022-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull last (?) hotfixes from Andrew Morton: "26 hotfixes. 8 are for issues which were introduced during this -rc cycle, 18 are for earlier issues, and are cc:stable" * tag 'mm-hotfixes-stable-2022-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (26 commits) x86/uaccess: avoid check_object_size() in copy_from_user_nmi() mm/page_isolation: fix isolate_single_pageblock() isolation behavior mm,hwpoison: check mm when killing accessing process mm/hugetlb: correct demote page offset logic mm: prevent page_frag_alloc() from corrupting the memory mm: bring back update_mmu_cache() to finish_fault() frontswap: don't call ->init if no ops are registered mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() mm: fix madivse_pageout mishandling on non-LRU page powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush mm: gup: fix the fast GUP race against THP collapse mm: fix dereferencing possible ERR_PTR vmscan: check folio_test_private(), not folio_get_private() mm: fix VM_BUG_ON in __delete_from_swap_cache() tools: fix compilation after gfp_types.h split mm/damon/dbgfs: fix memory leak when using debugfs_lookup() mm/migrate_device.c: copy pte dirty bit to page mm/migrate_device.c: add missing flush_cache_page() mm/migrate_device.c: flush TLB while holding PTL x86/mm: disable instrumentations of mm/pgprot.c ...
2022-09-26net: hippi: Add missing pci_disable_device() in rr_init_one()ruanjinjie
Add missing pci_disable_device() if rr_init_one() fails Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20220923094320.3109154-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probePeng Wu
The devm_ioremap() function returns NULL on error, it doesn't return error pointers. Fixes: 3a1a274e933f ("mlxbf_gige: compute MDIO period based on i1clk") Signed-off-by: Peng Wu <wupeng58@huawei.com> Link: https://lore.kernel.org/r/20220923023640.116057-1-wupeng58@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26cxgb4: fix missing unlock on ETHOFLD desc collect fail pathRafael Mendonca
The label passed to the QDESC_GET for the ETHOFLD TXQ, RXQ, and FLQ, is the 'out' one, which skips the 'out_unlock' label, and thus doesn't unlock the 'uld_mutex' before returning. Additionally, since commit 5148e5950c67 ("cxgb4: add EOTID tracking and software context dump"), the access to these ETHOFLD hardware queues should be protected by the 'mqprio_mutex' instead. Fixes: 2d0cb84dd973 ("cxgb4: add ETHOFLD hardware queue support") Fixes: 5148e5950c67 ("cxgb4: add EOTID tracking and software context dump") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Reviewed-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Link: https://lore.kernel.org/r/20220922175109.764898-1-rafaelmendsr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26Merge tag 'ext4_for_linus_fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull missed ext4 fix from Ted Ts'o: "Fix an potential unitialzied variable bug; this was a fixup that I had forgotten to apply before the last pull request for ext4. My bad" * tag 'ext4_for_linus_fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fixup possible uninitialized variable access in ext4_mb_choose_next_group_cr1()
2022-09-26net: sched: act_ct: fix possible refcount leak in tcf_ct_init()Hangyu Hua
nf_ct_put need to be called to put the refcount got by tcf_ct_fill_params to avoid possible refcount leak when tcf_ct_flow_table_get fails. Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Link: https://lore.kernel.org/r/20220923020046.8021-1-hbh25y@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26x86/uaccess: avoid check_object_size() in copy_from_user_nmi()Kees Cook
The check_object_size() helper under CONFIG_HARDENED_USERCOPY is designed to skip any checks where the length is known at compile time as a reasonable heuristic to avoid "likely known-good" cases. However, it can only do this when the copy_*_user() helpers are, themselves, inline too. Using find_vmap_area() requires taking a spinlock. The check_object_size() helper can call find_vmap_area() when the destination is in vmap memory. If show_regs() is called in interrupt context, it will attempt a call to copy_from_user_nmi(), which may call check_object_size() and then find_vmap_area(). If something in normal context happens to be in the middle of calling find_vmap_area() (with the spinlock held), the interrupt handler will hang forever. The copy_from_user_nmi() call is actually being called with a fixed-size length, so check_object_size() should never have been called in the first place. Given the narrow constraints, just replace the __copy_from_user_inatomic() call with an open-coded version that calls only into the sanitizers and not check_object_size(), followed by a call to raw_copy_from_user(). [akpm@linux-foundation.org: no instrument_copy_from_user() in my tree...] Link: https://lkml.kernel.org/r/20220919201648.2250764-1-keescook@chromium.org Link: https://lore.kernel.org/all/CAOUHufaPshtKrTWOz7T7QFYUNVGFm0JBjvM700Nhf9qEL9b3EQ@mail.gmail.com Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Yu Zhao <yuzhao@google.com> Reported-by: Florian Lehner <dev@der-flo.net> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Florian Lehner <dev@der-flo.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/page_isolation: fix isolate_single_pageblock() isolation behaviorZi Yan
set_migratetype_isolate() does not allow isolating MIGRATE_CMA pageblocks unless it is used for CMA allocation. isolate_single_pageblock() did not have the same behavior when it is used together with set_migratetype_isolate() in start_isolate_page_range(). This allows alloc_contig_range() with migratetype other than MIGRATE_CMA, like MIGRATE_MOVABLE (used by alloc_contig_pages()), to isolate first and last pageblock but fail the rest. The failure leads to changing migratetype of the first and last pageblock to MIGRATE_MOVABLE from MIGRATE_CMA, corrupting the CMA region. This can happen during gigantic page allocations. Like Doug said here: https://lore.kernel.org/linux-mm/a3363a52-883b-dcd1-b77f-f2bb378d6f2d@gmail.com/T/#u, for gigantic page allocations, the user would notice no difference, since the allocation on CMA region will fail as well as it did before. But it might hurt the performance of device drivers that use CMA, since CMA region size decreases. Fix it by passing migratetype into isolate_single_pageblock(), so that set_migratetype_isolate() used by isolate_single_pageblock() will prevent the isolation happening. Link: https://lkml.kernel.org/r/20220914023913.1855924-1-zi.yan@sent.com Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Doug Berger <opendmb@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Berger <opendmb@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm,hwpoison: check mm when killing accessing processShuai Xue
The GHES code calls memory_failure_queue() from IRQ context to queue work into workqueue and schedule it on the current CPU. Then the work is processed in memory_failure_work_func() by kworker and calls memory_failure(). When a page is already poisoned, commit a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") make memory_failure() call kill_accessing_process() that: - holds mmap locking of current->mm - does pagetable walk to find the error virtual address - and sends SIGBUS to the current process with error info. However, the mm of kworker is not valid, resulting in a null-pointer dereference. So check mm when killing the accessing process. [akpm@linux-foundation.org: remove unrelated whitespace alteration] Link: https://lkml.kernel.org/r/20220914064935.7851-1-xueshuai@linux.alibaba.com Fixes: a3f5d80ea401 ("mm,hwpoison: send SIGBUS with error virutal address") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Bixuan Cui <cuibixuan@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/hugetlb: correct demote page offset logicDoug Berger
With gigantic pages it may not be true that struct page structures are contiguous across the entire gigantic page. The nth_page macro is used here in place of direct pointer arithmetic to correct for this. Mike said: : This error could cause addressing exceptions. However, this is only : possible in configurations where CONFIG_SPARSEMEM && : !CONFIG_SPARSEMEM_VMEMMAP. Such a configuration option is rare and : unknown to be the default anywhere. Link: https://lkml.kernel.org/r/20220914190917.3517663-1-opendmb@gmail.com Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support") Signed-off-by: Doug Berger <opendmb@gmail.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: prevent page_frag_alloc() from corrupting the memoryMaurizio Lombardi
A number of drivers call page_frag_alloc() with a fragment's size > PAGE_SIZE. In low memory conditions, __page_frag_cache_refill() may fail the order 3 cache allocation and fall back to order 0; In this case, the cache will be smaller than the fragment, causing memory corruptions. Prevent this from happening by checking if the newly allocated cache is large enough for the fragment; if not, the allocation will fail and page_frag_alloc() will return NULL. Link: https://lkml.kernel.org/r/20220715125013.247085-1-mlombard@redhat.com Fixes: b63ae8ca096d ("mm/net: Rename and move page fragment handling from net/ to mm/") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Cc: Chen Lin <chen45464546@163.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: bring back update_mmu_cache() to finish_fault()Sergei Antonov
Running this test program on ARMv4 a few times (sometimes just once) reproduces the bug. int main() { unsigned i; char paragon[SIZE]; void* ptr; memset(paragon, 0xAA, SIZE); ptr = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0); if (ptr == MAP_FAILED) return 1; printf("ptr = %p\n", ptr); for (i=0;i<10000;i++){ memset(ptr, 0xAA, SIZE); if (memcmp(ptr, paragon, SIZE)) { printf("Unexpected bytes on iteration %u!!!\n", i); break; } } munmap(ptr, SIZE); } In the "ptr" buffer there appear runs of zero bytes which are aligned by 16 and their lengths are multiple of 16. Linux v5.11 does not have the bug, "git bisect" finds the first bad commit: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Before the commit update_mmu_cache() was called during a call to filemap_map_pages() as well as finish_fault(). After the commit finish_fault() lacks it. Bring back update_mmu_cache() to finish_fault() to fix the bug. Also call update_mmu_tlb() only when returning VM_FAULT_NOPAGE to more closely reproduce the code of alloc_set_pte() function that existed before the commit. On many platforms update_mmu_cache() is nop: x86, see arch/x86/include/asm/pgtable ARMv6+, see arch/arm/include/asm/tlbflush.h So, it seems, few users ran into this bug. Link: https://lkml.kernel.org/r/20220908204809.2012451-1-saproj@gmail.com Fixes: f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault() codepaths") Signed-off-by: Sergei Antonov <saproj@gmail.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26frontswap: don't call ->init if no ops are registeredChristoph Hellwig
If no frontswap module (i.e. zswap) was registered, frontswap_ops will be NULL. In such situation, swapon crashes with the following stack trace: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4fab000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 96000004 [#1] SMP Modules linked in: zram fsl_dpaa2_eth pcs_lynx phylink ahci_qoriq crct10dif_ce ghash_ce sbsa_gwdt fsl_mc_dpio nvme lm90 nvme_core at803x xhci_plat_hcd rtc_fsl_ftm_alarm xgmac_mdio ahci_platform i2c_imx ip6_tables ip_tables fuse Unloaded tainted modules: cppc_cpufreq():1 CPU: 10 PID: 761 Comm: swapon Not tainted 6.0.0-rc2-00454-g22100432cf14 #1 Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022 pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : frontswap_init+0x38/0x60 lr : __do_sys_swapon+0x8a8/0x9f4 sp : ffff80000969bcf0 x29: ffff80000969bcf0 x28: ffff37bee0d8fc00 x27: ffff80000a7f5000 x26: fffffcdefb971e80 x25: ffffaba797453b90 x24: 0000000000000064 x23: ffff37c1f209d1a8 x22: ffff37bee880e000 x21: ffffaba797748560 x20: ffff37bee0d8fce4 x19: ffffaba797748488 x18: 0000000000000014 x17: 0000000030ec029a x16: ffffaba795a479b0 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000001 x11: ffff37c63c0aba18 x10: 0000000000000000 x9 : ffffaba7956b8c88 x8 : ffff80000969bcd0 x7 : 0000000000000000 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffffaba79730f000 x2 : ffff37bee0d8fc00 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: frontswap_init+0x38/0x60 __do_sys_swapon+0x8a8/0x9f4 __arm64_sys_swapon+0x28/0x3c invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0xd4/0xf4 do_el0_svc+0x38/0x4c el0_svc+0x34/0x10c el0t_64_sync_handler+0x11c/0x150 el0t_64_sync+0x190/0x194 Code: d000e283 910003fd f9006c41 f946d461 (f9400021) ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20220909130829.3262926-1-hch@lst.de Fixes: 1da0d94a3ec8 ("frontswap: remove support for multiple ops") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()Naoya Horiguchi
NULL pointer dereference is triggered when calling thp split via debugfs on the system with offlined memory blocks. With debug option enabled, the following kernel messages are printed out: page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000 flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff) raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: unmovable page page:000000007d7ab72e is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1248! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 16 PID: 20964 Comm: bash Tainted: G I 6.0.0-rc3-foll-numa+ #41 ... RIP: 0010:split_huge_pages_write+0xcf4/0xe30 This shows that page_to_nid() in page_zone() is unexpectedly called for an offlined memmap. Use pfn_to_online_page() to get struct page in PFN walker. Link: https://lkml.kernel.org/r/20220908041150.3430269-1-naoya.horiguchi@linux.dev Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: fix madivse_pageout mishandling on non-LRU pageMinchan Kim
MADV_PAGEOUT tries to isolate non-LRU pages and gets a warning from isolate_lru_page below. Fix it by checking PageLRU in advance. ------------[ cut here ]------------ trying to isolate tail page WARNING: CPU: 0 PID: 6175 at mm/folio-compat.c:158 isolate_lru_page+0x130/0x140 Modules linked in: CPU: 0 PID: 6175 Comm: syz-executor.0 Not tainted 5.18.12 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:isolate_lru_page+0x130/0x140 Link: https://lore.kernel.org/linux-mm/485f8c33.2471b.182d5726afb.Coremail.hantianshuo@iie.ac.cn/ Link: https://lkml.kernel.org/r/20220908151204.762596-1-minchan@kernel.org Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT") Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: 韩天ç`• <hantianshuo@iie.ac.cn> Suggested-by: Yang Shi <shy828301@gmail.com> Acked-by: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flushYang Shi
The IPI broadcast is used to serialize against fast-GUP, but fast-GUP will move to use RCU instead of disabling local interrupts in fast-GUP. Using an IPI is the old-styled way of serializing against fast-GUP although it still works as expected now. And fast-GUP now fixed the potential race with THP collapse by checking whether PMD is changed or not. So IPI broadcast in radix pmd collapse flush is not necessary anymore. But it is still needed for hash TLB. Link: https://lkml.kernel.org/r/20220907180144.555485-2-shy828301@gmail.com Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26mm: gup: fix the fast GUP race against THP collapseYang Shi
Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer sufficient to handle concurrent GUP-fast in all cases, it only handles traditional IPI-based GUP-fast correctly. On architectures that send an IPI broadcast on TLB flush, it works as expected. But on the architectures that do not use IPI to broadcast TLB flush, it may have the below race: CPU A CPU B THP collapse fast GUP gup_pmd_range() <-- see valid pmd gup_pte_range() <-- work on pte pmdp_collapse_flush() <-- clear pmd and flush __collapse_huge_page_isolate() check page pinned <-- before GUP bump refcount pin the page check PTE <-- no change __collapse_huge_page_copy() copy data to huge page ptep_clear() install huge pmd for the huge page return the stale page discard the stale page The race can be fixed by checking whether PMD is changed or not after taking the page pin in fast GUP, just like what it does for PTE. If the PMD is changed it means there may be parallel THP collapse, so GUP should back off. Also update the stale comment about serializing against fast GUP in khugepaged. Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()") Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-09-26usbnet: Fix memory leak in usbnet_disconnect()Peilin Ye
Currently usbnet_disconnect() unanchors and frees all deferred URBs using usb_scuttle_anchored_urbs(), which does not free urb->context, causing a memory leak as reported by syzbot. Use a usb_get_from_anchor() while loop instead, similar to what we did in commit 19cfe912c37b ("Bluetooth: btusb: Fix memory leak in play_deferred"). Also free urb->sg. Reported-and-tested-by: syzbot+dcd3e13cf4472f2e0ba1@syzkaller.appspotmail.com Fixes: 69ee472f2706 ("usbnet & cdc-ether: Autosuspend for online devices") Fixes: 638c5115a794 ("USBNET: support DMA SG") Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Link: https://lore.kernel.org/r/20220923042551.2745-1-yepeilin.cs@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26io_uring: register single issuer task at creationDylan Yudaken
Instead of picking the task from the first submitter task, rather use the creator task or in the case of disabled (IORING_SETUP_R_DISABLED) the enabling task. This approach allows a lot of simplification of the logic here. This removes init logic from the submission path, which can always be a bit confusing, but also removes the need for locking to write (or read) the submitter_task. Users that want to move a ring before submitting can create the ring disabled and then enable it on the submitting task. Signed-off-by: Dylan Yudaken <dylany@fb.com> Fixes: 97bbdc06a444 ("io_uring: add IORING_SETUP_SINGLE_ISSUER") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-26ext4: fixup possible uninitialized variable access in ↵Jan Kara
ext4_mb_choose_next_group_cr1() Variable 'grp' may be left uninitialized if there's no group with suitable average fragment size (or larger). Fix the problem by initializing it earlier. Link: https://lore.kernel.org/r/20220922091542.pkhedytey7wzp5fi@quack3 Fixes: 83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree") Cc: stable@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-09-26Revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"Sasha Levin
This reverts commit fe2c9c61f668cde28dac2b188028c5299cedcc1e. On Tue, Sep 13, 2022 at 05:48:58PM +0100, Russell King (Oracle) wrote: >What happens if this is built as a module, and the module is loaded, >binds (and creates the directory), then is removed, and then re- >inserted? Nothing removes the old directory, so doesn't >debugfs_create_dir() fail, resulting in subsequent failure to add >any subsequent debugfs entries? > >I don't think this patch should be backported to stable trees until >this point is addressed. Revert until a proper fix is available as the original behavior was better. Cc: Marcin Wojtas <mw@semihalf.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: stable@kernel.org Reported-by: Russell King <linux@armlinux.org.uk> Fixes: fe2c9c61f668 ("net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()") Signed-off-by: Sasha Levin <sashal@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20220923234736.657413-1-sashal@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26drm/i915/gt: Restrict forced preemption to the active contextChris Wilson
When we submit a new pair of contexts to ELSP for execution, we start a timer by which point we expect the HW to have switched execution to the pending contexts. If the promotion to the new pair of contexts has not occurred, we declare the executing context to have hung and force the preemption to take place by resetting the engine and resubmitting the new contexts. This can lead to an unfair situation where almost all of the preemption timeout is consumed by the first context which just switches into the second context immediately prior to the timer firing and triggering the preemption reset (assuming that the timer interrupts before we process the CS events for the context switch). The second context hasn't yet had a chance to yield to the incoming ELSP (and send the ACk for the promotion) and so ends up being blamed for the reset. If we see that a context switch has occurred since setting the preemption timeout, but have not yet received the ACK for the ELSP promotion, rearm the preemption timer and check again. This is especially significant if the first context was not schedulable and so we used the shortest timer possible, greatly increasing the chance of accidentally blaming the second innocent context. Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption") Fixes: d12acee84ffb ("drm/i915/execlists: Cancel banned contexts on schedule-out") Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> Tested-by: Andrzej Hajda <andrzej.hajda@intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220921135258.1714873-1-andrzej.hajda@intel.com (cherry picked from commit 107ba1a2c705f4358f2602ec2f2fd821bb651f42) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2022-09-26perf tests powerpc: Fix branch stack sampling test to include sanity check ↵Athira Rajeev
for branch filter Commit b55878c90ab92a24 ("perf test: Add test for branch stack sampling") added test for branch stack sampling. There is a sanity check in the beginning to skip the test if the hardware doesn't support branch stack sampling. Snippet <<>> skip the test if the hardware doesn't support branch stack sampling perf record -b -o- -B true > /dev/null 2>&1 || exit 2 <<>> But the testcase also uses branch sample types: save_type, any. if any platform doesn't support the branch filters used in the test, the testcase will fail. In powerpc, currently mutliple branch filters are not supported and hence this test fails in powerpc. Fix the sanity check to look at the support for branch filters used in this test before proceeding with the test. Fixes: b55878c90ab92a24 ("perf test: Add test for branch stack sampling") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Reviewed-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20220921145255.20972-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>