summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-08-20libbpf: Fix map index used in error messageToke Høiland-Jørgensen
The error message emitted by bpf_object__init_user_btf_maps() was using the wrong section ID. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200819110534.9058-1-toke@redhat.com
2020-08-20powerpc/pseries: Do not initiate shutdown when system is running on UPSVasant Hegde
As per PAPR we have to look for both EPOW sensor value and event modifier to identify the type of event and take appropriate action. In LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes": SYSTEM_SHUTDOWN 3 The system must be shut down. An EPOW-aware OS logs the EPOW error log information, then schedules the system to be shut down to begin after an OS defined delay internal (default is 10 minutes.) Then in section 10.3.2.2.8 there is table 146 "Platform Event Log Format, Version 6, EPOW Section", which includes the "EPOW Event Modifier": For EPOW sensor value = 3 0x01 = Normal system shutdown with no additional delay 0x02 = Loss of utility power, system is running on UPS/Battery 0x03 = Loss of system critical functions, system should be shutdown 0x04 = Ambient temperature too high All other values = reserved We have a user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS. Once it gets an event it initiates shutdown after predefined time. It also starts monitoring for any new EPOW events. If it receives "Power restored" event before predefined time it will cancel the shutdown. Otherwise after predefined time it will shutdown the system. Commit 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") changed our handling of the "on UPS/Battery" case, to immediately shutdown the system. This breaks existing setups that rely on the userspace tool to delay shutdown and let the system run on the UPS. Fixes: 79872e35469b ("powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [mpe: Massage change log and add PAPR references] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com
2020-08-20io_uring: kill extra iovec=NULL in import_iovec()Pavel Begunkov
If io_import_iovec() returns an error, return iovec is undefined and must not be used, so don't set it to NULL when failing. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: comment on kfree(iovec) checksPavel Begunkov
kfree() handles NULL pointers well, but io_{read,write}() checks it because of performance reasons. Leave a comment there for those who are tempted to patch it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20io_uring: fix racy req->flags modificationPavel Begunkov
Setting and clearing REQ_F_OVERFLOW in io_uring_cancel_files() and io_cqring_overflow_flush() are racy, because they might be called asynchronously. REQ_F_OVERFLOW flag in only needed for files cancellation, so if it can be guaranteed that requests _currently_ marked inflight can't be overflown, the problem will be solved with removing the flag altogether. That's how the patch works, it removes inflight status of a request in io_cqring_fill_event() whenever it should be thrown into CQ-overflow list. That's Ok to do, because no opcode specific handling can be done after io_cqring_fill_event(), the same assumption as with "struct io_completion" patches. And it already have a good place for such cleanups, which is io_clean_op(). A nice side effect of this is removing this inflight check from the hot path. note on synchronisation: now __io_cqring_fill_event() may be taking two spinlocks simultaneously, completion_lock and inflight_lock. It's fine, because we never do that in reverse order, and CQ-overflow of inflight requests shouldn't happen often. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-20Revert "RDMA/hns: Reserve one sge in order to avoid local length error"Weihang Li
This patch caused some issues on SEND operation, and it should be reverted to make the drivers work correctly. There will be a better solution that has been tested carefully to solve the original problem. This reverts commit 711195e57d341e58133d92cf8aaab1db24e4768d. Fixes: 711195e57d34 ("RDMA/hns: Reserve one sge in order to avoid local length error") Link: https://lore.kernel.org/r/1597829984-20223-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20RDMA/hfi1: Correct an interlock issue for TID RDMA WRITE requestKaike Wan
The following message occurs when running an AI application with TID RDMA enabled: hfi1 0000:7f:00.0: hfi1_0: [QP74] hfi1_tid_timeout 4084 hfi1 0000:7f:00.0: hfi1_0: [QP70] hfi1_tid_timeout 4084 The issue happens when TID RDMA WRITE request is followed by an IB_WR_RDMA_WRITE_WITH_IMM request, the latter could be completed first on the responder side. As a result, no ACK packet for the latter could be sent because the TID RDMA WRITE request is still being processed on the responder side. When the TID RDMA WRITE request is eventually completed, the requester will wait for the IB_WR_RDMA_WRITE_WITH_IMM request to be acknowledged. If the next request is another TID RDMA WRITE request, no TID RDMA WRITE DATA packet could be sent because the preceding IB_WR_RDMA_WRITE_WITH_IMM request is not completed yet. Consequently the IB_WR_RDMA_WRITE_WITH_IMM will be retried but it will be ignored on the responder side because the responder thinks it has already been completed. Eventually the retry will be exhausted and the qp will be put into error state on the requester side. On the responder side, the TID resource timer will eventually expire because no TID RDMA WRITE DATA packets will be received for the second TID RDMA WRITE request. There is also risk of a write-after-write memory corruption due to the issue. Fix by adding a requester side interlock to prevent any potential data corruption and TID RDMA protocol error. Fixes: a0b34f75ec20 ("IB/hfi1: Add interlock between a TID RDMA request and other requests") Link: https://lore.kernel.org/r/20200811174931.191210.84093.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> # 5.4.x+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20RDMA/bnxt_re: Do not add user qps to flushlistSelvin Xavier
Driver shall add only the kernel qps to the flush list for clean up. During async error events from the HW, driver is adding qps to this list without checking if the qp is kernel qp or not. Add a check to avoid user qp addition to the flush list. Fixes: 942c9b6ca8de ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing") Fixes: c50866e2853a ("bnxt_re: fix the regression due to changes in alloc_pbl") Link: https://lore.kernel.org/r/1596689148-4023-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20RDMA/core: Fix spelling mistake "Could't" -> "Couldn't"Colin Ian King
There is a spelling mistake in a pr_warn message. Fix it. Link: https://lore.kernel.org/r/20200810075824.46770-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-20powerpc/perf: Fix soft lockups due to missed interrupt accountingAthira Rajeev
Performance monitor interrupt handler checks if any counter has overflown and calls record_and_restart() in core-book3s which invokes perf_event_overflow() to record the sample information. Apart from creating sample, perf_event_overflow() also does the interrupt and period checks via perf_event_account_interrupt(). Currently we record information only if the SIAR (Sampled Instruction Address Register) valid bit is set (using siar_valid() check) and hence the interrupt check. But it is possible that we do sampling for some events that are not generating valid SIAR, and hence there is no chance to disable the event if interrupts are more than max_samples_per_tick. This leads to soft lockup. Fix this by adding perf_event_account_interrupt() in the invalid SIAR code path for a sampling event. ie if SIAR is invalid, just do interrupt check and don't record the sample information. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com
2020-08-20Documentation: efi: remove description of efi=old_mapArd Biesheuvel
The old EFI runtime region mapping logic that was kept around for some time has finally been removed entirely, along with the SGI UV1 support code that was its last remaining user. So remove any mention of the efi=old_map command line parameter from the docs. Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi/x86: Move 32-bit code into efi_32.cArd Biesheuvel
Now that the old memmap code has been removed, some code that was left behind in arch/x86/platform/efi/efi.c is only used for 32-bit builds, which means it can live in efi_32.c as well. So move it over. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi/libstub: Handle unterminated cmdlineArvind Sankar
Make the command line parsing more robust, by handling the case it is not NUL-terminated. Use strnlen instead of strlen, and make sure that the temporary copy is NUL-terminated before parsing. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200813185811.554051-4-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi/libstub: Handle NULL cmdlineArvind Sankar
Treat a NULL cmdline the same as empty. Although this is unlikely to happen in practice, the x86 kernel entry does check for NULL cmdline and handles it, so do it here as well. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200729193300.598448-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi/libstub: Stop parsing arguments at "--"Arvind Sankar
Arguments after "--" are arguments for init, not for the kernel. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200725155916.1376773-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi: add missed destroy_workqueue when efisubsys_init failsLi Heng
destroy_workqueue() should be called to destroy efi_rts_wq when efisubsys_init() init resources fails. Cc: <stable@vger.kernel.org> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Link: https://lore.kernel.org/r/1595229738-10087-1-git-send-email-liheng40@huawei.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20efi/x86: Mark kernel rodata non-executable for mixed modeArvind Sankar
When remapping the kernel rodata section RO in the EFI pagetables, the protection flags that were used for the text section are being reused, but the rodata section should not be marked executable. Cc: <stable@vger.kernel.org> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://lore.kernel.org/r/20200717194526.3452089-1-nivedita@alum.mit.edu Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-08-20opp: Enable resources again if they were disabled earlierRajendra Nayak
dev_pm_opp_set_rate() can now be called with freq = 0 in order to either drop performance or bandwidth votes or to disable regulators on platforms which support them. In such cases, a subsequent call to dev_pm_opp_set_rate() with the same frequency ends up returning early because 'old_freq == freq' Instead make it fall through and put back the dropped performance and bandwidth votes and/or enable back the regulators. Cc: v5.3+ <stable@vger.kernel.org> # v5.3+ Fixes: cd7ea582866f ("opp: Make dev_pm_opp_set_rate() handle freq = 0 to drop performance votes") Reported-by: Sajida Bhanu <sbhanu@codeaurora.org> Reviewed-by: Sibi Sankar <sibis@codeaurora.org> Reported-by: Matthias Kaehlcke <mka@chromium.org> Tested-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org> [ Viresh: Don't skip clk_set_rate() and massaged changelog ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-08-20Fix build error when CONFIG_ACPI is not set/enabled:Randy Dunlap
../arch/x86/pci/xen.c: In function ‘pci_xen_init’: ../arch/x86/pci/xen.c:410:2: error: implicit declaration of function ‘acpi_noirq_set’; did you mean ‘acpi_irq_get’? [-Werror=implicit-function-declaration] acpi_noirq_set(); Fixes: 88e9ca161c13 ("xen/pci: Use acpi_noirq_set() helper to avoid #ifdef") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Cc: linux-pci@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-20efi: avoid error message when booting under XenJuergen Gross
efifb_probe() will issue an error message in case the kernel is booted as Xen dom0 from UEFI as EFI_MEMMAP won't be set in this case. Avoid that message by calling efi_mem_desc_lookup() only if EFI_MEMMAP is set. Fixes: 38ac0287b7f4 ("fbdev/efifb: Honour UEFI memory map attributes when mapping the FB") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-19Merge tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fixes from Alex Williamson: - Fix lockdep issue reported for recursive read-lock (Alex Williamson) - Fix missing unwind in type1 replay function (Alex Williamson) * tag 'vfio-v5.9-rc2' of git://github.com/awilliam/linux-vfio: vfio/type1: Add proper error unwind for vfio_iommu_replay() vfio-pci: Avoid recursive read-lock usage
2020-08-19Revert "drm/amdgpu: disable gfxoff for navy_flounder"Jiansong Chen
This reverts commit 9c9b17a7d19a8e21db2e378784fff1128b46c9d3. Newly released sdma fw (51.52) provides a fix for the issue. Signed-off-by: Jiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-20powerpc/powernv/pci: Fix possible crash when releasing DMA resourcesFrederic Barrat
Fix a typo introduced during recent code cleanup, which could lead to silently not freeing resources or an oops message (on PCI hotplug or CAPI reset). Only impacts ioda2, the code path for ioda1 is correct. Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200819130741.16769-1-fbarrat@linux.ibm.com
2020-08-19net: gemini: Fix missing free_netdev() in error path of ↵Wang Hai
gemini_ethernet_port_probe() Replace alloc_etherdev_mq with devm_alloc_etherdev_mqs. In this way, when probe fails, netdev can be freed automatically. Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19net: atlantic: Use readx_poll_timeout() for large timeoutSebastian Andrzej Siewior
Commit 8dcf2ad39fdb2 ("net: atlantic: add hwmon getter for MAC temperature") implemented a read callback with an udelay(10000U). This fails to compile on ARM because the delay is >1ms. I doubt that it is needed to spin for 10ms even if possible on x86. >From looking at the code, the context appears to be preemptible so using usleep() should work and avoid busy spinning. Use readx_poll_timeout() in the poll loop. Fixes: 8dcf2ad39fdb2 ("net: atlantic: add hwmon getter for MAC temperature") Cc: Mark Starovoytov <mstarovoitov@marvell.com> Cc: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19ptp: ptp_clockmatrix: use i2c_master_send for i2c writeMin Li
The old code for i2c write would break on some controllers, which fails at handling Repeated Start Condition. So we will just use i2c_master_send to handle write in one transanction. Changes since v1: - Remove indentation change Signed-off-by: Min Li <min.li.xe@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19netlink: fix state reallocation in policy exportJohannes Berg
Evidently, when I did this previously, we didn't have more than 10 policies and didn't run into the reallocation path, because it's missing a memset() for the unused policies. Fix that. Fixes: d07dcf9aadd6 ("netlink: add infrastructure to expose policies to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19Merge branch 'Bug-fixes-for-ENA-ethernet-driver'David S. Miller
Shay Agroskin says: ==================== Bug fixes for ENA ethernet driver This series adds the following: - Fix undesired call to ena_restore after returning from suspend - Fix condition inside a WARN_ON - Fix overriding previous value when updating missed_tx statistic v1->v2: - fix bug when calling reset routine after device resources are freed (Jakub) v2->v3: - fix wrong hash in 'Fixes' tag ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19net: ena: Make missed_tx stat incrementalShay Agroskin
Most statistics in ena driver are incremented, meaning that a stat's value is a sum of all increases done to it since driver/queue initialization. This patch makes all statistics this way, effectively making missed_tx statistic incremental. Also added a comment regarding rx_drops and tx_drops to make it clearer how these counters are calculated. Fixes: 11095fdb712b ("net: ena: add statistics for missed tx packets") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19net: ena: Change WARN_ON expression in ena_del_napi_in_range()Shay Agroskin
The ena_del_napi_in_range() function unregisters the napi handler for rings in a given range. This function had the following WARN_ON macro: WARN_ON(ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring); This macro prints the call stack if the expression inside of it is true [1], but the expression inside of it is the wanted situation. The expression checks whether the ring has an XDP queue and its index corresponds to a XDP one. This patch changes the expression to !ENA_IS_XDP_INDEX(adapter, i) && adapter->ena_napi[i].xdp_ring which indicates an unwanted situation. Also, change the structure of the function. The napi handler is unregistered for all rings, and so there's no need to check whether the index is an XDP index or not. By removing this check the code becomes much more readable. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19net: ena: Prevent reset after device destructionShay Agroskin
The reset work is scheduled by the timer routine whenever it detects that a device reset is required (e.g. when a keep_alive signal is missing). When releasing device resources in ena_destroy_device() the driver cancels the scheduling of the timer routine without destroying the reset work explicitly. This creates the following bug: The driver is suspended and the ena_suspend() function is called -> This function calls ena_destroy_device() to free the net device resources -> The driver waits for the timer routine to finish its execution and then cancels it, thus preventing from it to be called again. If, in its final execution, the timer routine schedules a reset, the reset routine might be called afterwards,and a redundant call to ena_restore_device() would be made. By changing the reset routine we allow it to read the device's state accurately. This is achieved by checking whether ENA_FLAG_TRIGGER_RESET flag is set before resetting the device and making both the destruction function and the flag check are under rtnl lock. The ENA_FLAG_TRIGGER_RESET is cleared at the end of the destruction routine. Also surround the flag check with 'likely' because we expect that the reset routine would be called only when ENA_FLAG_TRIGGER_RESET flag is set. The destruction of the timer and reset services in __ena_shutoff() have to stay, even though the timer routine is destroyed in ena_destroy_device(). This is to avoid a case in which the reset routine is scheduled after free_netdev() in __ena_shutoff(), which would create an access to freed memory in adapter->flags. Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-19of: address: Work around missing device_type property in pcie nodesMarc Zyngier
Recent changes to the DT PCI bus parsing made it mandatory for device tree nodes describing a PCI controller to have the 'device_type = "pci"' property for the node to be matched. Although this follows the letter of the specification, it breaks existing device-trees that have been working fine for years. Rockchip rk3399-based systems are a prime example of such collateral damage, and have stopped discovering their PCI bus. In order to paper over it, let's add a workaround to the code matching the device type, and accept as PCI any node that is named "pcie", A warning will hopefully nudge the user into updating their DT to a fixed version if they can, but the incentive is obviously pretty small. Fixes: 2f96593ecc37 ("of_address: Add bus type match for pci ranges parser") Suggested-by: Rob Herring <robh+dt@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200819094255.474565-1-maz@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2020-08-19dt: writing-schema: Miscellaneous grammar fixesGeert Uytterhoeven
- Add missing verb, - Fix accidental plural. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20200819124850.20543-1-geert+renesas@glider.be
2020-08-19lib/string.c: Use freestanding environmentArvind Sankar
gcc can transform the loop in a naive implementation of memset/memcpy etc into a call to the function itself. This optimization is enabled by -ftree-loop-distribute-patterns. This has been the case for a while, but gcc-10.x enables this option at -O2 rather than -O3 as in previous versions. Add -ffreestanding, which implicitly disables this optimization with gcc. It is unclear whether clang performs such optimizations, but hopefully it will also not do so in a freestanding environment. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19x86/boot/compressed: Use builtin mem functions for decompressorArvind Sankar
Since commits c041b5ad8640 ("x86, boot: Create a separate string.h file to provide standard string functions") fb4cac573ef6 ("x86, boot: Move memcmp() into string.h and string.c") the decompressor stub has been using the compiler's builtin memcpy, memset and memcmp functions, _except_ where it would likely have the largest impact, in the decompression code itself. Remove the #undef's of memcpy and memset in misc.c so that the decompressor code also uses the compiler builtins. The rationale given in the comment doesn't really apply: just because some functions use the out-of-line version is no reason to not use the builtin version in the rest. Replace the comment with an explanation of why memzero and memmove are being #define'd. Drop the suggestion to #undef in boot/string.h as well: the out-of-line versions are not really optimized versions, they're generic code that's good enough for the preboot environment. The compiler will likely generate better code for constant-size memcpy/memset/memcmp if it is allowed to. Most decompressors' performance is unchanged, with the exception of LZ4 and 64-bit ZSTD. Before After ARCH LZ4 73ms 10ms 32 LZ4 120ms 10ms 64 ZSTD 90ms 74ms 64 Measurements on QEMU on 2.2GHz Broadwell Xeon, using defconfig kernels. Decompressor code size has small differences, with the largest being that 64-bit ZSTD decreases just over 2k. The largest code size increase was on 64-bit XZ, of about 400 bytes. Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu> Suggested-by: Nick Terrell <nickrterrell@gmail.com> Tested-by: Nick Terrell <nickrterrell@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-19io_uring: use system_unbound_wq for ring exit workJens Axboe
We currently use system_wq, which is unbounded in terms of number of workers. This means that if we're exiting tons of rings at the same time, then we'll briefly spawn tons of event kworkers just for a very short blocking time as the rings exit. Use system_unbound_wq instead, which has a sane cap on the concurrency level. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-19ext4: limit the length of per-inode prealloc listbrookxu
In the scenario of writing sparse files, the per-inode prealloc list may be very long, resulting in high overhead for ext4_mb_use_preallocated(). To circumvent this problem, we limit the maximum length of per-inode prealloc list to 512 and allow users to modify it. After patching, we observed that the sys ratio of cpu has dropped, and the system throughput has increased significantly. We created a process to write the sparse file, and the running time of the process on the fixed kernel was significantly reduced, as follows: Running time on unfixed kernel: [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat real 0m2.051s user 0m0.008s sys 0m2.026s Running time on fixed kernel: [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat real 0m0.471s user 0m0.004s sys 0m0.395s Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: reorganize if statement of ext4_mb_release_context()brookxu
Reorganize the if statement of ext4_mb_release_context(), make it easier to read. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/5439ac6f-db79-ad68-76c1-a4dda9aa0cc3@gmail.com Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: add mb_debug logging when there are lost chunksbrookxu
Lost chunks are when some other process raced with the current thread to grab a particular block allocation. Add mb_debug log for developers who wants to see how often this is happening for a particular workload. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/0a165ac0-1912-aebd-8a0d-b42e7cd1aea1@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ext4: Fix comment typo "the the".kyoungho koo
I have found double typed comments "the the". So i modified it to one "the" Signed-off-by: kyoungho koo <rnrudgh@gmail.com> Link: https://lore.kernel.org/r/20200424171620.GA11943@koo-Z370-HD3 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19jbd2: clean up checksum verification in do_one_pass()Shijie Luo
Remove the unnecessary chksum_err and checksum_seen variables as well as some redundant code to make the function easier to understand. [ With changes suggested by jack@ and tytso@ ] Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200819122955.33526-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-08-19ALSA: hda: avoid reset of sdo_limitSameer Pujar
By default 'sdo_limit' is initialized with a default value of '8' as per spec. This is overridden in cases where a different value is required. However this is getting reset when snd_hdac_bus_init_chip() is called again, which happens during runtime PM cycle. Avoid this reset by moving 'sdo_limit' setup to 'snd_hdac_bus_init()' function which would be called only once. Fixes: 67ae482a59e9 ("ALSA: hda: add member to store ratio for stripe control") Cc: <stable@vger.kernel.org> Signed-off-by: Sameer Pujar <spujar@nvidia.com> Link: https://lore.kernel.org/r/1597851130-6765-1-git-send-email-spujar@nvidia.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-19drm/i915/tgl: Make sure TC-cold is blocked before enabling TC AUX power wellsImre Deak
The dependency between power wells is determined by the ordering of the power well list: when enabling the power wells for a domain, this happens walking the power well list forward, while disabling them happens in the reverse direction. Accordingly a power well on the list must follow any other power well it depends on. Since the TC AUX power wells depend on TC-cold being blocked, move the TC-cold off power well before all AUX power wells. Fixes: 3c02934b24e3 ("drm/i915/tc/tgl: Implement TC cold sequences") Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200720232952.16228-1-imre.deak@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit b302a2e68807604af2a5015816c1d117747989b6) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19drm/i915/selftests: Avoid passing a random 0 into ilog2George Spelvin
igt_mm_config() calls ilog2() on the (pseudo)random 21-bit number s>>12. Once in 2 million seeds, this is zero and ilog2 summons the nasal demons. There was an attempt to handle this case with a max(), but that's too late; ms could already be something bizarre. Given that the low 12 bits of s and ms are always zero, it's a lot simpler just to divide them by 4096, then everything fits into 32 bits, and we can easily generate a random number 1 <= s <= 0x1fffff. Fixes: 14d1b9a6247c ("drm/i915: buddy allocator") Signed-off-by: George Spelvin <lkml@sdf.org> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200325192429.GA8865@SDF.ORG Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 21118e8e56479ef33460fbd63a5ad0535843b666) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19drm/i915: Fix wrong return value in intel_atomic_check()Tianjia Zhang
In the case of calling check_digital_port_conflicts() failed, a negative error code -EINVAL should be returned. Fixes: bf5da83e4bd80 ("drm/i915: Move check_digital_port_conflicts() earier") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200802111535.5200-1-tianjia.zhang@linux.alibaba.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 66b51b801d05ee54a0f23628cb8220189adb715e) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19drm/i915: Update bw_buddy pagemask tableMatt Roper
A recent bspec update removed the LPDDR4 single channel entry from the buddy register table, but added a new four-channel entry. Workaround 1409767108 hasn't been updated with any guidance for four channel configurations, so we leave that alternate table unchanged for now. Bspec 49218 Fixes: 3fa01d642fa7 ("drm/i915/tgl: Program BW_BUDDY registers during display init") Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200612204734.3674650-1-matthew.d.roper@intel.com Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit ecb40d0826fda213ebb58d49e7d5b4752480e130) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19drm/i915/display: Check for an LPSP encoder before dereferencingChris Wilson
Avoid a GPF at <1>[ 20.177320] BUG: kernel NULL pointer dereference, address: 000000000000007c <1>[ 20.177322] #PF: supervisor read access in kernel mode <1>[ 20.177323] #PF: error_code(0x0000) - not-present page <6>[ 20.177324] PGD 0 P4D 0 <4>[ 20.177327] Oops: 0000 [#1] PREEMPT SMP PTI <4>[ 20.177328] CPU: 1 PID: 944 Comm: debugfs_test Not tainted 5.8.0-rc7-CI-CI_DRM_8814+ #1 <4>[ 20.177330] Hardware name: Dell Inc. XPS 13 9360/0823VW, BIOS 2.9.0 07/09/2018 <4>[ 20.177372] RIP: 0010:i915_lpsp_capability_show+0x44/0xc0 [i915] <4>[ 20.177374] Code: 0f b6 81 ca 0d 00 00 3c 0b 74 77 76 19 3c 0c 75 44 83 7e 7c 01 7e 2f 48 c7 c6 d7 b9 47 a0 e8 43 df 06 e1 31 c0 c3 3c 09 72 2b <8b> 46 7c 85 c0 75 e6 8b 82 e4 00 00 00 89 c2 83 e2 fb 83 fa 0a 74 <4>[ 20.177376] RSP: 0018:ffffc90000cebe38 EFLAGS: 00010246 <4>[ 20.177377] RAX: 0000000000000009 RBX: ffff888267fe6a58 RCX: ffff888252d10000 <4>[ 20.177378] RDX: ffff88824a9a4000 RSI: 0000000000000000 RDI: ffff888267fe6a30 <4>[ 20.177379] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 <4>[ 20.177380] R10: 0000000000000001 R11: 0000000000000000 R12: ffffc90000cebf08 <4>[ 20.177381] R13: 00000000ffffffff R14: 0000000000000001 R15: ffff888267fe6a30 <4>[ 20.177383] FS: 00007f6f9c6b5e40(0000) GS:ffff888276480000(0000) knlGS:0000000000000000 <4>[ 20.177384] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 20.177385] CR2: 000000000000007c CR3: 0000000255f04006 CR4: 00000000003606e0 <4>[ 20.177386] Call Trace: <4>[ 20.177390] seq_read+0xcb/0x420 which is presumably from having no encoder attached at that time. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2175 Fixes: 8806211fe7b3 ("drm/i915: Add i915_lpsp_capability debugfs") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Animesh Manna <animesh.manna@intel.com> Cc: Anshuman Gupta <anshuman.gupta@intel.com> Cc: Uma Shankar <uma.shankar@intel.com> Cc: "Ville Syrjälä" <ville.syrjala@linux.intel.com> Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200729130912.30093-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit a22b1a9bb0d72a58d5b836653f28d97ee8fea1c4) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19drm/i915: Copy default modparams to mock i915_deviceChris Wilson
Since we use the module parameters stored inside the drm_i915_device itself, we need to ensure the mock i915_device also sets up the right defaults. Fixes: 8a25c4be583d ("drm/i915/params: switch to device specific parameters") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200728150600.4509-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 98ef067453709444a264939940f7b3a5dfdfa09e) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19drm/i915: Provide the perf pmu.moduleChris Wilson
Rather than manually implement our own module reference counting for perf pmu events, finally realise that there is a module parameter to struct pmu for this very purpose. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200716094643.31410-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit 27e897beec1c59861f15d4d3562c39ad1143620f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-08-19Merge tag 'gvt-next-fixes-2020-08-05' of https://github.com/intel/gvt-linux ↵Jani Nikula
into drm-intel-fixes gvt-next-fixes-2020-08-05 - Fix guest suspend/resume low performance handling of shadow ppgtt (Colin) - Fix PV notifier handling for guest suspend/resume (Colin) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200805080207.GY27035@zhen-hp.sh.intel.com