summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-07-09Merge tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Multiple patches to add support for fcntl() leases over NFSv4. - A sysfs interface to display more information about the various transport connections used by the RPC client - A sysfs interface to allow a suitably privileged user to offline a transport that may no longer point to a valid server - A sysfs interface to allow a suitably privileged user to change the server IP address used by the RPC client Stable fixes: - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues Bugfixes: - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base() - SUNRPC: prevent port reuse on transports which don't request it. - NFSv3: Fix memory leak in posix_acl_create() - NFS: Various fixes to attribute revalidation timeouts - NFSv4: Fix handling of non-atomic change attribute updates - NFSv4: If a server is down, don't cause mounts to other servers to hang as well - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT - NFS: Fix mount failures due to incorrect setting of the has_sec_mnt_opts filesystem flag - NFS: Ensure nfs_readpage returns promptly when an internal error occurs - NFS: Fix fscache read from NFS after cache error - pNFS: Various bugfixes around the LAYOUTGET operation" * tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits) NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3 NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times NFSv4/pnfs: Clean up layout get on open NFSv4/pnfs: Fix layoutget behaviour after invalidation NFSv4/pnfs: Fix the layout barrier update NFS: Fix fscache read from NFS after cache error NFS: Ensure nfs_readpage returns promptly when internal error occurs sunrpc: remove an offlined xprt using sysfs sunrpc: provide showing transport's state info in the sysfs directory sunrpc: display xprt's queuelen of assigned tasks via sysfs sunrpc: provide multipath info in the sysfs directory NFSv4.1 identify and mark RPC tasks that can move between transports sunrpc: provide transport info in the sysfs directory SUNRPC: take a xprt offline using sysfs sunrpc: add dst_attr attributes to the sysfs xprt directory SUNRPC for TCP display xprt's source port in sysfs xprt_info SUNRPC query transport's source port SUNRPC display xprt's main value in sysfs's xprt_info SUNRPC mark the first transport sunrpc: add add sysfs directory per xprt under each xprt_switch ...
2021-07-09Merge tag 'f2fs-for-5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've improved the compression support especially for Android such as allowing compression for mmap files, replacing the immutable bit with internal bit to prohibits data writes explicitly, and adding a mount option, "compress_cache", to improve random reads. And, we added "readonly" feature to compact the partition w/ compression enabled, which will be useful for Android RO partitions. Enhancements: - support compression for mmap file - use an f2fs flag instead of IMMUTABLE bit for compression - support RO feature w/ extent_cache - fully support swapfile with file pinning - improve atgc tunability - add nocompress extensions to unselect files for compression Bug fixes: - fix false alaram on iget failure during GC - fix race condition on global pointers when there are multiple f2fs instances - add MODULE_SOFTDEP for initramfs As usual, we've also cleaned up some places for better code readability (e.g., sysfs/feature, debugging messages, slab cache name, and docs)" * tag 'f2fs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: drop dirty node pages when cp is in error status f2fs: initialize page->private when using for our internal use f2fs: compress: add nocompress extensions support MAINTAINERS: f2fs: update my email address f2fs: remove false alarm on iget failure during GC f2fs: enable extent cache for compression files in read-only f2fs: fix to avoid adding tab before doc section f2fs: introduce f2fs_casefolded_name slab cache f2fs: swap: support migrating swapfile in aligned write mode f2fs: swap: remove dead codes f2fs: compress: add compress_inode to cache compressed blocks f2fs: clean up /sys/fs/f2fs/<disk>/features f2fs: add pin_file in feature list f2fs: Advertise encrypted casefolding in sysfs f2fs: Show casefolding support only when supported f2fs: support RO feature f2fs: logging neatening f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit f2fs: compress: remove unneeded preallocation f2fs: atgc: export entries for better tunability via sysfs ...
2021-07-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Pull yet more updates from Andrew Morton: "54 patches. Subsystems affected by this patch series: lib, mm (slub, secretmem, cleanups, init, pagemap, and mremap), and debug" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (54 commits) powerpc/mm: enable HAVE_MOVE_PMD support powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache mm/mremap: allow arch runtime override mm/mremap: hold the rmap lock in write mode when moving page table entries. mm/mremap: use pmd/pud_poplulate to update page table entries mm/mremap: don't enable optimized PUD move if page table levels is 2 mm/mremap: convert huge PUD move to separate helper selftest/mremap_test: avoid crash with static build selftest/mremap_test: update the test to handle pagesize other than 4K mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t * mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t * kdump: use vmlinux_build_id to simplify buildid: fix kernel-doc notation buildid: mark some arguments const scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm scripts/decode_stacktrace.sh: support debuginfod x86/dumpstack: use %pSb/%pBb for backtrace printing arm64: stacktrace: use %pSb for backtrace printing module: add printk formats to add module build ID to stacktraces ...
2021-07-09io_uring: remove dead non-zero 'poll' checkJens Axboe
Colin reports that Coverity complains about checking for poll being non-zero after having dereferenced it multiple times. This is a valid complaint, and actually a leftover from back when this code was based on the aio poll code. Kill the redundant check. Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/ Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-09Merge tag 'irqchip-fixes-5.14-1' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - Fix a MIPS bug where irqdomain loopkups could occur in a context where RCU is not allowed - Fix a documentation bug for handle_domain_irq
2021-07-09MIPS: vdso: Invalid GIC access through VDSOMartin Fäcknitz
Accessing raw timers (currently only CLOCK_MONOTONIC_RAW) through VDSO doesn't return the correct time when using the GIC as clock source. The address of the GIC mapped page is in this case not calculated correctly. The GIC mapped page is calculated from the VDSO data by subtracting PAGE_SIZE: void *get_gic(const struct vdso_data *data) { return (void __iomem *)data - PAGE_SIZE; } However, the data pointer is not page aligned for raw clock sources. This is because the VDSO data for raw clock sources (CS_RAW = 1) is stored after the VDSO data for coarse clock sources (CS_HRES_COARSE = 0). Therefore, only the VDSO data for CS_HRES_COARSE is page aligned: +--------------------+ | | | vd[CS_RAW] | ---+ | vd[CS_HRES_COARSE] | | +--------------------+ | -PAGE_SIZE | | | | GIC mapped page | <--+ | | +--------------------+ When __arch_get_hw_counter() is called with &vd[CS_RAW], get_gic returns the wrong address (somewhere inside the GIC mapped page). The GIC counter values are not returned which results in an invalid time. Fixes: a7f4df4e21dd ("MIPS: VDSO: Add implementations of gettimeofday() and clock_gettime()") Signed-off-by: Martin Fäcknitz <faecknitz@hotsplots.de> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-07-09bpf: Selftest to verify mixing bpf2bpf calls and tailcalls with insn patchJohn Fastabend
This adds some extra noise to the tailcall_bpf2bpf4 tests that will cause verify to patch insns. This then moves around subprog start/end insn index and poke descriptor insn index to ensure that verify and JIT will continue to track these correctly. If done correctly verifier should pass this program same as before and JIT should emit tail call logic. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210707223848.14580-3-john.fastabend@gmail.com
2021-07-09bpf: Track subprog poke descriptors correctly and fix use-after-freeJohn Fastabend
Subprograms are calling map_poke_track(), but on program release there is no hook to call map_poke_untrack(). However, on program release, the aux memory (and poke descriptor table) is freed even though we still have a reference to it in the element list of the map aux data. When we run map_poke_run(), we then end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run(): [...] [ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e [ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337 [ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399 [ 402.824715] Call Trace: [ 402.824719] dump_stack+0x93/0xc2 [ 402.824727] print_address_description.constprop.0+0x1a/0x140 [ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824744] kasan_report.cold+0x7c/0xd8 [ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e [ 402.824757] prog_array_map_poke_run+0xc2/0x34e [ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0 [...] The elements concerned are walked as follows: for (i = 0; i < elem->aux->size_poke_tab; i++) { poke = &elem->aux->poke_tab[i]; [...] The access to size_poke_tab is a 4 byte read, verified by checking offsets in the KASAN dump: [ 402.825004] The buggy address belongs to the object at ffff8881905a7800 which belongs to the cache kmalloc-1k of size 1024 [ 402.825008] The buggy address is located 320 bytes inside of 1024-byte region [ffff8881905a7800, ffff8881905a7c00) The pahole output of bpf_prog_aux: struct bpf_prog_aux { [...] /* --- cacheline 5 boundary (320 bytes) --- */ u32 size_poke_tab; /* 320 4 */ [...] In general, subprograms do not necessarily manage their own data structures. For example, BTF func_info and linfo are just pointers to the main program structure. This allows reference counting and cleanup to be done on the latter which simplifies their management a bit. The aux->poke_tab struct, however, did not follow this logic. The initial proposed fix for this use-after-free bug further embedded poke data tracking into the subprogram with proper reference counting. However, Daniel and Alexei questioned why we were treating these objects special; I agree, its unnecessary. The fix here removes the per subprogram poke table allocation and map tracking and instead simply points the aux->poke_tab pointer at the main programs poke table. This way, map tracking is simplified to the main program and we do not need to manage them per subprogram. This also means, bpf_prog_free_deferred(), which unwinds the program reference counting and kfrees objects, needs to ensure that we don't try to double free the poke_tab when free'ing the subprog structures. This is easily solved by NULL'ing the poke_tab pointer. The second detail is to ensure that per subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do this, we add a pointer in the poke structure to point at the subprogram value so JITs can easily check while walking the poke_tab structure if the current entry belongs to the current program. The aux pointer is stable and therefore suitable for such comparison. On the jit_subprogs() error path, we omit cleaning up the poke->aux field because these are only ever referenced from the JIT side, but on error we will never make it to the JIT, so its fine to leave them dangling. Removing these pointers would complicate the error path for no reason. However, we do need to untrack all poke descriptors from the main program as otherwise they could race with the freeing of JIT memory from the subprograms. Lastly, a748c6975dea3 ("bpf: propagate poke descriptors to subprograms") had an off-by-one on the subprogram instruction index range check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'. However, subprog_end is the next subprogram's start instruction. Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com
2021-07-09irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entryMarc Zyngier
Since d4a45c68dc81 ("irqdomain: Protect the linear revmap with RCU"), any irqdomain lookup requires the RCU read lock to be held. This assumes that the architecture code will be structured such as irq_enter() will be called *before* the interrupt is looked up in the irq domain. However, this isn't the case for MIPS, and a number of drivers are structured to do it the other way around when handling an interrupt in their root irqchip (secondary irqchips are OK by construction). This results in a RCU splat on a lockdep-enabled kernel when the kernel takes an interrupt from idle, as reported by Guenter Roeck. Note that this could have fired previously if any driver had used tree-based irqdomain, which always had the RCU requirement. To solve this, provide a MIPS-specific helper (do_domain_IRQ()) as the pendent of do_IRQ() that will do thing in the right order (and maybe save some cycles in the process). Ideally, MIPS would be moved over to using handle_domain_irq(), but that's much more ambitious. Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> [maz: add dependency on CONFIG_IRQ_DOMAIN after report from the kernelci bot] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20210705172352.GA56304@roeck-us.net Link: https://lore.kernel.org/r/20210706110647.3979002-1-maz@kernel.org
2021-07-08net: bcmgenet: Ensure all TX/RX queues DMAs are disabledFlorian Fainelli
Make sure that we disable each of the TX and RX queues in the TDMA and RDMA control registers. This is a correctness change to be symmetrical with the code that enables the TX and RX queues. Tested-by: Maxime Ripard <maxime@cerno.tech> Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08cifs: use helpers when parsing uid/gid mount options and validate themRonnie Sahlberg
Use the nice helpers to initialize and the uid/gid/cred_uid when passed as mount arguments. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-07-08Merge branch 'ncsi-phy-link-up'David S. Miller
Ivan Mikhaylov says: ==================== net/ncsi: Add NCSI Intel OEM command to keep PHY link up Add NCSI Intel OEM command to keep PHY link up and prevents any channel resets during the host load on i210. Also includes dummy response handler for Intel manufacturer id. Changes from v1: 1. sparse fixes about casts 2. put it after ncsi_dev_state_probe_cis instead of ncsi_dev_state_probe_channel because sometimes channel is not ready after it 3. inl -> intel ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08net/ncsi: add dummy response handler for Intel boardsIvan Mikhaylov
Add the dummy response handler for Intel boards to prevent incorrect handling of OEM commands. Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08net/ncsi: add NCSI Intel OEM command to keep PHY upIvan Mikhaylov
This allows to keep PHY link up and prevents any channel resets during the host load. It is KEEP_PHY_LINK_UP option(Veto bit) in i210 datasheet which block PHY reset and power state changes. Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08net/ncsi: fix restricted cast warning of sparseIvan Mikhaylov
Sparse reports: net/ncsi/ncsi-rsp.c:406:24: warning: cast to restricted __be32 net/ncsi/ncsi-manage.c:732:33: warning: cast to restricted __be32 net/ncsi/ncsi-manage.c:756:25: warning: cast to restricted __be32 net/ncsi/ncsi-manage.c:779:25: warning: cast to restricted __be32 Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08net: microchip: sparx5: fix kconfig warningRandy Dunlap
PHY_SPARX5_SERDES depends on OF so SPARX5_SWITCH should also depend on OF since 'select' does not follow any dependencies. WARNING: unmet direct dependencies detected for PHY_SPARX5_SERDES Depends on [n]: (ARCH_SPARX5 || COMPILE_TEST [=n]) && OF [=n] && HAS_IOMEM [=y] Selected by [y]: - SPARX5_SWITCH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && NET_SWITCHDEV [=y] && HAS_IOMEM [=y] Fixes: 3cfa11bac9bb ("net: sparx5: add the basic sparx5 driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: linux-arm-kernel@lists.infradead.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08cxgb4: fix IRQ free race during driver unloadShahjada Abul Husain
IRQs are requested during driver's ndo_open() and then later freed up in disable_interrupts() during driver unload. A race exists where driver can set the CXGB4_FULL_INIT_DONE flag in ndo_open() after the disable_interrupts() in driver unload path checks it, and hence misses calling free_irq(). Fix by unregistering netdevice first and sync with driver's ndo_open(). This ensures disable_interrupts() checks the flag correctly and frees up the IRQs properly. Fixes: b37987e8db5f ("cxgb4: Disable interrupts and napi before unregistering netdev") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08mt76: mt7921: continue to probe driver when fw already downloadedAaron Ma
When reboot system, no power cycles, firmware is already downloaded, return -EIO will break driver as error: mt7921e: probe of 0000:03:00.0 failed with error -5 Skip firmware download and continue to probe. Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Fixes: 1c099ab44727c ("mt76: mt7921: add MCU support") Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08atl1c: fix Mikrotik 10/25G NIC detectionGatis Peisenieks
Since Mikrotik 10/25G NIC MDIO op emulation is not 100% reliable, on rare occasions it can happen that some physical functions of the NIC do not get initialized due to timeouted early MDIO op. This changes the atl1c probe on Mikrotik 10/25G NIC not to depend on MDIO op emulation. Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08s390: preempt: Fix preempt_count initializationValentin Schneider
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the preempt_count of the requested CPU's idle task: it unconditionally writes to the current CPU's. This clearly conflicts with idle_threads_init(), which intends to initialize *all* the idle tasks, including their preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count). Unfortunately, it seems the way s390 does things doesn't let us initialize every possible CPU's preempt_count early on, as the pages where this resides are only allocated when a CPU is brought up and are freed when it is brought down. Let the arch-specific code set a CPU's preempt_count when its lowcore is allocated, and turn init_idle_preempt_count() into an empty stub. Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20210707163338.1623014-1-valentin.schneider@arm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/linkage: increase asm symbols alignment to 16Vasily Gorbik
Both clang and gcc (for -march=z13 and later) align functions to 16 bytes at -O2 to benefit branch prediction. Make asm symbols alignment consistent with that. This also benefits potential ftrace code patching, which is currently able to patch 8 aligned bytes at once. With defconfig this currently increases .text size by 4104 bytes. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()Heiko Carstens
Lower case matches the call_on_stack() macro and is easier to read. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: add type checking to CALL_ON_STACK_NORETURN() macroHeiko Carstens
Make sure the to be called function takes no arguments (and returns void). Otherwise usage of CALL_ON_STACK_NORETURN() would generate broken code. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: remove old CALL_ON_STACK() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/softirq: use call_on_stack() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/lib: use call_on_stack() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/smp: use call_on_stack() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/kexec: use call_on_stack() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/irq: use call_on_stack() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/mm: use call_on_stack() macroHeiko Carstens
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: introduce proper type handling call_on_stack() macroHeiko Carstens
The existing CALL_ON_STACK() macro allows for subtle bugs: - There is no type checking of the function that is being called. That is: missing or too many arguments do not cause any compile error or warning. The same is true if the return type of the called function changes. This can lead to quite random bugs. - Sign and zero extension of arguments is missing. Given that the s390 C ABI requires that the caller of a function performs proper sign and zero extension this can also lead to subtle bugs. - If arguments to the CALL_ON_STACK() macros contain functions calls register corruption can happen due to register asm constructs being used. Therefore introduce a new call_on_stack() macro which is supposed to fix all these problems. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/irq: simplify on_async_stack()Heiko Carstens
Make on_async_stack() a bit more readable, even though as usual it depends if one considers "!!!" readable or not. At least the new construct to check if the async stack is in use or not is a bit shorter and generates slightly better code. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/irq: inline do_softirq_own_stack()Heiko Carstens
Move do_softirq_own_stack() to proper header file so it can be inlined; saving a few cycles. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/irq: simplify do_softirq_own_stack()Heiko Carstens
do_softirq_own_stack() is always called from task context and therefore it is not necessary to check if the async stack is currently used. Remove the check and directly switch to async stack. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/ap: get rid of register asm in ap_dqap()Harald Freudenberger
This is the second part of the cleanup for the header file ap.h to remove the register asm statements. This patch deals with the inline ap_dqap() function where within the assembler code an odd register of an register pair is to be addressed. [hca@linux.ibm.com: this intentionally breaks compilation with any clang compilers prior to llvm-project commit 458eac257377 ("[SystemZ] Support the 'N' code for the odd register in inline-asm."). This is hopefully the last clang kernel compile breakage caused by incompatibilities between gcc and clang.] Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTARTSven Schnelle
PIF_SYSCALL_RESTART is now only used to restart execve when loading PGSTE binaries. Rename the flag to reflect that, and avoid people thinking that this bit has anything to do with generic syscall restarting. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390: move restart of execve() syscallSven Schnelle
On s390, execve might have to be restarted for PGSTE binaries like kvm. In the past this was done via the PIF_SYSCALL_RESTART bit. However, with the recent changes, syscalls are now restarted differently. Now that execve() is the only call that might get restarted via PIF_SYSCALL_RESTART, move the loop to do_syscall(). This also has the advantage that the restart is no longer visible to userspace. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/signal: remove sigreturn on stackSven Schnelle
{rt_}sigreturn is now called from the vdso, so we no longer need the svc on the stack, and therefore no hack to support that mechanism on machines with non-executable stack. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08s390/signal: switch to using vdso for sigreturn and syscall restartSven Schnelle
with generic entry, there's a bug when it comes to restarting of signals. The failing sequence is: a) a signal is coming in, and no handler is registered, so the lower part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c sets PIF_SYSCALL_RESTART. b) a second signal gets pending while the kernel is still in the exit loop, and for that one, a handler exists. c) The first part of arch_do_signal_or_restart() is called. That part calls handle_signal(), which sets up stack + registers for handling the signal. d) __do_syscall() in arch/s390/kernel/syscall.c checks for PIF_SYSCALL_RESTART right before leaving to userspace. If it is set, it restart's the syscall. However, the registers are already setup for handling a signal from c). The syscall is now restarted with the wrong arguments. Change the code to: - use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because we cannot rewind and go back to userspace on s390 because the system call number might be encoded in the svc instruction. - for all other syscalls we rewind the PSW and return to userspace. Cc: <stable@kernel.org> # v5.12+ d57778feb987: s390/vdso: always enable vdso Cc: <stable@kernel.org> # v5.12+ 686341f2548b: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall Cc: <stable@kernel.org> # v5.12+ 43e1f76b0b69: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE Cc: <stable@kernel.org> # v5.12+ 779df2248739: s390/vdso: add minimal compat vdso Cc: <stable@kernel.org> # v5.12+ Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-07-08io_uring: mitigate unlikely iopoll lagPavel Begunkov
We have requests like IORING_OP_FILES_UPDATE that don't go through ->iopoll_list but get completed in place under ->uring_lock, and so after dropping the lock io_iopoll_check() should expect that some CQEs might have get completed in a meanwhile. Currently such events won't be accounted in @nr_events, and the loop will continue to poll even if there is enough of CQEs. It shouldn't be a problem as it's not likely to happen and so, but not nice either. Just return earlier in this case, it should be enough. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-07-08ptp: Relocate lookup cookie to correct block.Jonathan Lemon
An earlier commit set the pps_lookup cookie, but the line was somehow added to the wrong code block. Correct this. Fixes: 8602e40fc813 ("ptp: Set lookup cookie when creating a PTP PPS source.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Dario Binacchi <dariobin@libero.it> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08Merge tag 'drm-next-2021-07-08-1' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull drm fixes from Dave Airlie: "Some fixes for rc1 that came in the past weeks, mainly a bunch of amdgpu fixes, some i915 and the rest are misc around the place. I'm sending this a bit early so some more stuff may show up, but I'll probably take tomorrow off. dma-buf: - doc fixes amdgpu: - Misc Navi fixes - Powergating fix - Yellow Carp updates - Beige Goby updates - S0ix fix - Revert overlay validation fix - GPU reset fix for DC - PPC64 fix - Add new dimgrey cavefish DID - RAS fix - TTM fixes amdkfd: - SVM fixes radeon: - Fix missing drm_gem_object_put in error path - NULL ptr deref fix i915: - display DP VSC fix - DG1 display fix - IRQ fixes - IRQ demidlayering gma500: - bo leaks in error paths fixed" * tag 'drm-next-2021-07-08-1' of git://anongit.freedesktop.org/drm/drm: (52 commits) drm/i915: Drop all references to DRM IRQ midlayer drm/i915: Use the correct IRQ during resume drm/i915/display/dg1: Correctly map DPLLs during state readout drm/i915/display: Do not zero past infoframes.vsc drm/amdgpu: Conditionally reset SDMA RAS error counts drm/amdkfd: Maintain svm_bo reference in page->zone_device_data drm/amdkfd: add invalid pages debug at vram migration drm/amdkfd: skip migration for pages already in VRAM drm/amdkfd: skip invalid pages during migrations drm/amdkfd: classify and map mixed svm range pages in GPU drm/amdkfd: use hmm range fault to get both domain pfns drm/amdgpu: get owner ref in validate and map drm/amdkfd: set owner ref to svm range prefault drm/amdkfd: add owner ref param to get hmm pages drm/amdkfd: device pgmap owner at the svm migrate init drm/amdkfd: inc counter on child ranges with xnack off drm/amd/display: Extend DMUB diagnostic logging to DCN3.1 drm/amdgpu: Update NV SIMD-per-CU to 2 drm/amdgpu: add new dimgrey cavefish DID drm/amd/pm: skip PrepareMp1ForUnload message in s0ix ...
2021-07-08ipv6: tcp: drop silly ICMPv6 packet too big messagesEric Dumazet
While TCP stack scales reasonably well, there is still one part that can be used to DDOS it. IPv6 Packet too big messages have to lookup/insert a new route, and if abused by attackers, can easily put hosts under high stress, with many cpus contending on a spinlock while one is stuck in fib6_run_gc() ip6_protocol_deliver_rcu() icmpv6_rcv() icmpv6_notify() tcp_v6_err() tcp_v6_mtu_reduced() inet6_csk_update_pmtu() ip6_rt_update_pmtu() __ip6_rt_update_pmtu() ip6_rt_cache_alloc() ip6_dst_alloc() dst_alloc() ip6_dst_gc() fib6_run_gc() spin_lock_bh() ... Some of our servers have been hit by malicious ICMPv6 packets trying to _increase_ the MTU/MSS of TCP flows. We believe these ICMPv6 packets are a result of a bug in one ISP stack, since they were blindly sent back for _every_ (small) packet sent to them. These packets are for one TCP flow: 09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 TCP stack can filter some silly requests : 1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err() 2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS. This tests happen before the IPv6 routing stack is entered, thus removing the potential contention and route exhaustion. Note that IPv6 stack was performing these checks, but too late (ie : after the route has been added, and after the potential garbage collect war) v2: fix typo caught by Martin, thanks ! v3: exports tcp_mtu_to_mss(), caught by David, thanks ! Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08Merge tag 'pwm/for-5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm Pull pwm updates from Thierry Reding: "This contains mostly various fixes, cleanups and some conversions to the atomic API. One noteworthy change is that PWM consumers can now pass a hint to the PWM core about the PWM usage, enabling PWM providers to implement various optimizations. There's also a fair bit of simplification here with the addition of some device-managed helpers as well as unification between the DT and ACPI firmware interfaces" * tag 'pwm/for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (50 commits) pwm: Remove redundant assignment to pointer pwm pwm: ep93xx: Fix read of uninitialized variable ret pwm: ep93xx: Prepare clock before using it pwm: ep93xx: Unfold legacy callbacks into ep93xx_pwm_apply() pwm: ep93xx: Implement .apply callback pwm: vt8500: Only unprepare the clock after the pwmchip was removed pwm: vt8500: Drop if with an always false condition pwm: tegra: Assert reset only after the PWM was unregistered pwm: tegra: Don't needlessly enable and disable the clock in .remove() pwm: tegra: Don't modify HW state in .remove callback pwm: tegra: Drop an if block with an always false condition pwm: core: Simplify some devm_*pwm*() functions pwm: core: Remove unused devm_pwm_put() pwm: core: Unify fwnode checks in the module pwm: core: Reuse fwnode_to_pwmchip() in ACPI case pwm: core: Convert to use fwnode for matching docs: firmware-guide: ACPI: Add a PWM example dt-bindings: pwm: pwm-tiecap: Add compatible string for AM64 SoC dt-bindings: pwm: pwm-tiecap: Convert to json schema pwm: sprd: Don't check the return code of pwmchip_remove() ...
2021-07-08Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull more clk updates from Stephen Boyd: - A handful of fixes for lmk04832 driver - Migrate the basic clk divider to use determine rate ops - Fix modpost build for hisilicon hi3559a driver - Actually set the parent in k210_clk_set_parent() * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: Revert "clk: divider: Switch from .round_rate to .determine_rate by default" clk: hisilicon: hi3559a: Drop __init markings everywhere clk: meson: regmap: switch to determine_rate for the dividers clk: divider: Switch from .round_rate to .determine_rate by default clk: divider: Add re-usable determine_rate implementations clk: k210: Fix k210_clk_set_parent() clk: lmk04832: Fix spelling mistakes in dev_err messages and comments clk: lmk04832: fix return value check in lmk04832_probe() clk: stm32mp1: fix missing spin_lock_init()
2021-07-08Merge tag 'pci-v5.14-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Fix dsm_label_utf16s_to_utf8s() buffer overrun (Krzysztof Wilczyński) - Rely on lengths from scnprintf(), dsm_label_utf16s_to_utf8s() (Krzysztof Wilczyński) - Use sysfs_emit() and sysfs_emit_at() in "show" functions (Krzysztof Wilczyński) - Fix 'resource_alignment' newline issues (Krzysztof Wilczyński) - Add 'devspec' newline (Krzysztof Wilczyński) - Dynamically map ECAM regions (Russell King) Resource management: - Coalesce host bridge contiguous apertures (Kai-Heng Feng) PCIe native device hotplug: - Ignore Link Down/Up caused by DPC (Lukas Wunner) Power management: - Leave Apple Thunderbolt controllers on for s2idle or standby (Konstantin Kharlamov) Virtualization: - Work around Huawei Intelligent NIC VF FLR erratum (Chiqijun) - Clarify error message for unbound IOV devices (Moritz Fischer) - Add pci_reset_bus_function() Secondary Bus Reset interface (Raphael Norwitz) Peer-to-peer DMA: - Simplify distance calculation (Christoph Hellwig) - Finish RCU conversion of pdev->p2pdma (Eric Dumazet) - Rename upstream_bridge_distance() and rework doc (Logan Gunthorpe) - Collect acs list in stack buffer to avoid sleeping (Logan Gunthorpe) - Use correct calc_map_type_and_dist() return type (Logan Gunthorpe) - Warn if host bridge not in whitelist (Logan Gunthorpe) - Refactor pci_p2pdma_map_type() (Logan Gunthorpe) - Avoid pci_get_slot(), which may sleep (Logan Gunthorpe) Altera PCIe controller driver: - Add Joyce Ooi as Altera PCIe maintainer (Joyce Ooi) Broadcom iProc PCIe controller driver: - Fix multi-MSI base vector number allocation (Sandor Bodo-Merle) - Support multi-MSI only on uniprocessor kernel (Sandor Bodo-Merle) Freescale i.MX6 PCIe controller driver: - Limit DBI register length for imx6qp PCIe (Richard Zhu) - Add "vph-supply" for PHY supply voltage (Richard Zhu) - Enable PHY internal regulator when supplied >3V (Richard Zhu) - Remove imx6_pcie_probe() redundant error message (Zhen Lei) Intel Gateway PCIe controller driver: - Fix INTx enable (Martin Blumenstingl) Marvell Aardvark PCIe controller driver: - Fix checking for PIO Non-posted Request (Pali Rohár) - Implement workaround for the readback value of VEND_ID (Pali Rohár) MediaTek PCIe controller driver: - Remove redundant error printing in mtk_pcie_subsys_powerup() (Zhen Lei) MediaTek PCIe Gen3 controller driver: - Add missing MODULE_DEVICE_TABLE (Zou Wei) Microchip PolarFlare PCIe controller driver: - Make struct event_descs static (Krzysztof Wilczyński) Microsoft Hyper-V host bridge driver: - Fix race condition when removing the device (Long Li) - Remove bus device removal unused refcount/functions (Long Li) Mobiveil PCIe controller driver: - Remove unused readl and writel functions (Krzysztof Wilczyński) NVIDIA Tegra PCIe controller driver: - Add missing MODULE_DEVICE_TABLE (Zou Wei) NVIDIA Tegra194 PCIe controller driver: - Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift (Jon Hunter) - Fix host initialization during resume (Vidya Sagar) Rockchip PCIe controller driver: - Register IRQ handlers after device and data are ready (Javier Martinez Canillas)" * tag 'pci-v5.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits) PCI/P2PDMA: Finish RCU conversion of pdev->p2pdma PCI: xgene: Annotate __iomem pointer PCI: Fix kernel-doc formatting PCI: cpcihp: Declare cpci_debug in header file MAINTAINERS: Add Joyce Ooi as Altera PCIe maintainer PCI: rockchip: Register IRQ handlers after device and data are ready PCI: tegra194: Fix tegra_pcie_ep_raise_msi_irq() ill-defined shift PCI: aardvark: Implement workaround for the readback value of VEND_ID PCI: aardvark: Fix checking for PIO Non-posted Request PCI: tegra194: Fix host initialization during resume PCI: tegra: Add missing MODULE_DEVICE_TABLE PCI: imx6: Enable PHY internal regulator when supplied >3V dt-bindings: imx6q-pcie: Add "vph-supply" for PHY supply voltage PCI: imx6: Limit DBI register length for imx6qp PCIe PCI: imx6: Remove imx6_pcie_probe() redundant error message PCI: intel-gw: Fix INTx enable PCI: iproc: Support multi-MSI only on uniprocessor kernel PCI: iproc: Fix multi-MSI base vector number allocation PCI: mediatek-gen3: Add missing MODULE_DEVICE_TABLE PCI: Dynamically map ECAM regions ...
2021-07-09scripts: add generic syscallnr.shMasahiro Yamada
Like syscallhdr.sh and syscalltbl.sh, add a simple script to generate the __NR_syscalls, which should not be exported to userspace. This script is useful to replace arch/mips/kernel/syscalls/syscallnr.sh, refactor arch/s390/kernel/syscalls/syscalltbl, and eliminate the code surrounded by #ifdef __KERNEL__ / #endif from exported uapi/asm/unistd_*.h files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-07-09scripts: check duplicated syscall number in syscall tableMasahiro Yamada
Currently, syscall{hdr,tbl}.sh sorts the entire syscall table, but you can assume it is already sorted by the syscall number. The generated syscall table does not work if the same syscall number appears twice. Check it in the script. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-07-08powerpc/mm: enable HAVE_MOVE_PMD supportAneesh Kumar K.V
mremap HAVE_MOVE_PMD/PUD optimization time comparison for 1GB region: 1GB mremap - Source PTE-aligned, Destination PTE-aligned mremap time: 2292772ns 1GB mremap - Source PMD-aligned, Destination PMD-aligned mremap time: 1158928ns 1GB mremap - Source PUD-aligned, Destination PUD-aligned mremap time: 63886ns Link: https://lkml.kernel.org/r/20210616045735.374532-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08powerpc/book3s64/mm: update flush_tlb_range to flush page walk cacheAneesh Kumar K.V
flush_tlb_range is special in that we don't specify the page size used for the translation. Hence when flushing TLB we flush the translation cache for all possible page sizes. The kernel also uses the same interface when moving page tables around. Such a move requires us to flush the page walk cache. Instead of adding another interface to force page walk cache flush, update flush_tlb_range to flush page walk cache if the range flushed is more than the PMD range. A page table move will always involve an invalidate range more than PMD_SIZE. Running microbenchmark with mprotect and parallel memory access didn't show any observable performance impact. Link: https://lkml.kernel.org/r/20210616045735.374532-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>