summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-12-04openvswitch: support asymmetric conntrackAaron Conole
The openvswitch module shares a common conntrack and NAT infrastructure exposed via netfilter. It's possible that a packet needs both SNAT and DNAT manipulation, due to e.g. tuple collision. Netfilter can support this because it runs through the NAT table twice - once on ingress and again after egress. The openvswitch module doesn't have such capability. Like netfilter hook infrastructure, we should run through NAT twice to keep the symmetry. Fixes: 05752523e565 ("openvswitch: Interface with NAT.") Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04io-wq: clear node->next on list deletionJens Axboe
If someone removes a node from a list, and then later adds it back to a list, we can have invalid data in ->next. This can cause all sorts of issues. One such use case is the IORING_OP_POLL_ADD command, which will do just that if we race and get woken twice without any pending events. This is a pretty rare case, but can happen under extreme loads. Dan reports that he saw the following crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD d283ce067 P4D d283ce067 PUD e5ca04067 PMD 0 Oops: 0002 [#1] SMP CPU: 17 PID: 10726 Comm: tao:fast-fiber Kdump: loaded Not tainted 5.2.9-02851-gac7bc042d2d1 #116 Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019 RIP: 0010:io_wqe_enqueue+0x3e/0xd0 Code: 34 24 74 55 8b 47 58 48 8d 6f 50 85 c0 74 50 48 89 df e8 35 7c 75 00 48 83 7b 08 00 48 8b 14 24 0f 84 84 00 00 00 48 8b 4b 10 <48> 89 11 48 89 53 10 83 63 20 fe 48 89 c6 48 89 df e8 0c 7a 75 00 RSP: 0000:ffffc90006858a08 EFLAGS: 00010082 RAX: 0000000000000002 RBX: ffff889037492fc0 RCX: 0000000000000000 RDX: ffff888e40cc11a8 RSI: ffff888e40cc11a8 RDI: ffff889037492fc0 RBP: ffff889037493010 R08: 00000000000000c3 R09: ffffc90006858ab8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888e40cc11a8 R13: 0000000000000000 R14: 00000000000000c3 R15: ffff888e40cc1100 FS: 00007fcddc9db700(0000) GS:ffff88903fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000e479f5003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> io_poll_wake+0x12f/0x2a0 __wake_up_common+0x86/0x120 __wake_up_common_lock+0x7a/0xc0 sock_def_readable+0x3c/0x70 tcp_rcv_established+0x557/0x630 tcp_v6_do_rcv+0x118/0x3c0 tcp_v6_rcv+0x97e/0x9d0 ip6_protocol_deliver_rcu+0xe3/0x440 ip6_input+0x3d/0xc0 ? ip6_protocol_deliver_rcu+0x440/0x440 ipv6_rcv+0x56/0xd0 ? ip6_rcv_finish_core.isra.18+0x80/0x80 __netif_receive_skb_one_core+0x50/0x70 netif_receive_skb_internal+0x2f/0xa0 napi_gro_receive+0x125/0x150 mlx5e_handle_rx_cqe+0x1d9/0x5a0 ? mlx5e_poll_tx_cq+0x305/0x560 mlx5e_poll_rx_cq+0x49f/0x9c5 mlx5e_napi_poll+0xee/0x640 ? smp_reschedule_interrupt+0x16/0xd0 ? reschedule_interrupt+0xf/0x20 net_rx_action+0x286/0x3d0 __do_softirq+0xca/0x297 irq_exit+0x96/0xa0 do_IRQ+0x54/0xe0 common_interrupt+0xf/0xf </IRQ> RIP: 0033:0x7fdc627a2e3a Code: 31 c0 85 d2 0f 88 f6 00 00 00 55 48 89 e5 41 57 41 56 4c 63 f2 41 55 41 54 53 48 83 ec 18 48 85 ff 0f 84 c7 00 00 00 48 8b 07 <41> 89 d4 49 89 f5 48 89 fb 48 85 c0 0f 84 64 01 00 00 48 83 78 10 when running a networked workload with about 5000 sockets being polled for. Fix this by clearing node->next when the node is being removed from the list. Fixes: 6206f0e180d4 ("io-wq: shrink io_wq_work a bit") Reported-by: Dan Melnic <dmm@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-05powerpc/pmem: Convert to EXPORT_SYMBOL_GPLAneesh Kumar K.V
All other architecture export this as GPL symbol Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191202064018.155083-1-aneesh.kumar@linux.ibm.com
2019-12-05powerpc/archrandom: fix arch_get_random_seed_int()Ard Biesheuvel
Commit 01c9348c7620ec65 powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_* updated arch_get_random_[int|long]() to be NOPs, and moved the hardware RNG backing to arch_get_random_seed_[int|long]() instead. However, it failed to take into account that arch_get_random_int() was implemented in terms of arch_get_random_long(), and so we ended up with a version of the former that is essentially a NOP as well. Fix this by calling arch_get_random_seed_long() from arch_get_random_seed_int() instead. Fixes: 01c9348c7620ec65 ("powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191204115015.18015-1-ardb@kernel.org
2019-12-04drm/dp_mst: Correct the bug in drm_dp_update_payload_part1()Wayne Lin
[Why] If the payload_state is DP_PAYLOAD_DELETE_LOCAL in series, current code doesn't delete the payload at current index and just move the index to next one after shuffling payloads. [How] Drop the i++ increasing part in for loop head and decide whether to increase the index or not according to payload_state of current payload. Changes since v1: * Refine the code to have it easy reading * Amend the commit message to meet the way code is modified now. Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Fixes: 706246c761dd ("drm/dp_mst: Refactor drm_dp_update_payload_part1()") Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Juston Li <juston.li@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v5.1+ [Added cc for stable] Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203042423.5961-1-Wayne.Lin@amd.com
2019-12-04Merge branch 'net-convert-ipv6_stub-to-ip6_dst_lookup_flow'David S. Miller
Sabrina Dubroca says: ==================== net: convert ipv6_stub to ip6_dst_lookup_flow Xiumei Mu reported a bug in a VXLAN over IPsec setup: IPv6 | ESP | VXLAN Using this setup, packets go out unencrypted, because VXLAN over IPv6 gets its route from ipv6_stub->ipv6_dst_lookup (in vxlan6_get_route), which doesn't perform an XFRM lookup. This patchset first makes ip6_dst_lookup_flow suitable for some existing users of ipv6_stub->ipv6_dst_lookup by adding a 'net' argument, then converts all those users. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca
ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04net: ipv6: add net argument to ip6_dst_lookup_flowSabrina Dubroca
This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow, as some modules currently pass a net argument without a socket to ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument"). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more KVM updates from Paolo Bonzini: - PPC secure guest support - small x86 cleanup - fix for an x86-specific out-of-bounds write on a ioctl (not guest triggerable, data not attacker-controlled) * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: vmx: Stop wasting a page for guest_msrs KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332) Documentation: kvm: Fix mention to number of ioctls classes powerpc: Ultravisor: Add PPC_UV config option KVM: PPC: Book3S HV: Support reset of secure guest KVM: PPC: Book3S HV: Handle memory plug/unplug to secure VM KVM: PPC: Book3S HV: Radix changes for secure guest KVM: PPC: Book3S HV: Shared pages support for secure guests KVM: PPC: Book3S HV: Support for running secure guests mm: ksm: Export ksm_madvise() KVM x86: Move kvm cpuid support out of svm
2019-12-04Merge tag 'riscv/for-v5.5-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Paul Walmsley: "A few minor RISC-V updates for v5.5-rc1 that arrived late. New features: - Dump some kernel virtual memory map details to the console if CONFIG_DEBUG_VM is enabled Other improvements: - Enable more debugging options in the primary defconfigs Cleanups: - Clean up Kconfig indentation" * tag 'riscv/for-v5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Add address map dumper riscv: defconfigs: enable more debugging options riscv: defconfigs: enable debugfs riscv: Fix Kconfig indentation
2019-12-04Merge tag 'please-pull-misc-5.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 update from Tony Luck: "Cleanup some leftover para-virtualization pieces" * tag 'please-pull-misc-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: ia64: remove stale paravirt leftovers
2019-12-04Merge tag 'acpi-5.5-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull additional ACPI updates from Rafael Wysocki: "These close a nasty race condition in the ACPI memory mappings management code and an invalid parameter check in a library routing, allow GPE 0xFF to be masked via kernel command line, add a new lid switch blacklist entry and clean up Kconfig. Specifics: - Fix locking issue in acpi_os_map_cleanup() leading to a race condition that can be harnessed for provoking a kernel panic from user space (Francesco Ruggeri) - Fix parameter check in acpi_bus_get_private_data() (Vamshi K Sthambamkadi) - Allow GPE 0xFF to be masked via kernel command line (Yunfeng Ye) - Add a new lid switch blacklist entry for Acer Switch 10 SW5-032 to the ACPI button driver (Hans de Goede) - Clean up Kconfig (Krzysztof Kozlowski)" * tag 'acpi-5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data() ACPI: Fix Kconfig indentation ACPI: OSL: only free map once in osl.c ACPI: button: Add DMI quirk for Acer Switch 10 SW5-032 lid-switch ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100
2019-12-04Merge tag 'pm-5.5-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull additional power management updates from Rafael Wysocki: "These fix an ACPI EC driver bug exposed by the recent rework of the suspend-to-idle code flow, reintroduce frequency constraints into device PM QoS (in preparation for adding QoS support to devfreq), drop a redundant field from struct cpuidle_state and clean up Kconfig in some places. Specifics: - Avoid a race condition in the ACPI EC driver that may cause systems to be unable to leave suspend-to-idle (Rafael Wysocki) - Drop the "disabled" field, which is redundant, from struct cpuidle_state (Rafael Wysocki) - Reintroduce device PM QoS frequency constraints (temporarily introduced and than dropped during the 5.4 cycle) in preparation for adding QoS support to devfreq (Leonard Crestez) - Clean up indentation (in multiple places) and the cpuidle drivers help text in Kconfig (Krzysztof Kozlowski, Randy Dunlap)" * tag 'pm-5.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PM: s2idle: Rework ACPI events synchronization ACPI: EC: Rework flushing of pending work PM / devfreq: Add missing locking while setting suspend_freq PM / QoS: Restore DEV_PM_QOS_MIN/MAX_FREQUENCY PM / QoS: Reorder pm_qos/freq_qos/dev_pm_qos structs PM / QoS: Initial kunit test PM / QoS: Redefine FREQ_QOS_MAX_DEFAULT_VALUE to S32_MAX power: avs: Fix Kconfig indentation cpufreq: Fix Kconfig indentation cpuidle: minor Kconfig help text fixes cpuidle: Drop disabled field from struct cpuidle_state cpuidle: Fix Kconfig indentation
2019-12-04io_uring: ensure deferred timeouts copy necessary dataJens Axboe
If we defer a timeout, we should ensure that we copy the timespec when we have consumed the sqe. This is similar to commit f67676d160c6 for read/write requests. We already did this correctly for timeouts deferred as links, but do it generally and use the infrastructure added by commit 1a6b74fc8702 instead of having the timeout deferral use its own. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04cifs: fix possible uninitialized access and race on iface_listAurelien Aptel
iface[0] was accessed regardless of the count value and without locking. * check count before accessing any ifaces * make copy of iface list (it's a simple POD array) and use it without locking. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
2019-12-04cifs: Fix lookup of SMB connections on multichannelPaulo Alcantara (SUSE)
With the addition of SMB session channels, we introduced new TCP server pointers that have no sessions or tcons associated with them. In this case, when we started looking for TCP connections, we might end up picking session channel rather than the master connection, hence failing to get either a session or a tcon. In order to fix that, this patch introduces a new "is_channel" field to TCP_Server_Info structure so we can skip session channels during lookup of connections. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-04io_uring: allow IO_SQE_* flags on IORING_OP_TIMEOUTJens Axboe
There's really no reason why we forbid things like link/drain etc on regular timeout commands. Enable the usual SQE flags on timeouts. Reported-by: 李通洲 <carter.li@eoitek.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04iomap: fix sub-page uptodate handlingChristoph Hellwig
bio completions can race when a page spans more than one file system block. Add a spinlock to synchronize marking the page uptodate. Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads") Reported-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-04Merge branch 'v5.5/vfio/jiang-yi-irq-bypass-unregister-v1' into v5.5/vfio/nextAlex Williamson
2019-12-04drm/omap: fix dma_addr refcountingTomi Valkeinen
cec4fa7511ef7a73eb635834e9d85b25a5b47a98 ("drm/omap: use refcount API to track the number of users of dma_addr") changed omap_gem.c to use refcounting API to track dma_addr uses. However, the driver only tracks the refcounts for non-contiguous buffers, and the patch didn't fully take this in account. After the patch, the driver always decreased refcount in omap_gem_unpin, instead of decreasing the refcount only for non-contiguous buffers. This leads to refcounting mismatch. As for the contiguous cases the refcount is never increased, fix this issue by returning from omap_gem_unpin if the buffer being unpinned is contiguous. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191114080343.30704-1-tomi.valkeinen@ti.com Fixes: cec4fa7511ef ("drm/omap: use refcount API to track the number of users of dma_addr") Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2019-12-04null_blk: remove unused variable warning on !CONFIG_BLK_DEV_ZONEDJens Axboe
If BLK_DEV_ZONED isn't set, 'ret' isn't used. This makes gcc complain, rightfully. Move ret where it is used. Fixes: 979d54475e0b ("null_blk: cleanup null_gendisk_register") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04brd: warn on un-aligned bufferMing Lei
Queue dma alignment limit requires users(fs, target, ...) of block layer to pass aligned buffer. So far brd doesn't support un-aligned buffer, even though it is easy to support it. However, given brd is often used for debug purpose, and there are other drivers which can't support un-aligned buffer too. So add warning so that brd users know what to fix. Reported-by: Stephen Rust <srust@blockbridge.com> Cc: Stephen Rust <srust@blockbridge.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04brd: remove max_hw_sectors queue limitMing Lei
Now we depend on blk_queue_split() to respect most of queue limit (the only one exception could be dma alignment), however blk_queue_split() isn't used for brd, so this limit isn't respected since v4.3. Also max_hw_sectors limit doesn't play a big role for brd, which is added since brd is added to tree for unknown reason. So remove it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-04ALSA: pcm: oss: Avoid potential buffer overflowsTakashi Iwai
syzkaller reported an invalid access in PCM OSS read, and this seems to be an overflow of the internal buffer allocated for a plugin. Since the rate plugin adjusts its transfer size dynamically, the calculation for the chained plugin might be bigger than the given buffer size in some extreme cases, which lead to such an buffer overflow as caught by KASAN. Fix it by limiting the max transfer size properly by checking against the destination size in each plugin transfer callback. Reported-by: syzbot+f153bde47a62e0b05f83@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191204144824.17801-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-12-04orangefs: posix open permission checking...Mike Marshall
Orangefs has no open, and orangefs checks file permissions on each file access. Posix requires that file permissions be checked on open and nowhere else. Orangefs-through-the-kernel needs to seem posix compliant. The VFS opens files, even if the filesystem provides no method. We can see if a file was successfully opened for read and or for write by looking at file->f_mode. When writes are flowing from the page cache, file is no longer available. We can trust the VFS to have checked file->f_mode before writing to the page cache. The mode of a file might change between when it is opened and IO commences, or it might be created with an arbitrary mode. We'll make sure we don't hit EACCES during the IO stage by using UID 0. Some of the time we have access without changing to UID 0 - how to check? Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-12-04tracing: Do not create directories if lockdown is in affectSteven Rostedt (VMware)
If lockdown is disabling tracing on boot up, it prevents the tracing files from even bering created. But when that happens, there's several places that will give a warning that the files were not created as that is usually a sign of a bug. Add in strategic locations where a check is made to see if tracing is disabled by lockdown, and if it is, do not go further, and fail silently (but print that tracing is disabled by lockdown, without doing a WARN_ON()). Cc: Matthew Garrett <mjg59@google.com> Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-05powerpc: Fix vDSO clock_getres()Vincenzo Frascino
clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. Fix the powerpc vdso implementation of clock_getres keeping a copy of hrtimer_resolution in vdso data and using that directly. Fixes: a7f290dad32e ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel") Cc: stable@vger.kernel.org Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr
2019-12-05powerpc/pmem: Fix kernel crash due to wrong range value usage in ↵Aneesh Kumar K.V
flush_dcache_range This patch fix the below kernel crash. BUG: Unable to handle kernel data access on read at 0xc000000380000000 Faulting instruction address: 0xc00000000008b6f0 cpu 0x5: Vector: 300 (Data Access) at [c0000000d8587790] pc: c00000000008b6f0: arch_remove_memory+0x150/0x210 lr: c00000000008b720: arch_remove_memory+0x180/0x210 sp: c0000000d8587a20 msr: 800000000280b033 dar: c000000380000000 dsisr: 40000000 current = 0xc0000000d8558600 paca = 0xc00000000fff8f00 irqmask: 0x03 irq_happened: 0x01 pid = 1220, comm = ndctl enter ? for help memunmap_pages+0x33c/0x410 devm_action_release+0x30/0x50 release_nodes+0x30c/0x3a0 device_release_driver_internal+0x178/0x240 unbind_store+0x74/0x190 drv_attr_store+0x44/0x60 sysfs_kf_write+0x74/0xa0 kernfs_fop_write+0x1b0/0x260 __vfs_write+0x3c/0x70 vfs_write+0xe4/0x200 ksys_write+0x7c/0x140 system_call+0x5c/0x68 Fixes: 076265907cf9 ("powerpc: Chunk calls to flush_dcache_range in arch_*_memory") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191204052909.59145-1-aneesh.kumar@linux.ibm.com
2019-12-05powerpc/xive: Skip ioremap() of ESB pages for LSI interruptsCédric Le Goater
The PCI INTx interrupts and other LSI interrupts are handled differently under a sPAPR platform. When the interrupt source characteristics are queried, the hypervisor returns an H_INT_ESB flag to inform the OS that it should be using the H_INT_ESB hcall for interrupt management and not loads and stores on the interrupt ESB pages. A default -1 value is returned for the addresses of the ESB pages. The driver ignores this condition today and performs a bogus IO mapping. Recent changes and the DEBUG_VM configuration option make the bug visible with : kernel BUG at arch/powerpc/include/asm/book3s/64/pgtable.h:612! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=1024 NUMA pSeries Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-0.rc6.git0.1.fc32.ppc64le #1 NIP: c000000000f63294 LR: c000000000f62e44 CTR: 0000000000000000 REGS: c0000000fa45f0d0 TRAP: 0700 Not tainted (5.4.0-0.rc6.git0.1.fc32.ppc64le) ... NIP ioremap_page_range+0x4c4/0x6e0 LR ioremap_page_range+0x74/0x6e0 Call Trace: ioremap_page_range+0x74/0x6e0 (unreliable) do_ioremap+0x8c/0x120 __ioremap_caller+0x128/0x140 ioremap+0x30/0x50 xive_spapr_populate_irq_data+0x170/0x260 xive_irq_domain_map+0x8c/0x170 irq_domain_associate+0xb4/0x2d0 irq_create_mapping+0x1e0/0x3b0 irq_create_fwspec_mapping+0x27c/0x3e0 irq_create_of_mapping+0x98/0xb0 of_irq_parse_and_map_pci+0x168/0x230 pcibios_setup_device+0x88/0x250 pcibios_setup_bus_devices+0x54/0x100 __of_scan_bus+0x160/0x310 pcibios_scan_phb+0x330/0x390 pcibios_init+0x8c/0x128 do_one_initcall+0x60/0x2c0 kernel_init_freeable+0x290/0x378 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x80 Fixes: bed81ee181dd ("powerpc/xive: introduce H_INT_ESB hcall") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191203163642.2428-1-clg@kaod.org
2019-12-05powerpc/kasan: Fix boot failure with RELOCATABLE && FSL_BOOKEChristophe Leroy
When enabling CONFIG_RELOCATABLE and CONFIG_KASAN on FSL_BOOKE, the kernel doesn't boot. relocate_init() requires KASAN early shadow area to be set up because it needs access to the device tree through generic functions. Call kasan_early_init() before calling relocate_init() Reported-by: Lexi Shao <shaolexi@huawei.com> Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b58426f1664a4b344ff696d18cacf3b3e8962111.1575036985.git.christophe.leroy@c-s.fr
2019-12-04drm/tegra: Run hub cleanup on ->remove()Thierry Reding
The call to tegra_display_hub_cleanup() that takes care of disabling the window groups is missing from the driver's ->remove() callback. Call it to make sure the runtime PM reference counts for the display controllers are balanced. Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: sor: Make the +5V HDMI supply optionalThierry Reding
The SOR supports multiple display modes, but only when driving an HDMI monitor does it make sense to control the +5V power supply. eDP and DP don't need this, so make it optional. This fixes a crash observed during system suspend/resume. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: Silence expected errors on IOMMU attachThierry Reding
Subdevices may not be hooked up to an IOMMU via device tree. Detect such situations and avoid confusing users by not emitting an error message. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: vic: Export module device tableThierry Reding
Export the module device table to ensure the VIC compatible strings are listed in the module's aliases table. This in turn causes the driver to be automatically loaded on boot if VIC is the only enabled subdevice of the logical host1x DRM device. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: sor: Implement system suspend/resumeThierry Reding
Upon system suspend, make sure the +5V HDMI regulator is disabled. This avoids potentially leaking current to the HDMI connector. This also makes sure that upon resume the regulator is enabled again, which in some cases is necessary to properly restore the state of the supply on resume. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: Use proper IOVA address for cursor imageThierry Reding
The IOVA address for the cursor is the result of mapping the buffer object for the given display controller. Make sure to use the proper IOVA address as stored in the cursor's plane state. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: gem: Remove premature import restrictionsThierry Reding
All the display related blocks on Tegra require contiguous memory. Using the DMA API, there is no knowing at import time which device will end up using the buffer, so it's not known whether or not an IOMMU will be used to map the buffer. Move the check for non-contiguous buffers/mappings to the tegra_dc_pin() function which is now the earliest point where it is known if a DMA BUF can be used by the given device or not. v2: add check for contiguous buffer/mapping in tegra_dc_pin() Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: gem: Properly pin imported buffersThierry Reding
Buffers that are imported from a DMA-BUF don't have pages allocated with them. At the same time an SG table for them can't be derived using the DMA API helpers because the necessary information doesn't exist. However there's already an SG table that was created during import, so this can simply be duplicated. Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04drm/tegra: hub: Remove bogus connection mutex checkThierry Reding
The Tegra DRM never actually took that lock, so the driver was broken until generic locking was added in commit: commit b962a12050a387e4bbf3a48745afe1d29d396b0d Author: Rob Clark <robdclark@gmail.com> Date: Mon Oct 22 14:31:22 2018 +0200 drm/atomic: integrate modeset lock with private objects It's now no longer necessary to take that lock, so drop the check. v2: add rationale for why it is now safe to drop the check (Daniel) Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Thierry Reding <treding@nvidia.com>
2019-12-04arm64: mm: Fix column alignment for UXN in kernel_page_tablesMark Brown
UXN is the only individual PTE bit other than the PTE_ATTRINDX_MASK ones which doesn't have both a set and a clear value provided, meaning that the columns in the table won't all be aligned. The PTE_ATTRINDX_MASK values are all both mutually exclusive and longer so are listed last to make a single final column for those values. Ensure everything is aligned by providing a clear value for UXN. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-12-04arm64: insn: consistently handle exit textMark Rutland
A kernel built with KASAN && FTRACE_WITH_REGS && !MODULES, produces a boot-time splat in the bowels of ftrace: | [ 0.000000] ftrace: allocating 32281 entries in 127 pages | [ 0.000000] ------------[ cut here ]------------ | [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2019 ftrace_bug+0x27c/0x328 | [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-rc3-00008-g7f08ae53a7e3 #13 | [ 0.000000] Hardware name: linux,dummy-virt (DT) | [ 0.000000] pstate: 60000085 (nZCv daIf -PAN -UAO) | [ 0.000000] pc : ftrace_bug+0x27c/0x328 | [ 0.000000] lr : ftrace_init+0x640/0x6cc | [ 0.000000] sp : ffffa000120e7e00 | [ 0.000000] x29: ffffa000120e7e00 x28: ffff00006ac01b10 | [ 0.000000] x27: ffff00006ac898c0 x26: dfffa00000000000 | [ 0.000000] x25: ffffa000120ef290 x24: ffffa0001216df40 | [ 0.000000] x23: 000000000000018d x22: ffffa0001244c700 | [ 0.000000] x21: ffffa00011bf393c x20: ffff00006ac898c0 | [ 0.000000] x19: 00000000ffffffff x18: 0000000000001584 | [ 0.000000] x17: 0000000000001540 x16: 0000000000000007 | [ 0.000000] x15: 0000000000000000 x14: ffffa00010432770 | [ 0.000000] x13: ffff940002483519 x12: 1ffff40002483518 | [ 0.000000] x11: 1ffff40002483518 x10: ffff940002483518 | [ 0.000000] x9 : dfffa00000000000 x8 : 0000000000000001 | [ 0.000000] x7 : ffff940002483519 x6 : ffffa0001241a8c0 | [ 0.000000] x5 : ffff940002483519 x4 : ffff940002483519 | [ 0.000000] x3 : ffffa00011780870 x2 : 0000000000000001 | [ 0.000000] x1 : 1fffe0000d591318 x0 : 0000000000000000 | [ 0.000000] Call trace: | [ 0.000000] ftrace_bug+0x27c/0x328 | [ 0.000000] ftrace_init+0x640/0x6cc | [ 0.000000] start_kernel+0x27c/0x654 | [ 0.000000] random: get_random_bytes called from print_oops_end_marker+0x30/0x60 with crng_init=0 | [ 0.000000] ---[ end trace 0000000000000000 ]--- | [ 0.000000] ftrace faulted on writing | [ 0.000000] [<ffffa00011bf393c>] _GLOBAL__sub_D_65535_0___tracepoint_initcall_level+0x4/0x28 | [ 0.000000] Initializing ftrace call sites | [ 0.000000] ftrace record flags: 0 | [ 0.000000] (0) | [ 0.000000] expected tramp: ffffa000100b3344 This is due to an unfortunate combination of several factors. Building with KASAN results in the compiler generating anonymous functions to register/unregister global variables against the shadow memory. These functions are placed in .text.startup/.text.exit, and given mangled names like _GLOBAL__sub_{I,D}_65535_0_$OTHER_SYMBOL. The kernel linker script places these in .init.text and .exit.text respectively, which are both discarded at runtime as part of initmem. Building with FTRACE_WITH_REGS uses -fpatchable-function-entry=2, which also instruments KASAN's anonymous functions. When these are discarded with the rest of initmem, ftrace removes dangling references to these call sites. Building without MODULES implicitly disables STRICT_MODULE_RWX, and causes arm64's patch_map() function to treat any !core_kernel_text() symbol as something that can be modified in-place. As core_kernel_text() is only true for .text and .init.text, with the latter depending on system_state < SYSTEM_RUNNING, we'll treat .exit.text as something that can be patched in-place. However, .exit.text is mapped read-only. Hence in this configuration the ftrace init code blows up while trying to patch one of the functions generated by KASAN. We could try to filter out the call sites in .exit.text rather than initializing them, but this would be inconsistent with how we handle .init.text, and requires hooking into core bits of ftrace. The behaviour of patch_map() is also inconsistent today, so instead let's clean that up and have it consistently handle .exit.text. This patch teaches patch_map() to handle .exit.text at init time, preventing the boot-time splat above. The flow of patch_map() is reworked to make the logic clearer and minimize redundant conditionality. Fixes: 3b23e4991fb66f6d ("arm64: implement ftrace with regs") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-12-04arm64: mm: Fix initialisation of DMA zones on non-NUMA systemsWill Deacon
John reports that the recently merged commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") breaks the boot on his DB845C board: | Booting Linux on physical CPU 0x0000000000 [0x517f803c] | Linux version 5.4.0-mainline-10675-g957a03b9e38f | Machine model: Thundercomm Dragonboard 845c | [...] | Built 1 zonelists, mobility grouping on. Total pages: -188245 | Kernel command line: earlycon | firmware_class.path=/vendor/firmware/ androidboot.hardware=db845c | init=/init androidboot.boot_devices=soc/1d84000.ufshc | printk.devkmsg=on buildvariant=userdebug root=/dev/sda2 | androidboot.bootdevice=1d84000.ufshc androidboot.serialno=c4e1189c | androidboot.baseband=sda | msm_drm.dsi_display0=dsi_lt9611_1080_video_display: | androidboot.slot_suffix=_a skip_initramfs rootwait ro init=/init | | <hangs indefinitely here> This is because, when CONFIG_NUMA=n, zone_sizes_init() fails to handle memblocks that fall entirely within the ZONE_DMA region and erroneously ends up trying to add a negatively-sized region into the following ZONE_DMA32, which is later interpreted as a large unsigned region by the core MM code. Rework the non-NUMA implementation of zone_sizes_init() so that the start address of the memblock being processed is adjusted according to the end of the previous zone, which is then range-checked before updating the hole information of subsequent zones. Cc: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/lkml/CALAqxLVVcsmFrDKLRGRq7GewcW405yTOxG=KR3csVzQ6bXutkA@mail.gmail.com Fixes: 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-12-04kvm: vmx: Stop wasting a page for guest_msrsJim Mattson
We will never need more guest_msrs than there are indices in vmx_msr_index. Thus, at present, the guest_msrs array will not exceed 168 bytes. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-04KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332)Paolo Bonzini
The bounds check was present in KVM_GET_SUPPORTED_CPUID but not KVM_GET_EMULATED_CPUID. Reported-by: syzbot+e3f4897236c4eeb8af4f@syzkaller.appspotmail.com Fixes: 84cffe499b94 ("kvm: Emulate MOVBE", 2013-10-29) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-04xen-blkback: allow module to be cleanly unloadedPaul Durrant
Add a module_exit() to perform the necessary clean-up. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-04xen/xenbus: reference count registered modulesPaul Durrant
To prevent a PV driver module being removed whilst attached to its other end, and hence xenbus calling into potentially invalid text, take a reference on the module before calling the probe() method (dropping it if unsuccessful) and drop the reference after returning from the remove() method. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-12-04docs/core-api: Remove possibly confusing sub-headings from Bit OperationsMichael Ellerman
The recent commit 81d2c6f81996 ("kasan: support instrumented bitops combined with generic bitops"), split the KASAN instrumented bitops into separate headers for atomic, non-atomic and locking operations. This was done to allow arches to include just the instrumented bitops they need, while also using some of the generic bitops in asm-generic/bitops (which are automatically instrumented). The generic bitops are already split into atomic, non-atomic and locking headers. This split required an update to kernel-api.rst because it included include/asm-generic/bitops-instrumented.h, which no longer exists. So now kernel-api.rst includes all three instrumented headers to get the definitions for all the bitops. When adding the three headers it seemed sensible to add sub-headings for each, ie. "Atomic", "Non-atomic" and "Locking". The confusion is that test_bit() is (and always has been) in non-atomic.h, but is documented elsewhere (atomic_bitops.txt) as being atomic. So having it appear under the "Non-atomic" heading is possibly confusing. Probably test_bit() should move from bitops/non-atomic.h to atomic.h, but that has flow on effects. For now just remove the newly added sub-headings in the documentation, so we at least aren't adding to the confusion about whether test_bit() is atomic or not. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-12-04Merge branches 'acpi-bus', 'acpi-button', 'acpi-sysfs' and 'acpi-misc'Rafael J. Wysocki
* acpi-bus: ACPI: bus: Fix NULL pointer check in acpi_bus_get_private_data() * acpi-button: ACPI: button: Add DMI quirk for Acer Switch 10 SW5-032 lid-switch * acpi-sysfs: ACPI: sysfs: Change ACPI_MASKABLE_GPE_MAX to 0x100 * acpi-misc: ACPI: Fix Kconfig indentation
2019-12-04Merge branches 'pm-sleep', 'pm-cpuidle', 'pm-cpufreq', 'pm-devfreq' and 'pm-avs'Rafael J. Wysocki
* pm-sleep: ACPI: PM: s2idle: Rework ACPI events synchronization ACPI: EC: Rework flushing of pending work * pm-cpuidle: cpuidle: minor Kconfig help text fixes cpuidle: Drop disabled field from struct cpuidle_state cpuidle: Fix Kconfig indentation * pm-cpufreq: cpufreq: Fix Kconfig indentation * pm-devfreq: PM / devfreq: Add missing locking while setting suspend_freq * pm-avs: power: avs: Fix Kconfig indentation
2019-12-04ia64: agp: Replace empty define with do whileCorentin Labbe
It's dangerous to use empty code define. Furthermore it lead to the following warning: drivers/char/agp/generic.c: In function « agp_generic_destroy_page »: drivers/char/agp/generic.c:1266:28: attention : suggest braces around empty body in an « if » statement [-Wempty-body] So let's replace emptyness by "do {} while(0)" Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/1574324085-4338-6-git-send-email-clabbe@baylibre.com