summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-10-29drm/panfrost: Don't dereference bogus MMU pointersRobin Murphy
It seems that killing an application while faults are occurring (particularly with a GPU in FPGA at a whopping 40MHz) can lead to handling a lingering page fault after all the address space contexts have already been freed. In this situation, the LRU list is empty so addr_to_drm_mm_node() ends up dereferencing the list head as if it were a struct panfrost_mmu entry; this leaves "mmu->as" actually pointing at the pfdev->alloc_mask bitmap, which is also empty, and given that the fault has a high likelihood of being in AS0, hilarity ensues. Sadly, the cleanest solution seems to involve another goto. Oh well, at least it's robust... Fixes: 65e51e30d862 ("drm/panfrost: Prevent race when handling page fault") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/9a0b09e6b5851f0d4428b72dd6b8b4c0d0ef4206.1572293305.git.robin.murphy@arm.com
2019-10-29drm/panfrost: fix -Wmissing-prototypes warningsYi Wang
We get these warnings when build kernel W=1: drivers/gpu/drm/panfrost/panfrost_perfcnt.c:35:6: warning: no previous prototype for ‘panfrost_perfcnt_clean_cache_done’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_perfcnt.c:40:6: warning: no previous prototype for ‘panfrost_perfcnt_sample_done’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_perfcnt.c:190:5: warning: no previous prototype for ‘panfrost_ioctl_perfcnt_enable’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_perfcnt.c:218:5: warning: no previous prototype for ‘panfrost_ioctl_perfcnt_dump’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_perfcnt.c:250:6: warning: no previous prototype for ‘panfrost_perfcnt_close’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_perfcnt.c:264:5: warning: no previous prototype for ‘panfrost_perfcnt_init’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_perfcnt.c:320:6: warning: no previous prototype for ‘panfrost_perfcnt_fini’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_mmu.c:227:6: warning: no previous prototype for ‘panfrost_mmu_flush_range’ [-Wmissing-prototypes] drivers/gpu/drm/panfrost/panfrost_mmu.c:435:5: warning: no previous prototype for ‘panfrost_mmu_map_fault_addr’ [-Wmissing-prototypes] For file panfrost_mmu.c, make functions static to fix this. For file panfrost_perfcnt.c, include header file can fix this. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Reviewed-by: Steven Price <steven.price@arm.com> Cc: stable@vger.kernel.org [robh: fixup function parameter alignment] Signed-off-by: Rob Herring <robh@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/1571967015-42854-1-git-send-email-wang.yi59@zte.com.cn
2019-10-29net: hisilicon: Fix "Trying to free already-free IRQ"Jiangfeng Xiao
When rmmod hip04_eth.ko, we can get the following warning: Task track: rmmod(1623)>bash(1591)>login(1581)>init(1) ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1623 at kernel/irq/manage.c:1557 __free_irq+0xa4/0x2ac() Trying to free already-free IRQ 200 Modules linked in: ping(O) pramdisk(O) cpuinfo(O) rtos_snapshot(O) interrupt_ctrl(O) mtdblock mtd_blkdevrtfs nfs_acl nfs lockd grace sunrpc xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables nf_reject_ipv CPU: 0 PID: 1623 Comm: rmmod Tainted: G O 4.4.193 #1 Hardware name: Hisilicon A15 [<c020b408>] (rtos_unwind_backtrace) from [<c0206624>] (show_stack+0x10/0x14) [<c0206624>] (show_stack) from [<c03f2be4>] (dump_stack+0xa0/0xd8) [<c03f2be4>] (dump_stack) from [<c021a780>] (warn_slowpath_common+0x84/0xb0) [<c021a780>] (warn_slowpath_common) from [<c021a7e8>] (warn_slowpath_fmt+0x3c/0x68) [<c021a7e8>] (warn_slowpath_fmt) from [<c026876c>] (__free_irq+0xa4/0x2ac) [<c026876c>] (__free_irq) from [<c0268a14>] (free_irq+0x60/0x7c) [<c0268a14>] (free_irq) from [<c0469e80>] (release_nodes+0x1c4/0x1ec) [<c0469e80>] (release_nodes) from [<c0466924>] (__device_release_driver+0xa8/0x104) [<c0466924>] (__device_release_driver) from [<c0466a80>] (driver_detach+0xd0/0xf8) [<c0466a80>] (driver_detach) from [<c0465e18>] (bus_remove_driver+0x64/0x8c) [<c0465e18>] (bus_remove_driver) from [<c02935b0>] (SyS_delete_module+0x198/0x1e0) [<c02935b0>] (SyS_delete_module) from [<c0202ed0>] (__sys_trace_return+0x0/0x10) ---[ end trace bb25d6123d849b44 ]--- Currently "rmmod hip04_eth.ko" call free_irq more than once as devres_release_all and hip04_remove both call free_irq. This results in a 'Trying to free already-free IRQ' warning. To solve the problem free_irq has been moved out of hip04_remove. Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29net: aquantia: fix unintention integer overflow on left shiftColin Ian King
Shifting the integer value 1 is evaluated using 32-bit arithmetic and then used in an expression that expects a 64-bit value, so there is potentially an integer overflow. Fix this by using the BIT_ULL macro to perform the shift and avoid the overflow. Addresses-Coverity: ("Unintentional integer overflow") Fixes: 04a1839950d9 ("net: aquantia: implement data PTP datapath") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29net: aquantia: fix spelling mistake: tx_queus -> tx_queuesColin Ian King
There is a spelling mistake in a netdev_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29fjes: Handle workqueue allocation failureWill Deacon
In the highly unlikely event that we fail to allocate either of the "/txrx" or "/control" workqueues, we should bail cleanly rather than blindly march on with NULL queue pointer(s) installed in the 'fjes_adapter' instance. Cc: "David S. Miller" <davem@davemloft.net> Reported-by: Nicolas Waisman <nico@semmle.com> Link: https://lore.kernel.org/lkml/CADJ_3a8WFrs5NouXNqS5WYe7rebFP+_A5CheeqAyD_p7DFJJcg@mail.gmail.com/ Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-29arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003Bjorn Andersson
With the introduction of 'cce360b54ce6 ("arm64: capabilities: Filter the entries based on a given mask")' the Qualcomm Falkor/Kryo errata 1003 is no long applied. The result of not applying errata 1003 is that MSM8996 runs into various RCU stalls and fails to boot most of the times. Give 1003 a "type" to ensure they are not filtered out in update_cpu_capabilities(). Fixes: cce360b54ce6 ("arm64: capabilities: Filter the entries based on a given mask") Cc: stable@vger.kernel.org Reported-by: Mark Brown <broonie@kernel.org> Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Will Deacon <will@kernel.org>
2019-10-29drm/etnaviv: fix dumping of iommuv2Christian Gmeiner
etnaviv_iommuv2_dump_size(..) returns the number of PTE * SZ_4K but etnaviv_iommuv2_dump(..) increments buf pointer even if there is no PTE. This results in a bad buf pointer which gets used for memcpy(..), when copying the MMU state in the coredump buffer. Fixes: afb7b3b1deb4 ("drm/etnaviv: implement IOMMUv2 translation") Cc: stable@vger.kernel.org Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-10-29drm/etnaviv: reinstate MMUv1 command buffer window checkLucas Stach
The switch to per-process address spaces erroneously dropped the check which validated that the command buffer is mapped through the linear apperture as required by the hardware. This turned a system misconfiguration with a helpful error message into a very hard to debug issue. Reinstate the check at the appropriate location. Fixes: 17e4660ae3d7 (drm/etnaviv: implement per-process address spaces on MMUv2) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-10-29drm/etnaviv: fix deadlock in GPU coredumpLucas Stach
The GPU coredump function violates the locking order by holding the MMU context lock while trying to acquire the etnaviv_gem_object lock. This results in a possible ABBA deadlock with other codepaths which follow the established locking order. Fortunately this is easy to fix by dropping the MMU context lock earlier, as the BO dumping doesn't need the MMU context to be stable. The only thing the BO dumping cares about are the BO mappings, which are stable across the lifetime of the job. Fixes: 27b67278e007 (drm/etnaviv: rework MMU handling) [ Not really the first bad commit, but the one where this fix applies cleanly. Stable kernels need a manual backport. ] Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2019-10-29Merge tag 'fuse-fixes-5.4-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Mostly virtiofs fixes, but also fixes a regression and couple of longstanding data/metadata writeback ordering issues" * tag 'fuse-fixes-5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: redundant get_fuse_inode() calls in fuse_writepages_fill() fuse: Add changelog entries for protocols 7.1 - 7.8 fuse: truncate pending writes on O_TRUNC fuse: flush dirty data/metadata before non-truncate setattr virtiofs: Remove set but not used variable 'fc' virtiofs: Retry request submission from worker context virtiofs: Count pending forgets as in_flight forgets virtiofs: Set FR_SENT flag only after request has been sent virtiofs: No need to check fpq->connected state virtiofs: Do not end request in submission context fuse: don't advise readdirplus for negative lookup fuse: don't dereference req->args on finished request virtio-fs: don't show mount options virtio-fs: Change module name to virtiofs.ko
2019-10-29arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by defaultCatalin Marinas
Shared and writable mappings (__S.1.) should be clean (!dirty) initially and made dirty on a subsequent write either through the hardware DBM (dirty bit management) mechanism or through a write page fault. A clean pte for the arm64 kernel is one that has PTE_RDONLY set and PTE_DIRTY clear. The PAGE_SHARED{,_EXEC} attributes have PTE_WRITE set (PTE_DBM) and PTE_DIRTY clear. Prior to commit 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"), it was the responsibility of set_pte_at() to set the PTE_RDONLY bit and mark the pte clean if the software PTE_DIRTY bit was not set. However, the above commit removed the pte_sw_dirty() check and the subsequent setting of PTE_RDONLY in set_pte_at() while leaving the PAGE_SHARED{,_EXEC} definitions unchanged. The result is that shared+writable mappings are now dirty by default Fix the above by explicitly setting PTE_RDONLY in PAGE_SHARED{,_EXEC}. In addition, remove the superfluous PTE_DIRTY bit from the kernel PROT_* attributes. Fixes: 73e86cb03cf2 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2019-10-29um-ubd: Entrust re-queue to the upper layersAnton Ivanov
Fixes crashes due to ubd requeue logic conflicting with the block-mq logic. Crash is reproducible in 5.0 - 5.3. Fixes: 53766defb8c8 ("um: Clean-up command processing in UML UBD driver") Cc: stable@vger.kernel.org # v5.0+ Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29nvme-multipath: remove unused groups_only mode in ana logAnton Eidelman
groups_only mode in nvme_read_ana_log() is no longer used: remove it. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29nvme-multipath: fix possible io hang after ctrl reconnectAnton Eidelman
The following scenario results in an IO hang: 1) ctrl completes a request with NVME_SC_ANA_TRANSITION. NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered. 2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl. This fails because ctrl disconnects. Therefore nvme_update_ns_ana_state() is not called and NVME_NS_ANA_PENDING bit in ns->flags is not cleared. 3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls nvme_read_ana_log(ctrl, groups_only=true). However, nvme_update_ana_state() does not update namespaces because nr_nsids = 0 (due to groups_only mode). 4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK. Result: The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set. Consequently ctrl will never be considered a viable path by __nvme_find_path(). IO will hang if ctrl is the only or the last path to the namespace. More generally, while ctrl is reconnecting, its ANA state may change. And because nvme_mpath_init() requests ANA log in groups_only mode, these changes are not propagated to the existing ctrl namespaces. This may result in a mal-function or an IO hang. Solution: nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false. This will not harm the new ctrl case (no namespaces present), and will make sure the ANA state of namespaces gets updated after reconnect. Note: Another option would be for nvme_mpath_init() to invoke nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29libbpf: Don't use kernel-side u32 type in xsk.cAndrii Nakryiko
u32 is a kernel-side typedef. User-space library is supposed to use __u32. This breaks Github's projection of libbpf. Do u32 -> __u32 fix. Fixes: 94ff9ebb49a5 ("libbpf: Fix compatibility for kernels without need_wakeup") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20191029055953.2461336-1-andriin@fb.com
2019-10-29sched/topology: Allow sched_asym_cpucapacity to be disabledValentin Schneider
While the static key is correctly initialized as being disabled, it will remain forever enabled once turned on. This means that if we start with an asymmetric system and hotplug out enough CPUs to end up with an SMP system, the static key will remain set - which is obviously wrong. We should detect this and turn off things like misfit migration and capacity aware wakeups. As Quentin pointed out, having separate root domains makes this slightly trickier. We could have exclusive cpusets that create an SMP island - IOW, the domains within this root domain will not see any asymmetry. This means we can't just disable the key on domain destruction, we need to count how many asymmetric root domains we have. Consider the following example using Juno r0 which is 2+4 big.LITTLE, where two identical cpusets are created: they both span both big and LITTLE CPUs: asym0 asym1 [ ][ ] L L B L L B $ cgcreate -g cpuset:asym0 $ cgset -r cpuset.cpus=0,1,3 asym0 $ cgset -r cpuset.mems=0 asym0 $ cgset -r cpuset.cpu_exclusive=1 asym0 $ cgcreate -g cpuset:asym1 $ cgset -r cpuset.cpus=2,4,5 asym1 $ cgset -r cpuset.mems=0 asym1 $ cgset -r cpuset.cpu_exclusive=1 asym1 $ cgset -r cpuset.sched_load_balance=0 . (the CPU numbering may look odd because on the Juno LITTLEs are CPUs 0,3-5 and bigs are CPUs 1-2) If we make one of those SMP (IOW remove asymmetry) by e.g. hotplugging its big core, we would end up with an SMP cpuset and an asymmetric cpuset - the static key must remain set, because we still have one asymmetric root domain. With the above example, this could be done with: $ echo 0 > /sys/devices/system/cpu/cpu2/online Which would result in: asym0 asym1 [ ][ ] L L B L L When both SMP and asymmetric cpusets are present, all CPUs will observe sched_asym_cpucapacity being set (it is system-wide), but not all CPUs observe asymmetry in their sched domain hierarchy: per_cpu(sd_asym_cpucapacity, <any CPU in asym0>) == <some SD at DIE level> per_cpu(sd_asym_cpucapacity, <any CPU in asym1>) == NULL Change the simple key enablement to an increment, and decrement the key counter when destroying domains that cover asymmetric CPUs. Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Dietmar.Eggemann@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hannes@cmpxchg.org Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: qperret@google.com Cc: tj@kernel.org Cc: vincent.guittot@linaro.org Fixes: df054e8445a4 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations") Link: https://lkml.kernel.org/r/20191023153745.19515-3-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-29sched/topology: Don't try to build empty sched domainsValentin Schneider
Turns out hotplugging CPUs that are in exclusive cpusets can lead to the cpuset code feeding empty cpumasks to the sched domain rebuild machinery. This leads to the following splat: Internal error: Oops: 96000004 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23 Hardware name: ARM Juno development board (r0) (DT) Workqueue: events cpuset_hotplug_workfn pstate: 60000005 (nZCv daif -PAN -UAO) pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969) lr : build_sched_domains (kernel/sched/topology.c:1966) Call trace: build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969) partition_sched_domains_locked (kernel/sched/topology.c:2250) rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019) rebuild_sched_domains (kernel/cgroup/cpuset.c:1032) cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2)) process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274) worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416) kthread (kernel/kthread.c:255) ret_from_fork (arch/arm64/kernel/entry.S:1167) Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853) The faulty line in question is: cap = arch_scale_cpu_capacity(cpumask_first(cpu_map)); and we're not checking the return value against nr_cpu_ids (we shouldn't have to!), which leads to the above. Prevent generate_sched_domains() from returning empty cpumasks, and add some assertion in build_sched_domains() to scream bloody murder if it happens again. The above splat was obtained on my Juno r0 with the following reproducer: $ cgcreate -g cpuset:asym $ cgset -r cpuset.cpus=0-3 asym $ cgset -r cpuset.mems=0 asym $ cgset -r cpuset.cpu_exclusive=1 asym $ cgcreate -g cpuset:smp $ cgset -r cpuset.cpus=4-5 smp $ cgset -r cpuset.mems=0 smp $ cgset -r cpuset.cpu_exclusive=1 smp $ cgset -r cpuset.sched_load_balance=0 . $ echo 0 > /sys/devices/system/cpu/cpu4/online $ echo 0 > /sys/devices/system/cpu/cpu5/online Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar.Eggemann@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hannes@cmpxchg.org Cc: lizefan@huawei.com Cc: morten.rasmussen@arm.com Cc: qperret@google.com Cc: tj@kernel.org Cc: vincent.guittot@linaro.org Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection") Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-28libbpf: Fix off-by-one error in ELF sanity checkAndrii Nakryiko
libbpf's bpf_object__elf_collect() does simple sanity check after iterating over all ELF sections, if checks that .strtab index is correct. Unfortunately, due to section indices being 1-based, the check breaks for cases when .strtab ends up being the very last section in ELF. Fixes: 77ba9a5b48a7 ("tools lib bpf: Fetch map names from correct strtab") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191028233727.1286699-1-andriin@fb.com
2019-10-28libbpf: Fix compatibility for kernels without need_wakeupMagnus Karlsson
When the need_wakeup flag was added to AF_XDP, the format of the XDP_MMAP_OFFSETS getsockopt was extended. Code was added to the kernel to take care of compatibility issues arrising from running applications using any of the two formats. However, libbpf was not extended to take care of the case when the application/libbpf uses the new format but the kernel only supports the old format. This patch adds support in libbpf for parsing the old format, before the need_wakeup flag was added, and emulating a set of static need_wakeup flags that will always work for the application. v2 -> v3: * Incorporated code improvements suggested by Jonathan Lemon v1 -> v2: * Rebased to bpf-next * Rewrote the code as the previous version made you blind Fixes: a4500432c2587cb2a ("libbpf: add support for need_wakeup flag in AF_XDP part") Reported-by: Eloy Degen <degeneloy@gmail.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1571995035-21889-1-git-send-email-magnus.karlsson@intel.com
2019-10-28atm: remove unneeded semicolonYueHaibing
remove unneeded semicolon. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28Merge tag 'batadv-net-for-davem-20191025' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: * Fix free/alloc race for OGM and OGMv2, by Sven Eckelmann (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28sock: remove unneeded semicolonYueHaibing
remove unneeded semicolon. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: mediatek: remove unneeded semicolonYueHaibing
remove unneeded semicolon. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28mlxsw: spectrum_buffers: remove unneeded semicolonYueHaibing
Remove excess semicolon after closing parenthesis. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: usb: lan78xx: Disable interrupts before calling generic_handle_irq()Daniel Wagner
lan78xx_status() will run with interrupts enabled due to the change in ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler"). generic_handle_irq() expects to be run with IRQs disabled. [ 4.886203] 000: irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts [ 4.886243] 000: WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x154/0x168 [ 4.896294] 000: Modules linked in: [ 4.896301] 000: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.6 #39 [ 4.896310] 000: Hardware name: Raspberry Pi 3 Model B+ (DT) [ 4.896315] 000: pstate: 60000005 (nZCv daif -PAN -UAO) [ 4.896321] 000: pc : __handle_irq_event_percpu+0x154/0x168 [ 4.896331] 000: lr : __handle_irq_event_percpu+0x154/0x168 [ 4.896339] 000: sp : ffff000010003cc0 [ 4.896346] 000: x29: ffff000010003cc0 x28: 0000000000000060 [ 4.896355] 000: x27: ffff000011021980 x26: ffff00001189c72b [ 4.896364] 000: x25: ffff000011702bc0 x24: ffff800036d6e400 [ 4.896373] 000: x23: 000000000000004f x22: ffff000010003d64 [ 4.896381] 000: x21: 0000000000000000 x20: 0000000000000002 [ 4.896390] 000: x19: ffff8000371c8480 x18: 0000000000000060 [ 4.896398] 000: x17: 0000000000000000 x16: 00000000000000eb [ 4.896406] 000: x15: ffff000011712d18 x14: 7265746e69206465 [ 4.896414] 000: x13: ffff000010003ba0 x12: ffff000011712df0 [ 4.896422] 000: x11: 0000000000000001 x10: ffff000011712e08 [ 4.896430] 000: x9 : 0000000000000001 x8 : 000000000003c920 [ 4.896437] 000: x7 : ffff0000118cc410 x6 : ffff0000118c7f00 [ 4.896445] 000: x5 : 000000000003c920 x4 : 0000000000004510 [ 4.896453] 000: x3 : ffff000011712dc8 x2 : 0000000000000000 [ 4.896461] 000: x1 : 73a3f67df94c1500 x0 : 0000000000000000 [ 4.896466] 000: Call trace: [ 4.896471] 000: __handle_irq_event_percpu+0x154/0x168 [ 4.896481] 000: handle_irq_event_percpu+0x50/0xb0 [ 4.896489] 000: handle_irq_event+0x40/0x98 [ 4.896497] 000: handle_simple_irq+0xa4/0xf0 [ 4.896505] 000: generic_handle_irq+0x24/0x38 [ 4.896513] 000: intr_complete+0xb0/0xe0 [ 4.896525] 000: __usb_hcd_giveback_urb+0x58/0xd8 [ 4.896533] 000: usb_giveback_urb_bh+0xd0/0x170 [ 4.896539] 000: tasklet_action_common.isra.0+0x9c/0x128 [ 4.896549] 000: tasklet_hi_action+0x24/0x30 [ 4.896556] 000: __do_softirq+0x120/0x23c [ 4.896564] 000: irq_exit+0xb8/0xd8 [ 4.896571] 000: __handle_domain_irq+0x64/0xb8 [ 4.896579] 000: bcm2836_arm_irqchip_handle_irq+0x60/0xc0 [ 4.896586] 000: el1_irq+0xb8/0x140 [ 4.896592] 000: arch_cpu_idle+0x10/0x18 [ 4.896601] 000: do_idle+0x200/0x280 [ 4.896608] 000: cpu_startup_entry+0x20/0x28 [ 4.896615] 000: rest_init+0xb4/0xc0 [ 4.896623] 000: arch_call_rest_init+0xc/0x14 [ 4.896632] 000: start_kernel+0x454/0x480 Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler") Cc: Woojung Huh <woojung.huh@microchip.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stefan Wahren <wahrenst@gmx.net> Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Daniel Wagner <dwagner@suse.de> Tested-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: sja1105: improve NET_DSA_SJA1105_TAS dependencyArnd Bergmann
An earlier bugfix introduced a dependency on CONFIG_NET_SCH_TAPRIO, but this missed the case of NET_SCH_TAPRIO=m and NET_DSA_SJA1105=y, which still causes a link error: drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_setup_tc_taprio': sja1105_tas.c:(.text+0x5c): undefined reference to `taprio_offload_free' sja1105_tas.c:(.text+0x3b4): undefined reference to `taprio_offload_get' drivers/net/dsa/sja1105/sja1105_tas.o: In function `sja1105_tas_teardown': sja1105_tas.c:(.text+0x6ec): undefined reference to `taprio_offload_free' Change the dependency to only allow selecting the TAS code when it can link against the taprio code. Fixes: a8d570de0cc6 ("net: dsa: sja1105: Add dependency for NET_DSA_SJA1105_TAS") Fixes: 317ab5b86c8e ("net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: ethernet: ftgmac100: Fix DMA coherency issue with SW checksumBenjamin Herrenschmidt
We are calling the checksum helper after the dma_map_single() call to map the packet. This is incorrect as the checksumming code will touch the packet from the CPU. This means the cache won't be properly flushes (or the bounce buffering will leave us with the unmodified packet to DMA). This moves the calculation of the checksum & vlan tags to before the DMA mapping. This also has the side effect of fixing another bug: If the checksum helper fails, we goto "drop" to drop the packet, which will not unmap the DMA mapping. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Fixes: 05690d633f30 ("ftgmac100: Upgrade to NETIF_F_HW_CSUM") Reviewed-by: Vijay Khemka <vijaykhemka@fb.com> Tested-by: Vijay Khemka <vijaykhemka@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28Merge branch 'mv88e6xxx-Allow-config-of-ATU-hash-algorithm'David S. Miller
Andrew Lunn says: ==================== mv88e6xxx: Allow config of ATU hash algorithm v2: Pass a pointer for where the hash should be stored, return a plain errno, or 0. Document the parameter. v3: Document type of parameter, and valid range Add break statements to default clause of switch Directly use ctx->val.vu8 v4: Consistently use devlink, not a mix of devlink and dl. Fix allocation of devlink priv Remove upper case from parameter name Make mask 16 bit wide. v5: Back to using the parameter name ATU_hash v6: Rebase net-next/master ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.Andrew Lunn
Some of the marvell switches have bits controlling the hash algorithm the ATU uses for MAC addresses. In some industrial settings, where all the devices are from the same manufacture, and hence use the same OUI, the default hashing algorithm is not optimal. Allow the other algorithms to be selected via devlink. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: Add support for devlink device parametersAndrew Lunn
Add plumbing to allow DSA drivers to register parameters with devlink. To keep with the abstraction, the DSA drivers pass the ds structure to these helpers, and the DSA core then translates that to the devlink structure associated to the device. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28r8169: use helper rtl_hw_aspm_clkreq_enable also in rtl_hw_start_8168g_2Heiner Kallweit
One place in the driver was left where the open-coded functionality hasn't been replaced with helper rtl_hw_aspm_clkreq_enable yet. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: fix sk_page_frag() recursion from memory reclaimTejun Heo
sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the socket allocation mask allows blocking - if the allocation may block, it obviously owns the task's context and ergo exclusively owns current->task_frag. Unfortunately, this misses recursion through memory reclaim path. Please take a look at the following backtrace. [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10 ... tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 sock_xmit.isra.24+0xa1/0x170 [nbd] nbd_send_cmd+0x1d2/0x690 [nbd] nbd_queue_rq+0x1b5/0x3b0 [nbd] __blk_mq_try_issue_directly+0x108/0x1b0 blk_mq_request_issue_directly+0xbd/0xe0 blk_mq_try_issue_list_directly+0x41/0xb0 blk_mq_sched_insert_requests+0xa2/0xe0 blk_mq_flush_plug_list+0x205/0x2a0 blk_flush_plug_list+0xc3/0xf0 [1] blk_finish_plug+0x21/0x2e _xfs_buf_ioapply+0x313/0x460 __xfs_buf_submit+0x67/0x220 xfs_buf_read_map+0x113/0x1a0 xfs_trans_read_buf_map+0xbf/0x330 xfs_btree_read_buf_block.constprop.42+0x95/0xd0 xfs_btree_lookup_get_block+0x95/0x170 xfs_btree_lookup+0xcc/0x470 xfs_bmap_del_extent_real+0x254/0x9a0 __xfs_bunmapi+0x45c/0xab0 xfs_bunmapi+0x15/0x30 xfs_itruncate_extents_flags+0xca/0x250 xfs_free_eofblocks+0x181/0x1e0 xfs_fs_destroy_inode+0xa8/0x1b0 destroy_inode+0x38/0x70 dispose_list+0x35/0x50 prune_icache_sb+0x52/0x70 super_cache_scan+0x120/0x1a0 do_shrink_slab+0x120/0x290 shrink_slab+0x216/0x2b0 shrink_node+0x1b6/0x4a0 do_try_to_free_pages+0xc6/0x370 try_to_free_mem_cgroup_pages+0xe3/0x1e0 try_charge+0x29e/0x790 mem_cgroup_charge_skmem+0x6a/0x100 __sk_mem_raise_allocated+0x18e/0x390 __sk_mem_schedule+0x2a/0x40 [0] tcp_sendmsg_locked+0x8eb/0xe10 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x26d/0x2b0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In [0], tcp_send_msg_locked() was using current->page_frag when it called sk_wmem_schedule(). It already calculated how many bytes can be fit into current->page_frag. Due to memory pressure, sk_wmem_schedule() called into memory reclaim path which called into xfs and then IO issue path. Because the filesystem in question is backed by nbd, the control goes back into the tcp layer - back into tcp_sendmsg_locked(). nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes sense - it's in the process of freeing memory and wants to be able to, e.g., drop clean pages to make forward progress. However, this confused sk_page_frag() called from [2]. Because it only tests whether the allocation allows blocking which it does, it now thinks current->page_frag can be used again although it already was being used in [0]. After [2] used current->page_frag, the offset would be increased by the used amount. When the control returns to [0], current->page_frag's offset is increased and the previously calculated number of bytes now may overrun the end of allocated memory leading to silent memory corruptions. Fix it by adding gfpflags_normal_context() which tests sleepable && !reclaim and use it to determine whether to use current->task_frag. v2: Eric didn't like gfp flags being tested twice. Introduce a new helper gfpflags_normal_context() and combine the two tests. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28Merge branch 'net-dsa-b53-Add-support-for-MDB'David S. Miller
Florian Fainelli says: ==================== net: dsa: b53: Add support for MDB This patch series adds support for programming multicast database entries on b53 and bcm_sf2. This is extracted from a previously submitted series that added managed mode support, but these patches are usable in isolation. The larger series still needs to be reworked. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: bcm_sf2: Wire up MDB operationsFlorian Fainelli
Leverage the recently add b53_mdb_{add,del,prepare} functions since they work as-is for bcm_sf2. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: b53: Add support for MDBFlorian Fainelli
In preparation for supporting IGMP snooping with or without the use of a bridge, add support within b53_common.c to program the ARL entries for multicast operations. The key difference is that a multicast ARL entry is comprised of a bitmask of enabled ports, instead of a port number. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28udp: fix data-race in udp_set_dev_scratch()Eric Dumazet
KCSAN reported a data-race in udp_set_dev_scratch() [1] The issue here is that we must not write over skb fields if skb is shared. A similar issue has been fixed in commit 89c22d8c3b27 ("net: Fix skb csum races when peeking") While we are at it, use a helper only dealing with udp_skb_scratch(skb)->csum_unnecessary, as this allows udp_set_dev_scratch() to be called once and thus inlined. [1] BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1: udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308 __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556 first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579 udp_poll+0xea/0x110 net/ipv4/udp.c:2720 sock_poll+0xed/0x250 net/socket.c:1256 vfs_poll include/linux/poll.h:90 [inline] do_select+0x7d0/0x1020 fs/select.c:534 core_sys_select+0x381/0x550 fs/select.c:677 do_pselect.constprop.0+0x11d/0x160 fs/select.c:759 __do_sys_pselect6 fs/select.c:784 [inline] __se_sys_pselect6 fs/select.c:769 [inline] __x64_sys_pselect6+0x12e/0x170 fs/select.c:769 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0: udp_skb_csum_unnecessary include/net/udp.h:358 [inline] udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310 inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680 __do_sys_recvmmsg net/socket.c:2703 [inline] __se_sys_recvmmsg net/socket.c:2696 [inline] __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28Merge branch 'mvpp2-improvements-in-rx-path'David S. Miller
Matteo Croce says: ==================== mvpp2 improvements in rx path Refactor some code in the RX path to allow prefetching some data from the packet header. The first patch is only a refactor, the second one reduces the data synced, while the third one adds the prefetch. The packet rate improvement with the second patch is very small (1606 => 1620 kpps), while the prefetch bumps it up by 14%: 1620 => 1853 kpps. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28mvpp2: prefetch frame headerMatteo Croce
When receiving traffic, eth_type_trans() is high up on the perf top list, because it's the first function which access the packet data. Move the DMA unmap a bit higher, and put a prefetch just after it, so we have more time to load the data into the cache. The packet rate increase is about 14% with a tc drop test: 1620 => 1853 kpps Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28mvpp2: sync only the received frameMatteo Croce
In the RX path we always sync against the maximum frame size for that pool. Do the DMA sync and the unmap separately, so we can only sync by the size of the received frame. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28mvpp2: refactor frame drop routineMatteo Croce
Move some code down to remove a backward goto. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28isdn: hfcsusb: Spelling and grammar fixesGeert Uytterhoeven
Fix misspellings of "endpoints", "configuration", and "device's". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28tipc: Spelling s/enpoint/endpoint/Geert Uytterhoeven
Fix misspelling of "endpoint". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: Fix various misspellings of "connect"Geert Uytterhoeven
Fix misspellings of "disconnect", "disconnecting", "connections", and "disconnected". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Kalle Valo <kvalo@codeaurora.org> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: Fix misspellings of "configure" and "configuration"Geert Uytterhoeven
Fix various misspellings of "configuration" and "configure". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dpaa2: Use the correct style for SPDX License IdentifierNishad Kamdar
This patch corrects the SPDX License Identifier style in header files related to DPAA2 Ethernet driver supporting Freescale SoCs with DPAA2. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used) Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: qca8k: Initialize the switch with correct number of portsMichal Vokáč
Since commit 0394a63acfe2 ("net: dsa: enable and disable all ports") the dsa core disables all unused ports of a switch. In this case disabling ports with numbers higher than QCA8K_NUM_PORTS causes that some switch registers are overwritten with incorrect content. To fix this, initialize the dsa_switch->num_ports with correct number of ports. Fixes: 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper") Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: dsa: fix dereference on ds->dev before null check errorColin Ian King
Currently ds->dev is dereferenced on the assignments of pdata and np before ds->dev is null checked, hence there is a potential null pointer dereference on ds->dev. Fix this by assigning pdata and np after the ds->dev null pointer sanity check. Addresses-Coverity: ("Dereference before null check") Fixes: 7e99e3470172 ("net: dsa: remove dsa_switch_alloc helper") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28Merge branch 'net-avoid-KCSAN-splats'David S. Miller
Eric Dumazet says: ==================== net: avoid KCSAN splats Often times we use skb_queue_empty() without holding a lock, meaning that other cpus (or interrupt) can change the queue under us. This is fine, but we need to properly annotate the lockless intent to make sure the compiler wont over optimize things. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-28net: add READ_ONCE() annotation in __skb_wait_for_more_packets()Eric Dumazet
__skb_wait_for_more_packets() can be called while other cpus can feed packets to the socket receive queue. KCSAN reported : BUG: KCSAN: data-race in __skb_wait_for_more_packets / __udp_enqueue_schedule_skb write to 0xffff888102e40b58 of 8 bytes by interrupt on cpu 0: __skb_insert include/linux/skbuff.h:1852 [inline] __skb_queue_before include/linux/skbuff.h:1958 [inline] __skb_queue_tail include/linux/skbuff.h:1991 [inline] __udp_enqueue_schedule_skb+0x2d7/0x410 net/ipv4/udp.c:1470 __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline] udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057 udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074 udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233 __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300 udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 read to 0xffff888102e40b58 of 8 bytes by task 13035 on cpu 1: __skb_wait_for_more_packets+0xfa/0x320 net/core/datagram.c:100 __skb_recv_udp+0x374/0x500 net/ipv4/udp.c:1683 udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838 sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871 ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480 do_recvmmsg+0x19a/0x5c0 net/socket.c:2601 __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680 __do_sys_recvmmsg net/socket.c:2703 [inline] __se_sys_recvmmsg net/socket.c:2696 [inline] __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 13035 Comm: syz-executor.3 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>