summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-02-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, mremap, tmpfs, selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and firmware" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: nilfs2: make splice write available again mm, slub: better heuristic for number of cpus when calculating slab order Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" MAINTAINERS: update Andrey Ryabinin's email address selftests/vm: rename file run_vmtests to run_vmtests.sh tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 mm/mremap: fix BUILD_BUG_ON() error in get_extent firmware_loader: align .builtin_fw to 8 kasan: fix stack traces dependency for HW_TAGS squashfs: add more sanity checks in xattr id lookup squashfs: add more sanity checks in inode lookup squashfs: add more sanity checks in id lookup squashfs: avoid out of bounds writes in decompressors
2021-02-10nilfs2: make splice write available againJoachim Henke
Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was caused by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops"). This patch initializes the splice_write field in file_operations, like most file systems do, to restore the functionality. Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Joachim Henke <joachim.henke@t-systems.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-10mm, slub: better heuristic for number of cpus when calculating slab orderVlastimil Babka
When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in 32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of 045ab8c9487b for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before 045ab8c9487b. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/ [3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Reported-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Mel Gorman <mgorman@techsingularity.net> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-10gpio: ep93xx: Fix single irqchip with multi gpiochipsNikita Shubin
Fixes the following warnings which results in interrupts disabled on port B/F: gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver. gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver. - added separate irqchip for each interrupt capable gpiochip - provided unique names for each irqchip Fixes: d2b091961510 ("gpio: ep93xx: Pass irqchip when adding gpiochip") Cc: <stable@vger.kernel.org> Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-02-10gpio: ep93xx: fix BUG_ON port F usageNikita Shubin
Two index spaces and ep93xx_gpio_port are confusing. Instead add a separate struct to store necessary data and remove ep93xx_gpio_port. - add struct to store IRQ related data for each IRQ capable chip - replace offset array with defined offsets - add IRQ registers offset for each IRQ capable chip into ep93xx_gpio_banks ------------[ cut here ]------------ kernel BUG at drivers/gpio/gpio-ep93xx.c:64! ---[ end trace 3f6544e133e9f5ae ]--- Fixes: fd935fc421e74 ("gpio: ep93xx: Do not pingpong irq numbers") Cc: <stable@vger.kernel.org> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@gmail.com> Signed-off-by: Nikita Shubin <nikita.shubin@maquefel.me> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-02-10gpio: mxs: GPIO_MXS should not default to y unconditionallyGeert Uytterhoeven
Merely enabling CONFIG_COMPILE_TEST should not enable additional code. To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS, and ask the user in case of compile-testing. Fixes: 6876ca311bfca5d7 ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS") Cc: <stable@vger.kernel.org> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
2021-02-10drm/sun4i: dw-hdmi: Fix max. frequency for H6Jernej Skrabec
It turns out that reasoning for lowering max. supported frequency is wrong. Scrambling works just fine. Several now fixed bugs prevented proper functioning, even with rates lower than 340 MHz. Issues were just more pronounced with higher frequencies. Fix that by allowing max. supported frequency in HW and fix the comment. Fixes: cd9063757a22 ("drm/sun4i: DW HDMI: Lower max. supported rate for H6") Reviewed-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-6-jernej.skrabec@siol.net
2021-02-10drm/sun4i: Fix H6 HDMI PHY configurationJernej Skrabec
As it turns out, vendor HDMI PHY driver for H6 has a pretty big table of predefined values for various pixel clocks. However, most of them are not useful/tested because they come from reference driver code. Vendor PHY driver is concerned with only few of those, namely 27 MHz, 74.25 MHz, 148.5 MHz, 297 MHz and 594 MHz. These are all frequencies for standard CEA modes. Fix sun50i_h6_cur_ctr and sun50i_h6_phy_config with the values only for aforementioned frequencies. Table sun50i_h6_mpll_cfg doesn't need to be changed because values are actually frequency dependent and not so much SoC dependent. See i.MX6 documentation for explanation of those values for similar PHY. Fixes: c71c9b2fee17 ("drm/sun4i: Add support for Synopsys HDMI PHY") Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-5-jernej.skrabec@siol.net
2021-02-10drm/sun4i: dw-hdmi: always set clock rateJernej Skrabec
As expected, HDMI controller clock should always match pixel clock. In the past, changing HDMI controller rate would seemingly worsen situation. However, that was the result of other bugs which are now fixed. Fix that by removing set_rate quirk and always set clock rate. Fixes: 40bb9d3147b2 ("drm/sun4i: Add support for H6 DW HDMI controller") Reviewed-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-4-jernej.skrabec@siol.net
2021-02-10drm/sun4i: tcon: set sync polarity for tcon1 channelJernej Skrabec
Channel 1 has polarity bits for vsync and hsync signals but driver never sets them. It turns out that with pre-HDMI2 controllers seemingly there is no issue if polarity is not set. However, with HDMI2 controllers (H6) there often comes to de-synchronization due to phase shift. This causes flickering screen. It's safe to assume that similar issues might happen also with pre-HDMI2 controllers. Solve issue with setting vsync and hsync polarity. Note that display stacks with tcon top have polarity bits actually in tcon0 polarity register. Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support") Reviewed-by: Chen-Yu Tsai <wens@csie.org> Tested-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210209175900.7092-3-jernej.skrabec@siol.net
2021-02-10drm/i915: Fix overlay frontbuffer trackingVille Syrjälä
We don't have a persistent fb holding a reference to the frontbuffer object, so every time we do the get+put we throw the frontbuffer object immediately away. And so the next time around we get a pristine frontbuffer object with bits==0 even for the old vma. This confuses the frontbuffer tracking code which understandably expects the old frontbuffer to have the overlay's bit set. Fix this by hanging on to the frontbuffer reference until the next flip. And just to make this a bit more clear let's track the frontbuffer explicitly instead of just grabbing it via the old vma. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1136 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210209021918.16234-2-ville.syrjala@linux.intel.com Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit 553c23bdb4775130f333f07a51b047276bc53f79) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-02-09Revert "drm/amd/display: Update NV1x SR latency values"Alex Deucher
This reverts commit 4a3dea8932d3b1199680d2056dd91d31d94d70b7. This causes blank screens for some users. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1482 Cc: Alvin Lee <alvin.lee2@amd.com> Cc: Jun Lei <Jun.Lei@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-02-10 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 8 day(s) which contain a total of 3 files changed, 22 insertions(+), 21 deletions(-). The main changes are: 1) Fix missed execution of kprobes BPF progs when kprobe is firing via int3, from Alexei Starovoitov. 2) Fix potential integer overflow in map max_entries for stackmap on 32 bit archs, from Bui Quang Minh. 3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> c# Please enter a commit message to explain why this merge is necessary,
2021-02-09cifs: do not disable noperm if multiuser mount option is not providedRonnie Sahlberg
Fixes small regression in implementation of new mount API. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reported-by: Hyunchul Lee <hyc.lee@gmail.com> Tested-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-09Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"Johannes Weiner
This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Acked-by: Chris Down <chris@chrisdown.name> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09MAINTAINERS: update Andrey Ryabinin's email addressAndrey Ryabinin
Update my email, @virtuozzo.com will stop working shortly. Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.com Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09selftests/vm: rename file run_vmtests to run_vmtests.shRong Chen
Commit c2aa8afc36fa has renamed run_vmtests in Makefile, but the file still uses the old name. The kernel test robot reported the following issue: # selftests: vm: run_vmtests.sh # Warning: file run_vmtests.sh is missing! not ok 1 selftests: vm: run_vmtests.sh Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com Fixes: c2aa8afc36fa (selftests/vm: rename run_vmtests --> run_vmtests.sh) Signed-off-by: Rong Chen <rong.a.chen@intel.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09tmpfs: disallow CONFIG_TMPFS_INODE64 on alphaSeth Forshee
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, whereas passing "inode64" in the mount options will fail. This leads to erroneous behaviours such as this: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. Prevent CONFIG_TMPFS_INODE64 from being selected on alpha. Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09tmpfs: disallow CONFIG_TMPFS_INODE64 on s390Seth Forshee
Currently there is an assumption in tmpfs that 64-bit architectures also have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, but passing the "inode64" mount option will fail. This leads to the following behavior: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. As mount sees "inode64" in the mount options and thus passes it in the options for the remount. So prevent CONFIG_TMPFS_INODE64 from being selected on s390. Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Chris Down <chris@chrisdown.name> Cc: Hugh Dickins <hughd@google.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09mm/mremap: fix BUILD_BUG_ON() error in get_extentArnd Bergmann
clang can't evaluate this function argument at compile time when the function is not inlined, which leads to a link time failure: ld.lld: error: undefined symbol: __compiletime_assert_414 >>> referenced by mremap.c >>> mremap.o:(get_extent) in archive mm/built-in.a Mark the function as __always_inline to avoid it. Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org Fixes: 9ad9718bfa41 ("mm/mremap: calculate extent in one place") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09firmware_loader: align .builtin_fw to 8Fangrui Song
arm64 references the start address of .builtin_fw (__start_builtin_fw) with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC relocations. The compiler is allowed to emit the R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in include/linux/firmware.h is 8-byte aligned. The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a multiple of 8, which may not be the case if .builtin_fw is empty. Unconditionally align .builtin_fw to fix the linker error. 32-bit architectures could use ALIGN(4) but that would add unnecessary complexity, so just use ALIGN(8). Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/1204 Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image") Signed-off-by: Fangrui Song <maskray@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Douglas Anderson <dianders@chromium.org> Acked-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09kasan: fix stack traces dependency for HW_TAGSAndrey Konovalov
Currently, whether the alloc/free stack traces collection is enabled by default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL. The intention for this dependency was to only enable collection on slow debug kernels due to a significant perf and memory impact. As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option and is enabled on many productions kernels including Android and Ubuntu. As the result, this dependency is pointless and only complicates the code and documentation. Having stack traces collection disabled by default would make the hardware mode work differently to to the software ones, which is confusing. This change removes the dependency and enables stack traces collection by default. Looking into the future, this default might makes sense for production kernels, assuming we implement a fast stack trace collection approach. Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09squashfs: add more sanity checks in xattr id lookupPhillip Lougher
Sysbot has reported a warning where a kmalloc() attempt exceeds the maximum limit. This has been identified as corruption of the xattr_ids count when reading the xattr id lookup table. This patch adds a number of additional sanity checks to detect this corruption and others. 1. It checks for a corrupted xattr index read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This would cause an out of bounds read. 2. It checks against corruption of the xattr_ids count. This can either lead to the above kmalloc failure, or a smaller than expected table to be read. 3. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09squashfs: add more sanity checks in inode lookupPhillip Lougher
Sysbot has reported an "slab-out-of-bounds read" error which has been identified as being caused by a corrupted "ino_num" value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the inodes count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large inodes count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. [phillip@squashfs.org.uk: fix checkpatch issue] Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09squashfs: add more sanity checks in id lookupPhillip Lougher
Sysbot has reported a number of "slab-out-of-bounds reads" and "use-after-free read" errors which has been identified as being caused by a corrupted index value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the ids count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large ids count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09squashfs: avoid out of bounds writes in decompressorsPhillip Lougher
Patch series "Squashfs: fix BIO migration regression and add sanity checks". Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block usage to BIO" patch, which has produced a number of Sysbot/Syzkaller reports. Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption issues which have produced Sysbot reports in the id, inode and xattr lookup code. Each patch has been tested against the Sysbot reproducers using the given kernel configuration. They have the appropriate "Reported-by:" lines added. Additionally, all of the reproducer filesystems are indirectly fixed by patch [4/4] due to the fact they all have xattr corruption which is now detected there. Additional testing with other configurations and architectures (32bit, big endian), and normal filesystems has also been done to trap any inadvertent regressions caused by the additional sanity checks. This patch (of 4): This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Sysbot/Syskaller has reported a number of "out of bounds writes" and "unable to handle kernel paging request in squashfs_decompress" errors which have been identified as a regression introduced by the above patch. Specifically, the patch removed the following sanity check if (length < 0 || length > output->length || (index + length) > msblk->bytes_used) This check did two things: 1. It ensured any reads were not beyond the end of the filesystem 2. It ensured that the "length" field read from the filesystem was within the expected maximum length. Without this any corrupted values can over-run allocated buffers. Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Cc: Philippe Liard <pliard@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-09Merge tag 'i3c/fixes-for-5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c fix from Alexandre Belloni: "A single build warning fix" * tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match
2021-02-10bpf: Fix 32 bit src register truncation on div/modDaniel Borkmann
While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-10bpf: Fix verifier jmp32 pruning decision logicDaniel Borkmann
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 808464450 1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b4) w4 = 808464432 2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0 2: (9c) w4 %= w0 3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 3: (66) if w4 s> 0x30303030 goto pc+0 R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit propagating r0 from 6 to 7: safe 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 propagating r0 7: safe propagating r0 from 6 to 7: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 The underlying program was xlated as follows: # bpftool p d x i 10 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 5: (66) if w4 s> 0x30303030 goto pc+0 6: (7f) r0 >>= r0 7: (bc) w0 = w0 8: (15) if r0 == 0x0 goto pc+1 9: (9c) w4 %= w0 10: (66) if w0 s> 0x3030 goto pc+0 11: (d6) if w0 s<= 0x303030 goto pc+1 12: (05) goto pc-1 13: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we are actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking a closer look at the verifier analysis, the reason is that it misjudges its pruning decision at the first 'from 6 to 7: safe' occasion. What happens is that while both old/cur registers are marked as precise, they get misjudged for the jmp32 case as range_within() yields true, meaning that the prior verification path with a wider register bound could be verified successfully and therefore the current path with a narrower register bound is deemed safe as well whereas in reality it's not. R0 old/cur path's bounds compare as follows: old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff) cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff) old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff The 64 bit bounds generally look okay and while the information that got propagated from 32 to 64 bit looks correct as well, it's not precise enough for judging a conditional jmp32. Given the latter only operates on subregisters we also need to take these into account as well for a range_within() probe in order to be able to prune paths. Extending the range_within() constraint to both bounds will be able to tell us that the old signed 32 bit bounds are not wider than the cur signed 32 bit bounds. With the fix in place, the program will now verify the 'goto' branch case as it should have been: [...] 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit 7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1 The bug is quite subtle in the sense that when verifier would determine that a given branch is dead code, it would (here: wrongly) remove these instructions from the program and hard-wire the taken branch for privileged programs instead of the 'goto pc-1' rewrites which will cause hard to debug problems. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-10bpf: Fix verifier jsgt branch analysis on max boundDaniel Borkmann
Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return code for both will tell the caller whether a given conditional jump is taken or not, e.g. 1 means branch will be taken [for the involved registers] and the goto target will be executed, 0 means branch will not be taken and instead we fall-through to the next insn, and last but not least a -1 denotes that it is not known at verification time whether a branch will be taken or not. Now while the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval since the branch will also be taken for reg->s32_max_value == sval. The jgt branch analysis, for example, gets this right. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: 4f7b3e82589e ("bpf: improve verifier branch analysis") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) nf_conntrack_tuple_taken() needs to recheck zone for NAT clash resolution, from Florian Westphal. 2) Restore support for stateful expressions when set definition specifies no stateful expressions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09vsock: fix locking in vsock_shutdown()Stefano Garzarella
In vsock_shutdown() we touched some socket fields without holding the socket lock, such as 'state' and 'sk_flags'. Also, after the introduction of multi-transport, we are accessing 'vsk->transport' in vsock_send_shutdown() without holding the lock and this call can be made while the connection is in progress, so the transport can change in the meantime. To avoid issues, we hold the socket lock when we enter in vsock_shutdown() and release it when we leave. Among the transports that implement the 'shutdown' callback, only hyperv_transport acquired the lock. Since the caller now holds it, we no longer take it. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09Merge branch 'hns3-fixes'David S. Miller
Huazhong Tan says: ==================== net: hns3: fixes for -net The parameters sent from vf may be unreliable. If these parameters are used directly, memory overwriting may occur. So this series adds some checks for this case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: hns3: add a check for index in hclge_get_rss_key()Yufeng Mo
The index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this index before using it in hclge_get_rss_key(). Fixes: a638b1d8cc87 ("net: hns3: fix get VF RSS issue") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()Yufeng Mo
The tqp_index is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this tqp_index before using it in hclge_get_ring_chain_from_mbx(). Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: hns3: add a check for queue_id in hclge_reset_vf_queue()Yufeng Mo
The queue_id is received from vf, if use it directly, an out-of-bound issue may be caused, so add a check for this queue_id before using it in hclge_reset_vf_queue(). Fixes: 1a426f8b40fc ("net: hns3: fix the VF queue reset flow error") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09net: dsa: felix: implement port flushing on .phylink_mac_link_downVladimir Oltean
There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09drm/i915/tgl+: Make sure TypeC FIA is powered up when initializing itImre Deak
The TypeC FIA can be powered down if the TC-COLD power state is allowed, so block the TC-COLD state when initializing the FIA. Note that this isn't needed on ICL where the FIA is never modular and which has no generic way to block TC-COLD (except for platforms with a legacy TypeC port and on those too only via these legacy ports, not via a DP-alt/TBT port). Cc: <stable@vger.kernel.org> # v5.10+ Cc: José Roberto de Souza <jose.souza@intel.com> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3027 Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210208154303.6839-1-imre.deak@intel.com Reviewed-by: Jos� Roberto de Souza <jose.souza@intel.com> (cherry picked from commit f48993e5d26b079e8c80fff002499a213dbdb1b4) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2021-02-09cifs: fix dfs-linksRonnie Sahlberg
This fixes a regression following dfs links that was introduced in the patch series for the new mount api. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-02-09x86/build: Disable CET instrumentation in the kernel for 32-bit tooBorislav Petkov
Commit 20bf2b378729 ("x86/build: Disable CET instrumentation in the kernel") disabled CET instrumentation which gets added by default by the Ubuntu gcc9 and 10 by default, but did that only for 64-bit builds. It would still fail when building a 32-bit target. So disable CET for all x86 builds. Fixes: 20bf2b378729 ("x86/build: Disable CET instrumentation in the kernel") Reported-by: AC <achirvasub@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: AC <achirvasub@gmail.com> Link: https://lkml.kernel.org/r/YCCIgMHkzh/xT4ex@arch-chirva.localdomain
2021-02-08scsi: scsi_debug: Fix a memory leakMaurizio Lombardi
The sdebug_q_arr pointer must be freed when the module is unloaded. $ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff888e1cfb0000 (size 4096): comm "modprobe", pid 165555, jiffies 4325987516 (age 685.194s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000458f4f5d>] 0xffffffffc06702d9 [<000000003edc4b1f>] do_one_initcall+0xe9/0x57d [<00000000da7d518c>] do_init_module+0x1d1/0x6f0 [<000000009a6a9248>] load_module+0x36bd/0x4f50 [<00000000ddb0c3ce>] __do_sys_init_module+0x1db/0x260 [<000000009532db57>] do_syscall_64+0xa5/0x420 [<000000002916b13d>] entry_SYSCALL_64_after_hwframe+0x6a/0xdf Fixes: 87c715dcde63 ("scsi: scsi_debug: Add per_host_store option") Link: https://lore.kernel.org/r/20210208111734.34034-1-mlombard@redhat.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-02-08Merge branch 'bridge-mrp'David S. Miller
Horatiu Vultur says: ==================== bridge: mrp: Fix br_mrp_port_switchdev_set_state Based on the discussion here[1], there was a problem with the function br_mrp_port_switchdev_set_state. The problem was that it was called both with BR_STATE* and BR_MRP_PORT_STATE* types. This patch series fixes this issue and removes SWITCHDEV_ATTR_ID_MRP_PORT_STAT because is not used anymore. [1] https://www.spinics.net/lists/netdev/msg714816.html ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STATHoratiu Vultur
Now that MRP started to use also SWITCHDEV_ATTR_ID_PORT_STP_STATE to notify HW, then SWITCHDEV_ATTR_ID_MRP_PORT_STAT is not used anywhere else, therefore we can remove it. Fixes: c284b545900830 ("switchdev: mrp: Extend switchdev API to offload MRP") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_stateHoratiu Vultur
The function br_mrp_port_switchdev_set_state was called both with MRP port state and STP port state, which is an issue because they don't match exactly. Therefore, update the function to be used only with STP port state and use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE. The choice of using STP over MRP is that the drivers already implement SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port STP state. Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Fixes: fadd409136f0f2 ("bridge: switchdev: mrp: Implement MRP API for switchdev") Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08net: watchdog: hold device global xmit lock during tx disableEdwin Peer
Prevent netif_tx_disable() running concurrently with dev_watchdog() by taking the device global xmit lock. Otherwise, the recommended: netif_carrier_off(dev); netif_tx_disable(dev); driver shutdown sequence can happen after the watchdog has already checked carrier, resulting in possible false alarms. This is because netif_tx_lock() only sets the frozen bit without maintaining the locks on the individual queues. Fixes: c3f26a269c24 ("netdev: Fix lockdep warnings in multiqueue configurations.") Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-09netfilter: nftables: relax check for stateful expressions in set definitionPablo Neira Ayuso
Restore the original behaviour where users are allowed to add an element with any stateful expression if the set definition specifies no stateful expressions. Make sure upper maximum number of stateful expressions of NFT_SET_EXPR_MAX is not reached. Fixes: 8cfd9b0f8515 ("netfilter: nftables: generalize set expressions support") Fixes: 48b0ae046ee9 ("netfilter: nftables: netlink support for several set element expressions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-09netfilter: conntrack: skip identical origin tuple in same zone onlyFlorian Westphal
The origin skip check needs to re-test the zone. Else, we might skip a colliding tuple in the reply direction. This only occurs when using 'directional zones' where origin tuples reside in different zones but the reply tuples share the same zone. This causes the new conntrack entry to be dropped at confirmation time because NAT clash resolution was elided. Fixes: 4e35c1cb9460240 ("netfilter: nf_nat: skip nat clash resolution for same-origin entries") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-08vsock/virtio: update credit only if socket is not closedStefano Garzarella
If the socket is closed or is being released, some resources used by virtio_transport_space_update() such as 'vsk->trans' may be released. To avoid a use after free bug we should only update the available credit when we are sure the socket is still open and we have the lock held. Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210208144454.84438-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-08Merge tag 'trace-v5.11-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix output of top level event tracing 'enable' file. When writing a tool for enabling events in the tracing system, an anomaly was discovered. The top level event 'enable' file would never show '1' when all events were enabled. The system and event 'enable' files worked as expected. The reason was because the top level event 'enable' file included the 'ftrace' tracer events, which are not controlled by the 'enable' file and would cause the output to be wrong. This appears to have been a bug since it was created" * tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Do not count ftrace events in top level enable output
2021-02-08net: fix iteration for sctp transport seq_filesNeilBrown
The sctp transport seq_file iterators take a reference to the transport in the ->start and ->next functions and releases the reference in the ->show function. The preferred handling for such resources is to release them in the subsequent ->next or ->stop function call. Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") there is no guarantee that ->show will be called after ->next, so this function can now leak references. So move the sctp_transport_put() call to ->next and ->stop. Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Reported-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>