Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This fixes a corner case of fatal SIGSYS being ignored since v5.15.
Along with the signal fix is a change to seccomp so that seeing
another syscall after a fatal filter result will cause seccomp to kill
the process harder.
Summary:
- Force HANDLER_EXIT even for SIGNAL_UNKILLABLE
- Make seccomp self-destruct after fatal filter results
- Update seccomp samples for easier behavioral demonstration"
* tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
samples/seccomp: Adjust sample to also provide kill option
seccomp: Invalidate seccomp mode to catch death failures
signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
|
|
Merge misc fixes from Andrew Morton:
"5 patches.
Subsystems affected by this patch series: binfmt, procfs, and mm
(vmscan, memcg, and kfence)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kfence: make test case compatible with run time set sample interval
mm: memcg: synchronize objcg lists with a dedicated spinlock
mm: vmscan: remove deadlock due to throttling failing to make progress
fs/proc: task_mmu.c: don't read mapcount for migration entry
fs/binfmt_elf: fix PT_LOAD p_align values for loaders
|
|
When the KCONFIG_AUTOCONFIG is specified (e.g. export \
KCONFIG_AUTOCONFIG=output/config/auto.conf), the directory of
include/config/ will not be created, so kconfig can't create deps
files in it and auto.conf can't be generated.
Signed-off-by: Jing Leng <jleng@ambarella.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This reverts commit 269cbcf7b72de6f0016806d4a0cec1d689b55a87.
It causes build errors as reported by the kernel test robot.
Link: https://lore.kernel.org/r/202202112236.AwoOTtHO-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 269cbcf7b72d ("usb: dwc2: drd: fix soft connect when gadget is unconfigured")
Cc: stable@kernel.org
Cc: Amelie Delaunay <amelie.delaunay@foss.st.com>
Cc: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
Cc: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The parameter kfence_sample_interval can be set via boot parameter and
late shell command, which is convenient for automated tests and KFENCE
parameter optimization. However, KFENCE test case just uses
compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
case not run as users desired. Export kfence_sample_interval, so that
KFENCE test case can use run-time-set sample interval.
Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Christian Knig <christian.koenig@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock:
00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock:
00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
dump_stack_lvl+0x76/0x98
check_noncircular+0x136/0x158
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg. Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro@fb.com>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A soft lockup bug in kcompactd was reported in a private bugzilla with
the following visible in dmesg;
watchdog: BUG: soft lockup - CPU#33 stuck for 26s! [kcompactd0:479]
watchdog: BUG: soft lockup - CPU#33 stuck for 52s! [kcompactd0:479]
watchdog: BUG: soft lockup - CPU#33 stuck for 78s! [kcompactd0:479]
watchdog: BUG: soft lockup - CPU#33 stuck for 104s! [kcompactd0:479]
The machine had 256G of RAM with no swap and an earlier failed
allocation indicated that node 0 where kcompactd was run was potentially
unreclaimable;
Node 0 active_anon:29355112kB inactive_anon:2913528kB active_file:0kB
inactive_file:0kB unevictable:64kB isolated(anon):0kB isolated(file):0kB
mapped:8kB dirty:0kB writeback:0kB shmem:26780kB shmem_thp:
0kB shmem_pmdmapped: 0kB anon_thp: 23480320kB writeback_tmp:0kB
kernel_stack:2272kB pagetables:24500kB all_unreclaimable? yes
Vlastimil Babka investigated a crash dump and found that a task
migrating pages was trying to drain PCP lists;
PID: 52922 TASK: ffff969f820e5000 CPU: 19 COMMAND: "kworker/u128:3"
Call Trace:
__schedule
schedule
schedule_timeout
wait_for_completion
__flush_work
__drain_all_pages
__alloc_pages_slowpath.constprop.114
__alloc_pages
alloc_migration_target
migrate_pages
migrate_to_node
do_migrate_pages
cpuset_migrate_mm_workfn
process_one_work
worker_thread
kthread
ret_from_fork
This failure is specific to CONFIG_PREEMPT=n builds. The root of the
problem is that kcompact0 is not rescheduling on a CPU while a task that
has isolated a large number of the pages from the LRU is waiting on
kcompact0 to reschedule so the pages can be released. While
shrink_inactive_list() only loops once around too_many_isolated, reclaim
can continue without rescheduling if sc->skipped_deactivate == 1 which
could happen if there was no file LRU and the inactive anon list was not
low.
Link: https://lkml.kernel.org/r/20220203100326.GD3301@suse.de
Fixes: d818fca1cac3 ("mm/vmscan: throttle reclaim and compaction when too may pages are isolated")
Signed-off-by: Mel Gorman <mgorman@suse.de>
Debugged-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The syzbot reported the below BUG:
kernel BUG at include/linux/page-flags.h:785!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline]
RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744
Call Trace:
page_mapcount include/linux/mm.h:837 [inline]
smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466
smaps_pte_entry fs/proc/task_mmu.c:538 [inline]
smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601
walk_pmd_range mm/pagewalk.c:128 [inline]
walk_pud_range mm/pagewalk.c:205 [inline]
walk_p4d_range mm/pagewalk.c:240 [inline]
walk_pgd_range mm/pagewalk.c:277 [inline]
__walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379
walk_page_vma+0x277/0x350 mm/pagewalk.c:530
smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768
smap_gather_stats fs/proc/task_mmu.c:741 [inline]
show_smap+0xc6/0x440 fs/proc/task_mmu.c:822
seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272
seq_read+0x3e0/0x5b0 fs/seq_file.c:162
vfs_read+0x1b5/0x600 fs/read_write.c:479
ksys_read+0x12d/0x250 fs/read_write.c:619
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time. MADV_FREE may split THPs if it is called
for partial THP. It may trigger the below race:
CPU A CPU B
----- -----
smaps walk: MADV_FREE:
page_mapcount()
PageCompound()
split_huge_page()
page = compound_head(page)
PageDoubleMap(page)
When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.
This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.
Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount(). Don't skip getting mapcount
for device private entries since they do track references with mapcount.
Pagemap also has the similar issue although it was not reported. Fixed
it as well.
[shy828301@gmail.com: v4]
Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()]
Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org
Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com
Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com
Acked-by: David Hildenbrand <david@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong
__data_start/_end pair" assertion from libgc after update to v5.17-rc1.
Bisection pointed to commit 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD
p_align values for static PIE") that fixed handling of static PIEs, but
made the condition that guards load_bias calculation to exclude loader
binaries.
Restoring the check for presence of interpreter fixes the problem.
Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org
Fixes: 9630f0d60fec ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If NIC had packets in tx queue at the moment link down event
happened, it could result in tx timeout when link got back up.
Since device has more than one tx queue we need to reset them
accordingly.
Fixes: 057f4af2b171 ("atl1c: add 4 RX/TX queue support for Mikrotik 10/25G NIC")
Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Link: https://lore.kernel.org/r/20220211065123.4187615-1-gatis@mikrotik.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We cannot do the cancel_work_sync from after the unregister_netdev, as
the dev pointer is no longer valid, causing a uaf on ldisc unregister
(or device close).
Instead, do the cancel_work_sync from the ndo_uninit op, where the dev
still exists, but the queue has stopped.
Fixes: 7bd9890f3d74 ("mctp: serial: cancel tx work on ldisc close")
Reported-by: Luo Likang <luolikang@nsfocus.com>
Tested-by: Luo Likang <luolikang@nsfocus.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20220211011552.1861886-1-jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The reset input to the LAN9303 chip is active low, and devicetree
gpio handles reflect this. Therefore, the gpio should be requested
with an initial state of high in order for the reset signal to be
asserted. Other uses of the gpio already use the correct polarity.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fianelil <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220209145454.19749-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is a fairly large set of bugfixes, most of which had been sent a
while ago but only now made it into the soc tree:
Maintainer file updates:
- Claudiu Beznea now co-maintains the at91 soc family, replacing
Ludovic Desroches.
- Michael Walle maintains the sl28cpld drivers
- Alain Volmat and Raphael Gallais-Pou take over some drivers for ST
platforms
- Alim Akhtar is an additional reviewer for Samsung platforms
Code fixes:
- Op-tee had a problem with object lifetime that needs a slightly
complex fix, as well as another bug with error handling.
- Several minor issues for the OMAP platform, including a regression
with the timer
- A Kconfig change to fix a build-time issue on Intel SoCFPGA
Device tree fixes:
- The Amlogic Meson platform fixes a boot regression on am1-odroid, a
spurious interrupt, and a problem with reserved memory regions
- In the i.MX platform, several bug fixes are needed to make devices
work correctly: SD card detection, alarmtimer, and sound card on
some board. One patch for the GPU got in there by accident and gets
reverted again.
- TI K3 needs a fix for J721S2 serial port numbers
- ux500 needs a fix to mount the SD card as root on the Skomer phone"
* tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (46 commits)
Revert "arm64: dts: imx8mn-venice-gw7902: disable gpu"
arm64: Remove ARCH_VULCAN
MAINTAINERS: add myself as a maintainer for the sl28cpld
MAINTAINERS: add IRC to ARM sub-architectures and Devicetree
MAINTAINERS: arm: samsung: add Git tree and IRC
ARM: dts: Fix boot regression on Skomer
ARM: dts: spear320: Drop unused and undocumented 'irq-over-gpio' property
soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
docs/ABI: testing: aspeed-uart-routing: Escape asterisk
MAINTAINERS: update drm/stm drm/sti and cec/sti maintainers
MAINTAINERS: Update Benjamin Gaignard maintainer status
ARM: socfpga: fix missing RESET_CONTROLLER
arm64: dts: meson-sm1-odroid: fix boot loop after reboot
arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
arm64: dts: meson-g12: add ATF BL32 reserved-memory region
arm64: dts: meson-gx: add ATF BL32 reserved-memory region
arm64: dts: meson-sm1-bananapi-m5: fix wrong GPIO domain for GPIOE_2
arm64: dts: meson-sm1-odroid: use correct enable-gpio pin for tf-io regulator
arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
optee: use driver internal tee_context for some rpc
...
|
|
Yonghong Song says:
====================
The patch [1] exposed a bpf_timer initialization bug in function
check_and_init_map_value(). With bug fix here, the patch [1]
can be applied with all selftests passed. Please see individual
patches for fix details.
[1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
Changelog:
v3 -> v4:
. move header file in patch #1 to avoid bpf-next merge conflict
v2 -> v3:
. switch patch #1 and patch #2 for better bisecting
v1 -> v2:
. add Fixes tag for patch #1
. rebase against bpf tree
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The patch in [1] intends to fix a bpf_timer related issue,
but the fix caused existing 'timer' selftest to fail with
hang or some random errors. After some debug, I found
an issue with check_and_init_map_value() in the hashtab.c.
More specifically, in hashtab.c, we have code
l_new = bpf_map_kmalloc_node(&htab->map, ...)
check_and_init_map_value(&htab->map, l_new...)
Note that bpf_map_kmalloc_node() does not do initialization
so l_new contains random value.
The function check_and_init_map_value() intends to zero the
bpf_spin_lock and bpf_timer if they exist in the map.
But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed.
With [1], later copy_map_value() skips copying of
bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused
random failures for 'timer' selftest.
Without [1], for both bpf_spin_lock and bpf_timer case,
bpf_timer will be zero'ed, so 'timer' self test is okay.
For check_and_init_map_value(), why bpf_spin_lock is zero'ed
properly while bpf_timer not. In bpf uapi header, we have
struct bpf_spin_lock {
__u32 val;
};
struct bpf_timer {
__u64 :64;
__u64 :64;
} __attribute__((aligned(8)));
The initialization code:
*(struct bpf_spin_lock *)(dst + map->spin_lock_off) =
(struct bpf_spin_lock){};
*(struct bpf_timer *)(dst + map->timer_off) =
(struct bpf_timer){};
It appears the compiler has no obligation to initialize anonymous fields.
For example, let us use clang with bpf target as below:
$ cat t.c
struct bpf_timer {
unsigned long long :64;
};
struct bpf_timer2 {
unsigned long long a;
};
void test(struct bpf_timer *t) {
*t = (struct bpf_timer){};
}
void test2(struct bpf_timer2 *t) {
*t = (struct bpf_timer2){};
}
$ clang -target bpf -O2 -c -g t.c
$ llvm-objdump -d t.o
...
0000000000000000 <test>:
0: 95 00 00 00 00 00 00 00 exit
0000000000000008 <test2>:
1: b7 02 00 00 00 00 00 00 r2 = 0
2: 7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2
3: 95 00 00 00 00 00 00 00 exit
gcc11.2 does not have the above issue. But from
INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
Programming languages — C
http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf
page 157:
Except where explicitly stated otherwise, for the purposes of
this subclause unnamed members of objects of structure and union
type do not participate in initialization. Unnamed members of
structure objects have indeterminate value even after initialization.
To fix the problem, let use memset for bpf_timer case in
check_and_init_map_value(). For consistency, memset is also
used for bpf_spin_lock case.
[1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/
Fixes: 68134668c17f3 ("bpf: Add map side support for bpf timers.")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com
|
|
Currently the following code in check_and_init_map_value()
*(struct bpf_timer *)(dst + map->timer_off) =
(struct bpf_timer){};
can help generate bpf_timer definition in vmlinuxBTF.
But the code above may not zero the whole structure
due to anonymour members and that code will be replaced
by memset in the subsequent patch and
bpf_timer definition will disappear from vmlinuxBTF.
Let us emit the type explicitly so bpf program can continue
to use it from vmlinux.h.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220211194948.3141529-1-yhs@fb.com
|
|
Kumar Kartikeya says:
====================
A fix for an oversight in copy_map_value that leads to kernel crash.
Also, a question for BPF developers:
It seems in arraymap.c, we always do check_and_free_timer_in_array after we do
copy_map_value in map_update_elem callback, but the same is not done for
hashtab.c. Is there a specific reason for this difference in behavior, or did I
miss that it happens for hashtab.c as well?
Changlog:
---------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20220209051113.870717-1-memxor@gmail.com
* Fix build error for selftests patch due to missing SYS_PREFIX in bpf tree
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add a test that validates that timer value is not overwritten when doing
a copy_map_value call in the kernel. Without the prior fix, this test
triggers a crash.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220209070324.1093182-3-memxor@gmail.com
|
|
When both bpf_spin_lock and bpf_timer are present in a BPF map value,
copy_map_value needs to skirt both objects when copying a value into and
out of the map. However, the current code does not set both s_off and
t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock
is placed in map value with bpf_timer, as bpf_map_update_elem call will
be able to overwrite the other timer object.
When the issue is not fixed, an overwriting can produce the following
splat:
[root@(none) bpf]# ./test_progs -t timer_crash
[ 15.930339] bpf_testmod: loading out-of-tree module taints kernel.
[ 16.037849] ==================================================================
[ 16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520
[ 16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325
[ 16.039399]
[ 16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G OE 5.16.0+ #278
[ 16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014
[ 16.040485] Call Trace:
[ 16.040645] <TASK>
[ 16.040805] dump_stack_lvl+0x59/0x73
[ 16.041069] ? __pv_queued_spin_lock_slowpath+0x32b/0x520
[ 16.041427] kasan_report.cold+0x116/0x11b
[ 16.041673] ? __pv_queued_spin_lock_slowpath+0x32b/0x520
[ 16.042040] __pv_queued_spin_lock_slowpath+0x32b/0x520
[ 16.042328] ? memcpy+0x39/0x60
[ 16.042552] ? pv_hash+0xd0/0xd0
[ 16.042785] ? lockdep_hardirqs_off+0x95/0xd0
[ 16.043079] __bpf_spin_lock_irqsave+0xdf/0xf0
[ 16.043366] ? bpf_get_current_comm+0x50/0x50
[ 16.043608] ? jhash+0x11a/0x270
[ 16.043848] bpf_timer_cancel+0x34/0xe0
[ 16.044119] bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81
[ 16.044500] bpf_trampoline_6442477838_0+0x36/0x1000
[ 16.044836] __x64_sys_nanosleep+0x5/0x140
[ 16.045119] do_syscall_64+0x59/0x80
[ 16.045377] ? lock_is_held_type+0xe4/0x140
[ 16.045670] ? irqentry_exit_to_user_mode+0xa/0x40
[ 16.046001] ? mark_held_locks+0x24/0x90
[ 16.046287] ? asm_exc_page_fault+0x1e/0x30
[ 16.046569] ? asm_exc_page_fault+0x8/0x30
[ 16.046851] ? lockdep_hardirqs_on+0x7e/0x100
[ 16.047137] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 16.047405] RIP: 0033:0x7f9e4831718d
[ 16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48
[ 16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023
[ 16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d
[ 16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0
[ 16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0
[ 16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30
[ 16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 16.051608] </TASK>
[ 16.051762] ==================================================================
Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert a commit that reduced the number of IRQs used but resulted in
interrupt storms (Bjorn Helgaas)"
* tag 'pci-v5.17-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI/portdrv: Do not setup up IRQs if there are no users"
|
|
This reverts commit 0e8ae5a6ff5952253cd7cc0260df838ab4c21009.
0e8ae5a6ff59 ("PCI/portdrv: Do not setup up IRQs if there are no users")
reduced usage of IRQs when we don't think we need them. But Joey, Sergiu,
and David reported choppy GUI rendering, systems that became unresponsive
every few seconds, incorrect values reported by cpufreq, and high IRQ 16
CPU usage.
Joey bisected the issues to 0e8ae5a6ff59, so revert it until we figure out
a better solution.
Link: https://lore.kernel.org/r/20220210222717.GA658201@bhelgaas
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215533
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215546
Reported-by: Joey Corleone <joey.corleone@mail.ru>
Reported-by: Sergiu Deitsch <sergiu.deitsch@gmail.com>
Reported-by: David Spencer <dspencer577@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.16+
Cc: Jan Kiszka <jan.kiszka@siemens.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid undefined behavior when stack backtracing, which
manifests in GCC as incorrect stack addresses
- A few fixes for the XIP kernels
- A fix to tracking NUMA state on CPU hotplug
- Support for the recently relesaed binutils-2.38, which changed the
default ISA version to one without CSRs or fence.i in 'I' extension
* tag 'riscv-for-linus-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix build with binutils 2.38
riscv: cpu-hotplug: clear cpu from numa map when teardown
riscv: extable: fix err reg writing in dedicated uaccess handler
riscv/mm: Add XIP_FIXUP for riscv_pfn_base
riscv/mm: Add XIP_FIXUP for phys_ram_base
riscv: Fix XIP_FIXUP_FLASH_OFFSET
riscv: eliminate unreliable __builtin_frame_address(1)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Enable Cortex-A510 erratum 2051678 by default as we do with other
errata.
- arm64 IORT: Check the node revision for PMCG resources to cope with
old firmware based on a broken revision of the spec that had no way
to describe the second register page (when an implementation is using
the recommended RELOC_CTRS feature).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ACPI/IORT: Check node revision for PMCG resources
arm64: Enable Cortex-A510 erratum 2051678 by default
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert two commits that turned out to be problematic and fix two
issues related to wakeup from suspend-to-idle on x86.
Specifics:
- Revert a recent change that attempted to avoid issues with
conflicting address ranges during PCI initialization, because it
turned out to introduce a regression (Hans de Goede).
- Revert a change that limited EC GPE wakeups from suspend-to-idle to
systems based on Intel hardware, because it turned out that systems
based on hardware from other vendors depended on that functionality
too (Mario Limonciello).
- Fix two issues related to the handling of wakeup interrupts and
wakeup events signaled through the EC GPE during suspend-to-idle on
x86 (Rafael Wysocki)"
* tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
PM: s2idle: ACPI: Fix wakeup interrupts handling
ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- Revert debug commit that causes unexpected data corruption
- Fix muti-block reservation regression
* tag 'gfs2-v5.16-rc3-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix gfs2_release for non-writers regression
Revert "gfs2: check context in gfs2_glock_put"
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request
- nvme-tcp: fix bogus request completion when failing to send AER
(Sagi Grimberg)
- add the missing nvme_complete_req tracepoint for batched
completion (Bean Huo)
- Revert of the loop async autoclear issue that has continued to plague
us this release. A few patchsets exists to improve this, but they are
too invasive to be considered at this point (Tetsuo)
* tag 'block-5.17-2022-02-11' of git://git.kernel.dk/linux-block:
loop: revert "make autoclear operation asynchronous"
nvme-tcp: fix bogus request completion when failing to send AER
nvme: add nvme_complete_req tracepoint for batched completion
|
|
Pull io_uring fixes from Jens Axboe:
- Fix a false-positive warning from an older gcc (Alviro)
- Allow oom killer invocations from io_uring_setup (Shakeel)
* tag 'io_uring-5.17-2022-02-11' of git://git.kernel.dk/linux-block:
mm: io_uring: allow oom-killer from io_uring_setup
io_uring: Clean up a false-positive warning from GCC 9.3.0
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- use sleeping variants of GPIO accessors where needed
in gpio-aggregator
- never return kernel's internal error codes to user-space
in gpiolib core
- use the correct register for reading output values in
gpio-sifive
- fix line hogging in gpio-sim
* tag 'gpio-fixes-for-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix hogs with custom chip labels
gpio: sifive: use the correct register to read output values
gpiolib: Never return internal error codes to user space
gpio: aggregator: Fix calling into sleeping GPIO controllers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
"A couple of additional fixes for 5.17-rc4:
- Fix compilation warnings in the sata_fsl driver (powerpc) (me)
- Disable TRIM commands on M88V29 devices as these commands are
failing despite the device reporting it supports TRIM (Zoltan)"
* tag 'ata-5.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: Disable TRIM on M88V29
ata: sata_fsl: fix sscanf() and sysfs_emit() format strings
|
|
Pull drm fixes from Dave Airlie:
"Regular fixes pull, mostly i915 and amd fixes, along with a
maintainers update for fbdev core.
Otherwise just some build fixes and vc4 HDMI fixes.
fbdev:
- MAINTAINERS: add Daniel as fbdev core module maintainer
- build warning fix
- implicit type cast fix
panel:
- simple: Fix assignments from panel_dpi_probe()
privacy-screen:
- fix docs warning
i915:
- non-x86 build fix
- ttm error propogation fix
- drrs on hsw/ivb disabled
- BIOS readout fixes
- missing stackdepot oops fix
amd:
- DCN 3.1 display fixes
- GC 10.3.1 harvest fix
- Page flip irq fix
- hwmon label fix
- DCN 2.0 display fix
rockchip:
- fix HDMI error cleanup
- fix RK3399 VOP register fields
vc4:
- HDMI fixes
- remove redundant code"
* tag 'drm-fixes-2022-02-11' of git://anongit.freedesktop.org/drm/drm: (25 commits)
drm/amdgpu/display: change pipe policy for DCN 2.0
drm/amd/pm: fix hwmon node of power1_label create issue
drm/amd/display: keep eDP Vdd on when eDP stream is already enabled
drm/amd/display: fix yellow carp wm clamping
drm/amd/display: Cap pflip irqs per max otg number
drm/amdgpu: add utcl2_harvest to gc 10.3.1
display/amd: decrease message verbosity about watermarks table failure
drm/rockchip: vop: Correct RK3399 VOP register fields
drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
MAINTAINERS: Add entry for fbdev core
fbcon: Avoid 'cap' set but not used warning
drm/privacy-screen: Fix sphinx warning
drm/i915: Workaround broken BIOS DBUF configuration on TGL/RKL
drm/i915: Populate pipe dbuf slices more accurately during readout
drm/i915: Allow !join_mbus cases for adlp+ dbuf configuration
drm/i915: Fix header test for !CONFIG_X86
drm/i915/ttm: Return some errors instead of trying memcpy move
drm/i915: Disable DRRS on IVB/HSW port != A
drm/i915: Fix oops due to missing stack depot
drm/vc4: crtc: Fix redundant variable assignment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fixes to the RTLA tooling
- A fix to a tp_printk overriding tp_printk_stop_on_boot on the
command line
* tag 'trace-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix tp_printk option related with tp_printk_stop_on_boot
MAINTAINERS: Add RTLA entry
rtla: Fix segmentation fault when failing to enable -t
rtla/trace: Error message fixup
rtla/utils: Fix session duration parsing
rtla: Follow kernel version
|
|
If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.
To fix this use load-acquire/store-release so that, if the target vCPU
is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
AVIC. If AVIC has been disabled in the meanwhile, proceed with the
KVM_REQ_EVENT-based delivery.
Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and
in fact it can be handled in exactly the same way; the only difference
lies in who has set IRR, whether svm_deliver_interrupt or the processor.
Therefore, svm_complete_interrupt_delivery can be used to fix incomplete
IPI vmexits as well.
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
SVM has to set IRR for both the AVIC and the software-LAPIC case,
so pull it up to the common function that handles both configurations.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The check on the current CPU adds an extra level of indentation to
svm_deliver_avic_intr and conflates documentation on what happens
if the vCPU exits (of interest to svm_deliver_avic_intr) and migrates
(only of interest to avic_ring_doorbell, which calls get/put_cpu()).
Extract the wrmsr to a separate function and rewrite the
comment in svm_deliver_avic_intr().
Co-developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There is no vmx_pi_mmio_test file. Remove it to get rid of error while
creation of selftest archive:
rsync: [sender] link_stat "/kselftest/kvm/x86_64/vmx_pi_mmio_test" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Fixes: 6a58150859fd ("selftest: KVM: Add intra host migration tests")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Message-Id: <20220210172352.1317554-1-usama.anjum@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #3
- Fix pending state read of a HW interrupt
|
|
When a file is opened for writing, the vfs code (do_dentry_open)
calls get_write_access for the inode, thus incrementing the inode's write
count. That writer normally then creates a multi-block reservation for
the inode (i_res) that can be re-used by other writers, which speeds up
writes for applications that stupidly loop on open/write/close.
When the writes are all done, the multi-block reservation should be
deleted when the file is closed by the last "writer."
Commit 0ec9b9ea4f83 broke that concept when it moved the call to
gfs2_rs_delete before the check for FMODE_WRITE. Non-writers have no
business removing the multi-block reservations of writers. In fact, if
someone opens and closes the file for RO while a writer has a
multi-block reservation, the RO closer will delete the reservation
midway through the write, and this results in:
kernel BUG at fs/gfs2/rgrp.c:677! (or thereabouts) which is:
BUG_ON(rs->rs_requested); from function gfs2_rs_deltree.
This patch moves the check back inside the check for FMODE_WRITE.
Fixes: 0ec9b9ea4f83 ("gfs2: Check for active reservation in gfs2_release")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
It turns out that the might_sleep() call that commit 660a6126f8c3 adds
is triggering occasional data corruption in testing. We're not sure
about the root cause yet, but since this commit was added as a debugging
aid only, revert it for now.
This reverts commit 660a6126f8c3208f6df8d552039cda078a8426d1.
Fixes: 660a6126f8c3 ("gfs2: check context in gfs2_glock_put")
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Merge a revert of a problematic commit for 5.17-rc4.
* acpi-x86:
x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for 5.17-rc4
Here are some new device ids for 5.17-rc4.
All have been in linux-next with no reported issues.
* tag 'usb-serial-5.17-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: cp210x: add CPI Bulk Coin Recycler id
USB: serial: cp210x: add NCR Retail IO box id
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
USB: serial: option: add ZTE MF286D modem
USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
wireless fixes for v5.17
Second set of fixes for v5.17. This is the first pull request with
both driver and stack patches.
Most important here are a regression fix for brcmfmac USB devices and
an iwlwifi fix for use after free when the firmware was missing. We
have new maintainers for ath9k and wcn36xx as well as ath6kl is now
orphaned. Also smaller fixes to iwlwifi and stack.
|
|
The kernel test robot is reporting that xfstest which does
umount ext2 on xfs
umount xfs
sequence started failing, for commit 322c4293ecc58110 ("loop: make
autoclear operation asynchronous") removed a guarantee that fput() of
backing file is processed before lo_release() from close() returns to
user mode.
And syzbot is reporting that deferring destroy_workqueue() from
__loop_clr_fd() to a WQ context did not help [1]. Revert that commit.
Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: syzbot <syzbot+831661966588c802aae9@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20220211071554.3424-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If bpf_msg_push_data() is called with len 0 (as it happens during
selftests/bpf/test_sockmap), we do not need to do anything and can
return early.
Calling bpf_msg_push_data() with len 0 previously lead to a wrong ENOMEM
error: we later called get_order(copy + len); if len was 0, copy + len
was also often 0 and get_order() returned some undefined value (at the
moment 52). alloc_pages() caught that and failed, but then bpf_msg_push_data()
returned ENOMEM. This was wrong because we are most probably not out of
memory and actually do not need any additional memory.
Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/df69012695c7094ccb1943ca02b4920db3537466.1644421921.git.fmaurer@redhat.com
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Add selftest for nft_synproxy, from Florian Westphal.
2) xt_socket destroy path incorrectly disables IPv4 defrag for
IPv6 traffic (typo), from Eric Dumazet.
3) Fix exit value selftest nft_concat_range.sh, from Hangbin Liu.
4) nft_synproxy disables the IPv4 hooks if the IPv6 hooks fail
to be registered.
5) disable rp_filter on router in selftest nft_fib.sh, also
from Hangbin Liu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
trace_napi_poll_hit() is reading stat->dev while another thread can write
on it from dropmon_net_event()
Use READ_ONCE()/WRITE_ONCE() here, RCU rules are properly enforced already,
we only have to take care of load/store tearing.
BUG: KCSAN: data-race in dropmon_net_event / trace_napi_poll_hit
write to 0xffff88816f3ab9c0 of 8 bytes by task 20260 on cpu 1:
dropmon_net_event+0xb8/0x2b0 net/core/drop_monitor.c:1579
notifier_call_chain kernel/notifier.c:84 [inline]
raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:392
call_netdevice_notifiers_info net/core/dev.c:1919 [inline]
call_netdevice_notifiers_extack net/core/dev.c:1931 [inline]
call_netdevice_notifiers net/core/dev.c:1945 [inline]
unregister_netdevice_many+0x867/0xfb0 net/core/dev.c:10415
ip_tunnel_delete_nets+0x24a/0x280 net/ipv4/ip_tunnel.c:1123
vti_exit_batch_net+0x2a/0x30 net/ipv4/ip_vti.c:515
ops_exit_list net/core/net_namespace.c:173 [inline]
cleanup_net+0x4dc/0x8d0 net/core/net_namespace.c:597
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
read to 0xffff88816f3ab9c0 of 8 bytes by interrupt on cpu 0:
trace_napi_poll_hit+0x89/0x1c0 net/core/drop_monitor.c:292
trace_napi_poll include/trace/events/napi.h:14 [inline]
__napi_poll+0x36b/0x3f0 net/core/dev.c:6366
napi_poll net/core/dev.c:6432 [inline]
net_rx_action+0x29e/0x650 net/core/dev.c:6519
__do_softirq+0x158/0x2de kernel/softirq.c:558
do_softirq+0xb1/0xf0 kernel/softirq.c:459
__local_bh_enable_ip+0x68/0x70 kernel/softirq.c:383
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x33/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:394 [inline]
ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
wg_packet_decrypt_worker+0x73c/0x780 drivers/net/wireguard/receive.c:506
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
value changed: 0xffff88815883e000 -> 0x0000000000000000
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 26435 Comm: kworker/0:1 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
Fixes: 4ea7e38696c7 ("dropmon: add ability to detect when hardware dropsrxpackets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we fail to copy the just created file descriptor to userland, we
try to clean up by putting back 'fd' and freeing 'ib'. The code uses
put_unused_fd() for the former which is wrong, as the file descriptor
was already published by fd_install() which gets called internally by
anon_inode_getfd().
This makes the error handling code leaving a half cleaned up file
descriptor table around and a partially destructed 'file' object,
allowing userland to play use-after-free tricks on us, by abusing
the still usable fd and making the code operate on a dangling
'file->private_data' pointer.
Instead of leaving the kernel in a partially corrupted state, don't
attempt to explicitly clean up and leave this to the process exit
path that'll release any still valid fds, including the one created
by the previous call to anon_inode_getfd(). Simply return -EFAULT to
indicate the error.
Fixes: f73f7f4da581 ("iio: buffer: add ioctl() to support opening extra buffers for IIO device")
Cc: stable@kernel.org
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Alexandru Ardelean <ardeleanalex@gmail.com>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Nuno Sa <Nuno.Sa@analog.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The callback functions of clcsock will be saved and replaced during
the fallback. But if the fallback happens more than once, then the
copies of these callback functions will be overwritten incorrectly,
resulting in a loop call issue:
clcsk->sk_error_report
|- smc_fback_error_report() <------------------------------|
|- smc_fback_forward_wakeup() | (loop)
|- clcsock_callback() (incorrectly overwritten) |
|- smc->clcsk_error_report() ------------------|
So this patch fixes the issue by saving these function pointers only
once in the fallback and avoiding overwriting.
Reported-by: syzbot+4de3c0e8a263e1e499bc@syzkaller.appspotmail.com
Fixes: 341adeec9ada ("net/smc: Forward wakeup to smc socket waitqueue after fallback")
Link: https://lore.kernel.org/r/0000000000006d045e05d78776f6@google.com
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It appears that a read access to GIC[DR]_I[CS]PENDRn doesn't always
result in the pending interrupts being accurately reported if they are
mapped to a HW interrupt. This is particularily visible when acking
the timer interrupt and reading the GICR_ISPENDR1 register immediately
after, for example (the interrupt appears as not-pending while it really
is...).
This is because a HW interrupt has its 'active and pending state' kept
in the *physical* distributor, and not in the virtual one, as mandated
by the spec (this is what allows the direct deactivation). The virtual
distributor only caries the pending and active *states* (note the
plural, as these are two independent and non-overlapping states).
Fix it by reading the HW state back, either from the timer itself or
from the distributor if necessary.
Reported-by: Ricardo Koller <ricarkol@google.com>
Tested-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208123726.3604198-1-maz@kernel.org
|
|
When the gadget driver hasn't been (yet) configured, and the cable is
connected to a HOST, the SFTDISCON gets cleared unconditionally, so the
HOST tries to enumerate it.
At the host side, this can result in a stuck USB port or worse. When
getting lucky, some dmesg can be observed at the host side:
new high-speed USB device number ...
device descriptor read/64, error -110
Fix it in drd, by checking the enabled flag before calling
dwc2_hsotg_core_connect(). It will be called later, once configured,
by the normal flow:
- udc_bind_to_driver
- usb_gadget_connect
- dwc2_hsotg_pullup
- dwc2_hsotg_core_connect
Fixes: 17f934024e84 ("usb: dwc2: override PHY input signals with usb role switch support")
Cc: stable@kernel.org
Reviewed-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
Acked-by: Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Link: https://lore.kernel.org/r/1644423353-17859-1-git-send-email-fabrice.gasnier@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Check the size of the RNDIS_MSG_SET command given to us before
attempting to respond to an invalid message size.
Reported-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Cc: stable@kernel.org
Tested-by: Szymon Heidrich <szymon.heidrich@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|