Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove unneeded semicolon in um_arch.c
um: Remove the repeated declaration
um: fix error return code in winch_tramp()
um: fix error return code in slip_open()
um: Fix stack pointer alignment
um: implement flush_cache_vmap/flush_cache_vunmap
um: add a UML specific futex implementation
um: enable the use of optimized xor routines in UML
um: Add support for host CPU flags and alignment
um: allow not setting extra rpaths in the linux binary
um: virtio/pci: enable suspend/resume
um: add PCI over virtio emulation driver
um: irqs: allow invoking time-travel handler multiple times
um: time-travel/signals: fix ndelay() in interrupt
um: expose time-travel mode to userspace side
um: export signals_enabled directly
um: remove unused smp_sigio_handler() declaration
lib: add iomem emulation (logic_iomem)
um: allow disabling NO_IOMEM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull UBIFS updates from Richard Weinberger:
- Fix for a race xattr list and modification
- Various minor fixes (spelling, return codes, ...)
* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode
ubifs: Fix spelling mistakes
ubifs: Remove ui_mutex in ubifs_xattr_get and change_xattr
ubifs: Fix races between xattr_{set|get} and listxattr operations
ubifs: fix snprintf() checking
ubifs: journal: Fix error return code in ubifs_jnl_write_inode()
|
|
Some different PMU types may have the same substring. For example, on
Icelake server we have PMU types "uncore_imc" and
"uncore_imc_free_running". Both PMU types have the substring
"uncore_imc". But the parser wrongly thinks they are the same PMU type.
We enable an imc event,
perf stat -e uncore_imc/event=0xe3/ -a -- sleep 1
Perf actually expands the event to:
uncore_imc_0/event=0xe3/
uncore_imc_1/event=0xe3/
uncore_imc_2/event=0xe3/
uncore_imc_3/event=0xe3/
uncore_imc_4/event=0xe3/
uncore_imc_5/event=0xe3/
uncore_imc_6/event=0xe3/
uncore_imc_7/event=0xe3/
uncore_imc_free_running_0/event=0xe3/
uncore_imc_free_running_1/event=0xe3/
uncore_imc_free_running_3/event=0xe3/
uncore_imc_free_running_4/event=0xe3/
That's because the "uncore_imc_free_running" matches the
pattern "uncore_imc*".
Now we check that the last characters of PMU name is '_<digit>'.
For example, for pattern "uncore_imc*", "uncore_imc_0" is parsed ok, but
"uncore_imc_free_running_0" fails.
Fixes: b2b9d3a3f0211c5d ("perf pmu: Support wildcards on pmu name in dynamic pmu events")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210701064253.1175-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Some symbols may not be resolved if a user only monitors one type of
PMU.
$ sudo perf record -e cpu_atom/branch-instructions/ ./big_small_workload
$ sudo perf report –stdio
# Overhead Command Shared Object Symbol
# ........ ......... ................. .....................
#
28.02% perf-exec [unknown] [.] 0x0000000000401cf6
11.32% perf-exec [unknown] [.] 0x0000000000401d04
10.90% perf-exec [unknown] [.] 0x0000000000401d11
10.61% perf-exec [unknown] [.] 0x0000000000401cfc
To parse symbols the metadata records, e.g., PERF_RECORD_COMM, which are
generated by the kernel, are required.
To decide whether to generate the metadata records, the kernel relies on
the event_filter_match() to filter the unrelated events.
On a hybrid system, event_filter_match() further checks the CPU mask of
the current enabled PMU. If an event is collected on the CPU which
doesn't have an enabled PMU, it's treated as an unrelated event.
The "big_small_workload" is created in a big core, but runs on a small
core. The metadata records are filtered, because the user only monitors
the PMU of the small core. The big core PMU is not enabled.
For a hybrid system, a dummy event is required to generate the complete
side-band events.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/1625760212-18441-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The Topdown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors.
The Topdown metrics L1 event was added as default in 42641d6f4d15e6db
("perf stat: Add Topdown metrics events as default events")
From the Sapphire Rapids server and later platforms, the same dedicated
"metrics" register is extended to support both L1 and L2 events.
Add both L1 and L2 Topdown metrics events as default to enrich the
default measuring information if the new measurement register is
available.
On legacy systems there is no change to avoid extra multiplexing.
The topdown_level indicates the max metrics level for the top-down
statistics. Set it to 2 to display all L1 and L2 Topdown metrics events.
With the patch:
$ perf stat sleep 1
Performance counter stats for 'sleep 1':
0.59 msec task-clock # 0.001 CPUs utilized
1 context-switches # 1.687 K/sec
0 cpu-migrations # 0.000 /sec
76 page-faults # 128.198 K/sec
1,405,318 cycles # 2.371 GHz
1,471,136 instructions # 1.05 insn per cycle
310,132 branches # 523.136 M/sec
10,435 branch-misses # 3.36% of all branches
8,431,908 slots # 14.223 G/sec
1,554,116 topdown-retiring # 18.4% retiring
1,289,585 topdown-bad-spec # 15.2% bad speculation
2,810,636 topdown-fe-bound # 33.2% frontend bound
2,810,636 topdown-be-bound # 33.2% backend bound
231,464 topdown-heavy-ops # 2.7% heavy operations # 15.6% light operations
1,223,453 topdown-br-mispredict # 14.5% branch mispredict # 0.8% machine clears
1,884,779 topdown-fetch-lat # 22.3% fetch latency # 10.9% fetch bandwidth
1,454,917 topdown-mem-bound # 17.2% memory bound # 16.0% Core bound
1.001179699 seconds time elapsed
0.000000000 seconds user
0.001238000 seconds sys
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/1625760169-18396-1-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Move the implementation of evlist__set_leader() to a new libperf
perf_evlist__set_leader() function with the same functionality make it a
libperf exported API.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Move evsel::nr_groups to perf_evsel::nr_groups, so we can move the group
interface to libperf.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Move evsel::leader to perf_evsel::leader, so we can move the group
interface to libperf.
Also add several evsel helpers to ease up the transition:
struct evsel *evsel__leader(struct evsel *evsel);
- get leader evsel
bool evsel__has_leader(struct evsel *evsel, struct evsel *leader);
- true if evsel has leader as leader
bool evsel__is_leader(struct evsel *evsel);
- true if evsel is itw own leader
void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
- set leader for evsel
Committer notes:
Fix this when building with 'make BUILD_BPF_SKEL=1'
tools/perf/util/bpf_counter.c
- if (evsel->leader->core.nr_members > 1) {
+ if (evsel->core.leader->nr_members > 1) {
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Move evsel::idx to perf_evsel::idx, so we can move the group interface
to libperf.
Committer notes:
Fixup evsel->idx usage in tools/perf/util/bpf_counter_cgroup.c, that
appeared in my tree in my local tree.
Also fixed up these:
$ find tools/perf/ -name "*.[ch]" | xargs grep 'evsel->idx'
tools/perf/ui/gtk/annotate.c: evsel->idx + i);
tools/perf/ui/gtk/annotate.c: evsel->idx);
$
That running 'make -C tools/perf build-test' caught.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Requested-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210706151704.73662-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Ext4 regression and bug fixes"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: inline jbd2_journal_[un]register_shrinker()
ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT
ext4: fix possible UAF when remounting r/o a mmp-protected file system
ext4: use ext4_grp_locked_error in mb_find_extent
ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock
Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
|
|
Pull ceph updates from Ilya Dryomov:
"We have new filesystem client metrics for reporting I/O sizes from
Xiubo, two patchsets from Jeff that begin to untangle some heavyweight
blocking locks in the filesystem and a bunch of code cleanups"
* tag 'ceph-for-5.14-rc1' of git://github.com/ceph/ceph-client:
ceph: take reference to req->r_parent at point of assignment
ceph: eliminate ceph_async_iput()
ceph: don't take s_mutex in ceph_flush_snaps
ceph: don't take s_mutex in try_flush_caps
ceph: don't take s_mutex or snap_rwsem in ceph_check_caps
ceph: eliminate session->s_gen_ttl_lock
ceph: allow ceph_put_mds_session to take NULL or ERR_PTR
ceph: clean up locking annotation for ceph_get_snap_realm and __lookup_snap_realm
ceph: add some lockdep assertions around snaprealm handling
ceph: decoding error in ceph_update_snap_realm should return -EIO
ceph: add IO size metrics support
ceph: update and rename __update_latency helper to __update_stdev
ceph: simplify the metrics struct
libceph: fix doc warnings in cls_lock_client.c
libceph: remove unnecessary ret variable in ceph_auth_init()
libceph: fix some spelling mistakes
libceph: kill ceph_none_authorizer::reply_buf
ceph: make ceph_queue_cap_snap static
ceph: make ceph_netfs_read_ops static
ceph: remove bogus checks and WARN_ONs from ceph_set_page_dirty
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Features:
- Multiple patches to add support for fcntl() leases over NFSv4.
- A sysfs interface to display more information about the various
transport connections used by the RPC client
- A sysfs interface to allow a suitably privileged user to offline a
transport that may no longer point to a valid server
- A sysfs interface to allow a suitably privileged user to change the
server IP address used by the RPC client
Stable fixes:
- Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues
Bugfixes:
- SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base()
- SUNRPC: prevent port reuse on transports which don't request it.
- NFSv3: Fix memory leak in posix_acl_create()
- NFS: Various fixes to attribute revalidation timeouts
- NFSv4: Fix handling of non-atomic change attribute updates
- NFSv4: If a server is down, don't cause mounts to other servers to
hang as well
- pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT
- NFS: Fix mount failures due to incorrect setting of the
has_sec_mnt_opts filesystem flag
- NFS: Ensure nfs_readpage returns promptly when an internal error
occurs
- NFS: Fix fscache read from NFS after cache error
- pNFS: Various bugfixes around the LAYOUTGET operation"
* tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits)
NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3
NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times
NFSv4/pnfs: Clean up layout get on open
NFSv4/pnfs: Fix layoutget behaviour after invalidation
NFSv4/pnfs: Fix the layout barrier update
NFS: Fix fscache read from NFS after cache error
NFS: Ensure nfs_readpage returns promptly when internal error occurs
sunrpc: remove an offlined xprt using sysfs
sunrpc: provide showing transport's state info in the sysfs directory
sunrpc: display xprt's queuelen of assigned tasks via sysfs
sunrpc: provide multipath info in the sysfs directory
NFSv4.1 identify and mark RPC tasks that can move between transports
sunrpc: provide transport info in the sysfs directory
SUNRPC: take a xprt offline using sysfs
sunrpc: add dst_attr attributes to the sysfs xprt directory
SUNRPC for TCP display xprt's source port in sysfs xprt_info
SUNRPC query transport's source port
SUNRPC display xprt's main value in sysfs's xprt_info
SUNRPC mark the first transport
sunrpc: add add sysfs directory per xprt under each xprt_switch
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've improved the compression support especially for
Android such as allowing compression for mmap files, replacing the
immutable bit with internal bit to prohibits data writes explicitly,
and adding a mount option, "compress_cache", to improve random reads.
And, we added "readonly" feature to compact the partition w/
compression enabled, which will be useful for Android RO partitions.
Enhancements:
- support compression for mmap file
- use an f2fs flag instead of IMMUTABLE bit for compression
- support RO feature w/ extent_cache
- fully support swapfile with file pinning
- improve atgc tunability
- add nocompress extensions to unselect files for compression
Bug fixes:
- fix false alaram on iget failure during GC
- fix race condition on global pointers when there are multiple f2fs
instances
- add MODULE_SOFTDEP for initramfs
As usual, we've also cleaned up some places for better code
readability (e.g., sysfs/feature, debugging messages, slab cache
name, and docs)"
* tag 'f2fs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits)
f2fs: drop dirty node pages when cp is in error status
f2fs: initialize page->private when using for our internal use
f2fs: compress: add nocompress extensions support
MAINTAINERS: f2fs: update my email address
f2fs: remove false alarm on iget failure during GC
f2fs: enable extent cache for compression files in read-only
f2fs: fix to avoid adding tab before doc section
f2fs: introduce f2fs_casefolded_name slab cache
f2fs: swap: support migrating swapfile in aligned write mode
f2fs: swap: remove dead codes
f2fs: compress: add compress_inode to cache compressed blocks
f2fs: clean up /sys/fs/f2fs/<disk>/features
f2fs: add pin_file in feature list
f2fs: Advertise encrypted casefolding in sysfs
f2fs: Show casefolding support only when supported
f2fs: support RO feature
f2fs: logging neatening
f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit
f2fs: compress: remove unneeded preallocation
f2fs: atgc: export entries for better tunability via sysfs
...
|
|
Pull yet more updates from Andrew Morton:
"54 patches.
Subsystems affected by this patch series: lib, mm (slub, secretmem,
cleanups, init, pagemap, and mremap), and debug"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (54 commits)
powerpc/mm: enable HAVE_MOVE_PMD support
powerpc/book3s64/mm: update flush_tlb_range to flush page walk cache
mm/mremap: allow arch runtime override
mm/mremap: hold the rmap lock in write mode when moving page table entries.
mm/mremap: use pmd/pud_poplulate to update page table entries
mm/mremap: don't enable optimized PUD move if page table levels is 2
mm/mremap: convert huge PUD move to separate helper
selftest/mremap_test: avoid crash with static build
selftest/mremap_test: update the test to handle pagesize other than 4K
mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *
mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *
kdump: use vmlinux_build_id to simplify
buildid: fix kernel-doc notation
buildid: mark some arguments const
scripts/decode_stacktrace.sh: indicate 'auto' can be used for base path
scripts/decode_stacktrace.sh: silence stderr messages from addr2line/nm
scripts/decode_stacktrace.sh: support debuginfod
x86/dumpstack: use %pSb/%pBb for backtrace printing
arm64: stacktrace: use %pSb for backtrace printing
module: add printk formats to add module build ID to stacktraces
...
|
|
Colin reports that Coverity complains about checking for poll being
non-zero after having dereferenced it multiple times. This is a valid
complaint, and actually a leftover from back when this code was based
on the aio poll code.
Kill the redundant check.
Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- Fix a MIPS bug where irqdomain loopkups could occur in a context
where RCU is not allowed
- Fix a documentation bug for handle_domain_irq
|
|
Accessing raw timers (currently only CLOCK_MONOTONIC_RAW) through VDSO
doesn't return the correct time when using the GIC as clock source.
The address of the GIC mapped page is in this case not calculated
correctly. The GIC mapped page is calculated from the VDSO data by
subtracting PAGE_SIZE:
void *get_gic(const struct vdso_data *data) {
return (void __iomem *)data - PAGE_SIZE;
}
However, the data pointer is not page aligned for raw clock sources.
This is because the VDSO data for raw clock sources (CS_RAW = 1) is
stored after the VDSO data for coarse clock sources (CS_HRES_COARSE = 0).
Therefore, only the VDSO data for CS_HRES_COARSE is page aligned:
+--------------------+
| |
| vd[CS_RAW] | ---+
| vd[CS_HRES_COARSE] | |
+--------------------+ | -PAGE_SIZE
| | |
| GIC mapped page | <--+
| |
+--------------------+
When __arch_get_hw_counter() is called with &vd[CS_RAW], get_gic returns
the wrong address (somewhere inside the GIC mapped page). The GIC counter
values are not returned which results in an invalid time.
Fixes: a7f4df4e21dd ("MIPS: VDSO: Add implementations of gettimeofday() and clock_gettime()")
Signed-off-by: Martin Fäcknitz <faecknitz@hotsplots.de>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
This adds some extra noise to the tailcall_bpf2bpf4 tests that will cause
verify to patch insns. This then moves around subprog start/end insn
index and poke descriptor insn index to ensure that verify and JIT will
continue to track these correctly.
If done correctly verifier should pass this program same as before and
JIT should emit tail call logic.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210707223848.14580-3-john.fastabend@gmail.com
|
|
Subprograms are calling map_poke_track(), but on program release there is no
hook to call map_poke_untrack(). However, on program release, the aux memory
(and poke descriptor table) is freed even though we still have a reference to
it in the element list of the map aux data. When we run map_poke_run(), we then
end up accessing free'd memory, triggering KASAN in prog_array_map_poke_run():
[...]
[ 402.824689] BUG: KASAN: use-after-free in prog_array_map_poke_run+0xc2/0x34e
[ 402.824698] Read of size 4 at addr ffff8881905a7940 by task hubble-fgs/4337
[ 402.824705] CPU: 1 PID: 4337 Comm: hubble-fgs Tainted: G I 5.12.0+ #399
[ 402.824715] Call Trace:
[ 402.824719] dump_stack+0x93/0xc2
[ 402.824727] print_address_description.constprop.0+0x1a/0x140
[ 402.824736] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824740] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824744] kasan_report.cold+0x7c/0xd8
[ 402.824752] ? prog_array_map_poke_run+0xc2/0x34e
[ 402.824757] prog_array_map_poke_run+0xc2/0x34e
[ 402.824765] bpf_fd_array_map_update_elem+0x124/0x1a0
[...]
The elements concerned are walked as follows:
for (i = 0; i < elem->aux->size_poke_tab; i++) {
poke = &elem->aux->poke_tab[i];
[...]
The access to size_poke_tab is a 4 byte read, verified by checking offsets
in the KASAN dump:
[ 402.825004] The buggy address belongs to the object at ffff8881905a7800
which belongs to the cache kmalloc-1k of size 1024
[ 402.825008] The buggy address is located 320 bytes inside of
1024-byte region [ffff8881905a7800, ffff8881905a7c00)
The pahole output of bpf_prog_aux:
struct bpf_prog_aux {
[...]
/* --- cacheline 5 boundary (320 bytes) --- */
u32 size_poke_tab; /* 320 4 */
[...]
In general, subprograms do not necessarily manage their own data structures.
For example, BTF func_info and linfo are just pointers to the main program
structure. This allows reference counting and cleanup to be done on the latter
which simplifies their management a bit. The aux->poke_tab struct, however,
did not follow this logic. The initial proposed fix for this use-after-free
bug further embedded poke data tracking into the subprogram with proper
reference counting. However, Daniel and Alexei questioned why we were treating
these objects special; I agree, its unnecessary. The fix here removes the per
subprogram poke table allocation and map tracking and instead simply points
the aux->poke_tab pointer at the main programs poke table. This way, map
tracking is simplified to the main program and we do not need to manage them
per subprogram.
This also means, bpf_prog_free_deferred(), which unwinds the program reference
counting and kfrees objects, needs to ensure that we don't try to double free
the poke_tab when free'ing the subprog structures. This is easily solved by
NULL'ing the poke_tab pointer. The second detail is to ensure that per
subprogram JIT logic only does fixups on poke_tab[] entries it owns. To do
this, we add a pointer in the poke structure to point at the subprogram value
so JITs can easily check while walking the poke_tab structure if the current
entry belongs to the current program. The aux pointer is stable and therefore
suitable for such comparison. On the jit_subprogs() error path, we omit
cleaning up the poke->aux field because these are only ever referenced from
the JIT side, but on error we will never make it to the JIT, so its fine to
leave them dangling. Removing these pointers would complicate the error path
for no reason. However, we do need to untrack all poke descriptors from the
main program as otherwise they could race with the freeing of JIT memory from
the subprograms. Lastly, a748c6975dea3 ("bpf: propagate poke descriptors to
subprograms") had an off-by-one on the subprogram instruction index range
check as it was testing 'insn_idx >= subprog_start && insn_idx <= subprog_end'.
However, subprog_end is the next subprogram's start instruction.
Fixes: a748c6975dea3 ("bpf: propagate poke descriptors to subprograms")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210707223848.14580-2-john.fastabend@gmail.com
|
|
Since d4a45c68dc81 ("irqdomain: Protect the linear revmap with RCU"),
any irqdomain lookup requires the RCU read lock to be held.
This assumes that the architecture code will be structured such as
irq_enter() will be called *before* the interrupt is looked up
in the irq domain. However, this isn't the case for MIPS, and a number
of drivers are structured to do it the other way around when handling
an interrupt in their root irqchip (secondary irqchips are OK by
construction).
This results in a RCU splat on a lockdep-enabled kernel when the kernel
takes an interrupt from idle, as reported by Guenter Roeck.
Note that this could have fired previously if any driver had used
tree-based irqdomain, which always had the RCU requirement.
To solve this, provide a MIPS-specific helper (do_domain_IRQ())
as the pendent of do_IRQ() that will do thing in the right order
(and maybe save some cycles in the process).
Ideally, MIPS would be moved over to using handle_domain_irq(),
but that's much more ambitious.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
[maz: add dependency on CONFIG_IRQ_DOMAIN after report from the kernelci bot]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Serge Semin <fancer.lancer@gmail.com>
Link: https://lore.kernel.org/r/20210705172352.GA56304@roeck-us.net
Link: https://lore.kernel.org/r/20210706110647.3979002-1-maz@kernel.org
|
|
Make sure that we disable each of the TX and RX queues in the TDMA and
RDMA control registers. This is a correctness change to be symmetrical
with the code that enables the TX and RX queues.
Tested-by: Maxime Ripard <maxime@cerno.tech>
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the nice helpers to initialize and the uid/gid/cred_uid when passed as mount arguments.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Ivan Mikhaylov says:
====================
net/ncsi: Add NCSI Intel OEM command to keep PHY link up
Add NCSI Intel OEM command to keep PHY link up and prevents any channel
resets during the host load on i210. Also includes dummy response handler for
Intel manufacturer id.
Changes from v1:
1. sparse fixes about casts
2. put it after ncsi_dev_state_probe_cis instead of
ncsi_dev_state_probe_channel because sometimes channel is not ready
after it
3. inl -> intel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the dummy response handler for Intel boards to prevent incorrect
handling of OEM commands.
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows to keep PHY link up and prevents any channel resets during
the host load.
It is KEEP_PHY_LINK_UP option(Veto bit) in i210 datasheet which
block PHY reset and power state changes.
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sparse reports:
net/ncsi/ncsi-rsp.c:406:24: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:732:33: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:756:25: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:779:25: warning: cast to restricted __be32
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
PHY_SPARX5_SERDES depends on OF so SPARX5_SWITCH should also depend
on OF since 'select' does not follow any dependencies.
WARNING: unmet direct dependencies detected for PHY_SPARX5_SERDES
Depends on [n]: (ARCH_SPARX5 || COMPILE_TEST [=n]) && OF [=n] && HAS_IOMEM [=y]
Selected by [y]:
- SPARX5_SWITCH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && NET_SWITCHDEV [=y] && HAS_IOMEM [=y]
Fixes: 3cfa11bac9bb ("net: sparx5: add the basic sparx5 driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IRQs are requested during driver's ndo_open() and then later
freed up in disable_interrupts() during driver unload.
A race exists where driver can set the CXGB4_FULL_INIT_DONE
flag in ndo_open() after the disable_interrupts() in driver
unload path checks it, and hence misses calling free_irq().
Fix by unregistering netdevice first and sync with driver's
ndo_open(). This ensures disable_interrupts() checks the flag
correctly and frees up the IRQs properly.
Fixes: b37987e8db5f ("cxgb4: Disable interrupts and napi before unregistering netdev")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When reboot system, no power cycles, firmware is already downloaded,
return -EIO will break driver as error:
mt7921e: probe of 0000:03:00.0 failed with error -5
Skip firmware download and continue to probe.
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Fixes: 1c099ab44727c ("mt76: mt7921: add MCU support")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since Mikrotik 10/25G NIC MDIO op emulation is not 100% reliable,
on rare occasions it can happen that some physical functions of
the NIC do not get initialized due to timeouted early MDIO op.
This changes the atl1c probe on Mikrotik 10/25G NIC not to
depend on MDIO op emulation.
Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the
preempt_count of the requested CPU's idle task: it unconditionally writes
to the current CPU's. This clearly conflicts with idle_threads_init(),
which intends to initialize *all* the idle tasks, including their
preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count).
Unfortunately, it seems the way s390 does things doesn't let us initialize
every possible CPU's preempt_count early on, as the pages where this
resides are only allocated when a CPU is brought up and are freed when it
is brought down.
Let the arch-specific code set a CPU's preempt_count when its lowcore is
allocated, and turn init_idle_preempt_count() into an empty stub.
Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210707163338.1623014-1-valentin.schneider@arm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Both clang and gcc (for -march=z13 and later) align functions to 16
bytes at -O2 to benefit branch prediction.
Make asm symbols alignment consistent with that.
This also benefits potential ftrace code patching, which is currently
able to patch 8 aligned bytes at once.
With defconfig this currently increases .text size by 4104 bytes.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Lower case matches the call_on_stack() macro and is easier to read.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make sure the to be called function takes no arguments (and returns void).
Otherwise usage of CALL_ON_STACK_NORETURN() would generate broken code.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The existing CALL_ON_STACK() macro allows for subtle bugs:
- There is no type checking of the function that is being called. That
is: missing or too many arguments do not cause any compile error or
warning. The same is true if the return type of the called function
changes. This can lead to quite random bugs.
- Sign and zero extension of arguments is missing. Given that the s390
C ABI requires that the caller of a function performs proper sign
and zero extension this can also lead to subtle bugs.
- If arguments to the CALL_ON_STACK() macros contain functions calls
register corruption can happen due to register asm constructs being
used.
Therefore introduce a new call_on_stack() macro which is supposed to
fix all these problems.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Make on_async_stack() a bit more readable, even though as usual it
depends if one considers "!!!" readable or not.
At least the new construct to check if the async stack is in use or
not is a bit shorter and generates slightly better code.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Move do_softirq_own_stack() to proper header file so it can be
inlined; saving a few cycles.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
do_softirq_own_stack() is always called from task context and
therefore it is not necessary to check if the async stack is
currently used.
Remove the check and directly switch to async stack.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
This is the second part of the cleanup for the header file ap.h
to remove the register asm statements. This patch deals with
the inline ap_dqap() function where within the assembler code
an odd register of an register pair is to be addressed.
[hca@linux.ibm.com: this intentionally breaks compilation with any
clang compilers prior to llvm-project commit 458eac257377 ("[SystemZ]
Support the 'N' code for the odd register in inline-asm.").
This is hopefully the last clang kernel compile breakage caused by
incompatibilities between gcc and clang.]
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
PIF_SYSCALL_RESTART is now only used to restart execve when loading
PGSTE binaries. Rename the flag to reflect that, and avoid people
thinking that this bit has anything to do with generic syscall
restarting.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
On s390, execve might have to be restarted for PGSTE binaries
like kvm. In the past this was done via the PIF_SYSCALL_RESTART
bit. However, with the recent changes, syscalls are now restarted
differently. Now that execve() is the only call that might get
restarted via PIF_SYSCALL_RESTART, move the loop to do_syscall().
This also has the advantage that the restart is no longer visible
to userspace.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
{rt_}sigreturn is now called from the vdso, so we no longer
need the svc on the stack, and therefore no hack to support that
mechanism on machines with non-executable stack.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
with generic entry, there's a bug when it comes to restarting of signals.
The failing sequence is:
a) a signal is coming in, and no handler is registered, so the lower
part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c
sets PIF_SYSCALL_RESTART.
b) a second signal gets pending while the kernel is still in the exit
loop, and for that one, a handler exists.
c) The first part of arch_do_signal_or_restart() is called. That part
calls handle_signal(), which sets up stack + registers for handling
the signal.
d) __do_syscall() in arch/s390/kernel/syscall.c checks for
PIF_SYSCALL_RESTART right before leaving to userspace. If it is set,
it restart's the syscall. However, the registers are already setup
for handling a signal from c). The syscall is now restarted with the
wrong arguments.
Change the code to:
- use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because
we cannot rewind and go back to userspace on s390 because the system call
number might be encoded in the svc instruction.
- for all other syscalls we rewind the PSW and return to userspace.
Cc: <stable@kernel.org> # v5.12+ d57778feb987: s390/vdso: always enable vdso
Cc: <stable@kernel.org> # v5.12+ 686341f2548b: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Cc: <stable@kernel.org> # v5.12+ 43e1f76b0b69: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Cc: <stable@kernel.org> # v5.12+ 779df2248739: s390/vdso: add minimal compat vdso
Cc: <stable@kernel.org> # v5.12+
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|