summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-10-30perf trace: Consider syscall aliases tooArnaldo Carvalho de Melo
When trying to trace the 'umount' syscall on x86_64 I noticed that it was failing: # trace -e umount umount /mnt event syntax error: 'umount' \___ parser error Run 'perf list' for a list of valid events Usage: perf trace [<options>] [<command>] or: perf trace [<options>] -- <command> [<options>] or: perf trace record [<options>] [<command>] or: perf trace record [<options>] -- <command> [<options>] -e, --event <event> event/syscall selector. use 'perf list' to list available events # This is because in the x86-64 we have it just as 'umount2': $ grep umount arch/x86/entry/syscalls/syscall_64.tbl 166 common umount2 __x64_sys_umount $ So if the syscall name fails, try fallbacking to looking at the aliases we have in the syscall_fmts table to then re-lookup, now: # trace -e umount umount -f /mnt umount: /mnt: not mounted. 1.759 ( 0.004 ms): umount/18365 umount2(name: 0x55fbfcbc4480, flags: 1) = -1 EINVAL Invalid argument # Time to beautify the flags arg :-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Peterson <benjamin@python.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ukweodgzbmjd25lfkgryeft1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30perf trace beauty: Beautify mount/umount's 'flags' argumentArnaldo Carvalho de Melo
# trace -e mount mount -o ro -t debugfs nodev /mnt 0.000 ( 1.040 ms): mount/27235 mount(dev_name: 0x5601cc8c64e0, dir_name: 0x5601cc8c6500, type: 0x5601cc8c6480, flags: RDONLY) = 0 # trace -e mount mount -o remount,relatime -t debugfs nodev /mnt 0.000 ( 2.946 ms): mount/27262 mount(dev_name: 0x55f4a73d64e0, dir_name: 0x55f4a73d6500, type: 0x55f4a73d6480, flags: REMOUNT|RELATIME) = 0 # trace -e mount mount -o remount,strictatime -t debugfs nodev /mnt 0.000 ( 2.934 ms): mount/27265 mount(dev_name: 0x5617f71d94e0, dir_name: 0x5617f71d9500, type: 0x5617f71d9480, flags: REMOUNT|STRICTATIME) = 0 # trace -e mount mount -o remount,suid,silent -t debugfs nodev /mnt 0.000 ( 0.049 ms): mount/27273 mount(dev_name: 0x55ad65df24e0, dir_name: 0x55ad65df2500, type: 0x55ad65df2480, flags: REMOUNT|SILENT) = 0 # trace -e mount mount -o remount,rw,sync,lazytime -t debugfs nodev /mnt 0.000 ( 2.684 ms): mount/27281 mount(dev_name: 0x561216055530, dir_name: 0x561216055550, type: 0x561216055510, flags: SYNCHRONOUS|REMOUNT|LAZYTIME) = 0 # trace -e mount mount -o remount,dirsync -t debugfs nodev /mnt 0.000 ( 3.512 ms): mount/27314 mount(dev_name: 0x55c4e7188480, dir_name: 0x55c4e7188530, type: 0x55c4e71884a0, flags: REMOUNT|DIRSYNC, data: 0x55c4e71884e0) = 0 # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Peterson <benjamin@python.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-i5ncao73c0bd02qprgrq6wb9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30perf trace beauty: Allow syscalls to mask an argument before considering itArnaldo Carvalho de Melo
Take mount's 'flags' arg, to cope with this semantic, as defined in do_mount in fs/namespace.c: /* * Pre-0.97 versions of mount() didn't have a flags word. When the * flags word was introduced its top half was required to have the * magic value 0xC0ED, and this remained so until 2.4.0-test9. * Therefore, if this magic number is present, it carries no * information and must be discarded. */ We need to mask this arg, and then see if it is zero, when we simply don't print the arg name and value. The next patch will use this for mount's 'flag' arg. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Peterson <benjamin@python.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-btue14k5jemayuykfrwsnh85@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30perf beauty: Introduce strarray__scnprintf_flags()Arnaldo Carvalho de Melo
Generalizing pkey_alloc__scnprintf_access_rights(), so that we can use it with other flags-like arguments, such as mount's mountflags argument. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Peterson <benjamin@python.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-o3ymi3104m8moaz9865g09w9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30perf beauty: Switch from GPL v2.0 to LGPL v2.1Arnaldo Carvalho de Melo
The intention is to have this as a library, since it is not perf specific at all. I did the switch for the files where I'm the only contributor, with the exception of a few lines changed by Jiri Olsa. Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-a04q6chdyjknm1hr305ulx8h@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30perf beauty: Add a generator for MS_ mount/umount's flag constantsArnaldo Carvalho de Melo
It'll use tools/include copy of linux/fs.h to generate a table to be used by tools, initially by the 'mount' and 'umount' beautifiers in 'perf trace', but that could also be used to translate from a string constant to the integer value to be used in a eBPF or tracefs tracepoint filter. When used without any args it produces: $ tools/perf/trace/beauty/mount_flags.sh static const char *mount_flags[] = { [1 ? (ilog2(1) + 1) : 0] = "RDONLY", [2 ? (ilog2(2) + 1) : 0] = "NOSUID", [4 ? (ilog2(4) + 1) : 0] = "NODEV", [8 ? (ilog2(8) + 1) : 0] = "NOEXEC", [16 ? (ilog2(16) + 1) : 0] = "SYNCHRONOUS", [32 ? (ilog2(32) + 1) : 0] = "REMOUNT", [64 ? (ilog2(64) + 1) : 0] = "MANDLOCK", [128 ? (ilog2(128) + 1) : 0] = "DIRSYNC", [1024 ? (ilog2(1024) + 1) : 0] = "NOATIME", [2048 ? (ilog2(2048) + 1) : 0] = "NODIRATIME", [4096 ? (ilog2(4096) + 1) : 0] = "BIND", [8192 ? (ilog2(8192) + 1) : 0] = "MOVE", [16384 ? (ilog2(16384) + 1) : 0] = "REC", [32768 ? (ilog2(32768) + 1) : 0] = "SILENT", [16 + 1] = "POSIXACL", [17 + 1] = "UNBINDABLE", [18 + 1] = "PRIVATE", [19 + 1] = "SLAVE", [20 + 1] = "SHARED", [21 + 1] = "RELATIME", [22 + 1] = "KERNMOUNT", [23 + 1] = "I_VERSION", [24 + 1] = "STRICTATIME", [25 + 1] = "LAZYTIME", [26 + 1] = "SUBMOUNT", [27 + 1] = "NOREMOTELOCK", [28 + 1] = "NOSEC", [29 + 1] = "BORN", [30 + 1] = "ACTIVE", [31 + 1] = "NOUSER", }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Peterson <benjamin@python.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-mgutbbkmip9gfnmd28ikg7xt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30tools include uapi: Grab a copy of linux/fs.hArnaldo Carvalho de Melo
We'll use it to create tables for the 'flags' argument to the 'mount' and 'umount' syscalls. Add it to check_headers.sh so that when a new protocol gets added we get a notification during the build process. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Benjamin Peterson <benjamin@python.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-yacf9jvkwfwg2g95r2us3xb3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30selftests/powerpc: Relax L1d miss targets for rfi_flush testNaveen N. Rao
When running the rfi_flush test, if the system is loaded, we see two issues: 1. The L1d misses when rfi_flush is disabled increase significantly due to other workloads interfering with the cache. 2. The L1d misses when rfi_flush is enabled sometimes goes slightly below the expected number of misses. To address these, let's relax the expected number of L1d misses: 1. When rfi_flush is disabled, we allow upto half the expected number of the misses for when rfi_flush is enabled. 2. When rfi_flush is enabled, we allow ~1% lower number of cache misses. Reported-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-30ALSA: firewire-lib: fix insufficient PCM rule for period/buffer sizeTakashi Sakamoto
In a former commit, PCM constraint based on LCM of SYT_INTERVAL was obsoleted with PCM rule. However, the new PCM rule brings -EINVAL in some cases that max/min values of size of buffer/period is not multiples of one of values of SYT_INTERVAL. For example, pulseaudio always fail to configure PCM substream. This commit changes strategy for the PCM rule. Although the buggy rules had a single dependency (rate from period, period from rate, rate from buffer, buffer from rate), a revised rule has double dependencies (period from period/rate, buffer from buffer/rate). A step of value is calculated with table of SYT_INTERVAL and list of available rates. This prevents interval template which brings -EINVAL to a call of snd_interval_refine(). Fixes: 5950229582bc('ALSA: firewire-lib: add PCM rules to obsolete PCM constraints based on LCM of SYT_INTERVAL') Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2018-10-30x86/numa_emulation: Fix uniform-split numa emulationDave Jiang
The numa_emulation() routine in the 'uniform' case walks through all the physical 'memblk' instances and divides them into N emulated nodes with split_nodes_size_interleave_uniform(). As each physical node is consumed it is removed from the physical memblk array in the numa_remove_memblk_from() helper. Since split_nodes_size_interleave_uniform() handles advancing the array as the 'memblk' is consumed it is expected that the base of the array is always specified as the argument. Otherwise, on multi-socket (> 2) configurations the uniform-split capability can generate an invalid numa configuration leading to boot failures with signatures like the following: rcu: INFO: rcu_sched detected stalls on CPUs/tasks: Sending NMI from CPU 0 to CPUs 2: NMI backtrace for cpu 2 CPU: 2 PID: 1332 Comm: pgdatinit0 Not tainted 4.19.0-rc8-next-20181019-baseline #59 RIP: 0010:__init_single_page.isra.74+0x81/0x90 [..] Call Trace: deferred_init_pages+0xaa/0xe3 deferred_init_memmap+0x18f/0x318 kthread+0xf8/0x130 ? deferred_free_pages.isra.105+0xc9/0xc9 ? kthread_stop+0x110/0x110 ret_from_fork+0x35/0x40 Fixes: 1f6a2c6d9f121 ("x86/numa_emulation: Introduce uniform split capability") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/154049911459.2685845.9210186007479774286.stgit@dwillia2-desk3.amr.corp.intel.com
2018-10-30x86/paravirt: Remove unused _paravirt_ident_32Juergen Gross
There is no user of _paravirt_ident_32 left in the tree. Remove it together with the related paravirt_patch_ident_32(). paravirt_patch_ident_64() can be moved inside CONFIG_PARAVIRT_XXL=y. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181030063301.15054-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-30perf/core: Clean up inconsisent indentationColin Ian King
Replace a bunch of spaces with tab, cleans up indentation Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20181029233211.21475-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-30xtensa: clean up xtensa-specific property sectionsMax Filippov
xtensa-specific property sections may be section-specific. They should be collected in the order of appearance. .gnu.linkonce.prop.* input sections should be collected into the .xt.prop output section. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-10-30xtensa: use DWARF_DEBUG in the vmlinux.lds.SMax Filippov
Xtensa doesn't have anything custom in its debug sections list. Use macro DWARF_DEBUG instead of opencoding it. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-10-30Merge branches 'pm-cpuidle' and 'pm-cpufreq'Rafael J. Wysocki
* pm-cpuidle: cpuidle: menu: Remove get_loadavg() from the performance multiplier sched: Factor out nr_iowait and nr_iowait_cpu * pm-cpufreq: cpufreq: remove unused arm_big_little_dt driver cpufreq: drop ARM_BIG_LITTLE_CPUFREQ support for ARM64 cpufreq: intel_pstate: Fix compilation for !CONFIG_ACPI
2018-10-29sparc64: Remvoe set_fs() from perf_callchain_user().David S. Miller
Ever since commit 88b0193d9418 ("perf/callchain: Force USER_DS when invoking perf_callchain_user()") the caller does this for us. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29rtnetlink: Disallow FDB configuration for non-Ethernet deviceIdo Schimmel
When an FDB entry is configured, the address is validated to have the length of an Ethernet address, but the device for which the address is configured can be of any type. The above can result in the use of uninitialized memory when the address is later compared against existing addresses since 'dev->addr_len' is used and it may be greater than ETH_ALEN, as with ip6tnl devices. Fix this by making sure that FDB entries are only configured for Ethernet devices. BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863 CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x14b/0x190 lib/dump_stack.c:113 kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956 __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645 memcmp+0x11d/0x180 lib/string.c:863 dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464 ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline] rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558 rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715 netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114 __sys_sendmsg net/socket.c:2152 [inline] __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x440ee9 Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003 RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8 R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0 R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181 kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2718 [inline] __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:996 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline] netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg net/socket.c:631 [inline] ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114 __sys_sendmsg net/socket.c:2152 [inline] __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159 do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 v2: * Make error message more specific (David) Fixes: 090096bf3db1 ("net: generic fdb support for drivers without ndo_fdb_<op>") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29sctp: check policy more carefully when getting pr statusXin Long
When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt, it doesn't correctly process the case when policy is set with SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a slab-out-of-bounds in sctp_getsockopt_pr_streamstatus(). This patch fixes it by return -EINVAL for this case. Fixes: 0ac1077e3a54 ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL") Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29sctp: clear the transport of some out_chunk_list chunks in sctp_assoc_rm_peerXin Long
If a transport is removed by asconf but there still are some chunks with this transport queuing on out_chunk_list, later an use-after-free issue will be caused when accessing this transport from these chunks in sctp_outq_flush(). This is an old bug, we fix it by clearing the transport of these chunks in out_chunk_list when removing a transport in sctp_assoc_rm_peer(). Reported-by: syzbot+56a40ceee5fb35932f4d@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29Merge branch 'mlxsw-Couple-of-fixes'David S. Miller
Ido Schimmel says: ==================== mlxsw: Couple of fixes First patch makes sure mlxsw does not ignore user requests to delete FDB entries that were learned by the device. Second patch fixes a use-after-free that can be triggered by requesting a reload via devlink when the previous reload failed. Please consider both patches for stable. They apply cleanly to both 4.18.y and 4.19.y. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29mlxsw: core: Fix devlink unregister flowShalom Toledo
After a failed reload, the driver is still registered to devlink, its devlink instance is still allocated and the 'reload_fail' flag is set. Then, in the next reload try, the driver's allocated devlink instance will be freed without unregistering from devlink and its components (e.g, resources). This scenario can cause a use-after-free if the user tries to execute command via devlink user-space tool. Fix by not freeing the devlink instance during reload (failed or not). Fixes: 24cc68ad6c46 ("mlxsw: core: Add support for reload") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29mlxsw: spectrum_switchdev: Don't ignore deletions of learned MACsPetr Machata
Demands to remove FDB entries should be honored even if the FDB entry in question was originally learned, and not added by the user. Therefore ignore the added_by_user datum for SWITCHDEV_FDB_DEL_TO_DEVICE. Fixes: 816a3bed9549 ("switchdev: Add fdb.added_by_user to switchdev notifications") Signed-off-by: Petr Machata <petrm@mellanox.com> Suggested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29hinic: Fix l4_type parameter in hinic_task_set_tunnel_l4Nathan Chancellor
Clang warns: drivers/net/ethernet/huawei/hinic/hinic_tx.c:392:34: error: implicit conversion from enumeration type 'enum hinic_l4_tunnel_type' to different enumeration type 'enum hinic_l4_offload_type' [-Werror,-Wenum-conversion] hinic_task_set_tunnel_l4(task, TUNNEL_UDP_NO_CSUM, ~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~ 1 error generated. It seems that hinic_task_set_tunnel_l4 was meant to take an enum of type hinic_l4_tunnel_type, not hinic_l4_offload_type, given both the name of the functions and the values used. Fixes: cc18a7543d2f ("net-next/hinic: add checksum offload and TSO support") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29Documentation: ip-sysctl.txt: Document tcp_fwmark_acceptLorenzo Colitti
This patch documents the tcp_fwmark_accept sysctl that was added in 3.15. Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29bonding: fix length of actor systemTobias Jungel
The attribute IFLA_BOND_AD_ACTOR_SYSTEM is sent to user space having the length of sizeof(bond->params.ad_actor_system) which is 8 byte. This patch aligns the length to ETH_ALEN to have the same MAC address exposed as using sysfs. Fixes: f87fda00b6ed2 ("bonding: prevent out of bound accesses") Signed-off-by: Tobias Jungel <tobias.jungel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12Hangbin Liu
Similiar with ipv6 mcast commit 89225d1ce6af3 ("net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.") i) RFC3376 8.12. Older Version Querier Present Timeout says: The Older Version Querier Interval is the time-out for transitioning a host back to IGMPv3 mode once an older version query is heard. When an older version query is received, hosts set their Older Version Querier Present Timer to Older Version Querier Interval. This value MUST be ((the Robustness Variable) times (the Query Interval in the last Query received)) plus (one Query Response Interval). Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT. Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response Interval) in struct in_device. Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri. We need update these values when receive IGMPv3 queries. Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-29xtensa: add NOTES section to the linker scriptMax Filippov
This section collects all source .note.* sections together in the vmlinux image. Without it .note.Linux section may be placed at address 0, while the rest of the kernel is at its normal address, resulting in a huge vmlinux.bin image that may not be linked into the xtensa Image.elf. Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-10-29Merge tag 'rpmsg-v4.20' of git://github.com/andersson/remoteprocLinus Torvalds
Pull rpmsg updates from Bjorn Andersson: "This migrates rpmsg_char to use read/write_iter to allow being operated using aio, removes the message size alignment requirements from glink, closes a potential memory leak in SMD and switches to %pOFn for printing device_node names" * tag 'rpmsg-v4.20' of git://github.com/andersson/remoteproc: rpmsg: glink: smem: Support rx peak for size less than 4 bytes rpmsg: smd: fix memory leak on channel create rpmsg: glink: Remove chunk size word align warning rpmsg: Convert to using %pOFn instead of device_node.name rpmsg: char: Migrate to iter versions of read and write
2018-10-29Merge tag 'rproc-v4.20' of git://github.com/andersson/remoteprocLinus Torvalds
Pull remoteproc updates from Bjorn Andersson: "This contains a series of patches that reworks the memory carveout handling in remoteproc, in order to allow this to be reused for statically allocated memory regions to be used for e.g. firmware. It adds support for audio DSP (both TZ-assisted and non-TZ assisted) and compute DSP on Qualcomm SDM845, TZ-assisted audio DSP, compute DSP and WiFi processor on Qualcomm QCS404 and through some renaming of the drivers cleans up the naming situation. Finally support for custom coreudmp segment handlers is added and is used in the Qualcomm modem remoteproc driver to gather memory dumps of the firmware" * tag 'rproc-v4.20' of git://github.com/andersson/remoteproc: (36 commits) remoteproc: qcom: q6v5-mss: Register segments/dumpfn for coredump remoteproc: qcom: q6v5-mss: Add custom dump function for modem remoteproc: qcom: q6v5-mss: Refactor mba load/unload sequence remoteproc: Add mechanism for custom dump function assignment remoteproc: Introduce custom dump function for each remoteproc segment remoteproc: modify vring allocation to rely on centralized carveout allocator remoteproc: qcom: q6v5: shore up resource probe handling remoteproc: qcom: qcom_q6v5_adsp: Fix some return value check remoteproc: modify rproc_handle_carveout to support pre-registered region remoteproc: add helper function to check carveout device address remoteproc: add helper function to allocate rproc_mem_entry from reserved memory remoteproc: add alloc ops in rproc_mem_entry struct remoteproc: introduce rproc_find_carveout_by_name function remoteproc: introduce rproc_add_carveout function remoteproc: add helper function to allocate and init rproc_mem_entry struct remoteproc: add name in rproc_mem_entry struct remoteproc: add release ops in rproc_mem_entry struct remoteproc: add rproc_va_to_pa function remoteproc: configure IOMMU only if device address requested remoteproc: qcom: q6v5-mss: add SCM probe dependency ...
2018-10-30xfs: remove [cm]time update from reflink callsDarrick J. Wong
Now that the vfs remap helper dirties the inode [cm]time for us, xfs no longer needs to do that on its own. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30xfs: remove xfs_reflink_remap_rangeDarrick J. Wong
Since xfs_file_remap_range is a thin wrapper, move the contents of xfs_reflink_remap_range into the shell. This cuts down on the vfs calls being made from internal xfs code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30xfs: remove redundant remap partial EOF block checksDarrick J. Wong
Now that we've moved the partial EOF block checks to the VFS helpers, we can remove the redundant functionality from XFS. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30xfs: support returning partial reflink resultsDarrick J. Wong
Back when the XFS reflink code only supported clone_file_range, we were only able to return zero or negative error codes to userspace. However, now that copy_file_range (which returns bytes copied) can use XFS' clone_file_range, we have the opportunity to return partial results. For example, if userspace sends a 1GB clone request and we run out of space halfway through, we at least can tell userspace that we completed 512M of that request like a regular write. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30xfs: clean up xfs_reflink_remap_blocks call siteDarrick J. Wong
Move the offset <-> blocks unit conversions into xfs_reflink_remap_blocks to make the call site less ugly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30xfs: fix pagecache truncation prior to reflinkDarrick J. Wong
Prior to remapping blocks, it is necessary to remove pages from the destination file's page cache. Unfortunately, the truncation is not aggressive enough -- if page size > block size, we'll end up zeroing subpage blocks instead of removing them. So, round the start offset down and the end offset up to page boundaries. We already wrote all the dirty data so the larger range shouldn't be a problem. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30ocfs2: remove ocfs2_reflink_remap_rangeDarrick J. Wong
Since ocfs2_remap_file_range is a thin shell around ocfs2_remap_remap_range, move everything from the latter into the former. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30ocfs2: support partial clone range and dedupe rangeDarrick J. Wong
Change the ocfs2 remap code to allow for returning partial results. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30ocfs2: fix pagecache truncation prior to reflinkDarrick J. Wong
Prior to remapping blocks, it is necessary to remove pages from the destination file's page cache. Unfortunately, the truncation is not aggressive enough -- if page size > block size, we'll end up zeroing subpage blocks instead of removing them. So, round the start offset down and the end offset up to page boundaries. We already wrote all the dirty data so the larger range should be fine. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30ocfs2: truncate page cache for clone destination file before remappingDarrick J. Wong
When cloning blocks into another file, truncate the page cache before we start remapping blocks so that concurrent reads wait for us to finish. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: clean up generic_remap_file_range_prep return valueDarrick J. Wong
Since the remap prep function can update the length of the remap request, we can change this function to return the usual return status instead of the odd behavior it has now. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: hide file range comparison functionDarrick J. Wong
There are no callers of vfs_dedupe_file_range_compare, so we might as well make it a static helper and remove the export. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: enable remap callers that can handle short operationsDarrick J. Wong
Plumb in a remap flag that enables the filesystem remap handler to shorten remapping requests for callers that can handle it. Now copy_file_range can report partial success (in case we run up against alignment problems, resource limits, etc.). We also enable CAN_SHORTEN for fideduperange to maintain existing userspace-visible behavior where xfs/btrfs shorten the dedupe range to avoid stale post-eof data exposure. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: plumb remap flags through the vfs dedupe functionsDarrick J. Wong
Plumb a remap_flags argument through the vfs_dedupe_file_range_one functions so that dedupe can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: plumb remap flags through the vfs clone functionsDarrick J. Wong
Plumb a remap_flags argument through the {do,vfs}_clone_file_range functions so that clone can take advantage of it. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: make remap_file_range functions take and return bytes completedDarrick J. Wong
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource limits in a graceful manner. A subsequent patch will enable copy_file_range to signal to the ->clone_file_range implementation that it can handle a short length, which will be returned in the function's return value. For now the short return is not implemented anywhere so the behavior won't change -- either copy_file_range manages to clone the entire range or it tries an alternative. Neither clone ioctl can take advantage of this, alas. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: remap helper should update destination inode metadataDarrick J. Wong
Extend generic_remap_file_range_prep to handle inode metadata updates when remapping into a file. If the operation can possibly alter the file contents, we must update the ctime and mtime and remove security privileges, just like we do for regular file writes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: pass remap flags to generic_remap_checksDarrick J. Wong
Pass the same remap flags to generic_remap_checks for consistency. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: pass remap flags to generic_remap_file_range_prepDarrick J. Wong
Plumb the remap flags through the filesystem from the vfs function dispatcher all the way to the prep function to prepare for behavior changes in subsequent patches. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: combine the clone and dedupe into a single remap_file_rangeDarrick J. Wong
Combine the clone_file_range and dedupe_file_range operations into a single remap_file_range file operation dispatch since they're fundamentally the same operation. The differences between the two can be made in the prep functions. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-30vfs: rename clone_verify_area to remap_verify_areaDarrick J. Wong
Since we use clone_verify_area for both clone and dedupe range checks, rename the function to make it clear that it's for both. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>