summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-10-25i40e: Fix flow-type by setting GL_HASH_INSET registersSlawomir Laba
Fix setting bits for specific flow_type for GLQF_HASH_INSET register. In previous version all of the bits were set only in hena register, while in inset only one bit was set. In order for this working correctly on all types of cards these bits needs to be set correctly for both hena and inset registers. Fixes: eb0dd6e4a3b3 ("i40e: Allow RSS Hash set with less than four parameters") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221024100526.1874914-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25i40e: Fix VF hang when reset is triggered on another VFSylwester Dziedziuch
When a reset was triggered on one VF with i40e_reset_vf global PF state __I40E_VF_DISABLE was set on a PF until the reset finished. If immediately after triggering reset on one VF there is a request to reset on another it will cause a hang on VF side because VF will be notified of incoming reset but the reset will never happen because of this global state, we will get such error message: [ +4.890195] iavf 0000:86:02.1: Never saw reset and VF will hang waiting for the reset to be triggered. Fix this by introducing new VF state I40E_VF_STATE_RESETTING that will be set on a VF if it is currently resetting instead of the global __I40E_VF_DISABLE PF state. Fixes: 3ba9bcb4b68f ("i40e: add locking around VF reset") Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221024100526.1874914-2-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25i40e: Fix ethtool rx-flow-hash setting for X722Slawomir Laba
When enabling flow type for RSS hash via ethtool: ethtool -N $pf rx-flow-hash tcp4|tcp6|udp4|udp6 s|d the driver would fail to setup this setting on X722 device since it was using the mask on the register dedicated for X710 devices. Apply a different mask on the register when setting the RSS hash for the X722 device. When displaying the flow types enabled via ethtool: ethtool -n $pf rx-flow-hash tcp4|tcp6|udp4|udp6 the driver would print wrong values for X722 device. Fix this issue by testing masks for X722 device in i40e_get_rss_hash_opts function. Fixes: eb0dd6e4a3b3 ("i40e: Allow RSS Hash set with less than four parameters") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Signed-off-by: Michal Jaron <michalx.jaron@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20221024100526.1874914-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25Merge branch 'net-lan743x-pci11010-pci11414-devices-enhancements'Jakub Kicinski
Raju Lakkaraju says: ==================== net: lan743x: PCI11010 / PCI11414 devices Enhancements This patch series continues with the addition of supported features for the Ethernet function of the PCI11010 / PCI11414 devices to the LAN743x driver. ==================== Link: https://lore.kernel.org/r/20221024082516.661199-1-Raju.Lakkaraju@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25net: phy: micrel: Add PHY Auto/MDI/MDI-X set driver for KSZ9131Raju Lakkaraju
Add support for MDI-X status and configuration for KSZ9131 chips Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25net: lan743x: Add support for get_pauseparam and set_pauseparamRaju Lakkaraju
Add pause get and set functions Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25ipv6: ensure sane device mtu in tunnelsEric Dumazet
Another syzbot report [1] with no reproducer hints at a bug in ip6_gre tunnel (dev:ip6gretap0) Since ipv6 mcast code makes sure to read dev->mtu once and applies a sanity check on it (see commit b9b312a7a451 "ipv6: mcast: better catch silly mtu values"), a remaining possibility is that a layer is able to set dev->mtu to an underflowed value (high order bit set). This could happen indeed in ip6gre_tnl_link_config_route(), ip6_tnl_link_config() and ipip6_tunnel_bind_dev() Make sure to sanitize mtu value in a local variable before it is written once on dev->mtu, as lockless readers could catch wrong temporary value. [1] skbuff: skb_over_panic: text:ffff80000b7a2f38 len:40 put:40 head:ffff000149dcf200 data:ffff000149dcf2b0 tail:0xd8 end:0xc0 dev:ip6gretap0 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:120 Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 10241 Comm: kworker/1:1 Not tainted 6.0.0-rc7-syzkaller-18095-gbbed346d5a96 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022 Workqueue: mld mld_ifc_work pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_panic+0x4c/0x50 net/core/skbuff.c:116 lr : skb_panic+0x4c/0x50 net/core/skbuff.c:116 sp : ffff800020dd3b60 x29: ffff800020dd3b70 x28: 0000000000000000 x27: ffff00010df2a800 x26: 00000000000000c0 x25: 00000000000000b0 x24: ffff000149dcf200 x23: 00000000000000c0 x22: 00000000000000d8 x21: ffff80000b7a2f38 x20: ffff00014c2f7800 x19: 0000000000000028 x18: 00000000000001a9 x17: 0000000000000000 x16: ffff80000db49158 x15: ffff000113bf1a80 x14: 0000000000000000 x13: 00000000ffffffff x12: ffff000113bf1a80 x11: ff808000081c0d5c x10: 0000000000000000 x9 : 73f125dc5c63ba00 x8 : 73f125dc5c63ba00 x7 : ffff800008161d1c x6 : 0000000000000000 x5 : 0000000000000080 x4 : 0000000000000001 x3 : 0000000000000000 x2 : ffff0001fefddcd0 x1 : 0000000100000000 x0 : 0000000000000089 Call trace: skb_panic+0x4c/0x50 net/core/skbuff.c:116 skb_over_panic net/core/skbuff.c:125 [inline] skb_put+0xd4/0xdc net/core/skbuff.c:2049 ip6_mc_hdr net/ipv6/mcast.c:1714 [inline] mld_newpack+0x14c/0x270 net/ipv6/mcast.c:1765 add_grhead net/ipv6/mcast.c:1851 [inline] add_grec+0xa20/0xae0 net/ipv6/mcast.c:1989 mld_send_cr+0x438/0x5a8 net/ipv6/mcast.c:2115 mld_ifc_work+0x38/0x290 net/ipv6/mcast.c:2653 process_one_work+0x2d8/0x504 kernel/workqueue.c:2289 worker_thread+0x340/0x610 kernel/workqueue.c:2436 kthread+0x12c/0x158 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860 Code: 91011400 aa0803e1 a90027ea 94373093 (d4210000) Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221024020124.3756833-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25fs/binfmt_elf: Fix memory leak in load_elf_binary()Li Zetao
There is a memory leak reported by kmemleak: unreferenced object 0xffff88817104ef80 (size 224): comm "xfs_admin", pid 47165, jiffies 4298708825 (age 1333.476s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 60 a8 b3 00 81 88 ff ff a8 10 5a 00 81 88 ff ff `.........Z..... backtrace: [<ffffffff819171e1>] __alloc_file+0x21/0x250 [<ffffffff81918061>] alloc_empty_file+0x41/0xf0 [<ffffffff81948cda>] path_openat+0xea/0x3d30 [<ffffffff8194ec89>] do_filp_open+0x1b9/0x290 [<ffffffff8192660e>] do_open_execat+0xce/0x5b0 [<ffffffff81926b17>] open_exec+0x27/0x50 [<ffffffff81a69250>] load_elf_binary+0x510/0x3ed0 [<ffffffff81927759>] bprm_execve+0x599/0x1240 [<ffffffff8192a997>] do_execveat_common.isra.0+0x4c7/0x680 [<ffffffff8192b078>] __x64_sys_execve+0x88/0xb0 [<ffffffff83bbf0a5>] do_syscall_64+0x35/0x80 If "interp_elf_ex" fails to allocate memory in load_elf_binary(), the program will take the "out_free_ph" error handing path, resulting in "interpreter" file resource is not released. Fix it by adding an error handing path "out_free_file", which will release the file resource when "interp_elf_ex" failed to allocate memory. Fixes: 0693ffebcfe5 ("fs/binfmt_elf.c: allocate less for static executable") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221024154421.982230-1-lizetao1@huawei.com
2022-10-25exec: Copy oldsighand->action under spin-lockBernd Edlinger
unshare_sighand should only access oldsighand->action while holding oldsighand->siglock, to make sure that newsighand->action is in a consistent state. Signed-off-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Cc: stable@vger.kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/AM8PR10MB470871DEBD1DED081F9CC391E4389@AM8PR10MB4708.EURPRD10.PROD.OUTLOOK.COM
2022-10-25overflow: Refactor test skips for Clang-specific issuesKees Cook
Convert test exclusion into test skipping. This brings the logic for why a test is being skipped into the test itself, instead of having to spread ifdefs around the code. This will make cleanup easier as minimum tests get raised. Drop __maybe_unused so missed tests will be noticed again and clean up whitespace. For example, clang-11 on i386: [15:52:32] ================== overflow (18 subtests) ================== [15:52:32] [PASSED] u8_u8__u8_overflow_test [15:52:32] [PASSED] s8_s8__s8_overflow_test [15:52:32] [PASSED] u16_u16__u16_overflow_test [15:52:32] [PASSED] s16_s16__s16_overflow_test [15:52:32] [PASSED] u32_u32__u32_overflow_test [15:52:32] [PASSED] s32_s32__s32_overflow_test [15:52:32] [SKIPPED] u64_u64__u64_overflow_test [15:52:32] [SKIPPED] s64_s64__s64_overflow_test [15:52:32] [SKIPPED] u32_u32__int_overflow_test [15:52:32] [PASSED] u32_u32__u8_overflow_test [15:52:32] [PASSED] u8_u8__int_overflow_test [15:52:32] [PASSED] int_int__u8_overflow_test [15:52:32] [PASSED] shift_sane_test [15:52:32] [PASSED] shift_overflow_test [15:52:32] [PASSED] shift_truncate_test [15:52:32] [PASSED] shift_nonsense_test [15:52:32] [PASSED] overflow_allocation_test [15:52:32] [PASSED] overflow_size_helpers_test [15:52:32] ==================== [PASSED] overflow ===================== [15:52:32] ============================================================ [15:52:32] Testing complete. Ran 18 tests: passed: 15, skipped: 3 Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Tom Rix <trix@redhat.com> Cc: Daniel Latypov <dlatypov@google.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Cc: llvm@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20221006230017.1833458-1-keescook@chromium.org
2022-10-25overflow: disable failing tests for older clang versionsNick Desaulniers
Building the overflow kunit tests with clang-11 fails with: $ ./tools/testing/kunit/kunit.py run --arch=arm --make_options LLVM=1 \ overflow ... ld.lld: error: undefined symbol: __mulodi4 ... Clang 11 and earlier generate unwanted libcalls for signed output, unsigned input. Disable these tests for now, but should these become used in the kernel we might consider that as justification for dropping clang-11 support. Keep the clang-11 build alive a little bit longer. Avoid -Wunused-function warnings via __maybe_unused. To test W=1: $ make LLVM=1 -j128 defconfig $ ./scripts/config -e KUNIT -e KUNIT_ALL $ make LLVM=1 -j128 olddefconfig lib/overflow_kunit.o W=1 Link: https://github.com/ClangBuiltLinux/linux/issues/1711 Link: https://github.com/llvm/llvm-project/commit/3203143f1356a4e4e3ada231156fc6da6e1a9f9d Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221006171751.3444575-1-ndesaulniers@google.com
2022-10-25overflow: Fix kern-doc markup for functionsKees Cook
Fix the kern-doc markings for several of the overflow helpers and move their location into the core kernel API documentation, where it belongs (it's not driver-specific). Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: linux-hardening@vger.kernel.org Reviewed-by: Akira Yokosawa <akiyks@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2022-10-25tools headers cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: 257449c6a50298bd ("x86/cpufeatures: Add LbrExtV2 feature bit") This only causes these perf files to be rebuilt: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Link: https://lore.kernel.org/lkml/Y1g6vGPqPhOrXoaN@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers uapi: Sync linux/stat.h with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: 825cf206ed510c4a ("statx: add direct I/O alignment information") That add a constant that was manually added to tools/perf/trace/beauty/statx.c, at some point this should move to the shell based automated way. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/stat.h' differs from latest version at 'include/uapi/linux/stat.h' diff -u tools/include/uapi/linux/stat.h include/uapi/linux/stat.h Cc: Eric Biggers <ebiggers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/Y1gGQL5LonnuzeYd@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools include UAPI: Sync sound/asound.h copy with the kernel sourcesArnaldo Carvalho de Melo
Picking the changes from: 69ab6f5b00b1804e ("ALSA: Remove some left-over license text in include/uapi/sound/") Which entails no changes in the tooling side as it doesn't introduce new SNDRV_PCM_IOCTL_ ioctls. To silence this perf tools build warning: Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h' diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers uapi: Update linux/in.h copyArnaldo Carvalho de Melo
To get the changes in: 65b32f801bfbc54d ("uapi: move IPPROTO_L2TP to in.h") 5854a09b49574da5 ("net/ipv4: Use __DECLARE_FLEX_ARRAY() helper") That ends up automatically adding the new IPPROTO_L2TP to the socket args beautifiers: $ tools/perf/trace/beauty/socket.sh > before $ cp include/uapi/linux/in.h tools/include/uapi/linux/in.h $ tools/perf/trace/beauty/socket.sh > after $ diff -u before after --- before 2022-10-25 12:17:02.577892416 -0300 +++ after 2022-10-25 12:17:10.806113033 -0300 @@ -20,6 +20,7 @@ [98] = "ENCAP", [103] = "PIM", [108] = "COMP", + [115] = "L2TP", [132] = "SCTP", [136] = "UDPLITE", [137] = "MPLS", $ Now 'perf trace' will decode that 115 into "L2TP" and it will also be possible to use it in tracepoint filter expressions. Addresses this tools/perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h' diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Wojciech Drewek <wojciech.drewek@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/lkml/Y1f%2FGe6vjQrGjYiK@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench'Arnaldo Carvalho de Melo
We also need to add SYM_TYPED_FUNC_START() to util/include/linux/linkage.h and update tools/perf/check_headers.sh to ignore the include cfi_types.h line when checking if the kernel original files drifted from the copies we carry. This is to get the changes from: ccace936eec7b805 ("x86: Add types to indirectly called assembly functions") Addressing these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/lkml/Y1f3VRIec9EBgX6F@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers arm64: Sync arm64's cputype.h with the kernel sourcesArnaldo Carvalho de Melo
To get the changes in: 0e5d5ae837c8ce04 ("arm64: Add AMPERE1 to the Spectre-BHB affected list") That addresses this perf build warning: Warning: Kernel ABI header at 'tools/arch/arm64/include/asm/cputype.h' differs from latest version at 'arch/arm64/include/asm/cputype.h' diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: D Scott Phillips <scott@os.amperecomputing.com> https://lore.kernel.org/lkml/Y1fy5GD7ZYvkeufv@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf test: Do not fail Intel-PT misc test w/o libpythonNamhyung Kim
The virtual LBR test uses a python script to check the max size of branch stack in the Intel-PT generated LBR. But it didn't check whether python scripting is available (as it's optional). Let's skip the test if the python support is not available. Fixes: f77811a0f62577d2 ("perf test: test_intel_pt.sh: Add 9 tests") Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Ammy Yi <ammy.yi@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221021181055.60183-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf list: Fix PMU name pai_crypto in perf list on s390Thomas Richter
Commit e0b23af82d6f454c ("perf list: Add PMU pai_crypto event description for IBM z16") introduced the "Processor Activity Instrumentation" for cryptographic counters for z16. The PMU device driver exports the counters via sysfs files listed in directory /sys/devices/pai_crypto. To specify an event from that PMU, use 'perf stat -e pai_crypto/XXX/'. However the JSON file mentioned in above commit exports the counter decriptions in file pmu-events/arch/s390/cf_z16/pai.json. Rename this file to pmu-events/arch/s390/cf_z16/pai_crypto.json to make the naming consistent. Now 'perf list' shows the counter names under pai_crypto section: pai_crypto: CRYPTO_ALL [CRYPTO ALL. Unit: pai_crypto] ... Output before was pai: CRYPTO_ALL [CRYPTO ALL. Unit: pai_crypto] ... Fixes: e0b23af82d6f454c ("perf list: Add PMU pai_crypto event description for IBM z16") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20221021082557.2695382-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf record: Fix event fd racesIan Rogers
The write call may set errno which is problematic if occurring in a function also setting errno. Save and restore errno around the write call. done_fd may be used after close, clear it as part of the close and check its validity in the signal handler. Suggested-by: <gthelen@google.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anand K Mistry <amistry@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20221024011024.462518-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf bpf: Fix build with libbpf 0.7.0 by checking if ↵Arnaldo Carvalho de Melo
bpf_program__set_insns() is available During the transition to libbpf 1.0 some functions that perf used were deprecated and finally removed from libbpf, so bpf_program__set_insns() was introduced for perf to continue to use its bpf loader. But when build with LIBBPF_DYNAMIC=1 we now need to check if that function is available so that perf can build with older libbpf versions, even if the end result is emitting a warning to the user that the use of the perf BPF loader requires a newer libbpf, since bpf_program__set_insns() touches libbpf objects internal state. This affects only 'perf trace' when using bpf C code or pre-compiled bytecode as an event. Noticed on RHEL9, that has libbpf 0.7.0, where bpf_program__set_insns() isn't available. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf bpf: Fix build with libbpf 0.7.0 by adding prototype for bpf_load_program()Arnaldo Carvalho de Melo
The bpf_load_program() prototype appeared in tools/lib/bpf/bpf.h as deprecated, but nowadays its completely removed, so add it back for building with the system libbpf when using 'make LIBBPF_DYNAMIC=1'. This is a stop gap hack till we do like tools/bpf does with bpftool, i.e. bootstrap the libbpf build and install it in the perf build directory when not using 'make LIBBPF_DYNAMIC=1'. That has to be done to all libraries in tools/lib/, so tha we can remove -Itools/lib/ from the tools/perf CFLAGS. Noticed when building with LIBBPF_DYNAMIC=1 and libbpf 0.7.0 on RHEL9. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf vendor events power10: Fix hv-24x7 metric eventsKajol Jain
Testcase stat_all_metrics.sh fails in powerpc: 90: perf all metrics test : FAILED! The testcase "stat_all_metrics.sh" verifies perf stat result for all the metric events present in perf list. It runs perf metric events with various commands and expects non-empty metric result. Incase of powerpc:hv-24x7 events, some of the event count can be 0 based on system configuration. And if that event used as denominator in divide equation, it can cause divide by 0 error. The current nest_metric.json file creating divide by 0 issue for some of the metric events, which results in failure of the "stat_all_metrics.sh" test case. Most of the metrics events have cycles or an event which expect to have a larger value as denominator, so adding 1 to the denominator of the metric expression as a fix. Result in powerpc box after this patch changes: 90: perf all metrics test : Ok Fixes: a3cbcadfdfc330c2 ("perf vendor events power10: Adds 24x7 nest metric events for power10 platform") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Link: https://lore.kernel.org/r/20221014140220.122251-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf docs: Fix man page build wrt perf-arm-coresight.txtAdrian Hunter
perf build assumes documentation files starting with "perf-" are man pages but perf-arm-coresight.txt is not a man page: asciidoc: ERROR: perf-arm-coresight.txt: line 2: malformed manpage title asciidoc: ERROR: perf-arm-coresight.txt: line 3: name section expected asciidoc: FAILED: perf-arm-coresight.txt: line 3: section title expected make[3]: *** [Makefile:266: perf-arm-coresight.xml] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [Makefile.perf:895: man] Error 2 Fix by renaming it. Fixes: dc2e0fb00bb2b24f ("perf test coresight: Add relevant documentation about ARM64 CoreSight testing") Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reported-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Carsten Haitzler <carsten.haitzler@arm.com> Cc: coresight@lists.linaro.org Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/a176a3e1-6ddc-bb63-e41c-15cda8c2d5d2@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers UAPI: Sync powerpc syscall tables with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in these csets: e237506238352f3b ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs") That doesn't cause any changes in the perf tools. As a reminder, this table is used in tools perf to allow features such as: [root@five ~]# perf trace -e set_mempolicy_home_node ^C[root@five ~]# [root@five ~]# perf trace -v -e set_mempolicy_home_node Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 253729 && common_pid != 3585) && (id == 450) mmap size 528384B ^C[root@five ~] [root@five ~]# perf trace -v -e set* --max-events 5 Using CPUID AuthenticAMD-25-21-0 event qualifier tracepoint filter: (common_pid != 253734 && common_pid != 3585) && (id == 38 || id == 54 || id == 105 || id == 106 || id == 109 || id == 112 || id == 113 || id == 114 || id == 116 || id == 117 || id == 119 || id == 122 || id == 123 || id == 141 || id == 160 || id == 164 || id == 170 || id == 171 || id == 188 || id == 205 || id == 218 || id == 238 || id == 273 || id == 308 || id == 450) mmap size 528384B 0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash)) = 0 6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0 6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash)) = 0 7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0 13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash)) = 0 [root@five ~]# That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w set_mempolicy_home_node tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/s390/entry/syscalls/syscall.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_mempolicy_home_node tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450 common set_mempolicy_home_node sys_set_mempolicy_home_node $ $ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c [450] = "set_mempolicy_home_node", $ This addresses these perf build warnings: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h' diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl' diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl' diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl' diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/lkml/Y01HN2DGkWz8tC%2FJ@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25Merge tag 'platform-drivers-x86-v6.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "The only thing which stands out is a fix for a backlight regression on Chromebooks (under drivers/acpi, with ack from Rafael). Other then that nothing special to report just various small fixes and hardware-id additions" * tag 'platform-drivers-x86-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: ACPI: video: Fix missing native backlight on Chromebooks platform/x86/intel: pmc/core: Add Raptor Lake support to pmc core driver leds: simatic-ipc-leds-gpio: fix incorrect LED to GPIO mapping platform/x86/amd: pmc: Read SMU version during suspend on Cezanne systems platform/x86: thinkpad_acpi: Fix reporting a non present second fan on some models platform/x86: asus-wmi: Add support for ROG X16 tablet mode
2022-10-25net: dev: Convert sa_data to flexible array in struct sockaddrKees Cook
One of the worst offenders of "fake flexible arrays" is struct sockaddr, as it is the classic example of why GCC and Clang have been traditionally forced to treat all trailing arrays as fake flexible arrays: in the distant misty past, sa_data became too small, and code started just treating it as a flexible array, even though it was fixed-size. The special case by the compiler is specifically that sizeof(sa->sa_data) and FORTIFY_SOURCE (which uses __builtin_object_size(sa->sa_data, 1)) do not agree (14 and -1 respectively), which makes FORTIFY_SOURCE treat it as a flexible array. However, the coming -fstrict-flex-arrays compiler flag will remove these special cases so that FORTIFY_SOURCE can gain coverage over all the trailing arrays in the kernel that are _not_ supposed to be treated as a flexible array. To deal with this change, convert sa_data to a true flexible array. To keep the structure size the same, move sa_data into a union with a newly introduced sa_data_min with the original size. The result is that FORTIFY_SOURCE can continue to have no idea how large sa_data may actually be, but anything using sizeof(sa->sa_data) must switch to sizeof(sa->sa_data_min). Cc: Jens Axboe <axboe@kernel.dk> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: David Ahern <dsahern@kernel.org> Cc: Dylan Yudaken <dylany@fb.com> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Petr Machata <petrm@nvidia.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: syzbot <syzkaller@googlegroups.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221018095503.never.671-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-25media: vivid.rst: loop_video is set on the capture devnodeHans Verkuil
The example on how to use and test Capture Overlay specified the wrong video device node. Back in 2015 the loop_video control moved from the output device to the capture device, but this example code is still referring to the output video device. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: vivid: set num_in/outputs to 0 if not supportedHans Verkuil
If node_types does not have video/vbi/meta inputs or outputs, then set num_inputs/num_outputs to 0 instead of 1. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 0c90f649d2f5 (media: vivid: add vivid_create_queue() helper) Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: vivid: drop GFP_DMA32Hans Verkuil
>From what I can see, this is not needed. And since using it issues a 'deprecated' warning, just drop it. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: vivid: fix control handler mutex deadlockHans Verkuil
vivid_update_format_cap() can be called from an s_ctrl callback. In that case (keep_controls == true) no control framework functions can be called that take the control handler mutex. The new call to v4l2_ctrl_modify_dimensions() did exactly that. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 6bc7643d1b9c (media: vivid: add pixel_array test control) Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: videodev2.h: V4L2_DV_BT_BLANKING_HEIGHT should check 'interlaced'Hans Verkuil
If it is a progressive (non-interlaced) format, then ignore the interlaced timing values. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 7f68127fa11f ([media] videodev2.h: defines to calculate blanking and frame sizes) Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: v4l2-dv-timings: add sanity checks for blanking valuesHans Verkuil
Add sanity checks to v4l2_valid_dv_timings() to ensure that the provided blanking values are reasonable. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: b18787ed1ce3 ([media] v4l2-dv-timings: add new helper module) Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: vivid: dev->bitmap_cap wasn't freed in all casesHans Verkuil
Whenever the compose width/height values change, the dev->bitmap_cap vmalloc'ed array must be freed and dev->bitmap_cap set to NULL. This was done in some places, but not all. This is only an issue if overlay support is enabled and the bitmap clipping is used. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: ef834f7836ec ([media] vivid: add the video capture and output parts) Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25media: vivid: s_fbuf: add more sanity checksHans Verkuil
VIDIOC_S_FBUF is by definition a scary ioctl, which is why only root can use it. But at least check if the framebuffer parameters match that of one of the framebuffer created by vivid, and reject anything else. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: ef834f7836ec ([media] vivid: add the video capture and output parts) Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-10-25bnx2: Use kmalloc_size_roundup() to match ksize() usageKees Cook
Round up allocations with kmalloc_size_roundup() so that build_skb()'s use of ksize() is always accurate and no special handling of the memory is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE. Cc: Rasesh Mody <rmody@marvell.com> Cc: GR-Linux-NIC-Dev@marvell.com Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221022021004.gonna.489-kees@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25Merge branch 'mptcp-socket-option-updates'Paolo Abeni
Mat Martineau says: ==================== mptcp: Socket option updates Patches 1 and 3 refactor a recent socket option helper function for more generic use, and make use of it in a couple of places. Patch 2 adds TCP_FASTOPEN_NO_COOKIE functionality to MPTCP sockets, similar to TCP_FASTOPEN_CONNECT support recently added in v6.1 ==================== Link: https://lore.kernel.org/r/20221022004505.160988-1-mathew.j.martineau@linux.intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25mptcp: sockopt: use new helper for TCP_DEFER_ACCEPTMatthieu Baerts
mptcp_setsockopt_sol_tcp_defer() was doing the same thing as mptcp_setsockopt_first_sf_only() except for the returned code in case of error. Ignoring the error is needed to mimic how TCP_DEFER_ACCEPT is handled when used with "plain" TCP sockets. The specific function for TCP_DEFER_ACCEPT can be replaced by the new mptcp_setsockopt_first_sf_only() helper and errors can be ignored to stay compatible with TCP. A bit of cleanup. Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25mptcp: add TCP_FASTOPEN_NO_COOKIE supportMatthieu Baerts
The goal of this socket option is to configure MPTCP + TFO without cookie per socket. It was already possible to enable TFO without a cookie per netns by setting net.ipv4.tcp_fastopen sysctl knob to the right value. Per route was also supported by setting 'fastopen_no_cookie' option. This patch adds a per socket support like it is possible to do with TCP thanks to TCP_FASTOPEN_NO_COOKIE socket option. The only thing to do here is to relay the request to the first subflow like it is already done for TCP_FASTOPEN_CONNECT. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25mptcp: sockopt: make 'tcp_fastopen_connect' genericMatthieu Baerts
There are other socket options that need to act only on the first subflow, e.g. all TCP_FASTOPEN* socket options. This is similar to the getsockopt version. In the next commit, this new mptcp_setsockopt_first_sf_only() helper is used by other another option. Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25Merge branch 'soreuseport-fix-broken-so_incoming_cpu'Paolo Abeni
Kuniyuki Iwashima says: ==================== soreuseport: Fix broken SO_INCOMING_CPU. setsockopt(SO_INCOMING_CPU) for UDP/TCP is broken since 4.5/4.6 due to these commits: * e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") * c125e80b8868 ("soreuseport: fast reuseport TCP socket selection") These commits introduced the O(1) socket selection algorithm and removed O(n) iteration over the list, but it ignores the score calculated by compute_score(). As a result, it caused two misbehaviours: * Unconnected sockets receive packets sent to connected sockets * SO_INCOMING_CPU does not work The former is fixed by commit acdcecc61285 ("udp: correct reuseport selection with connected sockets"). This series fixes the latter and adds some tests for SO_INCOMING_CPU. ==================== Link: https://lore.kernel.org/r/20221021204435.4259-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25selftest: Add test for SO_INCOMING_CPU.Kuniyuki Iwashima
Some highly optimised applications use SO_INCOMING_CPU to make them efficient, but they didn't test if it's working correctly by getsockopt() to avoid slowing down. As a result, no one noticed it had been broken for years, so it's a good time to add a test to catch future regression. The test does 1) Create $(nproc) TCP listeners associated with each CPU. 2) Create 32 child sockets for each listener by calling sched_setaffinity() for each CPU. 3) Check if accept()ed sockets' sk_incoming_cpu matches listener's one. If we see -EAGAIN, SO_INCOMING_CPU is broken. However, we might not see any error even if broken; the kernel could miraculously distribute all SYN to correct listeners. Not to let that happen, we must increase the number of clients and CPUs to some extent, so the test requires $(nproc) >= 2 and creates 64 sockets at least. Test: $ nproc 96 $ ./so_incoming_cpu Before the previous patch: # Starting 12 tests from 5 test cases. # RUN so_incoming_cpu.before_reuseport.test1 ... # so_incoming_cpu.c:191:test1:Expected cpu (5) == i (0) # test1: Test terminated by assertion # FAIL so_incoming_cpu.before_reuseport.test1 not ok 1 so_incoming_cpu.before_reuseport.test1 ... # FAILED: 0 / 12 tests passed. # Totals: pass:0 fail:12 xfail:0 xpass:0 skip:0 error:0 After: # Starting 12 tests from 5 test cases. # RUN so_incoming_cpu.before_reuseport.test1 ... # so_incoming_cpu.c:199:test1:SO_INCOMING_CPU is very likely to be working correctly with 3072 sockets. # OK so_incoming_cpu.before_reuseport.test1 ok 1 so_incoming_cpu.before_reuseport.test1 ... # PASSED: 12 / 12 tests passed. # Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25soreuseport: Fix socket selection for SO_INCOMING_CPU.Kuniyuki Iwashima
Kazuho Oku reported that setsockopt(SO_INCOMING_CPU) does not work with setsockopt(SO_REUSEPORT) since v4.6. With the combination of SO_REUSEPORT and SO_INCOMING_CPU, we could build a highly efficient server application. setsockopt(SO_INCOMING_CPU) associates a CPU with a TCP listener or UDP socket, and then incoming packets processed on the CPU will likely be distributed to the socket. Technically, a socket could even receive packets handled on another CPU if no sockets in the reuseport group have the same CPU receiving the flow. The logic exists in compute_score() so that a socket will get a higher score if it has the same CPU with the flow. However, the score gets ignored after the blamed two commits, which introduced a faster socket selection algorithm for SO_REUSEPORT. This patch introduces a counter of sockets with SO_INCOMING_CPU in a reuseport group to check if we should iterate all sockets to find a proper one. We increment the counter when * calling listen() if the socket has SO_INCOMING_CPU and SO_REUSEPORT * enabling SO_INCOMING_CPU if the socket is in a reuseport group Also, we decrement it when * detaching a socket out of the group to apply SO_INCOMING_CPU to migrated TCP requests * disabling SO_INCOMING_CPU if the socket is in a reuseport group When the counter reaches 0, we can get back to the O(1) selection algorithm. The overall changes are negligible for the non-SO_INCOMING_CPU case, and the only notable thing is that we have to update sk_incomnig_cpu under reuseport_lock. Otherwise, the race prevents transitioning to the O(n) algorithm and results in the wrong socket selection. cpu1 (setsockopt) cpu2 (listen) +-----------------+ +-------------+ lock_sock(sk1) lock_sock(sk2) reuseport_update_incoming_cpu(sk1, val) . | /* set CPU as 0 */ |- WRITE_ONCE(sk1->incoming_cpu, val) | | spin_lock_bh(&reuseport_lock) | reuseport_grow(sk2, reuse) | . | |- more_socks_size = reuse->max_socks * 2U; | |- if (more_socks_size > U16_MAX && | | reuse->num_closed_socks) | | . | | |- RCU_INIT_POINTER(sk1->sk_reuseport_cb, NULL); | | `- __reuseport_detach_closed_sock(sk1, reuse) | | . | | `- reuseport_put_incoming_cpu(sk1, reuse) | | . | | | /* Read shutdown()ed sk1's sk_incoming_cpu | | | * without lock_sock(). | | | */ | | `- if (sk1->sk_incoming_cpu >= 0) | | . | | | /* decrement not-yet-incremented | | | * count, which is never incremented. | | | */ | | `- __reuseport_put_incoming_cpu(reuse); | | | `- spin_lock_bh(&reuseport_lock) | |- spin_lock_bh(&reuseport_lock) | |- reuse = rcu_dereference_protected(sk1->sk_reuseport_cb, ...) |- if (!reuse) | . | | /* Cannot increment reuse->incoming_cpu. */ | `- goto out; | `- spin_unlock_bh(&reuseport_lock) Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection") Reported-by: Kazuho Oku <kazuhooku@gmail.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25Merge branch 'net-ipa-validation-cleanup'Paolo Abeni
Alex Elder says: ==================== net: ipa: validation cleanup This series gathers a set of IPA driver cleanups, mostly involving code that ensures certain things are known to be correct *early* (either at build or initializatin time), so they can be assumed good during normal operation. The first removes three constant symbols, by making a (reasonable) assumption that a routing table consists of entries for the modem followed by entries for the AP, with no unused entries between them. The second removes two checks that are redundant (they verify the sizes of two memory regions are in range, which will have been done earlier for all regions). The third adds some new checks to routing and filter tables that can be done at "init time" (without requiring any access to IPA hardware). The fourth moves a check that routing and filter table addresses can be encoded within certain IPA immediate commands, so it's performed earlier; the checks can be done without touching IPA hardware. The fifth moves some other command-related checks earlier, for the same reason. The sixth removes the definition ipa_table_valid(), because what it does has become redundant. Finally, the last patch moves two more validation calls so they're done very early in the probe process. This will be required by some upcoming patches, which will record the size of the routing and filter tables at this time so they're available for subsequent initialization. ==================== Link: https://lore.kernel.org/r/20221021191340.4187935-1-elder@linaro.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25net: ipa: check table memory regions earlierAlex Elder
Verify that the sizes of the routing and filter table memory regions are valid as part of memory initialization, rather than waiting for table initialization. The main reason to do this is that upcoming patches use these memory region sizes to determine the number of entries in these tables, and we'll want to know these sizes are good sooner. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25net: ipa: kill ipa_table_valid()Alex Elder
What ipa_table_valid() (and ipa_table_valid_one(), which it calls) does is ensure that the memory regions that hold routing and filter tables have reasonable size. Specifically, it checks that the size of a region is sufficient (or rather, exactly the right size) to hold the maximum number of entries supported by the driver. (There is an additional check that's erroneous, but in practice it is never reached.) Recently ipa_table_mem_valid() was added, which is called by ipa_table_init(). That function verifies that all table memory regions are of sufficient size, and requires hashed tables to have zero size if hashing is not supported. It only ensures the filter table is large enough to hold the number of endpoints that support filtering, but that is adequate. Therefore everything that ipa_table_valid() does is redundant, so get rid of it. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25net: ipa: introduce ipa_cmd_init()Alex Elder
Currently, ipa_cmd_data_valid() is called by ipa_mem_config(). Nothing it does requires access to hardware though, so it can be done during the init phase of IPA driver startup. Create a new function ipa_cmd_init(), whose purpose is to do early initialization related to IPA immediate commands. It will call the build-time validation function, then will make the two calls made previously by ipa_cmd_data_valid(). This make ipa_cmd_data_valid() unnecessary, so get rid of it. Rename ipa_cmd_header_valid() to be ipa_cmd_header_init_local_valid(), so its name is clearer about which IPA immediate command it is associated with. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25net: ipa: verify table sizes fit in commands earlyAlex Elder
We currently verify the table size and offset fit in the immediate command fields that must encode them in ipa_table_valid_one(). We can now make this check earlier, in ipa_table_mem_valid(). The non-hashed IPv4 filter and route tables will always exist, and their sizes will match the IPv6 tables, as well as the hashed tables (if supported). So it's sufficient to verify the offset and size of the IPv4 non-hashed tables fit into these fields. Rename the function ipa_cmd_table_init_valid(), to reinforce that it is the TABLE_INIT immediate command fields we're checking. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-25net: ipa: validate IPA table memory earlierAlex Elder
Add checks in ipa_table_init() to ensure the memory regions defined for IPA filter and routing tables are valid. For routing tables, the checks ensure: - The non-hashed IPv4 and IPv6 routing tables are defined - The non-hashed IPv4 and IPv6 routing tables are the same size - The number entries in the non-hashed IPv4 routing table is enough to hold the number entries available to the modem, plus at least one usable by the AP. For filter tables, the checks ensure: - The non-hashed IPv4 and IPv6 filter tables are defined - The non-hashed IPv4 and IPv6 filter tables are the same size - The number entries in the non-hashed IPv4 filter table is enough to hold the endpoint bitmap, plus an entry for each defined endpoint that supports filtering. In addition, for both routing and filter tables: - If hashing isn't supported (IPA v4.2), hashed tables are zero size - If hashing *is* supported, all hashed tables are the same size as their non-hashed counterparts. When validating the size of routing tables, require the AP to have at least one entry (in addition to those used by the modem). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>