summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/cmpxchg.h
AgeCommit message (Collapse)Author
2025-03-24Merge tag 'x86-core-2025-03-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye" * tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits) zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault x86/asm: Make asm export of __ref_stack_chk_guard unconditional x86/mm: Only do broadcast flush from reclaim if pages were unmapped perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones perf/x86/intel, x86/cpu: Simplify Intel PMU initialization x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions x86/asm: Use asm_inline() instead of asm() in clwb() x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h> x86/hweight: Use asm_inline() instead of asm() x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm() x86/hweight: Use named operands in inline asm() x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP x86/head/64: Avoid Clang < 17 stack protector in startup code x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro x86/cpu/intel: Limit the non-architectural constant_tsc model checks x86/mm/pat: Replace Intel x86_model checks with VFM ones x86/cpu/intel: Fix fast string initialization for extended Families ...
2025-03-19x86/locking/atomic: Improve performance by using asm_inline() for atomic ↵Uros Bizjak
locking instructions According to: https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html the usage of asm pseudo directives in the asm template can confuse the compiler to wrongly estimate the size of the generated code. The LOCK_PREFIX macro expands to several asm pseudo directives, so its usage in atomic locking insns causes instruction length estimates to fail significantly (the specially instrumented compiler reports the estimated length of these asm templates to be 6 instructions long). This incorrect estimate further causes unoptimal inlining decisions, un-optimal instruction scheduling and un-optimal code block alignments for functions that use these locking primitives. Use asm_inline instead: https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html which is a feature that makes GCC pretend some inline assembler code is tiny (while it would think it is huge), instead of just asm. For code size estimation, the size of the asm is then taken as the minimum size of one instruction, ignoring how many instructions compiler thinks it is. bloat-o-meter reports the following code size increase (x86_64 defconfig, gcc-14.2.1): add/remove: 82/283 grow/shrink: 870/372 up/down: 76272/-43618 (32654) Total: Before=22770320, After=22802974, chg +0.14% with top grows (>500 bytes): Function old new delta ---------------------------------------------------------------- copy_process 6465 10191 +3726 balance_dirty_pages_ratelimited_flags 237 2949 +2712 icl_plane_update_noarm 5800 7969 +2169 samsung_input_mapping 3375 5170 +1795 ext4_do_update_inode.isra - 1526 +1526 __schedule 2416 3472 +1056 __i915_vma_resource_unhold - 946 +946 sched_mm_cid_after_execve 175 1097 +922 __do_sys_membarrier - 862 +862 filemap_fault 2666 3462 +796 nl80211_send_wiphy 11185 11874 +689 samsung_input_mapping.cold 900 1500 +600 virtio_gpu_queue_fenced_ctrl_buffer 839 1410 +571 ilk_update_pipe_csc 1201 1735 +534 enable_step - 525 +525 icl_color_commit_noarm 1334 1847 +513 tg3_read_bc_ver - 501 +501 and top shrinks (>500 bytes): Function old new delta ---------------------------------------------------------------- nl80211_send_iftype_data 580 - -580 samsung_gamepad_input_mapping.isra.cold 604 - -604 virtio_gpu_queue_ctrl_sgs 724 - -724 tg3_get_invariants 9218 8376 -842 __i915_vma_resource_unhold.part 899 - -899 ext4_mark_iloc_dirty 1735 106 -1629 samsung_gamepad_input_mapping.isra 2046 - -2046 icl_program_input_csc 2203 - -2203 copy_mm 2242 - -2242 balance_dirty_pages 2657 - -2657 These code size changes can be grouped into 4 groups: a) some functions now include once-called functions in full or in part. These are: Function old new delta ---------------------------------------------------------------- copy_process 6465 10191 +3726 balance_dirty_pages_ratelimited_flags 237 2949 +2712 icl_plane_update_noarm 5800 7969 +2169 samsung_input_mapping 3375 5170 +1795 ext4_do_update_inode.isra - 1526 +1526 that now include: Function old new delta ---------------------------------------------------------------- copy_mm 2242 - -2242 balance_dirty_pages 2657 - -2657 icl_program_input_csc 2203 - -2203 samsung_gamepad_input_mapping.isra 2046 - -2046 ext4_mark_iloc_dirty 1735 106 -1629 b) ISRA [interprocedural scalar replacement of aggregates, interprocedural pass that removes unused function return values (turning functions returning a value which is never used into void functions) and removes unused function parameters. It can also replace an aggregate parameter by a set of other parameters representing part of the original, turning those passed by reference into new ones which pass the value directly.] Top grows and shrinks of this group are listed below: Function old new delta ---------------------------------------------------------------- ext4_do_update_inode.isra - 1526 +1526 nfs4_begin_drain_session.isra - 249 +249 nfs4_end_drain_session.isra - 168 +168 __guc_action_register_multi_lrc_v70.isra 335 500 +165 __i915_gem_free_objects.isra - 144 +144 ... membarrier_register_private_expedited.isra 108 - -108 syncobj_eventfd_entry_func.isra 445 314 -131 __ext4_sb_bread_gfp.isra 140 - -140 class_preempt_notrace_destructor.isra 145 - -145 p9_fid_put.isra 151 - -151 __mm_cid_try_get.isra 238 - -238 membarrier_global_expedited.isra 294 - -294 mm_cid_get.isra 295 - -295 samsung_gamepad_input_mapping.isra.cold 604 - -604 samsung_gamepad_input_mapping.isra 2046 - -2046 c) different split points of hot/cold split that just move code around: Top grows and shrinks of this group are listed below: Function old new delta ---------------------------------------------------------------- samsung_input_mapping.cold 900 1500 +600 __i915_request_reset.cold 311 389 +78 nfs_update_inode.cold 77 153 +76 __do_sys_swapon.cold 404 455 +51 copy_process.cold - 45 +45 tg3_get_invariants.cold 73 115 +42 ... hibernate.cold 671 643 -28 copy_mm.cold 31 - -31 software_resume.cold 249 207 -42 io_poll_wake.cold 106 54 -52 samsung_gamepad_input_mapping.isra.cold 604 - -604 c) full inline of small functions with locking insn (~150 cases). These bring in most of the code size increase because the removed function code is now inlined in multiple places. E.g.: 0000000000a50e10 <release_devnum>: a50e10: 48 63 07 movslq (%rdi),%rax a50e13: 85 c0 test %eax,%eax a50e15: 7e 10 jle a50e27 <release_devnum+0x17> a50e17: 48 8b 4f 50 mov 0x50(%rdi),%rcx a50e1b: f0 48 0f b3 41 50 lock btr %rax,0x50(%rcx) a50e21: c7 07 ff ff ff ff movl $0xffffffff,(%rdi) a50e27: e9 00 00 00 00 jmp a50e2c <release_devnum+0x1c> a50e28: R_X86_64_PLT32 __x86_return_thunk-0x4 a50e2c: 0f 1f 40 00 nopl 0x0(%rax) is now fully inlined into the caller function. This is desirable due to the per function overhead of CPU bug mitigations like retpolines. FTR a) with -Os (where generated code size really matters) x86_64 defconfig object file decreases by 24.388 kbytes, representing 0.1% code size decrease: text data bss dec hex filename 23883860 4617284 814212 29315356 1bf511c vmlinux-old.o 23859472 4615404 814212 29289088 1beea80 vmlinux-new.o FTR b) clang recognizes "asm inline", but there was no difference in code sizes: text data bss dec hex filename 27577163 4503078 807732 32887973 1f5d4a5 vmlinux-clang-patched.o 27577181 4503078 807732 32887991 1f5d4b7 vmlinux-clang-unpatched.o The performance impact of the patch was assessed by recompiling fedora-41 6.13.5 kernel and running lmbench with old and new kernel. The most noticeable improvements were: Process fork+exit: 270.0952 microseconds Process fork+execve: 2620.3333 microseconds Process fork+/bin/sh -c: 6781.0000 microseconds File /usr/tmp/XXX write bandwidth: 1780350 KB/sec Pagefaults on /usr/tmp/XXX: 0.3875 microseconds to: Process fork+exit: 298.6842 microseconds Process fork+execve: 1662.7500 microseconds Process fork+/bin/sh -c: 2127.6667 microseconds File /usr/tmp/XXX write bandwidth: 1950077 KB/sec Pagefaults on /usr/tmp/XXX: 0.1958 microseconds and from: Socket bandwidth using localhost 0.000001 2.52 MB/sec 0.000064 163.02 MB/sec 0.000128 321.70 MB/sec 0.000256 630.06 MB/sec 0.000512 1207.07 MB/sec 0.001024 2004.06 MB/sec 0.001437 2475.43 MB/sec 10.000000 5817.34 MB/sec Avg xfer: 3.2KB, 41.8KB in 1.2230 millisecs, 34.15 MB/sec AF_UNIX sock stream bandwidth: 9850.01 MB/sec Pipe bandwidth: 4631.28 MB/sec to: Socket bandwidth using localhost 0.000001 3.13 MB/sec 0.000064 187.08 MB/sec 0.000128 324.12 MB/sec 0.000256 618.51 MB/sec 0.000512 1137.13 MB/sec 0.001024 1962.95 MB/sec 0.001437 2458.27 MB/sec 10.000000 6168.08 MB/sec Avg xfer: 3.2KB, 41.8KB in 1.0060 millisecs, 41.52 MB/sec AF_UNIX sock stream bandwidth: 9921.68 MB/sec Pipe bandwidth: 4649.96 MB/sec [ mingo: Prettified the changelog a bit. ] Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20250309170955.48919-1-ubizjak@gmail.com
2025-02-28x86/locking: Remove semicolon from "lock" prefixUros Bizjak
Minimum version of binutils required to compile the kernel is 2.25. This version correctly handles the "lock" prefix, so it is possible to remove the semicolon, which was used to support ancient versions of GNU as. Due to the semicolon, the compiler considers "lock; insn" as two separate instructions. Removing the semicolon makes asm length calculations more accurate, consequently making scheduling and inlining decisions of the compiler more accurate. Removing the semicolon also enables assembler checks involving lock prefix. Trying to assemble e.g. "lock andl %eax, %ebx" results in: Error: expecting lockable instruction after `lock' Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250228085149.2478245-1-ubizjak@gmail.com
2023-10-09locking/atomic/x86: Introduce arch_sync_try_cmpxchg()Uros Bizjak
Introduce the arch_sync_try_cmpxchg() macro to improve code using sync_try_cmpxchg() locking primitive. The new definitions use existing __raw_try_cmpxchg() macros, but use its own "lock; " prefix. The new macros improve assembly of the cmpxchg loop in evtchn_fifo_unmask() from drivers/xen/events/events_fifo.c from: 57a: 85 c0 test %eax,%eax 57c: 78 52 js 5d0 <...> 57e: 89 c1 mov %eax,%ecx 580: 25 ff ff ff af and $0xafffffff,%eax 585: c7 04 24 00 00 00 00 movl $0x0,(%rsp) 58c: 81 e1 ff ff ff ef and $0xefffffff,%ecx 592: 89 4c 24 04 mov %ecx,0x4(%rsp) 596: 89 44 24 08 mov %eax,0x8(%rsp) 59a: 8b 74 24 08 mov 0x8(%rsp),%esi 59e: 8b 44 24 04 mov 0x4(%rsp),%eax 5a2: f0 0f b1 32 lock cmpxchg %esi,(%rdx) 5a6: 89 04 24 mov %eax,(%rsp) 5a9: 8b 04 24 mov (%rsp),%eax 5ac: 39 c1 cmp %eax,%ecx 5ae: 74 07 je 5b7 <...> 5b0: a9 00 00 00 40 test $0x40000000,%eax 5b5: 75 c3 jne 57a <...> <...> to: 578: a9 00 00 00 40 test $0x40000000,%eax 57d: 74 2b je 5aa <...> 57f: 85 c0 test %eax,%eax 581: 78 40 js 5c3 <...> 583: 89 c1 mov %eax,%ecx 585: 25 ff ff ff af and $0xafffffff,%eax 58a: 81 e1 ff ff ff ef and $0xefffffff,%ecx 590: 89 4c 24 04 mov %ecx,0x4(%rsp) 594: 89 44 24 08 mov %eax,0x8(%rsp) 598: 8b 4c 24 08 mov 0x8(%rsp),%ecx 59c: 8b 44 24 04 mov 0x4(%rsp),%eax 5a0: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 5a4: 89 44 24 04 mov %eax,0x4(%rsp) 5a8: 75 30 jne 5da <...> <...> 5da: 8b 44 24 04 mov 0x4(%rsp),%eax 5de: eb 98 jmp 578 <...> The new code removes move instructions from 585: 5a6: and 5a9: and the compare from 5ac:. Additionally, the compiler assumes that cmpxchg success is more probable and optimizes code flow accordingly. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org
2023-06-05arch: Remove cmpxchg_doublePeter Zijlstra
No moar users, remove the monster. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230531132323.991907085@infradead.org
2023-04-29locking/x86: Define arch_try_cmpxchg_local()Uros Bizjak
Define target specific arch_try_cmpxchg_local(). This definition overrides the generic arch_try_cmpxchg_local() fallback definition and enables target-specific implementation of try_cmpxchg_local(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230405141710.3551-5-ubizjak@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-18x86: Fix various typos in commentsIngo Molnar
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
2020-10-12asm-generic/atomic: Add try_cmpxchg() fallbacksPeter Zijlstra
Only x86 provides try_cmpxchg() outside of the atomic_t interfaces, provide generic fallbacks to create this interface from the widely available cmpxchg() function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/159870621515.1229682.15506193091065001742.stgit@devnote2
2018-12-03x86: Fix various typos in commentsIngo Molnar
Go over arch/x86/ and fix common typos in comments, and a typo in an actual function argument name. No change in functionality intended. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-01x86/asm: Use CC_SET()/CC_OUT() in __cmpxchg_double()Uros Bizjak
Replace open-coded use of the SETcc instruction with CC_SET()/CC_OUT() in __cmpxchg_double(). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/CAFULd4YdvwwhXWHqqPsGk5+TLG71ozgSscTZNsqmrm+Jzg941w@mail.gmail.com
2018-07-25locking/atomics: Instrument xchg()Mark Rutland
While we instrument all of the (non-relaxed) atomic_*() functions and cmpxchg(), we missed xchg(). Let's add instrumentation for xchg(), fixing up x86 to implement arch_xchg(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andy.shevchenko@gmail.com Cc: arnd@arndb.de Cc: aryabinin@virtuozzo.com Cc: catalin.marinas@arm.com Cc: glider@google.com Cc: linux-arm-kernel@lists.infradead.org Cc: parri.andrea@gmail.com Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/20180716113017.3909-5-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12locking/atomic/x86: Switch atomic.h to use atomic-instrumented.hDmitry Vyukov
Add arch_ prefix to all atomic operations and include <asm-generic/atomic-instrumented.h>. This will allow to add KASAN instrumentation to all atomic ops. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/54f0eb64260b84199e538652e079a89b5423ad41.1517246437.git.dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-28locking/atomic/x86: Use 's64 *' for 'old' argument of atomic64_try_cmpxchg()Dmitry Vyukov
atomic64_try_cmpxchg() declares old argument as 'long *', this makes it impossible to use it in portable code. If caller passes 'long *', it becomes 32-bits on 32-bit arches. If caller passes 's64 *', it does not compile on x86_64. Change type of old argument to 's64 *' instead. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: kasan-dev@googlegroups.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/fa6f77f2375150d26ea796a77e8b59195fd2ab13.1497690003.git.dvyukov@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30locking/atomic: Fix atomic_try_cmpxchg() semanticsPeter Zijlstra
Dmitry noted that the new atomic_try_cmpxchg() primitive is broken when the old pointer doesn't point to the local stack. He writes: "Consider a classical lock-free stack push: node->next = atomic_read(&head); do { } while (!atomic_try_cmpxchg(&head, &node->next, node)); This code is broken with the current implementation, the problem is with unconditional update of *__po. In case of success it writes the same value back into *__po, but in case of cmpxchg success we might have lose ownership of some memory locations and potentially over what __po has pointed to. The same holds for the re-read of *__po. " He also points out that this makes it surprisingly different from the similar C/C++ atomic operation. After investigating the code-gen differences caused by this patch; and a number of alternatives (Linus dislikes this interface lots), we arrived at these results (size x86_64-defconfig/vmlinux): GCC-6.3.0: 10735757 cmpxchg 10726413 try_cmpxchg 10730509 try_cmpxchg + patch 10730445 try_cmpxchg-linus GCC-7 (20170327): 10709514 cmpxchg 10704266 try_cmpxchg 10704266 try_cmpxchg + patch 10704394 try_cmpxchg-linus From this we see that the patch has the advantage of better code-gen on GCC-7 and keeps the interface roughly consistent with the C language variant. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Fixes: a9ebf306f52c ("locking/atomic: Introduce atomic_try_cmpxchg()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-23locking/atomic: Introduce atomic_try_cmpxchg()Peter Zijlstra
Add a new cmpxchg interface: bool try_cmpxchg(u{8,16,32,64} *ptr, u{8,16,32,64} *val, u{8,16,32,64} new); Where the boolean returns the result of the compare; and thus if the exchange happened; and in case of failure, the new value of *ptr is returned in *val. This allows simplification/improvement of loops like: for (;;) { new = val $op $imm; old = cmpxchg(ptr, val, new); if (old == val) break; val = old; } into: do { } while (!try_cmpxchg(ptr, &val, val $op $imm)); while also generating better code (GCC6 and onwards). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-30x86/cmpxchg, locking/atomics: Remove superfluous definitionsNikolay Borisov
cmpxchg contained definitions for unused (x)add_* operations, dating back to the original ticket spinlock implementation. Nowadays these are unused so remove them. Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1474913478-17757-1-git-send-email-n.borisov.lkml@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-13arch: Remove __ARCH_HAVE_CMPXCHGThomas Gleixner
We removed the only user of this define in the rtmutex code. Get rid of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2014-07-11x86: Simplify __HAVE_ARCH_CMPXCHG testsBorislav Petkov
Both the 32-bit and 64-bit cmpxchg.h header define __HAVE_ARCH_CMPXCHG and there's ifdeffery which checks it. But since both bitness define it, we can just as well move it up to the main cmpxchg header and simpify a bit of code in doing that. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140711104338.GB17083@pd.tnic Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-25x86 cmpxchg.h: fix wrong commentLi Zhong
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-10-02UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel ↵David Howells
system headers Convert #include "..." to #include <path/...> in kernel system headers. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-04-06x86: Use correct byte-sized register constraint in __add()H. Peter Anvin
Similar to: 2ca052a x86: Use correct byte-sized register constraint in __xchg_op() ... the __add() macro also needs to use a "q" constraint in the byte-sized case, lest we try to generate an illegal register. Link: http://lkml.kernel.org/r/4F7A3315.501@goop.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Leigh Scott <leigh123linux@googlemail.com> Cc: Thomas Reitmayr <treitmayr@devbase.at> Cc: <stable@vger.kernel.org> v3.3
2012-04-06x86: Use correct byte-sized register constraint in __xchg_op()Jeremy Fitzhardinge
x86-64 can access the low half of any register, but i386 can only do it with a subset of registers. 'r' causes compilation failures on i386, but 'q' expresses the constraint properly. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Link: http://lkml.kernel.org/r/4F7A3315.501@goop.org Reported-by: Leigh Scott <leigh123linux@googlemail.com> Tested-by: Thomas Reitmayr <treitmayr@devbase.at> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org> v3.3
2012-01-26x86: Properly parenthesize cmpxchg() macro argumentsJan Beulich
Quite oddly, all of the arguments passed through from the top level macros to the second level which didn't need parentheses had them, while the only expression (involving a parameter) needing them didn't. Very recently I got bitten by the lack thereof when using something like "array + index" for the first operand, with "array" being an array more narrow than int. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4F2183A9020000780006F3E6@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-04x86: Fix and improve cmpxchg_double{,_local}()Jan Beulich
Just like the per-CPU ones they had several problems/shortcomings: Only the first memory operand was mentioned in the asm() operands, and the 2x64-bit version didn't have a memory clobber while the 2x32-bit one did. The former allowed the compiler to not recognize the need to re-load the data in case it had it cached in some register, while the latter was overly destructive. The types of the local copies of the old and new values were incorrect (the types of the pointed-to variables should be used here, to make sure the respective old/new variable types are compatible). The __dummy/__junk variables were pointless, given that local copies of the inputs already existed (and can hence be used for discarded outputs). The 32-bit variant of cmpxchg_double_local() referenced cmpxchg16b_local(). At once also: - change the return value type to what it really is: 'bool' - unify 32- and 64-bit variants - abstract out the common part of the 'normal' and 'local' variants Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F01F12A020000780006A19B@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-25x86: consolidate xchg and xadd macrosJeremy Fitzhardinge
They both have a basic "put new value in location, return old value" pattern, so they can use the same macro easily. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-11-25x86/cmpxchg: add a locked add() helperJeremy Fitzhardinge
Mostly to remove some conditional code in spinlock.h. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
2011-08-29x86, cmpxchg: Use __compiletime_error() to make usage messages a bit nicerJeremy Fitzhardinge
Use __compiletime_error() to produce a compile-time error rather than link-time, where available. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29x86: Add xadd helper macroJeremy Fitzhardinge
Add a common xadd implementation. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-29x86, cmpxchg: Unify cmpxchg into cmpxchg.hJeremy Fitzhardinge
Everything that's actually common between 32 and 64-bit is moved into cmpxchg.h. xchg/cmpxchg will fail with a link error if they're passed an unsupported size (which includes 64-bit args on 32-bit systems). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Link: http://lkml.kernel.org/r/4E5BCC40.3030501@goop.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2008-10-22x86, um: ... and asm-x86 moveAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com>