summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/uaccess_64.h
AgeCommit message (Collapse)Author
2025-04-13x86/uaccess: Use asm_inline() instead of asm() in __untagged_addr()Uros Bizjak
Use asm_inline() to instruct the compiler that the size of asm() is the minimum size of one instruction, ignoring how many instructions the compiler thinks it is. ALTERNATIVE macro that expands to several pseudo directives causes instruction length estimate to count more than 20 instructions. bloat-o-meter reports minimal code size increase (x86_64 defconfig with CONFIG_ADDRESS_MASKING, gcc-14.2.1): add/remove: 2/2 grow/shrink: 5/1 up/down: 2365/-1995 (370) Function old new delta ----------------------------------------------------- do_get_mempolicy - 1449 +1449 copy_nodes_to_user - 226 +226 __x64_sys_get_mempolicy 35 213 +178 syscall_user_dispatch_set_config 157 332 +175 __ia32_sys_get_mempolicy 31 206 +175 set_syscall_user_dispatch 29 181 +152 __do_sys_mremap 2073 2083 +10 sp_insert 133 117 -16 task_set_syscall_user_dispatch 172 - -172 kernel_get_mempolicy 1807 - -1807 Total: Before=21423151, After=21423521, chg +0.00% The code size increase is due to the compiler inlining more functions that inline untagged_addr(), e.g: task_set_syscall_user_dispatch() is now fully inlined in set_syscall_user_dispatch(): 000000000010b7e0 <set_syscall_user_dispatch>: 10b7e0: f3 0f 1e fa endbr64 10b7e4: 49 89 c8 mov %rcx,%r8 10b7e7: 48 89 d1 mov %rdx,%rcx 10b7ea: 48 89 f2 mov %rsi,%rdx 10b7ed: 48 89 fe mov %rdi,%rsi 10b7f0: 65 48 8b 3d 00 00 00 mov %gs:0x0(%rip),%rdi 10b7f7: 00 10b7f8: e9 03 fe ff ff jmp 10b600 <task_set_syscall_user_dispatch> that after inlining becomes: 000000000010b730 <set_syscall_user_dispatch>: 10b730: f3 0f 1e fa endbr64 10b734: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax 10b73b: 00 10b73c: 48 85 ff test %rdi,%rdi 10b73f: 74 54 je 10b795 <set_syscall_user_dispatch+0x65> 10b741: 48 83 ff 01 cmp $0x1,%rdi 10b745: 74 06 je 10b74d <set_syscall_user_dispatch+0x1d> 10b747: b8 ea ff ff ff mov $0xffffffea,%eax 10b74c: c3 ret 10b74d: 48 85 f6 test %rsi,%rsi 10b750: 75 7b jne 10b7cd <set_syscall_user_dispatch+0x9d> 10b752: 48 85 c9 test %rcx,%rcx 10b755: 74 1a je 10b771 <set_syscall_user_dispatch+0x41> 10b757: 48 89 cf mov %rcx,%rdi 10b75a: 49 b8 ef cd ab 89 67 movabs $0x123456789abcdef,%r8 10b761: 45 23 01 10b764: 90 nop 10b765: 90 nop 10b766: 90 nop 10b767: 90 nop 10b768: 90 nop 10b769: 90 nop 10b76a: 90 nop 10b76b: 90 nop 10b76c: 49 39 f8 cmp %rdi,%r8 10b76f: 72 6e jb 10b7df <set_syscall_user_dispatch+0xaf> 10b771: 48 89 88 48 08 00 00 mov %rcx,0x848(%rax) 10b778: 48 89 b0 50 08 00 00 mov %rsi,0x850(%rax) 10b77f: 48 89 90 58 08 00 00 mov %rdx,0x858(%rax) 10b786: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax) 10b78d: f0 80 48 08 20 lock orb $0x20,0x8(%rax) 10b792: 31 c0 xor %eax,%eax 10b794: c3 ret 10b795: 48 09 d1 or %rdx,%rcx 10b798: 48 09 f1 or %rsi,%rcx 10b79b: 75 aa jne 10b747 <set_syscall_user_dispatch+0x17> 10b79d: 48 c7 80 48 08 00 00 movq $0x0,0x848(%rax) 10b7a4: 00 00 00 00 10b7a8: 48 c7 80 50 08 00 00 movq $0x0,0x850(%rax) 10b7af: 00 00 00 00 10b7b3: 48 c7 80 58 08 00 00 movq $0x0,0x858(%rax) 10b7ba: 00 00 00 00 10b7be: c6 80 60 08 00 00 00 movb $0x0,0x860(%rax) 10b7c5: f0 80 60 08 df lock andb $0xdf,0x8(%rax) 10b7ca: 31 c0 xor %eax,%eax 10b7cc: c3 ret 10b7cd: 48 8d 3c 16 lea (%rsi,%rdx,1),%rdi 10b7d1: 48 39 fe cmp %rdi,%rsi 10b7d4: 0f 82 78 ff ff ff jb 10b752 <set_syscall_user_dispatch+0x22> 10b7da: e9 68 ff ff ff jmp 10b747 <set_syscall_user_dispatch+0x17> 10b7df: b8 f2 ff ff ff mov $0xfffffff2,%eax 10b7e4: c3 ret Please note a series of NOPs that get replaced with an alternative: 11f0: 65 48 23 05 00 00 00 and %gs:0x0(%rip),%rax 11f7: 00 Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250407072129.33440-1-ubizjak@gmail.com
2025-04-09x86/uaccess: Predict valid_user_address() returning trueMateusz Guzik
This works around what seems to be an optimization bug in GCC (at least 13.3.0), where it predicts access_ok() to fail despite the hint to the contrary. _copy_to_user() contains: if (access_ok(to, n)) { instrument_copy_to_user(to, from, n); n = raw_copy_to_user(to, from, n); } Where access_ok() is likely(__access_ok(addr, size)), yet the compiler emits conditional jumps forward for the case where it succeeds: <+0>: endbr64 <+4>: mov %rdx,%rcx <+7>: mov %rdx,%rax <+10>: xor %edx,%edx <+12>: add %rdi,%rcx <+15>: setb %dl <+18>: movabs $0x123456789abcdef,%r8 <+28>: test %rdx,%rdx <+31>: jne 0xffffffff81b3b7c6 <_copy_to_user+38> <+33>: cmp %rcx,%r8 <+36>: jae 0xffffffff81b3b7cb <_copy_to_user+43> <+38>: jmp 0xffffffff822673e0 <__x86_return_thunk> <+43>: nop <+44>: nop <+45>: nop <+46>: mov %rax,%rcx <+49>: rep movsb %ds:(%rsi),%es:(%rdi) <+51>: nop <+52>: nop <+53>: nop <+54>: mov %rcx,%rax <+57>: nop <+58>: nop <+59>: nop <+60>: jmp 0xffffffff822673e0 <__x86_return_thunk> Patching _copy_to_user() to likely() around the access_ok() use does not change the asm. However, spelling out the prediction *within* valid_user_address() does the trick: <+0>: endbr64 <+4>: xor %eax,%eax <+6>: mov %rdx,%rcx <+9>: add %rdi,%rdx <+12>: setb %al <+15>: movabs $0x123456789abcdef,%r8 <+25>: test %rax,%rax <+28>: jne 0xffffffff81b315e6 <_copy_to_user+54> <+30>: cmp %rdx,%r8 <+33>: jb 0xffffffff81b315e6 <_copy_to_user+54> <+35>: nop <+36>: nop <+37>: nop <+38>: rep movsb %ds:(%rsi),%es:(%rdi) <+40>: nop <+41>: nop <+42>: nop <+43>: nop <+44>: nop <+45>: nop <+46>: mov %rcx,%rax <+49>: jmp 0xffffffff82255ba0 <__x86_return_thunk> <+54>: mov %rcx,%rax <+57>: jmp 0xffffffff82255ba0 <__x86_return_thunk> Since we kinda expect valid_user_address() to be likely anyway, add the likely() annotation that also happens to work around this compiler bug. [ mingo: Moved the unlikely() branch into valid_user_address() & updated the changelog ] Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250401203029.1132135-1-mjguzik@gmail.com
2025-01-20x86: use cmov for user address maskingLinus Torvalds
This was a suggestion by David Laight, and while I was slightly worried that some micro-architecture would predict cmov like a conditional branch, there is little reason to actually believe any core would be that broken. Intel documents that their existing cores treat CMOVcc as a data dependency that will constrain speculation in their "Speculative Execution Side Channel Mitigations" whitepaper: "Other instructions such as CMOVcc, AND, ADC, SBB and SETcc can also be used to prevent bounds check bypass by constraining speculative execution on current family 6 processors (Intel® Core™, Intel® Atom™, Intel® Xeon® and Intel® Xeon Phi™ processors)" and while that leaves the future uarch issues open, that's certainly true of our traditional SBB usage too. Any core that predicts CMOV will be unusable for various crypto algorithms that need data-independent timing stability, so let's just treat CMOV as the safe choice that simplifies the address masking by avoiding an extra instruction and doesn't need a temporary register. Suggested-by: David Laight <David.Laight@aculab.com> Link: https://www.intel.com/content/dam/develop/external/us/en/documents/336996-speculative-execution-side-channel-mitigations.pdf Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-25x86: fix user address masking non-canonical speculation issueLinus Torvalds
It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical accesses in kernel space. And so using just the high bit to decide whether an access is in user space or kernel space ends up with the good old "leak speculative data" if you have the right gadget using the result: CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“ Now, the kernel surrounds the access with a STAC/CLAC pair, and those instructions end up serializing execution on older Zen architectures, which closes the speculation window. But that was true only up until Zen 5, which renames the AC bit [1]. That improves performance of STAC/CLAC a lot, but also means that the speculation window is now open. Note that this affects not just the new address masking, but also the regular valid_user_address() check used by access_ok(), and the asm version of the sign bit check in the get_user() helpers. It does not affect put_user() or clear_user() variants, since there's no speculative result to be used in a gadget for those operations. Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Link: https://lore.kernel.org/all/80d94591-1297-4afb-b510-c665efd37f10@citrix.com/ Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/ [1] Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html Link: https://arxiv.org/pdf/2108.10771 Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> # LAM case Fixes: 2865baf54077 ("x86: support user address masking instead of non-speculative conditional") Fixes: 6014bc27561f ("x86-64: make access_ok() independent of LAM") Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-22x86: make the masked_user_access_begin() macro use its argument only onceLinus Torvalds
This doesn't actually matter for any of the current users, but before merging it mainline, make sure we don't have any surprising semantics. We don't actually want to use an inline function here, because we want to allow - but not require - const pointer arguments, and return them as such. But we already had a local auto-type variable, so let's just use it to avoid any possible double evaluation. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-19x86: do the user address masking outside the user access areaLinus Torvalds
In any normal situation this really shouldn't matter, but in case the address passed in to masked_user_access_begin() were to be some complex expression, we should evaluate it fully before doing the 'stac' instruction. And even without that issue (which objdump would pick up on for any really bad case), just in general we should strive to minimize the amount of code we run with user accesses enabled. For example, even for the trivial pselect6() case, the code generation (obviously with a non-debug build) just diff with this ends up being - stac mov %rax,%rcx sar $0x3f,%rcx or %rax,%rcx + stac mov (%rcx),%r13 mov 0x8(%rcx),%r14 clac so the area delimeted by the 'stac / clac' pair is now literally just the two user access instructions, and the address generation has been moved out to before that code. This will be much more noticeable if we end up deciding that we can go back to just inlining "get_user()" using the new masked user access model. The get_user() pointers can often be more complex expressions involving kernel memory accesses or even function calls. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-19x86: support user address masking instead of non-speculative conditionalLinus Torvalds
The Spectre-v1 mitigations made "access_ok()" much more expensive, since it has to serialize execution with the test for a valid user address. All the normal user copy routines avoid this by just masking the user address with a data-dependent mask instead, but the fast "unsafe_user_read()" kind of patterms that were supposed to be a fast case got slowed down. This introduces a notion of using src = masked_user_access_begin(src); to do the user address sanity using a data-dependent mask instead of the more traditional conditional if (user_read_access_begin(src, len)) { model. This model only works for dense accesses that start at 'src' and on architectures that have a guard region that is guaranteed to fault in between the user space and the kernel space area. With this, the user access doesn't need to be manually checked, because a bad address is guaranteed to fault (by some architecture masking trick: on x86-64 this involves just turning an invalid user address into all ones, since we don't map the top of address space). This only converts a couple of examples for now. Example x86-64 code generation for loading two words from user space: stac mov %rax,%rcx sar $0x3f,%rcx or %rax,%rcx mov (%rcx),%r13 mov 0x8(%rcx),%r14 clac where all the error handling and -EFAULT is now purely handled out of line by the exception path. Of course, if the micro-architecture does badly at 'clac' and 'stac', the above is still pitifully slow. But at least we did as well as we could. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-11Merge tag 'x86-core-2024-03-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: - The biggest change is the rework of the percpu code, to support the 'Named Address Spaces' GCC feature, by Uros Bizjak: - This allows C code to access GS and FS segment relative memory via variables declared with such attributes, which allows the compiler to better optimize those accesses than the previous inline assembly code. - The series also includes a number of micro-optimizations for various percpu access methods, plus a number of cleanups of %gs accesses in assembly code. - These changes have been exposed to linux-next testing for the last ~5 months, with no known regressions in this area. - Fix/clean up __switch_to()'s broken but accidentally working handling of FPU switching - which also generates better code - Propagate more RIP-relative addressing in assembly code, to generate slightly better code - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to make it easier for distros to follow & maintain these options - Rework the x86 idle code to cure RCU violations and to clean up the logic - Clean up the vDSO Makefile logic - Misc cleanups and fixes * tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/idle: Select idle routine only once x86/idle: Let prefer_mwait_c1_over_halt() return bool x86/idle: Cleanup idle_setup() x86/idle: Clean up idle selection x86/idle: Sanitize X86_BUG_AMD_E400 handling sched/idle: Conditionally handle tick broadcast in default_idle_call() x86: Increase brk randomness entropy for 64-bit systems x86/vdso: Move vDSO to mmap region x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o x86/retpoline: Ensure default return thunk isn't used at runtime x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32 x86/vdso: Use $(addprefix ) instead of $(foreach ) x86/vdso: Simplify obj-y addition x86/vdso: Consolidate targets and clean-files x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS ...
2024-03-04x86/uaccess: Add missing __force to casts in __access_ok() and ↵Thomas Gleixner
valid_user_address() Sparse complains about losing the __user address space due to the cast to long: uaccess_64.h:88:24: sparse: warning: cast removes address space '__user' of expression Annotate it with __force to tell sparse that this is intentional. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240304005104.677606054@linutronix.de
2024-01-09x86/percpu: Use %RIP-relative address in untagged_addr()Uros Bizjak
%RIP-relative addresses are nowadays correctly handled in alternative instructions, so remove misleading comment and improve assembly to use %RIP-relative address. Also, explicitly using %gs: prefix will segfault for non-SMP builds. Use macros from percpu.h which will DTRT with segment prefix register as far as SMP/non-SMP builds are concerned. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradaed.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/20231213150357.5942-1-ubizjak%40gmail.com
2023-08-30x86: bring back rep movsq for user access on CPUs without ERMSMateusz Guzik
Intel CPUs ship with ERMS for over a decade, but this is not true for AMD. In particular one reasonably recent uarch (EPYC 7R13) does not have it (or at least the bit is inactive when running on the Amazon EC2 cloud -- I found rather conflicting information about AMD CPUs vs the extension). Hand-rolled mov loops executing in this case are quite pessimal compared to rep movsq for bigger sizes. While the upper limit depends on uarch, everyone is well south of 1KB AFAICS and sizes bigger than that are common. While technically ancient CPUs may be suffering from rep usage, gcc has been emitting it for years all over kernel code, so I don't think this is a legitimate concern. Sample result from read1_processes from will-it-scale (4KB reads/s): before: 1507021 after: 1721828 (+14%) Note that the cutoff point for rep usage is set to 64 bytes, which is way too conservative but I'm sticking to what was done in 47ee3f1dd93b ("x86: re-introduce support for ERMS copies for user space accesses"). That is to say *some* copies will now go slower, which is fixable but beyond the scope of this patch. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86-64: mm: clarify the 'positive addresses' user address rulesLinus Torvalds
Dave Hansen found the "(long) addr >= 0" code in the x86-64 access_ok checks somewhat confusing, and suggested using a helper to clarify what the code is doing. So this does exactly that: clarifying what the sign bit check is all about, by adding a helper macro that makes it clear what it is testing. This also adds some explicit comments talking about how even with LAM enabled, any addresses with the sign bit will still GP-fault in the non-canonical region just above the sign bit. This is all what allows us to do the user address checks with just the sign bit, and furthermore be a bit cavalier about accesses that might be done with an additional offset even past that point. (And yes, this talks about 'positive' even though zero is also a valid user address and so technically we should call them 'non-negative'. But I don't think using 'non-negative' ends up being more understandable). Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: mm: remove 'sign' games from LAM untagged_addr*() macrosLinus Torvalds
The intent of the sign games was to not modify kernel addresses when untagging them. However, that had two issues: (a) it didn't actually work as intended, since the mask was calculated as 'addr >> 63' on an _unsigned_ address. So instead of getting a mask of all ones for kernel addresses, you just got '1'. (b) untagging a kernel address isn't actually a valid operation anyway. Now, (a) had originally been true for both 'untagged_addr()' and the remote version of it, but had accidentally been fixed for the regular version of untagged_addr() by commit e0bddc19ba95 ("x86/mm: Reduce untagged_addr() overhead for systems without LAM"). That one rewrote the shift to be part of the alternative asm code, and in the process changed the unsigned shift into a signed 'sar' instruction. And while it is true that we don't want to turn what looks like a kernel address into a user address by masking off the high bit, that doesn't need these sign masking games - all it needs is that the mm context 'untag_mask' value has the high bit set. Which it always does. So simplify the code by just removing the superfluous (and in the case of untagged_addr_remote(), still buggy) sign bit games in the address masking. Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-03x86: uaccess: move 32-bit and 64-bit parts into proper <asm/uaccess_N.h> headerLinus Torvalds
The x86 <asm/uaccess.h> file has grown features that are specific to x86-64 like LAM support and the related access_ok() changes. They really should be in the <asm/uaccess_64.h> file and not pollute the generic x86 header. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-28Merge tag 'x86_cleanups_for_v6.4_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Unify duplicated __pa() and __va() definitions - Simplify sysctl tables registration - Remove unused symbols - Correct function name in comment * tag 'x86_cleanups_for_v6.4_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Centralize __pa()/__va() definitions x86: Simplify one-level sysctl registration for itmt_kern_table x86: Simplify one-level sysctl registration for abi_table2 x86/platform/intel-mid: Remove unused definitions from intel-mid.h x86/uaccess: Remove memcpy_page_flushcache() x86/entry: Change stale function name in comment to error_return()
2023-04-19x86: remove 'zerorest' argument from __copy_user_nocache()Linus Torvalds
Every caller passes in zero, meaning they don't want any partial copy to zero the remainder of the destination buffer. Which is just as well, because the implementation of that function didn't actually even look at that argument, and wasn't even aware it existed, although some misleading comments did mention it still. The 'zerorest' thing is a historical artifact of how "copy_from_user()" worked, in that it would zero the rest of the kernel buffer that it copied into. That zeroing still exists, but it's long since been moved to generic code, and the raw architecture-specific code doesn't do it. See _copy_from_user() in lib/usercopy.c for this all. However, while __copy_user_nocache() shares some history and superficial other similarities with copy_from_user(), it is in many ways also very different. In particular, while the code makes it *look* similar to the generic user copy functions that can copy both to and from user space, and take faults on both reads and writes as a result, __copy_user_nocache() does no such thing at all. __copy_user_nocache() always copies to kernel space, and will never take a page fault on the destination. What *can* happen, though, is that the non-temporal stores take a machine check because one of the use cases is for writing to stable memory, and any memory errors would then take synchronous faults. So __copy_user_nocache() does look a lot like copy_from_user(), but has faulting behavior that is more akin to our old copy_in_user() (which no longer exists, but copied from user space to user space and could fault on both source and destination). And it very much does not have the "zero the end of the destination buffer", since a problem with the destination buffer is very possibly the very source of the partial copy. So this whole thing was just a confusing historical artifact from having shared some code with a completely different function with completely different use cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: improve on the non-rep 'copy_user' functionLinus Torvalds
The old 'copy_user_generic_unrolled' function was oddly implemented for largely historical reasons: it had been largely based on the uncached copy case, which has some other concerns. For example, the __copy_user_nocache() function uses 'movnti' for the destination stores, and those want the destination to be aligned. In contrast, the regular copy function doesn't really care, and trying to align things only complicates matters. Also, like the clear_user function, the copy function had some odd handling of the repeat counts, complicating the exception handling for no really good reason. So as with clear_user, just write it to keep all the byte counts in the %rcx register, exactly like the 'rep movs' functionality that this replaces. Unlike a real 'rep movs', we do allow for this to trash a few temporary registers to not have to unnecessarily save/restore registers on the stack. And like the clearing case, rename this to what it now clearly is: 'rep_movs_alternative', and make it one coherent function, so that it shows up as such in profiles (instead of the odd split between "copy_user_generic_unrolled" and "copy_user_short_string", the latter of which was not about strings at all, and which was shared with the uncached case). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: improve on the non-rep 'clear_user' functionLinus Torvalds
The old version was oddly written to have the repeat count in multiple registers. So instead of taking advantage of %rax being zero, it had some sub-counts in it. All just for a "single word clearing" loop, which isn't even efficient to begin with. So get rid of those games, and just keep all the state in the same registers we got it in (and that we should return things in). That not only makes this act much more like 'rep stos' (which this function is replacing), but makes it much easier to actually do the obvious loop unrolling. Also rename the function from the now nonsensical 'clear_user_original' to what it now clearly is: 'rep_stos_alternative'. End result: if we don't have a fast 'rep stosb', at least we can have a fast fallback for it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: inline the 'rep movs' in user copies for the FSRM caseLinus Torvalds
This does the same thing for the user copies as commit 0db7058e8e23 ("x86/clear_user: Make it faster") did for clear_user(). In other words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set, avoiding the function call entirely. In order to do that, it makes the calling convention for the out-of-line case ("copy_user_generic_unrolled") match the 'rep movs' calling convention, although it does also end up clobbering a number of additional registers. Also, to simplify code sharing in the low-level assembly with the __copy_user_nocache() function (that uses the normal C calling convention), we end up with a kind of mixed return value for the low-level asm code: it will return the result in both %rcx (to work as an alternative for the 'rep movs' case), _and_ in %rax (for the nocache case). We could avoid this by wrapping __copy_user_nocache() callers in an inline asm, but since the cost is just an extra register copy, it's probably not worth it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: move stac/clac from user copy routines into callersLinus Torvalds
This is preparatory work for inlining the 'rep movs' case, but also a cleanup. The __copy_user_nocache() function was mis-used by the rdma code to do uncached kernel copies that don't actually want user copies at all, and as a result doesn't want the stac/clac either. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: don't use REP_GOOD or ERMS for user memory clearingLinus Torvalds
The modern target to use is FSRS (Fast Short REP STOS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Note! This changes the conditional for the inlining from FSRM ("fast short rep movs") to FSRS ("fast short rep stos"). We'll have a separate fixup for AMD microarchitectures that have a good 'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM was there first). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-18x86: don't use REP_GOOD or ERMS for user memory copiesLinus Torvalds
The modern target to use is FSRM (Fast Short REP MOVS), and the other cases should only be used for bigger areas (ie mainly things like page clearing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-16x86/uaccess: Remove memcpy_page_flushcache()Ira Weiny
Commit 21b56c847753 ("iov_iter: get rid of separate bvec and xarray callbacks") removed the calls to memcpy_page_flushcache(). In addition, memcpy_page_flushcache() uses the deprecated kmap_atomic(). Remove the unused x86 memcpy_page_flushcache() implementation and also get rid of one more kmap_atomic() user. [ dhansen: tweak changelog ] Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221230-kmap-x86-v1-1-15f1ecccab50%40intel.com
2022-08-18x86/clear_user: Make it fasterBorislav Petkov
Based on a patch by Mark Hemment <markhemm@googlemail.com> and incorporating very sane suggestions from Linus. The point here is to have the default case with FSRM - which is supposed to be the majority of x86 hw out there - if not now then soon - be directly inlined into the instruction stream so that no function call overhead is taking place. Drop the early clobbers from the @size and @addr operands as those are not needed anymore since we have single instruction alternatives. The benchmarks I ran would show very small improvements and a PF benchmark would even show weird things like slowdowns with higher core counts. So for a ~6m running the git test suite, the function gets called under 700K times, all from padzero(): <...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900 <...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160 <...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850 <...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900 ... which is around 1%-ish of the total time and which is consistent with the benchmark numbers. So Mel gave me the idea to simply measure how fast the function becomes. I.e.: start = rdtsc_ordered(); ret = __clear_user(to, n); end = rdtsc_ordered(); Computing the mean average of all the samples collected during the test suite run then shows some improvement: clear_user_original: Amean: 9219.71 (Sum: 6340154910, samples: 687674) fsrm: Amean: 8030.63 (Sum: 5522277720, samples: 687652) That's on Zen3. The situation looks a lot more confusing on Intel: Icelake: clear_user_original: Amean: 19679.4 (Sum: 13652560764, samples: 693750) Amean: 19743.7 (Sum: 13693470604, samples: 693562) (I ran it twice just to be sure.) ERMS: Amean: 20374.3 (Sum: 13910601024, samples: 682752) Amean: 20453.7 (Sum: 14186223606, samples: 693576) FSRM: Amean: 20458.2 (Sum: 13918381386, sample s: 680331) The original microbenchmark which people were complaining about: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=65536; done 2>&1 | grep copied 32207011840 bytes (32 GB, 30 GiB) copied, 1 s, 32.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.93069 s, 35.6 GB/s 37597741056 bytes (38 GB, 35 GiB) copied, 1 s, 37.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.78017 s, 38.6 GB/s 62020124672 bytes (62 GB, 58 GiB) copied, 2 s, 31.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 2.13716 s, 32.2 GB/s 60010004480 bytes (60 GB, 56 GiB) copied, 1 s, 60.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.14129 s, 60.2 GB/s 53212086272 bytes (53 GB, 50 GiB) copied, 1 s, 53.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.28398 s, 53.5 GB/s 55698259968 bytes (56 GB, 52 GiB) copied, 1 s, 55.7 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.22507 s, 56.1 GB/s 55306092544 bytes (55 GB, 52 GiB) copied, 1 s, 55.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.23647 s, 55.6 GB/s 54387539968 bytes (54 GB, 51 GiB) copied, 1 s, 54.4 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.25693 s, 54.7 GB/s 50566529024 bytes (51 GB, 47 GiB) copied, 1 s, 50.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.35096 s, 50.9 GB/s 58308165632 bytes (58 GB, 54 GiB) copied, 1 s, 58.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.17394 s, 58.5 GB/s Now the same thing with smaller buffers: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=8192; done 2>&1 | grep copied 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28485 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276112 s, 31.1 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.29136 s, 29.5 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.283803 s, 30.3 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.306503 s, 28.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.349169 s, 24.6 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276912 s, 31.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.265356 s, 32.4 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28464 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.242998 s, 35.3 GB/s is also not conclusive because it all depends on the buffer sizes, their alignments and when the microcode detects that cachelines can be aggregated properly and copied in bigger sizes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/CAHk-=wh=Mu_EYhtOmPn6AxoQZyEh-4fo2Zx3G7rBv1g7vwoKiw@mail.gmail.com
2021-09-08arch: remove compat_alloc_user_spaceArnd Bergmann
All users of compat_alloc_user_space() and copy_in_user() have been removed from the kernel, only a few functions in sparc remain that can be changed to calling arch_copy_in_user() instead. Link: https://lkml.kernel.org/r/20210727144859.4150043-7-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-06x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()Dan Williams
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
2020-03-18x86: get rid of small constant size cases in raw_copy_{to,from}_user()Al Viro
Very few call sites where that would be triggered remain, and none of those is anywhere near hot enough to bother. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-03x86/uaccess: Move copy_user_handle_tail() into asmPeter Zijlstra
By writing the function in asm we avoid cross object code flow and objtool no longer gets confused about a 'stray' CLAC. Also; the asm version is actually _simpler_. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-16x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handlingDan Williams
All copy_to_user() implementations need to be prepared to handle faults accessing userspace. The __memcpy_mcsafe() implementation handles both mmu-faults on the user destination and machine-check-exceptions on the source buffer. However, the memcpy_mcsafe() wrapper may silently fallback to memcpy() depending on build options and cpu-capabilities. Force copy_to_user_mcsafe() to always use __memcpy_mcsafe() when available, and otherwise disable all of the copy_to_user_mcsafe() infrastructure when __memcpy_mcsafe() is not available, i.e. CONFIG_X86_MCE=n. This fixes crashes of the form: run fstests generic/323 at 2018-07-02 12:46:23 BUG: unable to handle kernel paging request at 00007f0d50001000 RIP: 0010:__memcpy+0x12/0x20 [..] Call Trace: copyout_mcsafe+0x3a/0x50 _copy_to_iter_mcsafe+0xa1/0x4a0 ? dax_alive+0x30/0x50 dax_iomap_actor+0x1f9/0x280 ? dax_iomap_rw+0x100/0x100 iomap_apply+0xba/0x130 ? dax_iomap_rw+0x100/0x100 dax_iomap_rw+0x95/0x100 ? dax_iomap_rw+0x100/0x100 xfs_file_dax_read+0x7b/0x1d0 [xfs] xfs_file_read_iter+0xa7/0xc0 [xfs] aio_read+0x11c/0x1a0 Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()") Link: http://lkml.kernel.org/r/153108277790.37979.1486841789275803399.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()Dan Williams
Use the updated memcpy_mcsafe() implementation to define copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant difference from typical copy_to_iter() is that the ITER_KVEC and ITER_BVEC iterator types can fail to complete a full transfer. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15x86/asm/memcpy_mcsafe: Add write-protection-fault handlingDan Williams
In preparation for using memcpy_mcsafe() to handle user copies it needs to be to handle write-protection faults while writing user pages. Add MMU-fault handlers alongside the machine-check exception handlers. Note that the machine check fault exception handling makes assumptions about source buffer alignment and poison alignment. In the write fault case, given the destination buffer is arbitrarily aligned, it needs a separate / additional fault handling approach. The mcsafe_handle_tail() helper is reused. The @limit argument is set to @len since there is no safety concern about retriggering an MMU fault, and this simplifies the assembly. Co-developed-by: Tony Luck <tony.luck@intel.com> Reported-by: Mika Penttilä <mika.penttila@nextfour.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539238635.31796.14056325365122961778.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-30x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospecDan Williams
Quoting Linus: I do think that it would be a good idea to very expressly document the fact that it's not that the user access itself is unsafe. I do agree that things like "get_user()" want to be protected, but not because of any direct bugs or problems with get_user() and friends, but simply because get_user() is an excellent source of a pointer that is obviously controlled from a potentially attacking user space. So it's a prime candidate for then finding _subsequent_ accesses that can then be used to perturb the cache. __uaccess_begin_nospec() covers __get_user() and copy_from_iter() where the limit check is far away from the user pointer de-reference. In those cases a barrier_nospec() prevents speculation with a potential pointer to privileged memory. uaccess_try_nospec covers get_user_try. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: kernel-hardening@lists.openwall.com Cc: gregkh@linuxfoundation.org Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727416953.33451.10508284228526170604.stgit@dwillia2-desk3.amr.corp.intel.com
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass ↵Dan Williams
operations The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-03-29x86: switch to RAW_COPY_USERAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-29x86: don't wank with magical size in __copy_in_user()Al Viro
... especially since copy_in_user() doesn't Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28kill __copy_from_user_nocache()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28amd64: get rid of zeroingAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-05uaccess: drop duplicate includes from asm/uaccess.hAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-07-26x86/uaccess: Enable hardened usercopyKees Cook
Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in copy_*_user() and __copy_*_user() because copy_*_user() actually calls down to _copy_*_user() and not __copy_*_user(). Based on code from PaX and grsecurity. Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
2016-05-20x86/kasan: instrument user memory access APIAndrey Ryabinin
Exchange between user and kernel memory is coded in assembly language. Which means that such accesses won't be spotted by KASAN as a compiler instruments only C code. Add explicit KASAN checks to user memory access API to ensure that userspace writes to (or reads from) a valid kernel memory. Note: Unlike others strncpy_from_user() is written mostly in C and KASAN sees memory accesses in it. However, it makes sense to add explicit check for all @count bytes that *potentially* could be written to the kernel. [aryabinin@virtuozzo.com: move kasan check under the condition] Link: http://lkml.kernel.org/r/1462869209-21096-1-git-send-email-aryabinin@virtuozzo.com Link: http://lkml.kernel.org/r/1462538722-1574-4-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-17x86: reorganize SMAP handling in user space accessesLinus Torvalds
This reorganizes how we do the stac/clac instructions in the user access code. Instead of adding the instructions directly to the same inline asm that does the actual user level access and exception handling, add them at a higher level. This is mainly preparation for the next step, where we will expose an interface to allow users to mark several accesses together as being user space accesses, but it does already clean up some code: - the inlined trivial cases of copy_in_user() now do stac/clac just once over the accesses: they used to do one pair around the user space read, and another pair around the write-back. - the {get,put}_user_ex() macros that are used with the catch/try handling don't do any stac/clac at all, because that happens in the try/catch surrounding them. Other than those two cleanups that happened naturally from the re-organization, this should not make any difference. Yet. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-08x86: clean up/fix 'copy_in_user()' tail zeroingLinus Torvalds
The rule for 'copy_from_user()' is that it zeroes the remaining kernel buffer even when the copy fails halfway, just to make sure that we don't leave uninitialized kernel memory around. Because even if we check for errors, some kernel buffers stay around after thge copy (think page cache). However, the x86-64 logic for user copies uses a copy_user_generic() function for all the cases, that set the "zerorest" flag for any fault on the source buffer. Which meant that it didn't just try to clear the kernel buffer after a failure in copy_from_user(), it also tried to clear the destination user buffer for the "copy_in_user()" case. Not only is that pointless, it also means that the clearing code has to worry about the tail clearing taking page faults for the user buffer case. Which is just stupid, since that case shouldn't happen in the first place. Get rid of the whole "zerorest" thing entirely, and instead just check if the destination is in kernel space or not. And then just use memset() to clear the tail of the kernel buffer if necessary. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-04x86, sparse: Do not force removal of __user when calling ↵Steven Rostedt
copy_to/from_user_nocheck() Commit ff47ab4ff3cdd "x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic" added a "_nocheck" call in between the copy_to/from_user() and copy_user_generic(). As both the normal and nocheck versions of theses calls use the proper __user annotation, a typecast to remove it should not be added. This causes sparse to spin out the following warnings: arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident> arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident> arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident> arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uaccess_64.h:207:47: expected void const [noderef] <asn:1>*src arch/x86/include/asm/uaccess_64.h:207:47: got void const *<noident> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140103164500.5f6478f5@gandalf.local.home Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-11-12Merge branch 'x86-uaccess-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 uaccess changes from Ingo Molnar: "A single change that micro-optimizes __copy_*_user_inatomic(), used by the futex code" * 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
2013-10-26x86: Unify copy_to_user() and add size checking to itJan Beulich
Similarly to copy_from_user(), where the range check is to protect against kernel memory corruption, copy_to_user() can benefit from such checking too: Here it protects against kernel information leaks. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5265059502000078000FC4F6@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com>
2013-10-26x86: Unify copy_from_user() size checkingJan Beulich
Commits 4a3127693001c61a21d1ce680db6340623f52e93 ("x86: Turn the copy_from_user check into an (optional) compile time warning") and 63312b6a6faae3f2e5577f2b001e3b504f10a2aa ("x86: Add a Kconfig option to turn the copy_from_user warnings into errors") touched only the 32-bit variant of copy_from_user(), whereas the original commit 9f0cf4adb6aa0bfccf675c938124e68f7f06349d ("x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()") also added the same code to the 64-bit one. Further the earlier conversion from an inline WARN() to the call to copy_from_user_overflow() went a little too far: When the number of bytes to be copied is not a constant (e.g. [looking at 3.11] in drivers/net/tun.c:__tun_chr_ioctl() or drivers/pci/pcie/aer/aer_inject.c:aer_inject_write()), the compiler will always have to keep the funtion call, and hence there will always be a warning. By using __builtin_constant_p() we can avoid this. And then this slightly extends the effect of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS in that apart from converting warnings to errors in the constant size case, it retains the (possibly wrong) warnings in the non-constant size case, such that if someone is prepared to get a few false positives, (s)he'll be able to recover the current behavior (except that these diagnostics now will never be converted to errors). Since the 32-bit variant (intentionally) didn't call might_fault(), the unification results in this being called twice now. Adding a suitable #ifdef would be the alternative if that's a problem. I'd like to point out though that with __compiletime_object_size() being restricted to gcc before 4.6, the whole construct is going to become more and more pointless going forward. I would question however that commit 2fb0815c9ee6b9ac50e15dd8360ec76d9fa46a2 ("gcc4: disable __compiletime_object_size for GCC 4.6+") was really necessary, and instead this should have been dealt with as is done here from the beginning. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5265056D02000078000FC4F3@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-10x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomicAndi Kleen
The 64bit __copy_{from,to}_user_inatomic always called copy_from_user_generic, but skipped the special optimizations for 1/2/4/8 byte accesses. This especially hurts the futex call, which accesses the 4 byte futex user value with a complicated fast string operation in a function call, instead of a single movl. Use __copy_{from,to}_user for _inatomic instead to get the same optimizations. The only problem was the might_fault() in those functions. So move that to new wrapper and call __copy_{f,t}_user_nocheck() from *_inatomic directly. 32bit already did this correctly by duplicating the code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1376687844-19857-2-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-28x86: uaccess s/might_sleep/might_fault/Michael S. Tsirkin
The only reason uaccess routines might sleep is if they fault. Make this explicit. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1369577426-26721-9-git-send-email-mst@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>