summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/arch_hweight.h
AgeCommit message (Collapse)Author
2016-06-08x86/hweight: Get rid of the special calling conventionBorislav Petkov
People complained about ARCH_HWEIGHT_CFLAGS and how it throws a wrench into kcov, lto, etc, experimentations. Add asm versions for __sw_hweight{32,64}() and do explicit saving and restoring of clobbered registers. This gets rid of the special calling convention. We get to call those functions on !X86_FEATURE_POPCNT CPUs. We still need to hardcode POPCNT and register operands as some old gas versions which we support, do not know about POPCNT. Btw, remove redundant REX prefix from 32-bit POPCNT because alternatives can do padding now. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1464605787-20603-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30x86/cpufeature: Carve out X86_FEATURE_*Borislav Petkov
Move them to a separate header and have the following dependency: x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h This makes it easier to use the header in asm code and not include the whole cpufeature.h and add guards for asm. Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-05x86/hweight: Force inlining of __arch_hweight{32,64}()Denys Vlasenko
With this config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os gcc-4.7.2 generates many copies of these tiny functions: __arch_hweight32 (35 copies): 55 push %rbp e8 66 9b 4a 00 callq __sw_hweight32 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq __arch_hweight64 (8 copies): 55 push %rbp e8 5e c2 8a 00 callq __sw_hweight64 48 89 e5 mov %rsp,%rbp 5d pop %rbp c3 retq See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 This patch fixes this via s/inline/__always_inline/ To avoid touching 32-bit case where such change was not tested to be a win, reformat __arch_hweight64() to have completely disjoint 64-bit and 32-bit implementations. IOW: made #ifdef / 32 bits and 64 bits instead of having #ifdef / #else / #endif inside a single function body. Only 64-bit __arch_hweight64() is __always_inline'd. text data bss dec filename 86971120 17195912 36659200 140826232 vmlinux.before 86970954 17195912 36659200 140826066 vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Graf <tgraf@suug.ch> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1438697716-28121-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2010-05-17x86, hweight: Use a 32-bit popcnt for __arch_hweight32()H. Peter Anvin
Use a 32-bit popcnt instruction for __arch_hweight32(), even on x86-64. Even though the input register will *usually* be zero-extended due to the standard operation of the hardware, it isn't necessarily so if the input value was the result of truncating a 64-bit operation. Note: the POPCNT32 variant used on x86-64 has a technically unnecessary REX prefix to make it five bytes long, the same as a CALL instruction, therefore avoiding an unnecessary NOP. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <alpine.LFD.2.00.1005171443060.4195@i5.linux-foundation.org>
2010-04-06x86: Add optimized popcnt variantsBorislav Petkov
Add support for the hardware version of the Hamming weight function, popcnt, present in CPUs which advertize it under CPUID, Function 0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the default lib/hweight.c sw versions. A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost a 3x speedup on a F10h machine. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100318112015.GC11152@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>