hurd/glibc.git - glibc maintenance

Age	Commit message (Collapse)	Author
2018-07-19	benchtests: improve argument parsing through argparse library	Leonardo Sandoval
	The argparse library is used on compare_bench script to improve command line argument parsing. The 'schema validation file' is now optional, reducing by one the number of required parameters. * benchtests/scripts/compare_bench.py (__main__): use the argparse library to improve command line parsing. (__main__): make schema file as optional parameter (--schema), defaulting to benchtests/scripts/benchout.schema.json. (main): move out of the parsing stuff to __main_ and leave it only as caller of main comparison functions.
2018-07-16	Improve strstr performance	Wilco Dijkstra
	Improve strstr performance. Strstr tends to be slow because it uses many calls to memchr and a slow byte loop to scan for the next match. Performance is significantly improved by using strnlen on larger blocks and using strchr to search for the next matching character. strcasestr can also use strnlen to scan ahead, and memmem can use memchr to check for the next match. On the GLIBC bench tests the performance gains on Cortex-A72 are: strstr: +25% strcasestr: +4.3% memmem: +18% On a 256KB dataset strstr performance improves by 67%, strcasestr by 47%. Reviewd-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2018-06-12	benchtests: Add -f/--functions argument	H.J. Lu
	On x86-64, there may be multiple IFUNC implementations for a given function. But we may be only interested in a subset of them. This patch adds -f/--functions argument to compare a subset of IFUNC implementations. * benchtests/scripts/compare_strings.py (process_results): Add funcs argument. Compare only functions which are selected. (main): Check if base function is among selected functions. Pass selected functions to process_results. (__main__): Add -f/--functions argument.
2018-06-01	benchtests: Catch exceptions in input arguments	Leonardo Sandoval
	Catch runtime exceptions in case the user provided: wrong base function, attribute(s) or input file. In any of the latter, quit immediately with non-zero return code. * benchtests/scripts/compare_string.py: (process_results) Catch exception in non-existent base_func and catch exception in non-existent attribute. (parse_file) Catch exception in non-existent input file.
2018-06-01	benchtests: Add --no-diff and --no-header options	Leonardo Sandoval
	Having a string comparison report with neither diff numbers nor header yields a more useful output to be consumed by other tools. * benchtests/scripts/compare_string.py: Add --no-diff and --no-header options to avoid diff calculation and omit header, respectively. (main): process --no-diff and --no-header
2018-05-07	benchtests: Move iterator declaration into loop header	Siddhesh Poyarekar
	This is a minor style change to move the definition of I to its usage scope instead of at the top of the function. This is consistent with glibc style guidelines and more importantly it was getting in the way of my testing. * benchtests/bench-memcpy-walk.c (do_test): Move declaration of I into loop header. * benchtests/bench-memmove-walk.c (do_test): Likewise.
2018-03-19	Undefine attribute_hidden to fix benchtests	Wilco Dijkstra
	Add an undefine of attribute_hidden since it may be defined in some cases (it must be defined since it is used by some hp-timing configurations). * benchtests/bench-timing.h (attribute_hidden): Undefine.
2018-03-15	Use correct includes in benchtests	Wilco Dijkstra
	Currently the benchtests are run with internal GLIBC headers, which is incorrect. Defining _ISOMAC in the makefile ensures the internal headers are bypassed. Fix all tests which were relying on internal defines or includes. * benchtests/Makefile: Define _ISOMAC. * benchtests/bench-strcoll.c: Add missing sys/stat.h include. * benchtests/bench-string.h: Define inhibit_loop_to_libcall macro. * benchtests/bench-strstr.c: Define empty libc_hidden_builtin_def. * benchtests/bench-strtok.c (oldstrtok): Use rawmemchr. * benchtests/bench-timing.h: Define attribute_hidden.
2018-03-06	benchtests: Don't benchmark 0 length calls for strncmp	Siddhesh Poyarekar
	The 0 length strncmp is interesting for correctness but not for performance. * benchtests/bench-strncmp.c (test_main): Remove 0 length tests. (do_test_limit): Likewise.
2018-03-06	benchtests: Reallocate buffers for every strncmp implementation	Siddhesh Poyarekar
	Don't reuse buffers for different strncmp implementations since the earlier implementation will end up warming the cache for the later one. Eventually there should be a more elegant way to do this. * benchtests/bench-strncmp.c (do_test_limit): Reallocate buffers for every implementation. (do_test): Likewise.
2018-03-06	benchtests: Convert strncmp benchmark output to json	Siddhesh Poyarekar
	Make the output usable through the compare_strings.py script. * benchtests/bench-strncmp.c: Convert output to json.
2018-02-12	Remove slow paths from pow	Wilco Dijkstra
	Remove the slow paths from pow. Like several other double precision math functions, pow is exactly rounded. This is not required from math functions and causes major overheads as it requires multiple fallbacks using higher precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns of up to 100000x have been reported when the highest precision path triggers. All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1). The worst case error is ~0.506ULP. A simple test over a few hundred million values shows pow is 10% faster on average. This fixes BZ #13932. [BZ #13932] * sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove. * benchtests/pow-inputs: Update comment for slow path cases. * manual/probes.texi (slowpow_p10): Delete removed probe. (slowpow_p10): Likewise. * math/Makefile: Remove halfulp.c and slowpow.c. * sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1. * sysdeps/generic/math_private.h (__exp1): Remove error argument. (__halfulp): Remove. (__slowpow): Remove. * sysdeps/i386/fpu/halfulp.c: Delete file. * sysdeps/i386/fpu/slowpow.c: Likewise. * sysdeps/ia64/fpu/halfulp.c: Likewise. * sysdeps/ia64/fpu/slowpow.c: Likewise. * sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument, improve comments and add error analysis. * sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis. (power1): Remove function: (log1): Remove error argument, add error analysis. (my_log2): Remove function. * sysdeps/ieee754/dbl-64/halfulp.c: Delete file. * sysdeps/ieee754/dbl-64/slowpow.c: Likewise. * sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise. * sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise. * sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c. * sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1. * sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c, slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c. * sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define. * sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise. * sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file. * sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise. * sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-02	benchtests: Make bench-memcmp print json	Siddhesh Poyarekar
	The benchamrk result can now be studied using the compare_strings.py script. * benchtests/bench-memcmp.c: Print json instead of plain text.
2018-02-02	benchtests: Reallocate buffers for every test run	Siddhesh Poyarekar
	Keeping the buffers the same across test runs gives later invocations the advantage since they access cached data. Reallocate so that all test runs are on equal grounds. * benchtests/bench-memcmp.c (do_test): Call realloc_buf for every test run.
2018-01-01	Update copyright dates with scripts/update-copyrights.	Joseph Myers
	* All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.
2017-12-15	Convert strcmp benchmark output to json format	Siddhesh Poyarekar
	The format is now parseable with the compare_strings.py script.
2017-11-28	benchtests: Enable BENCHSET to run subset of tests	Victor Rodriguez
	This patch adds BENCHSET variable to benchtests/Makefile in order to provide the capability to run a list of subsets of benchmark tests, ie; make bench BENCHSET="bench-pthread bench-math malloc-thread" This helps users to benchmark specific glibc area ChangeLog: * benchtests/Makefile:Add BENCHSET to allow subsets of benchmarks to be run. * benchtests/README: Add documentation for: Running subsets of benchmarks. Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com> Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com> Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
2017-11-28	benchtests: Expand range of tests names in schema.json	Victor Rodriguez
	When executing bench-math the benchmark output is invalid with this error msg: Invalid benchmark output: 'workload-spec2006.wrf' does not match any of the regexes: '^[_a-zA-Z0-9]$¹ or Invalid benchmark output: Additional properties are not allowed ('workload-spec2006.wrf' was unexpected) The error was seen when running the test: workload-spec2006.wrf, 'stack=1024,guard=1' and 'stack=1024,guard=2'. The problem is that the current regex's do not accept the hyphen, dot, equal and comma in the output. This patch changes the regex in benchout.schema.json to accept symbols in benchmark tests names. ChangeLog: benchtests/scripts/benchout.schema.json: Fix regex to accept a wider range of tests names. Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com> Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
2017-11-28	benchtests: Adjust valid and accepted properties	Victor Rodriguez
	Benchmark workload-spec2006.wrf does not produce max, min or mean results but instead produce throughput. This is represented in benchtests/bench-skeleton.c. This patch adjust benchout.schema.json to consider bench.out from bench-math benchmarks as valid ChangeLog: * benchtests/scripts/benchout.schema.json: Add throughput as accepted result from property and remove "max", min" and "mean" from required properties based on benchtests/bench-skeleton.c. Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com> Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
2017-11-20	benchtests: Bump start size since smaller sizes are noisy	Siddhesh Poyarekar
	Numbers for very small sizes (< 128B) are much noisier for non-cached benchmarks like the walk benchmarks, so don't include them. * benchtests/bench-memcpy-walk.c (START_SIZE): Set to 128. * benchtests/bench-memmove-walk.c (START_SIZE): Likewise. * benchtests/bench-memset-walk.c (START_SIZE): Likewise.
2017-11-20	benchtests: Fix walking sizes and directions for *-walk benchmarks	Siddhesh Poyarekar
	Make the walking benchmarks walk only backwards since copying both ways is biased in favour of implementations that use non-temporal stores for larger sizes; falkor is one of them. This also fixes up bugs in computation of the result which ended up multiplying the length with the timing result unnecessarily. * benchtests/bench-memcpy-walk.c (do_one_test): Copy only backwards. Fix timing computation. * benchtests/bench-memmove-walk.c (do_one_test): Likewise. * benchtests/bench-memset-walk.c (do_one_test): Walk backwards on memset by N at a time. Fix timing computation.
2017-10-13	Benchtests for sinf, cosf and sincosf	Rajalakshmi Srinivasaraghavan
	Numbers used from cos and sin inputs.
2017-10-05	benchtests: Memory walking benchmark for memmove	Siddhesh Poyarekar
	This benchmark is an attempt to eliminate cache effects from string benchmarks. The benchmark walks both ways through a large memory area and copies different sizes of memory and alignments one at a time instead of looping around in the same memory area. This is a good metric to have alongside the simple memmove benchmark (which is only really useful for smaller sizes) especially for larger sizes where the likelihood of the call being done only once is pretty high. This benchmark is different from memcpy in that it also tests overlapping copies. * benchtests/bench-memmove-walk.c: New file. * benchtests/Makefile (string-benchset): Add it.
2017-10-05	benchtests: Memory walking benchmark for memset	Siddhesh Poyarekar
	This benchmark is an attempt to eliminate cache effects from string benchmarks. The benchmark walks backward through a large memory area and sets different sizes of memory and alignments one at a time instead of looping around in the same memory area. This is a good metric to have alongside the simple memset benchmark (which is only really useful for smaller sizes) especially for larger sizes where the likelihood of the call being done only once is pretty high. * benchtests/bench-memset-walk.c: New file. * benchtests/Makefile (string-benchset): Add it.
2017-10-05	benchtests: Memory walking benchmark for memcpy	Siddhesh Poyarekar
	This benchmark is an attempt to eliminate cache effects from string benchmarks. The benchmark walks both ways through a large memory area and copies different sizes of memory and alignments one at a time instead of looping around in the same memory area. This is a good metric to have alongside the other memcpy benchmarks, especially for larger sizes where the likelihood of the call being done only once is pretty high. * benchtests/bench-memcpy-walk.c: New file. * benchtests/Makefile (string-benchset): Add it.
2017-09-20	Add exp2f and log2f benchmark trace	Szabolcs Nagy
	exp2f and log2f benchmark traces are just copies of the existing expf and logf traces from wrf_r. * benchtests/Makefile: Add exp2f and log2f benchmarks. * benchtests/exp2f-inputs: Copy of expf-inputs. * benchtests/log2f-inputs: Copy of logf-inputs.
2017-09-19	Add logf trace	Wilco Dijkstra
	Add a trace for logf. This is a reduced trace based on 2.8 billion samples extracted from wrf_r. * benchtests/Makefile: Add logf benchmark. * benchtests/logf-inputs: Add reduced trace from wrf_r.
2017-09-19	Add expf trace	Wilco Dijkstra
	Add a trace for expf. This is a reduced trace based on 2.4 billion samples extracted from wrf_r. * benchtests/Makefile: Add expf benchmark. * benchtests/expf-inputs: Add reduced trace from wrf_r.
2017-09-19	Add benchtests for trunc and truncf.	Joseph Myers
	This patch adds benchtests for the trunc and truncf functions. The inputs listed are fairly arbitrary; I do not assert they are representative of any particular application. * benchtests/Makefile (bench-math): Add trunc and truncf. (CFLAGS-bench-trunc.c): New variable. (CFLAGS-bench-truncf.c): Likewise. * benchtests/trunc-inputs: New file. * benchtests/truncf-inputs: Likewise.
2017-09-16	benchtests: New -g option to generate graphs in compare_strings.py	Siddhesh Poyarekar
	The compare_strings.py option unconditionally generates a graph PNG image of the input data, which can be unnecessary and slow. Put this behind an optional flag -g. * benchtests/scripts/compare_strings.py: New option -g. (draw_graph): Print a message that a graph is being generated. (process_results): Generate graph only if -g is passed. (main): Process option -g.
2017-09-16	benchtests: Make compare_strings.py output a bit prettier	Siddhesh Poyarekar
	Make the column widths for the outputs fixed so that they look a little less messy. They will still look bad with lots of IFUNCs (like on x86) but it's still a step forward. * benchtests/scripts/compare_strings.py (process_results): Better spacing for output.
2017-09-16	benchtests: Use argparse to parse arguments	Siddhesh Poyarekar
	Make the script more usable by adding proper command line options along with a way to query the options. The script is capable of doing a bunch of things right now like choosing a base for comparison, choosing to generate graphs, etc. and they should be accessible via command line switches. * benchtests/scripts/compare_strings.py: Use argparse. * benchtests/README: Document existence of compare_strings.py.
2017-09-14	benchtests: Reallocate buffers for memset	Siddhesh Poyarekar
	Keeping the same buffers along with copying the same size of data into the same location means that the first routine is typically the slowest since it has to bear the cost of fetching data into to cache. Reallocating buffers stabilizes numbers by a bit. * benchtests/bench-string.h (realloc_bufs): New function. (test_init): Call it. * benchtests/bench-memset-large.c (do_test): Likewise. * benchtests/bench-memset.c (do_test): Likewise.
2017-09-14	benchtests: Make memset benchmarks print json	Siddhesh Poyarekar
	Make the memset benchmarks (bench-memset and bench-memset-large) print their output in JSON so that they can be evaluated using the compare_strings.py script. * benchtests/bench-memset-large.c: Print output in JSON format. * benchtests/bench-memset.c: Likewise.
2017-09-14	Add new codepage charmaps/IBM858 [BZ #21084]	Mike FABIAN
	This code page is identical to code page 850 except that X'D5' has been changed from LI61 (dotless i) to SC20 (euro symbol). The code points from /x01 to /x1f in the /localedata/charmaps/IBM858 file have the same mapping as those in localedata/charmaps/ANSI_X3.4-1968. That means they disagree with with ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00858.txt in that range. For example, localedata/charmaps/IBM858 and localedata/charmaps/ANSI_X3.4-1968 have: “<U0001> /x01 START OF HEADING (SOH)” whereas CP00858.txt has: “01 SS000000 Smiling Face” That means that CP00858.txt is not really ASCII-compatible and to make it ASCII-compatible we deviate fro CP00858.txt in the code points from /x01 to /x1f. [BZ #21084] * benchtests/strcoll-inputs/filelist#en_US.UTF-8: Add IBM858 and ibm858.c. * iconvdata/Makefile: Add IBM858. * iconvdata/gconv-modules: Add IBM858. * iconvdata/ibm858.c: New file. * iconvdata/tst-tables.sh: Add IBM858 * localedata/charmaps/IBM858: New file.
2017-08-21	benchtests: Do not compile benchmark objects as libc modules [BZ #21864]	Florian Weimer
	Otherwise, this will lead to link failures due to hidden symbol references.
2017-08-17	Add math benchmark latency test	Wilco Dijkstra
	This patch further improves math function benchmarking by adding a latency test in addition to throughput. This enables more accurate comparisons of the math functions. The latency test works by creating a dependency on the previous iteration: func_res = F (func_res * zero + input[i]). The multiply by zero avoids changing the input. It reports reciprocal throughput and latency in nanoseconds (depending on the timing header used) and max/min throughput in iterations per second: "workload-spec2006.wrf": { "reciprocal-throughput": 100, "latency": 200, "max-throughput": 1.0e+07, "min-throughput": 5.0e+06 } * benchtests/bench-skeleton.c (main): Add support for latency benchmarking. * benchtests/scripts/bench.py: Add support for latency benchmarking.
2017-08-11	benchtests: Print json in memmove benchmark	Siddhesh Poyarekar
	Make the memmove benchmarks (bench-memmove and bench-memmove-large) print their output in JSON so that they can be evaluated using the compare_strings.py script. * benchtests/bench-memmove-large.c: Print output in JSON format. * benchtests/bench-memmove.c: Likewise.
2017-08-11	benchtests: Remove verification runs from benchmark tests	Siddhesh Poyarekar
	The test run is unnecessary and interferes with the benchmark. The tests are done during make check, so they're unnecessary here. * benchtests/bench-memccpy.c (do_one_test): Remove checks. * benchtests/bench-memchr.c (do_one_test): Likewise. * benchtests/bench-memcpy-large.c (do_one_test): Likewise. * benchtests/bench-memcpy.c (do_one_test): Likewise. * benchtests/bench-memmove-large.c (do_one_test): Likewise. * benchtests/bench-memmove.c (do_one_test): Likewise. * benchtests/bench-memset-large.c (do_one_test): Likewise. * benchtests/bench-memset.c (do_one_test): Likewise. * benchtests/bench-string.h (test_init): Remove memsets.
2017-08-08	benchtests: Avoid a display error when running in text terminal	Siddhesh Poyarekar
	The compare_strings.py script generates a graph for the benchmarks it performs a comparison on and that fails if X is not available. Avoid the error and ensure that only the graph is generated and saved as a PNG file. * benchtests/scripts/compare_strings.py: Avoid display error when generating graph.
2017-08-08	benchtests: Allow selecting baseline for compare_string.py	Siddhesh Poyarekar
	This patch allows one to provide the function name using an optional -base option to compare all other functions against. This is useful when pitching one implementation of a string function against alternatives. In the absence of this option, comparisons are done against the first ifunc in the list. * benchtests/scripts/compare_strings.py (main): Add an optional -base option. (process_results): New argument base_func.
2017-08-08	benchtests: Use TEST_NAME instead of hardcoding memcpy	Siddhesh Poyarekar
	The hardcoded 'memcpy' name turns up in other derived tests like mempcpy. * benchtests/bench-memcpy.c (test_main): Use TEST_NAME instead of hardcoding memcpy. * benchtests/bench-memcpy-large.c (test_name): Likewise. * benchtests/bench-memcpy-random.c (test_name): Likewise.
2017-06-22	benchtests: New script to parse memcpy results	Siddhesh Poyarekar
	Read the memcpy results in json and print out the results in tabular form, in addition to generating a graph of the results to compare all of the implementations. The format of the output is extensible enough to allow this kind of analysis to be done on other string functions as well. * benchtests/scripts/benchout_strings.schema.json: New file. * benchtests/scripts/compare_strings.py: New file.
2017-06-22	benchtests: Make memcpy benchmarks print results in json	Siddhesh Poyarekar
	Print the benchmark output for various memcpy benchmarks in json so that it can be predictably parsed and analyzed. * benchtests/bench-memcpy-large.c: Include json-lib.h. (do_one_test): Print json. (do_test): Likewise. (test_main): Likewise. * benchtests/bench-memcpy-random.c: Include json-lib.h. (do_one_test): Print json. (do_test): Likewise. (test_main): Likewise. * benchtests/bench-memcpy.c: Include json-lib.h. (do_one_test): Print json. (do_test): Likewise. (test_main): Likewise.
2017-06-22	benchtests: Print string array elements, int and uint in json	Siddhesh Poyarekar
	Enhance the json module in benchtests to print signed and unsigned integers and string array elements. * benchtests/json-lib.h: Include inttypes.h. (json_attr_int, json_attr_int, json_element_string, json_element_int, json_element_uint): New functions. * benchtests/json-lib.c: (json_attr_int, json_attr_int, json_element_string, json_element_int, json_element_uint): New functions.
2017-06-20	Add powf trace	Wilco Dijkstra
	Add a workload for powf. This is a reduced trace based on 2.3 billion samples extracted from wrf. The distribution of values, in particular frequency of commonly used operands is the same as in the full trace. * benchtests/powf-inputs: Add reduced trace from wrf.
2017-06-20	Improve math benchmark infrastructure	Wilco Dijkstra
	Improve support for math function benchmarking. This patch adds a feature that allows accurate benchmarking of traces extracted from real workloads. This is done by iterating over all samples rather than repeating each sample many times (which completely ignores branch prediction and cache effects). A trace can be added to existing math function inputs via "## name: workload-<name>", followed by the trace. * benchtests/README: Describe workload feature. * benchtests/bench-skeleton.c (main): Add support for benchmarking traces from workloads.
2017-06-20	Add powf bench tests	Paul Clarke
	Add powf() bench test with input which covers these cases: - positive base to positive exponent - exponent 0 - negative base to even exponent - exponent 1 - exponent -1 - squared - squareroot - 1 to negative exponent - -1 to negative exponent - base 0 - -1 to even exponent - small base - small exponent * benchtests/Makefile (bench-math): Add powf. * benchtests/powf-inputs: New file.
2017-06-14	nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)	Adhemerval Zanella
	Current allocate_stack logic for create stacks is to first mmap all the required memory with the desirable memory and then mprotect the guard area with PROT_NONE if required. Although it works as expected, it pessimizes the allocation because it requires the kernel to actually increase commit charge (it counts against the available physical/swap memory available for the system). The only issue is to actually check this change since side-effects are really Linux specific and to actually account them it would require a kernel specific tests to parse the system wide information. On the kernel I checked /proc/self/statm does not show any meaningful difference for vmm and/or rss before and after thread creation. I could only see really meaningful information checking on system wide /proc/meminfo between thread creation: MemFree, MemAvailable, and Committed_AS shows large difference without the patch. I think trying to use these kind of information on a testcase is fragile. The BZ#18988 reports shows that the commit pages are easily seen with mlockall (MCL_FUTURE) (with lock all pages that become mapped in the process) however a more straighfoward testcase shows that pthread_create could be faster using this patch: -- static const int inner_count = 256; static const int outer_count = 128; static void thread1(void arg) { return NULL; } static void sleeper(void arg) { pthread_t ts[inner_count]; for (int i = 0; i < inner_count; i++) pthread_create (&ts[i], &a, thread1, NULL); for (int i = 0; i < inner_count; i++) pthread_join (ts[i], NULL); return NULL; } int main(void) { pthread_attr_init(&a); pthread_attr_setguardsize(&a, 1<<20); pthread_attr_setstacksize(&a, 1134592); pthread_t ts[outer_count]; for (int i = 0; i < outer_count; i++) pthread_create(&ts[i], &a, sleeper, NULL); for (int i = 0; i < outer_count; i++) pthread_join(ts[i], NULL); assert(r == 0); } return 0; } -- On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests I see: $ time ./test real 0m3.647s user 0m0.080s sys 0m11.836s While with the patch I see: $ time ./test real 0m0.696s user 0m0.040s sys 0m1.152s So I added a pthread_create benchtest (thread_create) which check the thread creation latency. As for the simple benchtests, I saw improvements in thread creation on all architectures I tested the change. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu, sparc64-linux-gnu, and sparcv9-linux-gnu. [BZ #18988] * benchtests/thread_create-inputs: New file. * benchtests/thread_create-source.c: Likewise. * support/xpthread_attr_setguardsize.c: Likewise. * support/Makefile (libsupport-routines): Add xpthread_attr_setguardsize object. * support/xthread.h: Add xpthread_attr_setguardsize prototype. * benchtests/Makefile (bench-pthread): Add thread_create. * nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and then mprotect the required area.
2017-06-04	benchtests: Add more tests for memrchr	H.J. Lu
	bench-memchr.c is shared with bench-memrchr.c. This patch adds some tests for positions close to the beginning for memrchr, which are equivalent to positions close to the end for memchr. * benchtests/bench-memchr.c (do_test): Print out both length and position. (test_main): Also test the position close to the beginning for memrchr.