do_colsum function of src/main/array.c (it's been there since 2010):
Sep 10, 2019
do_colsum function of src/main/array.c (it's been there since 2010):
parallel package on the other hand only run on Unix systems)Approach:
mmap)All benchmarks were performed using gcc 9.1.0 on an ancient 4-core Intel Core i5-3427U Processor and on an SSD.
./generate_random_matrix 250000 1000 ~/matrix.bin
Code: generate_random_matrix.c
// Map matrix
struct file_mapping matrix;
if (map_file(path, &matrix) == -1) {
fprintf(stderr, "Could not map file\n");
exit(EXIT_FAILURE);
}
// Compute sum of entire array
[...]
// Unmap file
if (unmap_file(&matrix) == -1) {
fprintf(stderr, "Could not unmap file\n");
exit(EXIT_FAILURE);
}
Code: file_mapping.h
double sum = 0;
for (ptrdiff_t i = 0; i < matrix.length; i++) {
sum += matrix.data[i];
}
Code: sum_sequential.c
Benchmark (with -O3):
$ hyperfine './sum_sequential ~/matrix_250000_1000.bin' Benchmark #1: ./sum_sequential ~/matrix_250000_1000.bin Time (mean ± σ): 447.3 ms ± 1.2 ms [User: 337.6 ms, System: 107.9 ms] Range (min … max): 446.3 ms … 449.4 ms 10 runs
double sum = 0;
#pragma omp parallel for reduction(+:sum) schedule(static)
for (ptrdiff_t i = 0; i < matrix.length; i++) {
sum += matrix.data[i];
}
Code: sum_omp_parallel_for.c
Benchmark (with -O3):
$ hyperfine './sum_omp_parallel_for ~/matrix_250000_1000.bin' Benchmark #1: ./sum_omp_parallel_for ~/matrix_250000_1000.bin Time (mean ± σ): 196.2 ms ± 1.5 ms [User: 480.9 ms, System: 177.5 ms] Range (min … max): 193.8 ms … 199.8 ms 15 runs
double sum = 0;
#pragma omp simd reduction(+:sum)
for (ptrdiff_t i = 0; i < matrix.length; i++) {
sum += matrix.data[i];
}
Code: sum_omp_simd.c
Benchmark (with -O3):
$ hyperfine './sum_omp_simd ~/matrix_250000_1000.bin' Benchmark #1: ./sum_omp_simd ~/matrix_250000_1000.bin Time (mean ± σ): 310.3 ms ± 1.2 ms [User: 197.8 ms, System: 111.0 ms] Range (min … max): 308.6 ms … 312.1 ms 10 runs
double sum = 0;
#pragma omp parallel for simd reduction(+:sum)
for (ptrdiff_t i = 0; i < matrix.length; i++) {
sum += matrix.data[i];
}
Code: sum_omp_parallel_for_simd.c
Benchmark (with -O3):
$ hyperfine './sum_omp_parallel_for_simd ~/matrix_250000_1000.bin' Benchmark #1: ./sum_omp_parallel_for_simd ~/matrix_250000_1000.bin Time (mean ± σ): 183.9 ms ± 0.9 ms [User: 438.6 ms, System: 172.4 ms] Range (min … max): 182.3 ms … 186.0 ms 16 runs
Just for fun.
Code: sum_pthread_collapsed_master.c
Benchmark (with -O3):
$ hyperfine './sum_pthread_collapsed_master ~/matrix_250000_1000.bin' Benchmark #1: ./sum_pthread_collapsed_master ~/matrix_250000_1000.bin Time (mean ± σ): 194.3 ms ± 1.6 ms [User: 475.3 ms, System: 163.4 ms] Range (min … max): 192.6 ms … 197.7 ms 15 runs
| Task | Time (ms) |
|---|---|
| sequential | 447.3 |
| omp parallel for | 196.2 |
| omp simd | 310.3 |
| omp parallel for simd | 183.9 |
| pthreads | 194.3 |
PKG_CFLAGS=-fopenmp R CMD SHLIB code.c