We used Cython multithreading (via openmp) in Smd0 for parallel read and collecting event timestamps. We noticed that there is a 0.3 ms overhead with thread synchronization. The table below shows the amount of time (ms) used for the Smd0 to finish yielding a batch. The comparison is done with and w/o multithreading. Although we expect this amount of time to decrease linearly when Smd0 yields a smaller batch size, this is not true for the case of multithreading. For batch size = 10,000, it took 2.3 ms while batch size = 1, this number is still around 1 ms (see second row of the Table below in comparison with the first row when multithreading is not used).
Time Spent per Batch (ms)/ BATCH_SIZE | 10000 | 1000 | 100 | 1 |
---|---|---|---|---|
Average Time w/o multithreading | 6.20 | 0.63 | 0.07 | 0.0009 |
Average Time w prange | 2.31 | 1.09 | 0.99 | 0.94 |
Max Time w/o multithreading | 15.49 | 13.79 | 13.45 | 13.34 |
Max Time w prange | 4.28 | 2.69 | 3.11 | 9.18 |
Min Time w/o multithreading | 1.54 | 0.16 | 0.01 | 0.0 ... |
Min Time w prange | 1.31 | 0.92 | 0.78 | 0.67 |
Std. Time w/o multithreading | 6.56 | 2.44 | 0.77 | 0.08 |
Std. Time w prange | 1.39 | 0.30 | 0.14 | 0.13 |
We investigated how much time does Cython need to enter and exit a loop that activates and join threads (note that threads are created only once by Cython). With the simple code below, it takes about ~0.3 ms to complete.
from cython.parallel import prange def do_prange(int n, int batch_size): cdef timeval tv_st, tv_en gettimeofday(&tv_st, NULL) gettimeofday(&tv_en, NULL) |
---|
This seems to be