You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

We used Cython multithreading (via openmp) in Smd0 for parallel read and collecting event timestamps. We noticed that there is a 0.3 ms overhead with thread synchronization. The table below shows the amount of time (ms) used for the Smd0 to finish yielding a batch. The comparison is done with and w/o multithreading. Although we expect this amount of time to decrease linearly when Smd0 yields a smaller batch size, this is not true for the case of multithreading. For batch size = 10,000, it took 2.3 ms while batch size = 1, this number is still around 1 ms (see second row of the Table below in comparison with the first row when multithreading is not used). 

 

Time Spent per Batch (ms)/ BATCH_SIZE1000010001001
Average Time w/o multithreading6.200.630.070.0009
Average Time w prange2.311.090.990.94
Max Time w/o multithreading 15.4913.7913.4513.34
Max Time w prange4.282.693.119.18
Min Time w/o multithreading 1.540.160.010.0 ...
Min Time w prange1.310.920.780.67
Std. Time w/o multithreading 6.562.440.770.08
Std. Time w prange1.390.300.140.13

 

We investigated how much time does Cython need to enter and exit a loop that activates and join threads (note that threads are created only once by Cython). With the simple code below, it takes about ~0.3 ms to complete.

from cython.parallel import prange
import numpy as np
from posix.time cimport timeval, gettimeofday
def do_prange(int n, int batch_size):
cdef int i, j
cdef int sum_i = 0
cdef timeval tv_st, tv_en
cdef unsigned long ut_st, ut_en
  gettimeofday(&tv_st, NULL)
for i in prange(n, nogil=True, schedule='static'):
for j in range(batch_size):
sum_i += 1
  gettimeofday(&tv_en, NULL)
ut_st = 1000000 * tv_st.tv_sec + tv_st.tv_usec
ut_en = 1000000 * tv_en.tv_sec + tv_en.tv_usec
print(f'{ut_st} {ut_en} {ut_en - ut_st} {sum_i}')

This seems to be 

 

 

  • No labels