Page History

We used Cython multithreading (via openmp) is used in Smd0 for parallel read . This however is currently a bottleneck since one prange loop takes ~1ms to complete as demonstrated in this test.and collecting event timestamps. We noticed that there is a 0.3 ms overhead with thread synchronization. The table below shows the amount of time (ms) used for the Smd0 to finish yielding a batch. The comparison is done with and w/o multithreading. Although we expect this amount of time to decrease linearly when Smd0 yields a smaller batch size, this is not true for the case of multithreading. For batch size = 10,000, it took 2.3 ms while batch size = 1, this number is still around 1 ms (see second row of the Table below in comparison with the first row when multithreading is not used).

This test reads 30,000 events from 16 smalldata files.

Time Spent per Batch (ms)/ BATCH_SIZE	10000	1000	100	1
Average Time w/o multithreading	6.20	0.63	0.07	0.0009
Average Time w prange	2.31	1.09	0.99	0.94
Max Time w/o multithreading	15.49	13.79	13.45	13.34
Max Time w prange	4.28	2.69	3.11	9.18
Min Time w/o multithreading	1.54	0.16	0.01	0.0 ...
Min Time w prange	1.31	0.92	0.78	0.67
Std. Time w/o multithreading	6.56	2.44	0.77	0.08
Std. Time w prange	1.39	0.30	0.14	0.13

We investigated how much time does Cython need to enter and exit a loop that activates and join threads (note that threads are created only once by Cython). With the simple code below, it takes about ~0.3 ms to complete.

from cython.parallel import prange
import numpy as np
from posix.time cimport timeval, gettimeofday

def do_prange(int n, int batch_size):
  cdef int i, j
  cdef int sum_i = 0

cdef timeval tv_st, tv_en
  cdef unsigned long ut_st, ut_en

  gettimeofday(&tv_st, NULL)
  for i in prange(n, nogil=True, schedule='static'):
    for j in range(batch_size):
      sum_i += 1

  gettimeofday(&tv_en, NULL)
  ut_st = 1000000 * tv_st.tv_sec + tv_st.tv_usec
  ut_en = 1000000 * tv_en.tv_sec + tv_en.tv_usec
  print(f'{ut_st} {ut_en} {ut_en - ut_st} {sum_i}')

This seems to be

Gdb shows that threads were reused in the case of multithreading thread.

Page tree

Versions Compared

Old Version 3

New Version 4

Key