Pipeline Management

There are DAGs managed by Apache Airflow to conduct the following:

movement and registration of 'runs' (ie images and associated meta/data) from the TEMs to the final storage. these are named 'temN_daq'
management of the pre-processing (alignment and ctf) for each 'experiment'. these are named 'YYYYMMDD_PID_TEMN_SAMPLEID'

Generally, the DAQ DAGS utilise an external data source to determine what experiment is 'active' and obtains the relevant data (proposal id, tem parameters, pre-processing parameters). In our case, we utilise the eLogBook where users can interact with such information.

As new experiments and samples are activated, a new DAG for this experiment/sample tuple is registered as a preprocessing pipeline. We define samples as a grouping of specimen + specific imaging parameters. As new data is copied over from the TEMs, each 'run' is registered with the appropriate preprocessing pipe.

Apache Airflow manages the tasks necessarily to preprocess the data and register the data/metadata into the appropriate places. This includes performance monitoring (to influxdb/grafana) and file and summary information (like image resolution, drift etc to the elogbook).

Page tree

Pipeline Management