This page describes the processes and scripts that provide the transfer of image and meta data from the TEM's.

EPU and SerialEM are installed and running on the TEM machines. Each TEM has the following machines

Internally to each tem, there is a private network where all machines are interconnected. Across these servers, an 'X:' drive is mounted where data collection is stored.

Operators are expected to utilise the elogbook to control the backend data management systems. This primarily functions as a means to 

In the data center, an apache airflow instance runs to manage the workflows required to support data management. It currently runs as a docker swarn instance on cryoem-daq[01-5] nodes.

On these cryoem-daq nodes, the K2 server's disk is mounted via CIFS. the cryoem-daq nodes also mount the large GPFS filesystem where the data ultimately resides and users can access. As the docker swarm instances are containers, these mountpoints are 'bind' mounted into the airflow instances.

The airflow stack is kept under revision control under github.

Airflow define workflows in DAGs. These are coded in python and provide dependency graphs between tasks. The following table describes the function of each DAG.

DAGPurpose   
temN-daq.pyLiterally monitors the elogbook for the current experiment and sets up the storage and preprocessing pipelines in preparation. Also copies the data from the tem servers to the GPFS file system - and is it goes so, triggers the appropriate new pre-processing task to kick off.   
<experiment name>_<sample id>.pythese DAGs get generated with every new experiment and contain the actual preprocessing pipeline to align, ctf and particle pick each and every image that is triggered from the temN-daq DAG.   
pipeline_single-particle_pre-processing.pydefault template DAG for single particle pre-processing. This file is copied to <experiment name>_<sample id>.py when a new experiment starts