All data is kept at the SLAC Shared Scientific Data Facility (S3DF) which provides the petaflops of compute and petabytes of storage needed for cryoEM data set management and analysis. The S3DF is comprised of many high-powered computer servers interconnected with high-speed networking fabrics. This in turn is connected to high performance storage which is available from all centrally managed hosts.
Prerequisites
- You will need an active SLAC unix account to access your data (contact Lisa Dunn).
You will need to register as an S3DF user in the "CryoEM" facility with your SLAC unix username. Please go to coact.slac.stanford.edu, then go to Repos→Request Facility Access and select "CryoEM", INCLUDE YOUR PROPOSAL NUMBER/PROJECT CODE IN THE NOTES FIELD, then submit. For further information, see: https://s3df.slac.stanford.edu/public/doc/#/accounts-and-access?id=access
The proposal number/project code (e.g., CT### or CE###) MUST be included in the notes field of the S3DF CryoEM facility access request. S3DF registration requests without this information will be rejected, and you will have to submit a ticket via email to s3df-help@slac.stanford.edu with that information to have your request re-opened for resubmission.
All experimental data is kept on the S3DF network filesystem:
The S3DF bastion hosts (e.g. accessed by running ssh s3dflogin.slac.stanford.edu ) do NOT mount the experimental and group data (see the message displayed in your terminal upon successful login to an S3DF bastion host). You will need to access your data via one of the load-balanced hosts listed below.
- S3DF interactive pool (see: https://s3df.slac.stanford.edu/public/doc/#/interactive-compute?id=interactive-pools).
- S3DF DTN hosts (see: https://s3df.slac.stanford.edu/public/doc/#/data-transfer?id=data-transfer)
- S3DF Globus endpoint via the #s3df_globus5 data collection (see: https://s3df.slac.stanford.edu/public/doc/#/data-transfer?id=globus)
The SDF cluster has been decommissioned as of August 2024. All Cryo-EM experimental and group project data is accessible from the S3DF cluster.
- All experimental data is stored under this directory path structure: /fs/ddn/sdf/group/cryoem/exp/YYYYMM/YYYYMMDD-<proposal>_<instrument>
- YYYY is the year, MM is the month, and DD is the day
- <proposal> is the 4- or 5-character alphanumeric project number assigned to your project.
- <instrument> is the instrument used in your experiment:
- TEM1
- TEM2
- TEM3
- TEM4
- TEMALPHA
- TEMBETA
- TEMGAMMA
- TEMDELTA
- FIB1
- FIB2
- Hydra
- For example, if your experiment name is 20220423-CA107, on instrument TEMBETA, then your data will be found at /fs/ddn/sdf/group/cryoem/exp/202204/20220423-CA107_TEMBETA
- Permissions to access data is controlled via the CryoEM eLogBook.
- Whoever is added as a collaborator to the experiment will have access to the data associated with this experiment.
Data Organization
Within your experiment's directory (i.e., YYYYMMDD-<proposal>_<instrument>), two directory folders are created for each sample recorded in the cryoEM eLogBook. The first directory folder is given a unique 24-character alphanumeric name and contains your data for that sample. The second directory folder is a symbolic link with the name of the sample in the cryoEM eLogBook that directs to the corresponding alphanumeric directory for that sample. Inside each sample directory folder is the output from our bespoke data processing pipeline that performs frame alignment (i.e., motion correction) and CTF estimation before and after alignment and posts results to an experiment-specific private Slack channel for near real-time feedback during data collection. The output directories from our data pipeline are as follows:
- raw
- Raw contains the original data directory collected via EPU, Serial EPU, Serial EM, etc... that is transferred to the SDF from the microscope.
- aligned
- Aligned contains the output from MotionCor2.
- summed
- Summed contains the output from CTF estimation via CTFFind4.
- particles
- Particles contains the output from particle picking via Relion.
- previews
- Previews contains the image files that are posted to the experiment's Slack channel.
- logs
- Logs contain the logs of all the preprocessing pipeline tasks of frame alignment, CTF estimation, particle picking, and preview generation.
Data Collected using EPU
The format of the EPU file naming scheme is:
FoilHole_[Hole ID]_Data_[Acquisition Area ID]_[date]_[time].mrc
- For example: in FoilHole_31545690_Data_31547881_31547882_20190601_081945.mrc
- [Hole ID] is 31545690
- [Acquisition Area ID] is 31547881_31547882
- [date] is 20190601 in yyyymmdd format
- [time] is 081945 in 24-hour hhmmss format
EPU organizes these files in the following directory scheme by grid square with the top directory being the EPU session name, located in your raw directory:
- EPU Session
- Metadata
- Images-Disc1
- Your copy of the Gain Reference File .gain
- GridSquare_########
- FoilHoles
- Data
- ...
- GridSquare_########
- FoilHoles
- Data
- GridSquare_########
Each "FoilHoles" directory contains images of the foil hole that data was collected from in both .jpg and .mrc format. Each "Data" directory under an EPU session contains the files stored for each image acquisition.
Each image acquisition in EPU results in six files:
- A high-quality MRC or TIFF image file. This is an unaligned summed image from the image stack of dose fraction images, and a .jpg copy of this file.
- A high-quality MRC or TIFF image stack file. This is your raw data which is an unaligned image stack of dose fraction images and has "Fractions" appended to the file naming scheme. A checksum file ending in .dm5 can also be found here from transferring this file to the data server. That is not your raw data file, its a checksum file.
- Two XML files with metadata, one for the raw image stack and one for the unaligned summed image.
Data Transfer Methods
rsync/scp/bbcp
The unix command line utilities rsync , scp , and bbcp (https://www.slac.stanford.edu/~abh/bbcp/bbcp.pdf) can be used over ssh to bulk transfer/synchronise data across different locations. The data transfer nodes (dtn01.slac.stanford.edu, dtn02.slac.stanford.edu, and the centos7.slac.stanford.edu cluster have been decommissioned. Please log into the load-balanced hostname s3dfdtn.slac.stanford.edu with your SLAC unix credentials instead). You will need to be registered as a CryoEM facility user in S3DF before you can access the S3DF DTN hosts (see Prerequisites above). Once logged into an S3DF DTN host, the data will be mounted at:
- Experimental data: /fs/ddn/sdf/group/cryoem/exp/<YYYYMM>/<YYYYMMDD>-<proposal>_<instrument>
- Group project data: /fs/ddn/sdf/group/cryoem/g/<proposal>
Globus
One can create a free Globus account at https://globusonline.org. If your institution is a part of the Globus network, you can register with your institution's credentials. From there, download the client (Globus Connect Personal) so that files may be copied a local host, or to another Globus Endpoint run by your institution, if available. You may also use the Globus web client (https://www.globus.org/).
The Globus 4 endpoint at slac#cryoem is no longer supported.(Globus shut down all version 4 endpoints worldwide on December 18th 2023).
You will need to be registered as a CryoEM facility user in S3DF before you can access the S3DF Globus endpoint (see Prerequisites above). Once your S3DF registration has been approved, you can log into Globus with your home institution's credentials (if they are a part of the Globus network) or a personal Globus account and access the #s3df_globus5 data collection. You will be asked to authenticate with your SLAC unix credentials. Once authenticated, the data will be mounted at the following paths:
- Experimental data: /fs/ddn/sdf/group/cryoem/exp/<YYYYMM>/<YYYYMMDD>-<proposal>_<instrument>
- Group project data: /sdf/group/cryoem/g/<proposal>
SAMBA
If you are onsite at SLAC, you can access the data via samba/cifs. You should connect to zslaccfs in order to browse the global directory. From there you can access the cryo-EM disks under cryoem.
You should login with your SLAC Windows Account to use this.
- On your Linux machine, open a terminal window.
- Install the necessary software with the command sudo apt-get install -y samba samba-common python-glade2 system-config-samba.
- Type your sudo password and hit Enter.
- Allow the installation to complete.
- Open a new file browser window.
- At the bottom of the left navigation pane, click "Other Locations"
- At the bottom of the window, in the "Connect to Server" field, Type smb://zslaccfs/
- Open the cryoem directory.
- Log in with your SLAC Windows account username and password, leave the "Workgroup" field as the default value or if using Ubuntu the domain should be "slac".
- You can now browse the CryoEM disks in your Linux file browser.
Alternatively, you can mount it via command line . The example below for Ubuntu/Debian hosts:
Install cifs-utils. sudo apt-get install cifs-utils
- Create a directory where you want to mount it. sudo mkdir /mnt/slac
sudo mount -t cifs -o username=SLAC_USERNAME,vers=1.0,domain=slac,uid=`id -u` //zslaccfs.slac.stanford.edu/cryoem/ /mnt/slac
Note that the
uidis important so that you as user can have the correct permissions on your local desktop.
SSHFS
You can also use the FUSE based SSHFS to mount the filesystem via ssh. It is recommended to ssh into s3dfdtn.slac.stanford.edu in order to use this method. Please note you will need to be a registered S3DF user (see Prerequisites above) to access the S3DF DDTN hosts.
Using SSHFS, the filesystem will be mounted locally so that you may browse the directory as you would with SAMBA etc.
Summary
| Globus | SAMBA | SSHFS | RSYNC/SCP | BBCP | |
|---|---|---|---|---|---|
| Software Install | Globus Connect Personal clients available (https://www.globus.org/globus-connect-personal) | Implementation already baked into MAC OS and Windows. Simple install for Linux via package managers | Requires FUSE library. Not available for Windows | Command line tools usually already installed in most Linux and MAC OSes. GUIs available for SCP. | Command line tools only |
| Graphical Interface | Yes (Web/Globus Connect Personal client) | Yes (OS) | Yes (OS) | Yes (WinSCP) | No |
| Command line interface | Yes | Yes (standard OS) | Yes (standard OS) | Yes (standard OS) | Yes (standard OS) |
| Performance | Fast | Fast | Slow | Slow | Fast |
| Access | Anywhere | At SLAC Only | Anywhere | Anywhere | Anywhere |
| Credentials | Globus ID + SLAC Unix + S3DF | SLAC Windows Active Directory | SLAC Unix + S3DF | SLAC Unix + S3DF | SLAC Unix + S3DF |
| Ease of Use | Easy | Easy | Medium | Easy | Difficult |