CryoSPARC on S3DF

Before You Start

NOTE: Due to limited SLAC Cryo-EM resources in S3DF, access to compute resources (CPU/GPU/project data storage) is restricted to screening (CE) and training (CT) proposals. Data collection (CA) and other proposals will need to transfer their data to another location for further processing/analysis. For further information, see: Where is my Cryo-EM Data?

Please follow the instructions below before processing data with CryoSPARC on S3DF.

Request a CryoSPARC license at https://cryosparc.com/download
Contact an Information Systems Specialist to create a folder /sdf/group/cryoem/g/<proposal number>, where you will store the processed data (requires permission from S2C2 management).
Register as a Cryo-EM facility user in S3DF by following the procedure below:
Submit a request to be added to the Cryo-EM facility via the S3DF Coact user portal
i) In a browser, navigate to the S3DF Coact user portal at https://coact.slac.stanford.edu
ii) Click the "Log in with S3DF (unix)" button, then enter your SLAC Unix credentials
iii) Click on "Repos" in the menu bar, click the "Request Access to Facility" button, select "CryoEM" from the drop down, and INCLUDE YOUR PROPOSAL NUMBER/PROJECT CODE IN THE NOTES FIELD, and press the "Request Facility Access" button.

The proposal number/project code (e.g., CT### or CE###) MUST be included in the notes field of the S3DF CryoEM facility access request. S3DF registration requests without this information will be rejected, and you will have to submit a ticket via email to s3df-help@slac.stanford.edu with that information to have your request re-opened for resubmission.
Register a new repo for your proposal number(s) or request membership in an existing one. You will need a proposal number for each proposal for which you will be processing data with SLAC Cryo-EM compute resources on S3DF. For instance, if you are a member of proposals CX00 and CX01, and will use S3DF compute resources for processing data in both proposals, each proposal needs its own corresponding S3DF repo for compute usage accounting. Once you have completed S3DF user registration for the CryoEM facility, log into https://coact.slac.stanford.edu and do one of the following:
a) To request a new repo for your proposal(s)
i) Click on "Repos" in the menu bar
ii) Click the "Request New Repo" button
iii) Enter the name of your repo (should match the proposal, e.g. "CA00"), select "CryoEM" from the drop down, and enter the SLAC unix username of the principal who will be utilizing the S3DF compute resources
b) To request membership in an existing repo
i) Click on "Repos" in the menu bar
ii) Click the "Request Repo Membership" button
iii) Enter the existing repo name (should match the proposal number) and select "CryoEM" from the drop down menu, then submit.
Go to your project folder provisioned in step (2), make a subfolder to contain symlinks to your raw data, and another folder to store your CryoSPARC database.
For example, I created a folder (mkdir) called apoferritin and in the CT00 folder, I then created a folder called 20221121-temalpha under the apoferritin folder. In the 20221121-temalpha folder, I put my linked raw data from the exp drive. Another folder called cryosparc was created under the CT00 folder to store data generated from running CryoSPARC:
```
# change to project folder
$ cd /sdf/group/cryoem/g/CT00

# create subfolders to store CryoSPARC data
$ mkdir -p apoferritin/20221121-temalpha

# change to newly-created CryoSPARC data directory
$ cd apoferritin/20221121-temalpha

# continue to step 6
```

To create the symbolic links: Find the path of your experimental data directory, which should be in the following format: /sdf/group/cryoem/exp/YYYYMM/YYYYMMDD-PP###_<instrument>/<sample>/raw/<path_to_sample_directory>/... Type in the commands below: note that echo is used to preview the command first

If the data is in tiff format:

# ensure that you are in your project folder as in step (5) (e.g., /sdf/group/cryoem/g/<proposal_number>/...
$ pwd


# Check the output of the `find` command in your experimental data directory to get a list of all *fractions.tiff raw image files,
# and create symbolic links to the data in your project folder

# Do a dry-run of the command
$ find [directory with data] -name '*fractions.tiff' -exec echo ln -sf \{\} . \;  
# Execute the command
$ find [directory with data] -name '*fractions.tiff' -exec ln -sf \{\} . \;

If the data is in mrc format:

# ensure that you are in your project folder as in step (5) (e.g., /sdf/group/cryoem/g/<proposal_number>/...
$ pwd

# Check the output of the `find` command in your experimental data directory to get a list of all *Fractions.mrc raw image files, 
# and create symbolic links to the data in your project folder

# Do a dry-run of the command
$ find [directory with data] -name '*Fractions.mrc' -exec echo ln -sf \{\} . \;
# Execute the command
$ find [directory with data] -name '*Fractions.mrc' -exec ln -sf \{\} . \;

For example,

# Do a dry-run of the command
$ find /sdf/group/cryoem/exp/202101/20210105-CF01_TEM2/Apoferritin/raw/ApoTest/Images-Disc1/ -name '*fractions.tiff' -exec echo ln -sf \{\} . \;
# Execute the command
$ find /sdf/group/cryoem/exp/202101/20210105-CF01_TEM2/Apoferritin/raw/ApoTest/Images-Disc1/ -name '*fractions.tiff' -exec ln -sf \{\} . \;

Same with the gain ref:

$ find [directory with gainref data] -name '*.mrc' -exec ln -sf \{\} . \;

Now you are ready to process data with CryoSPARC!

Accessing CryoSPARC on S3DF

Note: Once your session has started, the batch scheduler will reserve your requested resources for the amount of hours requested, regardless of whether you are actively running jobs in your CryoSPARC session or not. To reduce idling of compute resources and promote fairness of resource allocation, it is advised to only request time for your session when you will be actively running jobs in CryoSPARC, rather than acquiring resources and idling. For example, avoid requesting 72 hour CryoSPARC sessions when you will only be running jobs for a few hours during that timeframe. We will be implementing an idle timeout for S3DF CryoSPARC sessions not running batch jobs for a certain length of time to enforce this policy.

Log in to S3DF Ondemand by clicking the "Log in with S3DF (unix)" button, then enter your SLAC Unix credentials.
Under "Interactive Apps", select "CryoSPARC".

Fill out the Ondemand session request form and click “Launch” at the bottom of the page.

The CryoEM facility on S3DF has the following resources available as of this writing (2024-06-03):

Partition (cluster) name	Nodes available	CPU model	CPU Cores per node	Usable Memory per node	GPU model	GPUs per node	Local Scratch per node
turing	12	Intel Xeon Gold 5118	40 (hyperthreaded)	160GB	NVIDIA GeForce 2080Ti	10	300GB
ampere	25	Rome 7542	112 (hyperthreaded)	960GB	Tesla A100 (40GB)	4	14TB
roma	7	Rome 7702	120	480GB	-	-	300GB

By default, S2C2 and Chiu Lab-affiliated proposals are allowed the equivalent allocation of 1 node's compute resources maximum for each cluster type above, for all of the proposal's running jobs combined on that cluster. For example, a given proposal may request and be allocated up to the equivalent of 1 turing node, or 10GPU/40CPU/160GB, when requesting resources across all jobs run under that proposal on the turing cluster. A user may request jobs to run in multiple clusters, but allocation allowed per proposal, per cluster will remain at the equivalent of 1 node.

It is generally recommended to request a GPU(cores):CPU(cores):memory(GB) ratio of 1GPU:4CPU:16GB for most jobs. Please scale your resource requests appropriately according to the type of jobs being run.

Here are some example session settings:

Cryosparc license ID: <a valid CryoSPARC license>

Cryosparc Version: XXXXXX (currently deployed: 4.4.1+240110)

Account: cryoem:<your proposal number> (e.g. cryoem:CT00)

Partition: turing (requests 2080Ti GPUs)

Number of hours: XX (if running live then this should be the length of your session)

Number of CPU cores: 8

Mem: 32768MB or higher

Number of GPUs: 2 (Don't request more than 4 GPUs)

Advanced Settings
Cryosparc Datadir: /sdf/group/cryoem/g/<your proposal number> (do not use your home directory folder)

Your session will go into the scheduling queue, it may take a while to start depending on current cluster usage and resource availability. Your session will only start when all of the following conditions are met:
- ALL of your resource requests for this CryoSPARC session are available
- Your job has reached the top of the scheduling queue
Once your session is ready, you will see a “Launch Cryosparc” button on the bottom left corner of your session (you can also check the "I would like to receive an email when the session starts" option in the interactive session request form to be notified when your session is available). Click "Launch Cryosparc" to start your session.

5. If this is your first time launching CryoSPARC, you may need to log in with your SLAC email (e.g <your_slac_unix_username>@slac.stanford.edu) and the password is your license.

Note that the "<your_slac_unix_username>@slac.stanford.edu" email address does not need to actually exist, it's merely the default username generated when a user creates a new CryoSPARC session with a new database.

6. In the web browser, enter the following in the address bar:

To open the running CryoSPARC instance, enter localhost:<CRYOSPARC_BASE_PORT> in the address bar.
The value of CRYOSPARC_BASE_PORT can be viewed by opening a terminal in your CryoSPARC interactive session desktop and running:
```
# display the contents of cryosparc/config.sh and filter for the CRYOSPARC_BASE_PORT field (default value is 39100)
$ cat cryosparc/config.sh | grep CRYOSPARC_BASE_PORT
export CRYOSPARC_BASE_PORT=39100
```
Then enter the value you found in the config.sh script into the browser address bar, e.g. localhost:39100.

Note that if other CryoSPARC users have sessions on the same host, your CRYOSPARC_BASE_PORT value may differ.
To access CryoSPARC Live, connect to your running CryoSPARC instance as above and , click on the CryoSPARC Live (⚡️) "All Live Sessions" icon on the navigation bar. For more details on using CryoSPARC live, see: https://guide.cryosparc.com/live/new-live-session-start-to-finish-guide

7. Now you should be able to see the CryoSPARC interface and start data processing. Detailed tutorials of using CryoSPARC to process data can be found here: https://guide.cryosparc.com/processing-data/tutorials-and-case-studies

Best Practices For Using CryoSPARC in S3DF

DO request only the resources that you need. If you ask for more time or more CPUs or GPUs than you can actually use in your job, then it will take longer for your job to start, you will be reducing your fairshare so that your later jobs may be de-prioritised, and it prevents others from using your reserved idle resources (see note above).
DO be respectful of other users’ jobs - you shall be sharing a limited set of nodes with many other users. Please consider the type, size and quantity of jobs that you submit so that you do not starve others of compute resources.
DO NOT request more than one CryoSPARC instance at a time. Having multiple instances will likely corrupt your database. Delete any currently running CryoSPARC sessions by logging in to S3DF Ondemand with your SLAC unix credentials, clicking on "My Interactive Sessions", and deleting any "Running" Cryosparc Ondemand session cards.
DO NOT store data in your home directory (e.g., /sdf/home/<username_initial>/<username/... ) which has a limited disk quota of 25GB. Use a proposal directory under group space (e.g., /sdf/group/cryoem/g/<proposal_number> ) for each proposal you will be processing. See step (2) in the "Before You Start" section above.
DO NOT use proposal compute resource allocations on a different proposal. For example, if you are a member of proposals CX00 and CX01, do not process CX00 data under the cryoem:CX01 account or vice versa. Only process the data associated with the proposal specified in the CryoSPARC session's Account field.
DO NOT copy your experimental data (raw images) into your proposal folder, create symlinks to the experimental data in your proposal folder as described in steps (6) and (7) above. This reduces storage quota usage on the filesystem.

FAQs

How many GPU cores and how much memory do I need for my CryoSPARC jobs?

We recommend a maximum of 4 GPUs. Preprocessing can use 1 or more GPUs, scaling linearly in terms of throughput. Reconstruction uses one or more GPUs for 2D classification, and one GPU for 3D refinement.

The recommended amount of GPU memory required is at minimum 11GB per GPU to be able to process most types of data successfully in CryoSPARC.

Minimum GPU Memory Requirements

Preprocessing

Gatan K3, Gatan K3 Super Resolution, TFS Falcon 4 Images, TFS Falcon 4 EER Images: 11GB+

Reconstruction

2D Classification: 4GB+

3D Refinement (heavily dependent on box size): 11GB+

Troubleshooting

Issue: "User not found!" message when logging in to CryoSPARC instance
Issue: ""

Migrating the CryoSPARC Database

If you need to move your CryoSPARC database (e.g., you started your CryoSPARC interactive session in your SDF home directory and exceeded your disk quota) follow the procedure under the "Move the Database" section here (start from "Step Two - Move the Database"). Make sure you are NOT running a CryoSPARC session when you move the database directory.

After you've migrated your database, you'll need to specify the new database location whenever you start a new CryoSPARC Interactive Session in Ondemand:

Start a new CryoSPARC Interactive Desktop session as above in "Using CryoSPARC".
Scroll down and check the "Show advanced settings..." option.
In the "Cryosparc Datadir" form field, change "$HOME/cryosparc" to the path where you migrated your CryoSPARC database.
Click Launch to get an Interactive Desktop session with a CryoSPARC instance running from your migrated database location.

Importing CryoSPARC Projects

If you've migrated your CryoSPARC database as above, you may need to import your existing projects into CryoSPARC when running from the new database location. Follow the instructions here to import your existing projects into a new CryoSPARC instance with a migrated database.

Page tree